Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to use Python 3.5 #48

Open
janetriley opened this issue Jun 2, 2016 · 20 comments
Open

Update to use Python 3.5 #48

janetriley opened this issue Jun 2, 2016 · 20 comments

Comments

@janetriley
Copy link
Collaborator

janetriley commented Jun 2, 2016

Update the project to be compatible with Python 3.5 so we have the option to use asyncio.

@janetriley janetriley changed the title Update to use Python 3 Update to use Python 3.5 Jun 2, 2016
@bbengfort bbengfort mentioned this issue Jun 2, 2016
3 tasks
@bbengfort bbengfort added this to the Version 0.4 milestone Jun 2, 2016
@bbengfort
Copy link
Member

bbengfort commented Jun 2, 2016

  • Make sure that the tests work
  • Drop 2.7 dependency from travis
  • ensure 3 compatibility in all packages.

@janetriley
Copy link
Collaborator Author

In addition there are references to 2.7 in

  • README.md
  • docs/index.md
  • setup.py
  • Dockerfile-app

@will2041
Copy link
Collaborator

will2041 commented Jun 2, 2016

Pull request that includes code changes and Travis update.

@will2041
Copy link
Collaborator

will2041 commented Jun 2, 2016

Have another pull request out that takes care of the documentation updates and removes an unused method. Only work after that will be any updates needed on the docker front.

@bbengfort
Copy link
Member

@will2041 is this complete now? I'm using Python 3.5 and everything seems to be fine.

@will2041
Copy link
Collaborator

will2041 commented Jun 5, 2016

Basically. The Docker file still uses 2.7 and has some references, but that's it I think. I did run into some weird behavior with the export command using bin/baleen, but I'm not sure it's a 3.5 problem.

This could probably be closed. I think there's a separate item for Docker updates.

@bbengfort
Copy link
Member

@will2041 -- ok this can be closed; I'm just hesitant to actually push to production, especially since things have been running so well! We may have to find a time where we're both available to try to do the release together and push to production - any thoughts when?

@will2041
Copy link
Collaborator

will2041 commented Aug 8, 2016

I suppose a weekend is easiest to coordinate schedules. I'm free Saturday, but after that I've got visitors/am travellng until after Labor Day.

@bbengfort
Copy link
Member

So uh, I guess you meant this Saturday? I guess it'll have to keep until after labor day then! Sorry about that. Want to get something on the calendar?

@will2041
Copy link
Collaborator

Yeah, let's schedule something. I sent you an invite for the 10th. Maybe if Labor Day weekend ends up being free we could move it up, but I probably won't know that until the last minute.

@bbengfort
Copy link
Member

So the 10th I'm teaching -- though I could do it later in the evening; and
like I said, I'll be driving back from North Carolina on the 17th; so 24th?
Labor Day weekend could work.

Ben

On Mon, Aug 15, 2016 at 10:56 PM, will2041 [email protected] wrote:

Yeah, let's schedule something. I sent you an invite for the 10th. Maybe
if Labor Day weekend ends up being free we could move it up, but I probably
won't know that until the last minute.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAth7jNRb1AIz4FG6IBoRMS8mrmpRo0yks5qgSb5gaJpZM4IszOx
.

@will2041
Copy link
Collaborator

Ha! Oh man, now we're just pushing it out crazy far. Sundays work? 11th? My parents are in town that weekend of the 24th... How about some evening during the week? I could maybe swing that.

@bbengfort
Copy link
Member

This is my life - scheduling months in advance; seriously ...

Sundays do work but in the evenings for me, not the morning. Want to do the 11th anytime after 2pm EST?

@will2041
Copy link
Collaborator

Sundays are wonderful. I updated the invite to 3PM EST on the 11th.

@bbengfort
Copy link
Member

Perfect, we figured it out!

@will2041
Copy link
Collaborator

Updates before push:

Master branch change - https://github.com/bbengfort/baleen/blob/master/baleen/exceptions.py#L69 TimeoutError is already a built in OS error

@will2041
Copy link
Collaborator

Example of current error when running ingestion locally:

baleen.ingest INFO [11/Sep/2016:12:41:27 -0700] -- MongoIngestor job baf0c464-7857-11e6-89aa-60f81dac6496 started
baleen.ingest ERROR [11/Sep/2016:12:41:38 -0700] -- Post Error for feed Washington Post: Breaking News, World, US, DC News & Analysis on entry 4: Tried to save duplicate unique keys (E11000 duplicate key error collection: baleen.posts index: url_1 dup key: { : "https://www.washingtonpost.com/politics/clinton-holds-lead-over-trump-in-new-poll-but-warning-signs-emerge/2016/09/10/800dee0c-76c8-11e6-b786-19d0cb1e..." })
baleen.ingest ERROR [11/Sep/2016:12:41:38 -0700] -- Post Error for feed Washington Post: Breaking News, World, US, DC News & Analysis on entry 6: 'NoneType' object has no attribute 'encode'
<<SKIPPED 59 more entries like above and below lines>>
baleen.ingest ERROR [11/Sep/2016:12:41:41 -0700] -- Post Error for feed Washington Post: Breaking News, World, US, DC News & Analysis on entry 78: 'NoneType' object has no attribute 'encode'
baleen.ingest ERROR [11/Sep/2016:12:41:57 -0700] -- Ingestion Error: 'PostWrangler' object has no attribute 'title'
baleen.ingest CRITICAL [11/Sep/2016:12:41:57 -0700] -- MongoIngestor job baf0c464-7857-11e6-89aa-60f81dac6496 failed!

@will2041
Copy link
Collaborator

will2041 commented Sep 12, 2016

Well, I got it mostly working. Only weird thing is that the output has some errors:

Processed 35 (1 unchanged) feeds (5 minutes 25 seconds): 659 posts with 62 errors

52 errors are:

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url

But then there are 10 like this:

baleen.ingest ERROR [11/Sep/2016:16:43:44 -0700] -- Post Error for feed Washington Post: Breaking News, World, US, DC News & Analysis on entry 70: Post {'wp_uuid': 'd0905852-6eb7-11e5-b31c-d80d62b53e28', 'title_detail': {'value': 'This weekend’s open houses in D.C., Maryland, Virginia', 'base': 'http://feeds.washingtonpost.com/rss/homepage', 'language': None, 'type': 'text/plain'}, 'content': None, 'pubdate': None, 'url': 'https://www.washingtonpost.com/realestate/this-weekends-open-houses-in-dc-maryland-virginia/2015/10/09/d0905852-6eb7-11e5-b31c-d80d62b53e28_story.html', 'tags': [], 'title': 'This weekend’s open houses in D.C., Maryland, Virginia', 'links': [{'href': 'https://www.washingtonpost.com/realestate/this-weekends-open-houses-in-dc-maryland-virginia/2015/10/09/d0905852-6eb7-11e5-b31c-d80d62b53e28_story.html', 'rel': 'alternate', 'type': 'text/html'}], 'guidislink': False} does not contain any content

The logging is new. We were failing on saving to Mongo because the content field was None and we couldn't encode that to get a unique hash. Now we don't fail, but these contentless posts just disappear (using what I have in my workspace).

@will2041
Copy link
Collaborator

Updating status months later:

  • We've switched to having the DistrictDataLabs account own the repo
  • No production push with the 3.5 code yet
  • Python 3.6 has been released in the interim...

Next steps:

  • Evaluate deployment process for repeatability
  • Try things out with Python 3.6
  • Update Docker image

@will2041
Copy link
Collaborator

Tried everything out with Python 3.6 locally and it all seems to work. I'm going to switch gears and look into deployment to see if I can get all this running (and repeatable/documented).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants