Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128) #3

Closed
Kristinita opened this issue Apr 10, 2017 · 6 comments
Assignees

Comments

@Kristinita
Copy link

1. Summary

I get stack trace, if I use deadlinks plugin.

2. Settings

My project — https://github.com/Kristinita/KristinitaPelican,
Nas-Izu.md file — https://github.com/Kristinita/KristinitaPelican/blob/master/content/Giologica/Nas-Izu.md with Cyrillic symbols.

Part of my pelicanconf.py file:

PLUGIN_PATHS = ['pelican-plugins']
PLUGINS = [
    'pagefixer',
    'pelican_javascript',
    'section_number', 'interlinks', 'deadlinks'
]

DEADLINK_VALIDATION = True

DEADLINK_OPTS = {
    'archive': True,
    'classes': ['custom-class1', 'disabled'],
    'labels': True
}

3. Steps to reproduce

I run command in terminal:

pelican content --debug > DeadlinkDebug.txt 2>&1

See full output on Gist — https://gist.github.com/Kristinita/63c81829c196afd7dc68cbe5e3dba12a.

4. Expected behavior

Not stack trace.

5. Actual behavior

ERROR: Could not process Giologica\Nas-Izu.md
  | UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128)
  |___
  | Traceback (most recent call last):
  |   File "c:\python36\lib\site-packages\pelican\generators.py", line 629, in generate_context
  |     context_sender=self)
  |   File "c:\python36\lib\site-packages\pelican\readers.py", line 572, in read_file
  |     context=context)
  |   File "c:\python36\lib\site-packages\pelican\contents.py", line 153, in __init__
  |     signals.content_object_init.send(self)
  |   File "c:\python36\lib\site-packages\blinker\base.py", line 267, in send
  |     for receiver in self.receivers_for(sender)]
  |   File "c:\python36\lib\site-packages\blinker\base.py", line 267, in <listcomp>
  |     for receiver in self.receivers_for(sender)]
  |   File "D:\Kristinita\pelican-plugins\deadlinks\deadlinks.py", line 163, in content_object_init
  |     avail, success, code = get_status_code(url)
  |   File "D:\Kristinita\pelican-plugins\deadlinks\deadlinks.py", line 32, in get_status_code
  |     urlopen(url)
  |   File "c:\python36\lib\urllib\request.py", line 223, in urlopen
  |     return opener.open(url, data, timeout)
  |   File "c:\python36\lib\urllib\request.py", line 526, in open
  |     response = self._open(req, data)
  |   File "c:\python36\lib\urllib\request.py", line 544, in _open
  |     '_open', req)
  |   File "c:\python36\lib\urllib\request.py", line 504, in _call_chain
  |     result = func(*args)
  |   File "c:\python36\lib\urllib\request.py", line 1346, in http_open
  |     return self.do_open(http.client.HTTPConnection, req)
  |   File "c:\python36\lib\urllib\request.py", line 1318, in do_open
  |     encode_chunked=req.has_header('Transfer-encoding'))
  |   File "c:\python36\lib\http\client.py", line 1239, in request
  |     self._send_request(method, url, body, headers, encode_chunked)
  |   File "c:\python36\lib\http\client.py", line 1250, in _send_request
  |     self.putrequest(method, url, **skips)
  |   File "c:\python36\lib\http\client.py", line 1117, in putrequest
  |     self._output(request.encode('ascii'))
  | UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128)

6. Environment

Operating system and version:
Windows 10 Enterprise LTSB 64-bit EN
Python:
3.6.1
Pelican:
3.7.1
BeautifulSoup4:
4.5.3

Thanks.

@silentlamb
Copy link
Collaborator

Thank you for your bug reports, I love how detailed they are.

The reason of this stack trace was this link: http://www.wikireality.ru/wiki/Оффтопик not being "percent-encoded" before passing it to urlopen.

@silentlamb silentlamb self-assigned this Apr 10, 2017
@Kristinita
Copy link
Author

Kristinita commented Apr 11, 2017

@silentlamb , maybe this Stack Overflow answer help you.

Thanks.

silentlamb added a commit that referenced this issue Apr 17, 2017
This change will replace urllib with requests (additional depencency).
The reason is to handle unicode related issues in much easier way,
but also to make  Python 2.7 and Python 3.5 compatibility issues go
away.

This change also adds support (enabled by default) to make timeouts
as skipped dead links (logged, but ignored).

Hopefuly this fixes #3.
@silentlamb
Copy link
Collaborator

silentlamb commented Apr 17, 2017

@Kristinita I've tested the fix and everything seemed to work properly, but I got two requests if you don't mind.

  1. Check from your side whether the current master branch resolves the stack trace bug (just a sanity check, if it doesn't work please reopen the issue)
  2. Check whether all or some of the issues from [Bug] Warnings for good links #4 are gone or not (I've changed request handling a bit:, timeouts, SSL errors, etc)

@Kristinita
Copy link
Author

@silentlamb, after update I get similar stack trace for all my articles and pages. Example:

ERROR: Could not process Sublime-Text\ValeriyaSpeller.md
  | TypeError: '>=' not supported between instances of 'NoneType' and 'int'
  |___
  | Traceback (most recent call last):
  |   File "c:\python36\lib\site-packages\pelican\generators.py", line 523, in generate_context
  |     context_sender=self)
  |   File "c:\python36\lib\site-packages\pelican\readers.py", line 572, in read_file
  |     context=context)
  |   File "c:\python36\lib\site-packages\pelican\contents.py", line 153, in __init__
  |     signals.content_object_init.send(self)
  |   File "c:\python36\lib\site-packages\blinker\base.py", line 267, in send
  |     for receiver in self.receivers_for(sender)]
  |   File "c:\python36\lib\site-packages\blinker\base.py", line 267, in <listcomp>
  |     for receiver in self.receivers_for(sender)]
  |   File "D:\Kristinita\pelican-plugins\deadlinks\deadlinks.py", line 183, in content_object_init
  |     if code >= 400 and code < 500:
  | TypeError: '>=' not supported between instances of 'NoneType' and 'int'

See full deadlinks output on Gist — https://gist.github.com/Kristinita/a2be9ec597752c9934f4e68cbd67908d.

Thanks.

@silentlamb silentlamb reopened this Apr 21, 2017
@silentlamb
Copy link
Collaborator

Ok, seems like I haven't tested the time out path properly (which is strange because I though I did). The bug occurs because of stupid typo:

availibility = False, instead of availibility = False

(hopefully...) fixed.

One more thing: I've published two hidden tuning params as these may vary from person to person: timeout duration and flag indicating whether to make each timeout fail (dead links) or just log the fact it was not available (not dead link). By default timeouts are skipped and duration is set to 1000 ms.

@Kristinita
Copy link
Author

This problem fix for me, I close issue.

Thanks for a responsible approach to development!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants