Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htdig cannot crawl urls with Hindi characters #14

Open
andy5995 opened this issue Nov 12, 2017 · 2 comments
Open

htdig cannot crawl urls with Hindi characters #14

andy5995 opened this issue Nov 12, 2017 · 2 comments

Comments

@andy5995
Copy link
Contributor

When running jekyll locally, this was output in the console when I ran htdig -i. It crawled the jekyll web server running at http://localhost:4000

[2017-11-12 14:44:22] ERROR bad URI `/tag/कैनबिस/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/अध-ययन/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/मनःचिकित-सा/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/कोगनीटिव/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/दवा/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/चिकित-सा/'
@martijndeb
Copy link
Collaborator

I guess the whole String/Retriever/HtWord* should be converted to be able to handle Unicode with iconv? Not experienced enough to take this on tho.

@andy5995
Copy link
Contributor Author

But that's a great hint though (if you're right). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants