Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zimdump stops if cannot create redirect because of invalid filename #190

Open
asashnov opened this issue Nov 20, 2020 · 9 comments
Open
Assignees
Milestone

Comments

@asashnov
Copy link
Contributor

zim-tools(5ed81f8)
zimdump stops if cannot create redirect symlink because of an invalid filename. When dump files it catches the exception and continues.

$ ~/oz/zim-tools/build/src/zimdump dump --redirect --dir=wikipedia_en_100_maxi    ZIM.files/wikipedia_en_100_maxi_2020-05.zim
Exception: Error creating symlink from United_States to wikipedia_en_100_maxi/A/United_States_of_America/Introduction

$ du -hs wikipedia_en_100_maxi
1,7M	wikipedia_en_100_maxi

$ find wikipedia_en_100_maxi -type f | wc -l
16

$ find wikipedia_en_100_maxi -type f 
wikipedia_en_100_maxi/A/List_of_prime_ministers_of_India
wikipedia_en_100_maxi/A/List_of_United_States_cities_by_population
wikipedia_en_100_maxi/A/Protein
wikipedia_en_100_maxi/A/Hippopotamus
wikipedia_en_100_maxi/A/Spider
wikipedia_en_100_maxi/A/Association_football
wikipedia_en_100_maxi/-/style.css
wikipedia_en_100_maxi/-/j/js_modules/script.js
wikipedia_en_100_maxi/-/j/js_modules/images_loaded.min.js
wikipedia_en_100_maxi/-/j/js_modules/article_list_home.js
wikipedia_en_100_maxi/-/j/js_modules/node_module/details-element-polyfill/dist/details-element-polyfill.js
wikipedia_en_100_maxi/-/j/js_modules/masonry.min.js
wikipedia_en_100_maxi/-/s/css_modules/inserted_style.css
wikipedia_en_100_maxi/-/s/css_modules/style.css
wikipedia_en_100_maxi/-/s/css_modules/content.parsoid.css
wikipedia_en_100_maxi/-/s/css_modules/mobile_main_page.css

When without --redirect option zimdump handles the exceptions:

$ ~/oz/zim-tools/build/src/zimdump dump --dir=wikipedia_en_100_maxi    ZIM.files/wikipedia_en_100_maxi_2020-05.zim
Wrote wikipedia_en_100_maxi/A/United_States_of_America/Introduction to wikipedia_en_100_maxi/_exceptions/A%2fUnited_States_of_America%2fIntroduction
Wrote wikipedia_en_100_maxi/A/United_States_of_America/OldPage to wikipedia_en_100_maxi/_exceptions/A%2fUnited_States_of_America%2fOldPage
Wrote wikipedia_en_100_maxi/A/United_States to wikipedia_en_100_maxi/_exceptions/A%2fUnited_States


$ du -hs wikipedia_en_100_maxi
66M	wikipedia_en_100_maxi

$ find wikipedia_en_100_maxi -type f | wc -l
3710
@FledgeXu
Copy link

I met the same problem.

@kelson42 kelson42 added the bug label Feb 7, 2021
@kelson42 kelson42 pinned this issue Feb 7, 2021
@kelson42 kelson42 self-assigned this Feb 7, 2021
@kelson42
Copy link
Contributor

kelson42 commented Feb 7, 2021

Similar to #213

@kelson42 kelson42 added this to the 3.2.0 milestone Sep 25, 2022
@adamlamar
Copy link

I looked into this problem but I'm not sure what a good solution would be. Because a file and directory can't share the same name on linux, some redirects in wikipedia tend to break with the symlink approach. Example:

% => Percent_sign
%/% => Environment_variables

If % is created first, then a file exists called %. When we go to create the file %/%, we first will need to create a directory called %, and it fails.

@kelson42
Copy link
Contributor

@adamlamar There is a file with all exceptions. Not sire about the details, but it should be listed in the exception file IMO.

@adamlamar
Copy link

I'm able to see and understand the printed exception. The problem is the --redirect option can't save everything due to shared file and directory names. Either the symlink creation fails, or the content creation fails.

The easy fix would be to simply print out the error and move on without throwing an exception. However this leaves the content and/or redirects in a partial extracted state on the filesystem.

@nickhuang99
Copy link

@kelson42 Can developer take a look my patch at 190 so that this exception can be solved? Thanks

@kelson42
Copy link
Contributor

@nickhuang99 yes, but please create a PR.

@nickhuang99
Copy link

nickhuang99 commented Jun 18, 2024

@kelson42 sure. Created.

@nickhuang99
Copy link

@kelson42 hi, I have another PR to solve one case of this issue when the failure of creating symlink due to it has been created before. The reason of symlink being created before can be due to repeatedly run dump or merge two zim together etc.

@kelson42 kelson42 modified the milestones: 3.4.2, 3.5.0 Jul 8, 2024
@kelson42 kelson42 modified the milestones: 3.5.0, 3.6.0 Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants