Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze (memory/CPU chewed up) when trying to spell long string #18

Open
dylan-chong opened this issue Apr 27, 2019 · 1 comment
Open

Comments

@dylan-chong
Copy link

spell('ç§�ã�Ÿã�¡ã�¯ãƒ‰ãƒ¼ãƒ“ルã�«2009å¹´7月ã�«4泊ã�—ã�¾ã�—ã�Ÿã€‚ 地下鉄ã�‹ã‚‰2分ã€�ブãƒ\xadードウェイã�¾ã�§æ\xad©ã�„ã�¦ã‚‚ã€�ã��ã‚“ã�ªã�«æ°—ã�«ã�ªã‚Šã�¾ã�›ã‚“ã�§ã�—ã�Ÿã�‹ã‚‰ã€�立地ã�§ã‚‚ã�™ã�¦ã��ã�ªãƒ›ãƒ†ãƒ«ã�§ã�™ã€‚経営者ã�‹ã�¨æ€�ã‚�れるè€�夫婦ã�¨å¨˜ã�•ã‚“ã€�ã�‚ã�¨2人ã�®ã‚¹ã‚¿ãƒƒãƒ•ã�«å‡ºä¼šã�„ã�¾ã�—ã�Ÿã€‚ ゴスペルã�®æ‰€åœ¨åœ°ã‚’å°‹ã�\xadã�Ÿã‚‰ã€�ãƒ�ットã�§åœ°å›³ã‚’プリントã�—ã�¦ã��ã‚Œã�¦ã€�親切ã�«èª¬æ˜Žã�—ã�¦ã��ã‚Œã�¾ã�—ã�Ÿã€‚ æ\xad´å�²ã‚’ä¿�ã�¨ã�†ã�¨ã�¨ã�—ã�¦ã�„るニューヨークをæ\xad©ã��æ‹\xa0点ã�¨ã�—ã�¦ã€�最é�©ã�ªãƒ›ãƒ†ãƒ«ã�§ã�™ã€‚ スターãƒ�ックスã€�マクドナルドã€�ã‚\xadングãƒ�ーガも近ã��ã�«ã�‚ã‚Šã€�エンパイアステートビル迄10分弱ã�§ã�™ã�Œã€�コリアã�®çµŒå–¶ã�™ã‚‹ã‚³ãƒ³ãƒ“ニ兼飲食店も数軒有りã€�ホテルã�®è£�手ã�®é€šã‚Šã�«ã�¯æ¶ˆè²»ç¨Žç„¡æ–™ã�®ã‚³ãƒ³ãƒ“ニもã�‚ã�£ã�¦ä¾¿åˆ©ã�§ã�—ã�Ÿã€‚ 100å¹´ã�®æ\xad´å�²ã�¨ã�„ã�£ã�¦ã‚‚改装ã�•ã‚Œã�¦ã�„ã�¦æ¸…æ½”ã�ªãƒ›ãƒ†ãƒ«ã�§ã�™ã€‚手動ã�®ã‚¨ãƒ¬ãƒ™ãƒ¼ã‚¿ã‚‚å�°è±¡ã�«æ®‹ã‚Šã�¾ã�—ã�Ÿã€‚ 冷蔵庫ã�Œã�ªã�„点ã�¨ã€�ウインドウタイプã�®ã‚¨ã‚¢ã‚³ãƒ³ã�Œã�¡ã‚‡ã�£ã�¨ä¸�便ã�ªç‚¹ä»¥å¤–ã�Šå‹§ã‚�ã�§ã�™ã€‚')

This causes memory to chew up to 6GB+ in a matter of seconds.

Took me all day to figure this out!

Would be good to include some sort of blacklist of weird characters and prevent the mysterious memory hog, eg throw an error (and provide some api to check if a string is spellable)

@dylan-chong dylan-chong changed the title Livelock (memory chew) when trying to spell weird character Freeze (memory/CPU chewed up) when trying to spell weird character Apr 27, 2019
@dylan-chong
Copy link
Author

This seems to happen with long words like

spell('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')

Maybe the length of the word screws it up

@dylan-chong dylan-chong changed the title Freeze (memory/CPU chewed up) when trying to spell weird character Freeze (memory/CPU chewed up) when trying to spell long string Apr 27, 2019
filyp added a commit to filyp/autocorrect-deprecated that referenced this issue Sep 15, 2019
Optimize by using generators, so the possible typos don't have to be stored in memory.

For long english words:

%timeit spell('disproporttionatelly')
before: 1 s ± 7.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
after: 762 ms ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit spell('indistimguishabble')
before: 821 ms ± 5.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each
after: 619 ms ± 10 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

More importantly, I'm working on a support for polish, where alphabet is larger, and words tend to be longer, so the change is significant:

%timeit spell('gżegrzółka')
before: 1.51 s ± 67.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
after: 370 ms ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit spell('anarchokolektuwistycznychh')
before: 3.83 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
after: 2.2 s ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

It also solves issue phatpiglet#18
Before, running spell('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
would consume all RAM it could. Now it takes only 8KB more than idle, and won't freeze.

When using pypy, difference gets even more dramatic:

%timeit spell('disproporttionatelly')
before: 668 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
after: 377 ms ± 25.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit spell('indistimguishabble')
before: 585 ms ± 15.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
after: 330 ms ± 29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit spell('gżegrzółka')
before: Gets killed because it eats up too much RAM
before: 166 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit spell('anarchokolektuwistycznychh')
before: Gets killed because it eats up too much RAM
after: 994 ms ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant