Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocation of xxx exceeds 10% of system memory #20

Open
kiang opened this issue Jan 1, 2020 · 2 comments
Open

Allocation of xxx exceeds 10% of system memory #20

kiang opened this issue Jan 1, 2020 · 2 comments
Labels
good first issue Good for newcomers

Comments

@kiang
Copy link

kiang commented Jan 1, 2020

執行的程式: https://github.com/kiang/bribes_data/blob/master/03_ckip.py
輸入的檔案(JFULL 欄位): https://github.com/kiang/bribes_data/blob/master/filter/200610/%E8%87%BA%E7%81%A3%E9%AB%98%E7%AD%89%E6%B3%95%E9%99%A2%E8%87%BA%E4%B8%AD%E5%88%86%E9%99%A2%E5%88%91%E4%BA%8B/TCHM%2C95%2C%E9%81%B8%E4%B8%8A%E8%A8%B4%2C1051%2C20061025%2C1.json

找了一下網路的說明,需要調整批次的大小,不知道一般會建議怎麼做?

@jacobvsdanniel
Copy link
Collaborator

jacobvsdanniel commented Jan 2, 2020

輸入的list裡面有太長的句子會很大地影響速度及佔記憶體,或許是這個原因。

可以考慮用換行斷句,例如:
pos([data['JFULL']]) -> pos(data['JFULL'].split("\n"))

@kiang
Copy link
Author

kiang commented Jan 2, 2020

比較麻煩的也許是法院判決書習慣透過換行進行資料的排版,如果逐行輸入會不會有大量破碎的詞句產生誤判?

@jacobvsdanniel jacobvsdanniel added the good first issue Good for newcomers label Feb 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants