Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing books #1

Open
benjamingeer opened this issue Oct 10, 2019 · 3 comments
Open

Importing books #1

benjamingeer opened this issue Oct 10, 2019 · 3 comments

Comments

@benjamingeer
Copy link
Contributor

benjamingeer commented Oct 10, 2019

With the file the-city-of-god-vol-1.txt (1.3M of text and 170,580 standoff tags):

  • 2 minutes to create the resource
  • 5 minutes to read it back with all its markup

With the current settings in webapiJavaRunOptions in KnoraBuild.sbt (1G of heap space), Knora threw an OutOfMemoryError at the end of the Twirl template generateInsertStatementsForCreateValue.scala.txt. I changed this to -Xms3G -Xmx4G.

Then when GraphDB processed the insert, it also crashed with an OutOfMemoryError. I increased its heap space to -Xms3G -Xmx5G.

@benjamingeer
Copy link
Contributor Author

benjamingeer commented Oct 10, 2019

complete-works-of-shakespeare.txt (5.5M of text): can't be imported, because the byte array that would be needed to contain the SPARQL request to create the resource would exceed the maximum size of an integer in Java, which is 2G. In OpenJDK, this causes an integer overflow, resulting in a NegativeArraySizeException because of bug JDK-8188099.

@benjamingeer
Copy link
Contributor Author

benjamingeer commented Oct 10, 2019

Imported books whose text was 2 MB or less:

title author textLength standoffCount
The City of God, Volume I Aurelius Augustine "1429450"^^xsd:integer "170580"^^xsd:integer
Ulysses James Joyce "1671273"^^xsd:integer "222164"^^xsd:integer
A Tale of Two Cities Charles Dickens "845686"^^xsd:integer "106972"^^xsd:integer
The Mysterious Island Jules Verne "1225416"^^xsd:integer "159077"^^xsd:integer
The City of God, Volume II Aurelius Augustine "1493907"^^xsd:integer "180829"^^xsd:integer
The Canterbury Tales Geoffrey Chaucer "1691072"^^xsd:integer "212687"^^xsd:integer
Twenty Years After Alexandre Dumas, Pere "1525395"^^xsd:integer "209739"^^xsd:integer
Notre-Dame de Paris Victor Hugo "1172917"^^xsd:integer "153688"^^xsd:integer
Adventures of Huckleberry Finn, Complete Mark Twain (Samuel Clemens) "649502"^^xsd:integer "83701"^^xsd:integer
Ivanhoe Walter Scott "1221984"^^xsd:integer "156357"^^xsd:integer
The Federalist Papers Alexander Hamilton, John Jay, and James Madison "1243931"^^xsd:integer "147323"^^xsd:integer
Adventures of Sherlock Holmes A. Conan Doyle "646613"^^xsd:integer "84462"^^xsd:integer
Pride and Prejudice Jane Austen "769430"^^xsd:integer "85927"^^xsd:integer
The Idiot Fyodor Dostoyevsky "1503848"^^xsd:integer "187924"^^xsd:integer
Jane Eyre Charlotte Bronte "1150074"^^xsd:integer "134899"^^xsd:integer
Madame Bovary Gustave Flaubert "729520"^^xsd:integer "92523"^^xsd:integer
Hard Times Charles Dickens "650147"^^xsd:integer "84955"^^xsd:integer
Annals of the Turkish Empire, from 1591 to 1659 Mustafa Naima "1202285"^^xsd:integer "148615"^^xsd:integer
Dracula Bram Stoker "940050"^^xsd:integer "113273"^^xsd:integer
Emma Jane Austen "982327"^^xsd:integer "116913"^^xsd:integer
Little Women Louisa May Alcott "1127269"^^xsd:integer "137047"^^xsd:integer
Grimms’ Fairy Tales The Brothers Grimm "590019"^^xsd:integer "77981"^^xsd:integer
The Brothers Karamazov Fyodor Dostoyevsky "2148741"^^xsd:integer "272277"^^xsd:integer
The Scarlet Letter Nathaniel Hawthorne "548936"^^xsd:integer "66236"^^xsd:integer
Wuthering Heights Emily Bronte "727518"^^xsd:integer "83038"^^xsd:integer
Gulliver's Travels Jonathan Swift "644229"^^xsd:integer "76522"^^xsd:integer
The Social Cancer José Rizal "1114659"^^xsd:integer "136282"^^xsd:integer
The Republic Plato "1305740"^^xsd:integer "155832"^^xsd:integer
Our Mutual Friend Charles Dickens "2016245"^^xsd:integer "268073"^^xsd:integer
Little Dorrit Charles Dickens "2076555"^^xsd:integer "265140"^^xsd:integer
Moby Dick; or The Whale Herman Melville "1344452"^^xsd:integer "171837"^^xsd:integer
From the Earth to the Moon Jules Verne "614536"^^xsd:integer "73790"^^xsd:integer
Swann's Way Marcel Proust "1197050"^^xsd:integer "135968"^^xsd:integer
Great Expectations Charles Dickens "1107826"^^xsd:integer "137539"^^xsd:integer
The Iliad of Homer Homer "1225502"^^xsd:integer "160361"^^xsd:integer

@benjamingeer
Copy link
Contributor Author

In the end I gave Knora -Xmx8G and GraphDB -Xmx10G.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant