-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file-based reconciliation service hangs #5
Comments
I'll look into it ASAP. |
I have the same issue with file-based reconciliation service. I can not add a nt file!? |
I'm working on it. Updating some libraries in the rdf extension caused all this havoc. |
@erajabi can you please provide more details and your nt file or a part of it? I was able to add Locations from NY Times data as a reconciliation service. |
I provided a list of countries, which I got from the DBpedia in nt or RDF/XML. I add the file to the LODrefine. It hangs on "Adding new reconciliation service" status. You can find the example in DBpedia by ruuning this query . I exported the file in NT or RDF/XML with 388K size. |
@erajabi Unfortunately DBpedia query results are not formatted in such a way to be useful for file-based reconciliation. Please take a look at one of the datasets from NY Times and compare them to the RDF/XML output of the query you tried to use. You'll see the difference in the structure of the file. While NY Times data can be imported, LODRefine cannot parse DBpedia result. @paulzh When using large datasets for file-based reconciliation it can take very long before file is indexed (hence the spinning wheel), mostly due to Jena library this extension relies upon. I updated the library and tried to speed up the import as much as possible. Some performance issues might still remain. Please note that if you have large datasets you want to reconcile with, it might be better to install open-source version of Virtuoso and set up a sparql endpoint for your data. Refine itself awesome tool, but it has its limitations and so do extensions. I'm closing this issue. |
First let make this clear: you are absolutely right about the file being valid, but the structure of the file is not suitable for the import. <rdf:RDF xmlns:res="http://www.w3.org/2005/sparql-results#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:nodeID="rset">
<rdf:type rdf:resource="http://www.w3.org/2005/sparql-results#ResultSet" />
<res:resultVariable>country</res:resultVariable>
<res:resultVariable>name</res:resultVariable>
<res:resultVariable>label</res:resultVariable>
<res:solution rdf:nodeID="r0">
<res:binding rdf:nodeID="r0c0"><res:variable>country</res:variable><res:value rdf:resource="http://dbpedia.org/resource/Alamannia"/></res:binding>
<res:binding rdf:nodeID="r0c1"><res:variable>name</res:variable><res:value xml:lang="en">Alamannia</res:value></res:binding>
</res:solution>
<res:solution rdf:nodeID="r1">
<res:binding rdf:nodeID="r1c0"><res:variable>country</res:variable><res:value rdf:resource="http://dbpedia.org/resource/Alamannia"/></res:binding>
<res:binding rdf:nodeID="r1c1"><res:variable>name</res:variable><res:value xml:lang="en">Alamannia</res:value></res:binding>
</res:solution> And this is how it should be formatted for import (notice <rdf:Description> tags): <?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:blah="http://example.com/blah#">
<rdf:Description rdf:about="http://data.nytimes.com/N7749429140577003771">
<rdfs:label>Afghanistan</rdfs:label>
<blah:test rdf:resource="http://localhost:3333/capital_city/Tahiti"/>
</rdf:Description>
<rdf:Description rdf:about="http://data.nytimes.com/66220885864277068001">
<rdfs:label>Albania</rdfs:label>
<blah:test rdf:resource="http://localhost:3333/capital_city/Tirana"/>
</rdf:Description> Triples (from kasabi) you pasted are from another dataset and now I'm not quite sure what you're trying to achieve. If you explain your case into more details, then maybe I can provide some suggestions what to do and how to do it. About reconciliation time: it depends on the service and amount of data you're trying to reconcile. It can take from few minutes to half an hour or even more. Usually, this has little to do with LODRefine (and rdf-extension). |
It is clear that the RDF file I sent is not from DBpedia. That was a sample and valid RDF file which takes much time to add. I just wanted to be sure whether the adding file works correctly or not? As I mentioned, it hangs. |
Thank you for details. From what you wrote LODRefine should be able to handle your data. Some issues you're experiencing are probably due to outdated LODRefine Windows binary on Sourceforge. I plan to update it in next few days. |
Thanks in advance. |
@erajabi Requests to DBpedia have been timing out (or taking several minutes to result results) if reconciliation type was anything but owl:Thing. I tested reconciliation queries (used in reconciliation requests) in DBpedia web interface and got same performance issues as in LODRefine. Funny thing is that these same requests worked just fine in the past. Removing a small part of a SPARQL query improved performance for types other that owl:Thing with DBpedia. I was able to reconcile your dataset. I did some cleaning first. After that I reconciled data with DBpedia twice: once with type set to http://dbpedia.org/ontology/Place and once with type set to http://dbpedia.org/ontology/Country. Cleaned and reconciled dataset is available here. If you have any questions related to reconciliation or resulting dataset, please email me, I'll be glad to help. |
@sparkica: Thanks for your time and efforts. I found this tool very useful. I could import the reconciled dataset and now I am going through the data. Regarding the timeout issue, could you please tell me when I can have the new version? Did you resolve the timeout issue? How could you reconcile the data against the DBpedia? Did the DBpedia proposed dbo:Country or Place to you? or You just wrote the type specifically? It would be great if you can clarify them, as I want to test other data as well, and want to do the reconciliation by myself. Absolutely you helped on this issue.I really appreciate. |
@erajabi Code here has been updated. If you have JDK (javac) installed, you can clone the repository, build it and you should be able to run it with refine.bat. You'll have to wait a little bit longer for new binary version for Windows as I plan to fix some more issues before building it. |
Unfortunately after adding a simple file in 7.0.1 (to add filebased section) containing following sample rows : .... |
RDF-extension uses Jena for reading RDF files (to be later used for reconciliation). Your sample rows return Jena SAXParseException: Content is not allowed in prolog. RDF-extension uses Jena library to read files in one of the RDF formats. I updated it to proper Turtle (see below) and saved it into test.ttl:
I was able to import test.ttl and use it to reconcile from file. |
On my opinion, if it uses the Jena it should be able to read either NT or ttl files. I am again testing a large N-TRIPLE file (around 500MB) and it seems takes looong time. Can we have a status bar of reading file (like percentage of reading.. or something like that)? I think if the file can not be read, the tool should inform user as well.right? |
When trying to add a reconciliation service from a local RDF file (turtle syntax) the system opens the import (spinning wheel) pane, but it does not go past this window.
The text was updated successfully, but these errors were encountered: