-
Notifications
You must be signed in to change notification settings - Fork 273
DBpedia Live
To setup a running DBpedia Live instance you need to setup a local Wikipedia mirror. We use the mwdumper for this job.
First download the following files from the latest Wikipedia dump for the language you desire.
- pages-articles.xml.bz2
- imagelinks.sql.gz
- image.sql.gz
- langlinks.sql.gz
- templatelinks.sql.gz
and unzip the latest mediawiki into a folder visible to your Apache server.
You can also use this shell script to download the latest dumps for your language.
Edit the maintenace/tables.sql file of your mediawiki installation and append DEFAULT CHARACTER SET binary
after every table definition. E.g.:
CREATE TABLE /*_*/user_former_groups (
-- Key to user_id
ufg_user int unsigned NOT NULL default 0,
ufg_group varbinary(255) NOT NULL default ''
) /*$wgDBTableOptions*/ DEFAULT CHARACTER SET binary ;`
Create a MySQL database, e.g. dbpedia_live
, and load the tables.sql in that database. Then run the following command:
$ java -jar mwdumper.jar --format=sql:1.5 <dump-lang>-pages-articles.xml.bz2 | mysql -u <username> -p -f --default-character-set=utf8 <databasename>
If the import fails due to UTF-8 errors, try to "clean" the XML dump with the following command iconv -f utf-8 -t utf-8 -c
(or try a different dump). After a successful import, load the other tables dumps (image, imagelinks, langlinks, templatelinks) into the database.
Since your Wikipedia mirror needs to keep track with the changes made to its source, you need to install the OAI extension on your clone.
$ cd /var/www/wikipedia/extensions/
$ git clone http://git.wikimedia.org/git/mediawiki/extensions/OAI.git
In your browser, go to the page http://localhost/wikipedia/ (or wherever you installed mediawiki) and configure your copy of wikipedia. Download LocalSettings.php and place it in your wikipedia folder.
The configuration happens in the LocalSettings.php file (adjust the credentials for the OAI source repository):
require_once("$IP/extensions/OAI/OAIHarvest.php");
$oaiSourceRepository = "http://<user>:<password>@<oaiserverurl>";
# OAI repository for update server
require_once("$IP/extensions/OAI/OAIRepo.php");
$oaiAuth = true;
$wgDebugLogGroups['oai'] = '<pathToLog>/oai.log';
Import some SQL to your Wikipedia database.
$ mysql dbpedia_live -uroot -p < OAI/update_table.sql (Note, importing this file can take a long time and/or give errors, best approach is to run the create table statement from the commandline, followed by the insert/updates statements)
$ mysql dbpedia_live -uroot -p < OAI/oaiuser_table.sql
$ mysql dbpedia_live -uroot -p < OAI/oaiharvest_table.sql
$ mysql dbpedia_live -uroot -p < OAI/oaiaudit_table.sql
$ echo "INSERT INTO /*$wgDBprefix*/oaiuser(ou_name, ou_password_hash) VALUES ('dbpedia', md5('<aPasswordForLocalOAI>') );" | mysql dbpedia_live -u root -p
Before we forget it, create a new file pw.txt that contains your <aPasswordForLocalOAI>
.
Import the previously downloaded files to the database. If you're using phpmyadmin you'll most likely have to edit php.ini to increase the maximum POST size and my.ini to increase the maximum memory available. The BigDump script can help when importing big files.
You can also use this shell script to load all the relevant files into your database.
$ cd /var/www/wiki/extensions/OAI/
(or equivalent, this is however the path I recommend)
$ php oaiUpdate.php
(this actually starts the synchronization)
Attention ! The synchronization does not include any kind of delay, so you will hammer the wikiproxy. Since the wikiproxy is relatively slow it will stop responding and the oaiUpdate.php will crash with an error. (This plugins needs to be refactored in order to introduce better exception handling, but for now the following hack works.)
In order to introduce a delay go to the oaiHarvest.php file:
look for the following line:
"function fetchURL( $url, &$resultCode ) {"
introduce the following line after it:
"sleep(delay)"
(where delay is the number of seconds between update calls. I recommend at least a value of 10.)
More information about how to install/setup and configure the OAI extension can be found here:
- http://www.mediawiki.org/wiki/Extension_talk:OAIRepository
- http://www.sciencemedianetwork.org/wiki/Mediawiki/OAI_mirror/OAIRepository
- http://wiki.dbpedia.org/DBpediaLiveTutorial?show_comments=1
$ git clone git://github.com/dbpedia/extraction-framework.git
$ cd extraction-framework
$ mvn clean install # Compiles the code
Adjust the settings in live.ini and live.xml according to your language and needs. Put the pw.txt file in the live folder. (Examples of preconfigured files for German can be found here)
Create a MySQL database for caching extracted triples, e.g. dbpedia_live_cache
, and load the dbstructure.sql in that database.
$ mysql dbpedia_live_cache -uroot -p < dbstructure.sql
(NOTE: in case you get an error when importing the sql file, change the "SET SESSION" entries with "SET GLOBAL" in the dbstructure.sql file)
You must place the following in you apache website configuration
# Enable cross origin policy
Header set Access-Control-Allow-Origin "*"
# Avoid open your server to proxying
ProxyRequests Off
# Let apache pass the original host not the ProxyPass one
ProxyPreserveHost On
ProxyTimeout 1200
# Virtuoso / DBpedia VAD proxying
ProxyPass /conductor http://localhost:XXXX/conductor
ProxyPassReverse /conductor http://localhost:XXXX/conductor
ProxyPass /about http://localhost:XXXX/about
ProxyPassReverse /about http://localhost:XXXX/about
ProxyPass /category http://localhost:XXXX/category
ProxyPassReverse /category http://localhost:XXXX/category
ProxyPass /class http://localhost:XXXX/class
ProxyPassReverse /class http://localhost:XXXX/class
ProxyPass /data4 http://localhost:XXXX/data4
ProxyPassReverse /data4 http://localhost:XXXX/data4
ProxyPass /data3 http://localhost:XXXX/data3
ProxyPassReverse /data3 http://localhost:XXXX/data3
ProxyPass /data2 http://localhost:XXXX/data2
ProxyPassReverse /data2 http://localhost:XXXX/data2
ProxyPass /data http://localhost:XXXX/data
ProxyPassReverse /data http://localhost:XXXX/data
ProxyPass /describe http://localhost:XXXX/describe
ProxyPassReverse /describe http://localhost:XXXX/describe
ProxyPass /delta.vsp http://localhost:XXXX/delta.vsp
ProxyPassReverse /delta.vsp http://localhost:XXXX/delta.vsp
ProxyPass /fct http://localhost:XXXX/fct
ProxyPassReverse /fct http://localhost:XXXX/fct
ProxyPass /isparql http://localhost:XXXX/isparql
ProxyPassReverse /isparql http://localhost:XXXX/isparql
ProxyPass /ontology http://localhost:XXXX/ontology
ProxyPassReverse /ontology http://localhost:XXXX/ontology
ProxyPass /page http://localhost:XXXX/page
ProxyPassReverse /page http://localhost:XXXX/page
ProxyPass /property http://localhost:XXXX/property
ProxyPassReverse /property http://localhost:XXXX/property
ProxyPass /rdfdesc http://localhost:XXXX/rdfdesc
ProxyPassReverse /rdfdesc http://localhost:XXXX/rdfdesc
ProxyPass /resource http://localhost:XXXX/resource
ProxyPassReverse /resource http://localhost:XXXX/resource
ProxyPass /services http://localhost:XXXX/services
ProxyPassReverse /services http://localhost:XXXX/services
ProxyPass /snorql http://localhost:XXXX/snorql
ProxyPassReverse /snorql http://localhost:XXXX/snorql
ProxyPass /sparql-auth http://localhost:XXXX/sparql-auth
ProxyPassReverse /sparql-auth http://localhost:XXXX/sparql-auth
ProxyPass /sparql http://localhost:XXXX/sparql
ProxyPassReverse /sparql http://localhost:XXXX/sparql
ProxyPass /statics http://localhost:XXXX/statics
ProxyPassReverse /statics http://localhost:XXXX/statics
ProxyPass /void http://localhost:XXXX/void
ProxyPassReverse /void http://localhost:XXXX/void
ProxyPass /wikicompany http://localhost:XXXX/wikicompany
ProxyPassReverse /wikicompany http://localhost:XXXX/wikicompany
Install Virtuoso and configure it as follows, where XX is the ISO 639-1 code of your clone.
DB.DBA.RDF_GRAPH_GROUP_CREATE ('http://XX.dbpedia.org',1);
DB.DBA.RDF_GRAPH_GROUP_INS ('http://XX.dbpedia.org','http://live.XX.dbpedia.org');
DB.DBA.RDF_GRAPH_GROUP_INS ('http://XX.dbpedia.org','http://static.XX.dbpedia.org');
DB.DBA.RDF_GRAPH_GROUP_INS ('http://XX.dbpedia.org','http://XX.dbpedia.org/resource/classes#');
The same way with the normal xx.dbpedia.org installation. Check https://github.com/dbpedia/dbpedia-vad-i18n for details.