Skip to content
Paul Houle edited this page May 22, 2014 · 7 revisions

Introduction

An empty instance of RDFEasy is a toolkit for loading RDF data into a triple store and packaging it as a virtual machine image in the AWS Cloud. When you launch a virtual machine from this image, you get a fully functional copy of OpenLink Virtuoso 7 Open Source Edition pre-loaded with data.

This page teaches you how to use an instance of RDFEasy that is pre-loaded.

RDFEasy is intended for use with R3-series machines, which are attached to high performance SSD storage. When RDFEasy boots up, database information is stored in Amazon EBS; with a billion-triple database, startup time could be in the range of 10 to 20 minutes. Although the system answers simple queries quickly on EBS, you'll get much better performance at queries and loading if you transfer data to the onboard SSD, a process that takes about an hour.

Checklist

  1. Launch AMI; keep track of the security certificate that you'll need to log into it
  2. Wait. It could take from 10 or 20 minutes for the system to initialize for the first time. If you log into the system before it is ready, you won't see the database login credentials before the prompt.
  3. Log into the AWS instance as the "ubuntu" user using your security key.
  4. If the system is ready to use, the database login credentials are displayed at the command line. You are free to change these manually, but this will break automated scripts in the RDFEasy package if you choose to use them.
  5. Make queries

Making queries

There are a number of ways to make queries against RDFEasy.

"sql" command

RDFEasy commands are included in the $PATH of the ubuntu user. The sql command is a wrapper for the isql command that uses the stored database credentials. You can type Virtuoso SQL commands into this program, and run SPARQL queries by starting with the word sparql

ubuntu@ip-10-81-158-211:~$ sql
Connected to OpenLink Virtuoso
Driver: 07.10.3208 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> sparql select count(*) as ?cnt { ?s ?p ?o . } ;
cnt
INTEGER
_______________________________________________________________________________

776100175

1 Rows. -- 2344 msec.

web interface

OpenLink Virtuoso operates a web interface at

http://your-ip:8890/

if you log into this with your web browser and the dba credentials that you see when you log in as the ubuntu user, you get access to a rich administrative interface. If you want to do SPARQL queries, click on the Database tab, then on the Interactive SQL tab and prefix your queries with the keyword sparql.

Screenshot

SPARQL protocol endpoint

A SPARQL protocol endpoint is available at

http://your-ip:8890/sparql-auth

this endpoint supports HTTP digest authentication. The following code snippet is used to make a query in Java with the Jena framework.

  String service="http://54.196.0.151:8890/sparql-auth";
  Query q= QueryFactory.create("select (count(*) as ?cnt) { ?s ?p ?o . }");
  HttpAuthenticator a =new SimpleAuthenticator("dba","EvyWhb1UWaLK5wfD".toCharArray());
  QueryExecution x= QueryExecutionFactory.sparqlService(service, q, a);
  ResultSet s=x.execSelect();

  int count=s.next().get("cnt").asLiteral().getInt();
  System.out.println("Count = "+count);

Note that the SPARQL query we're running is slightly different from what I ran previously because (1) Jena parses queries sent through this interface and (2) Virtuoso is more lenient than a standard SPARQL implementation. The following query works on all SPARQL implementations

  select (count(*) as ?cnt) { ?s ?p ? o . }

The specific difference here is that the standard requires the parenthesis enclosing the expression and name in the select clause but Virtuoso lets you get away without it.

By default, OpenLink Virtuoso publishes a public SPARQL endpoint at

http://your-ip:8890/sparql

but this is disabled by RDFEasy in the packaging process. If you wish, you can enable the SPARQL endpoint by typing the shell command

enable_public_sparql

ODBC access

You can make SQL connections to Virtuoso on port 1111 through ODBC or JDBC. See the driver documentation for more details.