Skip to content

Indexing Backend Overview

Dan LaRocque edited this page Jul 24, 2013 · 20 revisions

Titan provides two types of index systems: The standard index and the external index interface which supports an arbitrary number of external indexing backends to provide support for geo, numeric range, and full-text search.

The standard index is very fast and always available without any further configuration, but only supports exact index matches. In other words, the standard index can only retrieve vertices and edges by matching one of their properties exactly.

The external index interface is more flexible and supports retrieving vertices and edges by bounding their geo-location, properties that fall into a numeric range or matching tokens in full text. The external index interface connects to separate systems as index backends for indexing and retrieval similarly to how storage backends are used for persistence. Index backends need to be configured in the graph configuration before they can be used.

The choice of index backend determines which search features are supported, as well as the performance and scalability of the index. Titan currently supports two index backends: Elastic Search and Lucene.

Indexing Vertices and Edges

To retrieve vertices and edges by their property values through an index lookup, the property key must be registered with the index when it is defined, that is, before its first use. The index is registered through the TypeMaker when the type is constructed.

For example, the following property key definitions register all four indexed properties from the Graph of the Gods example with the respective index backends.

// 1) Index vertices by their unique name property
graph.makeType().name("name").dataType(String.class).indexed(Vertex.class).unique(Direction.BOTH).makePropertyKey();
// 2) Index vertices by their age property
graph.makeType().name("age").dataType(Integer.class).indexed("search", Vertex.class).unique(Direction.OUT).makePropertyKey();
// 3) Index edges by their geo-location 
graph.makeType().name("place").dataType(Geoshape.class).indexed("search", Edge.class).unique(Direction.OUT).makePropertyKey();
// 4) Index vertices and edges by their full-text reason property
graph.makeType().name("reason").dataType(String.class).indexed("search",Vertex.class).indexed("search",Edge.class).unique(Direction.OUT).makePropertyKey();
  1. The name property key is indexed for vertices using Titan’s standard index. The standard index only supports exact match index retrievals but is very fast. When no index name is specified, the standard index is used.
  2. The age property key is indexed for vertices using the configured external index backend with the name “search”. Since the data type of age is a number, property values are indexed as numbers and can be retrieved via numeric range search.
  3. The place property key is indexed for edges using the external index backend named “search”. Since the data type of place is Geoshape, property values are indexed as geo-location and can be retrieved via geo-searches such as circular region or bounding box search (depending on what the index backed supports).
  4. The reason property key is indexed for vertices and edges using the external index backend named “search”. Since the data type of reason is String, property values are indexed as text and be retrieved via full-text search such as string-containment search. When the string is indexed, it is tokenized.

When using Titan’s standard index, the name argument to the indexed() method is optional. An equivalent definition of the name property key, which identifies the standard index by its name, would be:

graph.makeType().name("name").dataType(String.class).indexed("standard",Vertex.class).unique(Direction.BOTH).makePropertyKey();

The name “standard” is always reserved for Titan’s standard index backend. External indexing backends may not be configured with this name.

Note, this section assumes that an index backend named search has been defined in the graph configuration. Read “Next Steps” to find out how to configure an index backend.

Querying an Index

After the property keys have been registered with the index backends and vertices or edges have been added to the graph with corresponding property values, those elements can be queried for using index retrievals. Graph.query() constructs and index query through method chaining and calling vertices() or edges() retrieves the vertices or edges matching the query using index retrievals.
Continuing with the Graph of the Gods, the following are index query examples:

// 1) Find vertices with the name "hercules"
g.query().has("name",EQUAL,"hercules").vertices()
// 2) Find all vertices with an age greater than 50
g.query().has("age",GREATER_THAN,50).vertices()
// or find all vertices between 1000 (inclusive) and 5000 (exclusive) years of age
g.query().has("age",GREATER_THAN_EQUAL,1000).has("age",LESS_THAN,5000).vertices()
// which is equivalent to
g.query().interval("age",1000,5000).vertices()
// 3) Find all edges where the place is at most 50 kilometers from the given latitude-longitude pair
g.query().has("place",WITHIN,Geoshape.circle(37.97,23.72,50)).edges()
// 4) Find all edges where reason contains the word "loves"
g.query().has("reason",CONTAINS,"loves").edges()
// or all edges which contain two words
g.query().has("reason",CONTAINS,"loves").has("reason",CONTAINS,"breezes").edges()
// 5) Find all vertices older than a thousand years and named "saturn"
g.query().has("age",GREATER_THAN,1000).has("name",EQUAL,"saturn").vertices()
  1. The standard index registered for name supports equality searches. In this case, we retrieve all vertices where the name matches “hercules” exactly (i.e. case sensitive).
  2. The external index registered for age supports numeric range search since the data type for age is numeric. Hence, querying for vertices whose age falls into a given range is supported.
  3. The geo-index registered for place supports retrieving edges by specifying a circular shape as the search region. The circular region is defined by the latitude and longitude of its center (first two arguments) and the radius of the circle in kilometers (third argument).
  4. The full-text index registered for reason supports retrieval by string containment. A property matches if it contains the given word (case insensitive). Note, that the text is tokenized when indexed and stop words as well as punctuation are removed. Two query for properties that contain two or more words, split them up into two separate has() clauses.
  5. Multiple has clauses can be combined into a composite index retrieval potentially spread across multiple index backends. In this example, the query contains a name and age constraint. Titan will optimize the query to find the optimal index retrieval plan.

The Query.Compare enum specifies the comparison operators used for index query construction and used in the examples above:

  • EQUAL
  • NOT_EQUAL
  • GREATER_THAN
  • GREATER_THAN_EQUAL
  • LESS_THAN
  • LESS_THAN_EQUAL

In addition, the Text.CONTAINS operator is supported for full-text search and Geo.CONTAINS for geo-location search.

Choosing an Index

Here are some guidelines for choosing the best indexing strategy for a particular use case:

  1. Use the standard index for exact match index retrievals. The standard index does not require configuring or operating an external index system and is often significantly faster than external index backends.
    1. As an exception, use an external index for exact matches when number of distinct values for the property key is relatively small or if you expect one value to be associated with many elements in the graph (i.e. in case of low selectivity).
  2. Use an external index for numeric range, full-text or geo-spatial indexing.
    1. Use Elasticsearch when there is an expectation that Titan will be distributed across multiple machines.
    2. Lucene performs better in small scale, single machine applications. It performs better in unit tests, for instance.

Indexing Gotchas

Result Scoring

Titan currently only supports discrete index queries, that is, an element either matches a query or it does not. The index backends that Titan supports additionally provide result scoring, that is, they return the elements score highest against a particular query. This feature is often used in text search, but not yet supported by Titan.

Next Steps: Configuring an Index Backend

In order to use an external indexing system with Titan, an index backend has to be configured. Titan currently supports two indexing systems:

  • Lucene: The popular Apache Lucene indexing system
  • Elasticsearch: A distributed indexing system based on Apache Lucene

Refer to the respective documentation pages on how to configure these index backends.