-
Notifications
You must be signed in to change notification settings - Fork 0
Indexing Backend Overview
Titan provides two types of index systems: The standard index and the external index interface which supports an arbitrary number of external indexing backends to provide support for geo, numeric range, and full-text search.
The standard index is very fast and always available without any further configuration, but only supports exact index matches. In other words, the standard index can only retrieve vertices and edges by matching one of their properties exactly.
The external index interface is more flexible and supports retrieving vertices and edges by bounding their geo-location, properties that fall into a numeric range or matching tokens in full text. The external index interface connects to separate systems as index backends for indexing and retrieval similarly to how storage backends are used for persistence. Index backends need to be configured in the graph configuration before they can be used.
The choice of index backend determines which search features are supported, as well as the performance and scalability of the index. Titan currently supports two index backends: Elastic Search and Lucene.
To retrieve vertices and edges by their property values through an index lookup, the property key must be registered with the index when it is defined, that is, before its first use. The index is registered through the TypeMaker when the type is constructed.
For example, the following property key definitions register all four indexed properties from the Graph of the Gods example with the respective index backends.
// 1) Index vertices by their unique name property
graph.makeType().name("name").dataType(String.class).indexed(Vertex.class).unique(Direction.BOTH).makePropertyKey();
// 2) Index vertices by their age property
graph.makeType().name("age").dataType(Integer.class).indexed("search", Vertex.class).unique(Direction.OUT).makePropertyKey();
// 3) Index edges by their geo-location
graph.makeType().name("place").dataType(Geoshape.class).indexed("search", Edge.class).unique(Direction.OUT).makePropertyKey();
// 4) Index vertices and edges by their full-text reason property
graph.makeType().name("reason").dataType(String.class).indexed("search",Vertex.class).indexed("search",Edge.class).unique(Direction.OUT).makePropertyKey();
- The
name
property key is indexed for vertices using Titan’s standard index. The standard index only supports exact match index retrievals but is very fast. When no index name is specified, the standard index is used. - The
age
property key is indexed for vertices using the configured external index backend with the name “search”. Since the data type ofage
is a number, property values are indexed as numbers and can be retrieved via numeric range search. - The
place
property key is indexed for edges using the external index backend named “search”. Since the data type ofplace
isGeoshape
, property values are indexed as geo-location and can be retrieved via geo-searches such as circular region or bounding box search (depending on what the index backed supports). - The
reason
property key is indexed for vertices and edges using the external index backend named “search”. Since the data type ofreason
isString
, property values are indexed as text and be retrieved via full-text search such as string-containment search. When the string is indexed, it is tokenized.
When using Titan’s standard index, the name argument to the indexed()
method is optional. An equivalent definition of the name
property key, which identifies the standard index by its name, would be:
graph.makeType().name("name").dataType(String.class).indexed("standard",Vertex.class).unique(Direction.BOTH).makePropertyKey();
The name “standard” is always reserved for Titan’s standard index backend. External indexing backends may not be configured with this name.
Note, this section assumes that an index backend named search
has been defined in the graph configuration. Read “Next Steps” to find out how to configure an index backend.
After the property keys have been registered with the index backends and vertices or edges have been added to the graph with corresponding property values, those elements can be queried for using index retrievals. Graph.query()
constructs and index query through method chaining and calling vertices()
or edges()
retrieves the vertices or edges matching the query using index retrievals.
Continuing with the Graph of the Gods, the following are index query examples:
// 1) Find vertices with the name "hercules"
g.query().has("name",EQUAL,"hercules").vertices()
// 2) Find all vertices with an age greater than 50
g.query().has("age",GREATER_THAN,50).vertices()
// or find all vertices between 1000 (inclusive) and 5000 (exclusive) years of age
g.query().has("age",GREATER_THAN_EQUAL,1000).has("age",LESS_THAN,5000).vertices()
// which is equivalent to
g.query().interval("age",1000,5000).vertices()
// 3) Find all edges where the place is at most 50 kilometers from the given latitude-longitude pair
g.query().has("place",WITHIN,Geoshape.circle(37.97,23.72,50)).edges()
// 4) Find all edges where reason contains the word "loves"
g.query().has("reason",CONTAINS,"loves").edges()
// or all edges which contain two words
g.query().has("reason",CONTAINS,"loves").has("reason",CONTAINS,"breezes").edges()
// 5) Find all vertices older than a thousand years and named "saturn"
g.query().has("age",GREATER_THAN,1000).has("name",EQUAL,"saturn").vertices()
- The standard index registered for
name
supports equality searches. In this case, we retrieve all vertices where the name matches “hercules” exactly (i.e. case sensitive). - The external index registered for
age
supports numeric range search since the data type forage
is numeric. Hence, querying for vertices whose age falls into a given range is supported. - The geo-index registered for
place
supports retrieving edges by specifying a circular shape as the search region. The circular region is defined by the latitude and longitude of its center (first two arguments) and the radius of the circle in kilometers (third argument). - The full-text index registered for
reason
supports retrieval by string containment. A property matches if it contains the given word (case insensitive). Note, that the text is tokenized when indexed and stop words as well as punctuation are removed. Two query for properties that contain two or more words, split them up into two separatehas()
clauses. - Multiple has clauses can be combined into a composite index retrieval potentially spread across multiple index backends. In this example, the query contains a
name
andage
constraint. Titan will optimize the query to find the optimal index retrieval plan.
The Query.Compare
enum specifies the comparison operators used for index query construction and used in the examples above:
- EQUAL
- NOT_EQUAL
- GREATER_THAN
- GREATER_THAN_EQUAL
- LESS_THAN
- LESS_THAN_EQUAL
In addition, the Text.CONTAINS
operator is supported for full-text search and Geo.CONTAINS
for geo-location search.
Here are some guidelines for choosing the best indexing strategy for a particular use case:
- Use the standard index for exact match index retrievals. The standard index does not require configuring or operating an external index system and is often significantly faster than external index backends.
- As an exception, use an external index for exact matches when number of distinct values for the property key is relatively small or if you expect one value to be associated with many elements in the graph (i.e. in case of low selectivity).
- Use an external index for numeric range, full-text or geo-spatial indexing.
- Use Elasticsearch when there is an expectation that Titan will be distributed across multiple machines.
- Lucene performs better in small scale, single machine applications. It performs better in unit tests, for instance.
Titan currently only supports discrete index queries, that is, an element either matches a query or it does not. The index backends that Titan supports additionally provide result scoring, that is, they return the elements score highest against a particular query. This feature is often used in text search, but not yet supported by Titan.
In order to use an external indexing system with Titan, an index backend has to be configured. Titan currently supports two indexing systems:
- Lucene: The popular Apache Lucene indexing system
- Elasticsearch: A distributed indexing system based on Apache Lucene
Refer to the respective documentation pages on how to configure these index backends.