Skip to content

Basic Graph Traversals

okram edited this page Sep 13, 2010 · 51 revisions

This section will present basic graph traversals by way of examples on the simple property graph diagrammed below.

gremlin> $_g := tg:open()                         
==>tinkergraph[vertices:0]
gremlin> g:load('data/graph-example-1.xml')       
==>true
gremlin> $_g                                 
==>tinkergraph[vertices:6]
gremlin> $_ := g:id-v(1)
==>v[1]

In Gremlin, there exists a root iterable. This iterable can be a singleton (e.g. vertex 1). For example, assume a root list with vertex 1 from the above diagram as its only element. The root can be queried in Gremlin using a directory structure-like notation, called GPath. For example, the current root is identified with the following expression.

gremlin> .
==>v[1]

The symbol v denotes that the element is a vertex and 1 denotes the elements unique identifier. To determine all of the outgoing edges from the root, the following statement suffices.

gremlin> ./outE
==>e[7][1-knows->2]
==>e[9][1-created->3]
==>e[8][1-knows->4]

As a convenience, Gremlin prints the outgoing and incoming vertex identifiers along with the edge label. To acquire the vertices at the head of these edges (known as the incoming vertices), apply another segment in the path.

gremlin> ./outE/inV
==>v[2]
==>v[3]
==>v[4]

It is important to note that in Gremlin, vertices are adjacent to edges and edges are adjacent to vertices. The reason for this will become apparent later when making use of element properties in path expressions. The reserved terms for denoting adjacency selection are outE, inE, bothE, outV, inV, and bothV (see Cheat Sheet). The components of a property graph are diagrammed in the example sub-graph below.

The process of traversing a graph, in this manner, can continue indefinitely (granted, if there are loops in the graph).

gremlin> .
==>v[1]
gremlin> ./outE/inV/outE/inV
==>v[5]
==>v[3]

Moreover, it is possible to make use of Gremlin’s language statements to repeat patterns. For example, the previous example can be denoted as follows.

gremlin> repeat 2
           $_ := ./outE/inV
           end
gremlin> .
==>v[3]
==>v[5]

The variable $_ is a reserved variable that denotes the root list. In this way, the root list can be redefined.

If the Gremlin graph data structure was only a directed graph, then outgoing/incoming edges and outgoing/incoming vertices would be the limits of what could be expressed. However, given that vertices and edges can have properties, it is possible to use these properties within a path expression. For example, suppose you want to know the name of vertex 1.

gremlin> $_ := g:id-v(1)
==>v[1]
gremlin> ./@name
==>marko

The @name construct denotes the property key name and returns the value of that key. The first component of the path is vertex 1. Thus, the name of vertex 1 is “marko.” Another, more complex example that uses vertex and edge properties is to determine the name of the vertices that vertex 1 knows and that are older than 30 years of age, is expressed as such.

gremlin> ./outE[@label='knows']/inV[@age > 30]/@name
==>josh

In this expression, the [ ] notation serves to filter results of previous step in the path (see Path Functions). Thus, ./outE is filtered to only those edges that have a label of “knows.” With respect to the diagrammed graph, this leaves only two edges. Next, the incoming vertices at the head of these two edges are determined and then filtered to only those whose age property is greater than 30. Given the diagram, this only leaves vertex 4. In the final segment of the path expression, the name of vertex 4 is selected and what is returned is “josh.”

To conclude, let’s do a more complicated graph traversal that uses backtracking and an in-line regular expression.

gremlin> ./outE[@label='knows']/inV[@age > 21]/@name[g:matches(.,'jo.{2}|JO.{2}')]/../@age
==>32

With the root vertex being vertex 1, this path expression returns the age of those vertices that vertex 1 knows, are older than 21, and whose names are 4 characters and start with a ‘jo’ or ‘JO’. While contrived, it demonstrates regular expression matching using the function boolean g:matches(string, string) on string properties as well as backtracking (known as a “future filter”) /.. to a vertex previously visited.

This expression does the same thing without backtracking. Both are provided in order to demonstrate the many ways in which to express the same thing.

gremlin> ./outE[@label='knows']/inV[@age > 21 and g:matches(./@name, 'jo.{2}|JO.{2}')]/@age   
==>32