This is a Ruby driver/client for the neo4j HTTP API. Due to driver/version incompatibility with the neo4j-ruby-driver (via FFI/Seabolt) with MRI/CRuby and neo4j 4.x, we came up with the HTTP API solution. This driver/client was built with performance in mind, to lower the JSON/HTTP API overhead (compared to the Bolt protocol (binary)) as much as possible.
All the documented features of the HTTP API are implemented. JSON streams are enabled to improve performance and lower memory overhead on the server side. (unfortunately the HTTP API does not stream results in the ndjson format, which would even be better)
The driver uses a connection pool out of persistent HTTP connections which are transparently being re-used. The HTTP transportation is done via http.rb, a client library which implements the HTTP protocol natively and outsources the parsing to native extensions.
Due to the fact that the neo4j HTTP API produces and consumes data in the JSON format, we chose the best in class JSON parsing/serializing libraries available. For parsing (consuming the HTTP API response, database data) we use FastJsonparser a Ruby binding to the simdjson library. And for serializing (sending the Cypher statements) we use Oj, a fast Object marshaller.
We focus on the bare-metal/low-level neo4j interaction, with raw Cypher. As an
alternative solution with higher abstraction check out
neo4j-http. It provides some clever
CRUD abstrctions like .upsert_node()
or .find_nodes_by()
.
Add this line to your application's Gemfile:
gem 'boltless'
And then execute:
$ bundle
Or install it yourself as:
$ gem install boltless
- neo4j (>=3.5, all featured tested with 4.4.x)
- Ruby (>=2.7, tested on CRuby/MRI only, may work with other implementations as well)
You can configure the Boltless gem in serveral ways. The most relevant settings should be set on an initializer file when used together with Rails. Here comes a self descriptive example:
# All the configured values here represent the Gem defaults.
Boltless.configure do |config|
# The base URL of the neo4j HTTP API (port 7474 for HTTP, port 7473
# for HTTPS when configured at server side)
config.base_url = 'http://neo4j:7474'
# The username for the neo4j database (used for HTTP Basic Authentication)
config.username = 'neo4j'
# The password for the neo4j database (used for HTTP Basic Authentication)
config.password = 'neo4j'
# The default user database of the neo4j instance/cluster, for the
# community edition this is always +neo4j+, only a single user database is
# supported with the community editon. You can always specify which
# database to operate on, on each top-level querying method (eg.
# +Boltless.execute(.., database: 'custom')+
config.default_db = 'neo4j'
# The size of the connection pool, make sure it matches your application
# server (eg. Puma) thread pool size, in order to avoid
# timeouts/bottlenecks
config.connection_pool_size = 10
# The seconds to wait for a connection from the pool,
# when all connections are currently in use
config.connection_pool_timeout = 15.seconds
# The overall timeout for a single HTTP request (including connecting,
# transmitting and response completion)
config.request_timeout = 10.seconds
# We allow the neo4j server to bootup for the configured time. This allows
# parallel starts of the user application and the neo4j server, without
# glitching.
config.wait_for_upstream_server = 30.seconds
# Configure a logger for the gem
config.logger = Logger.new($stdout).tap do |logger|
logger.level = :warn
end
# Whenever we should log the neo4j queries, including benchmarking. When
# disabled we reduce the logging overhead even more as no debug logs hit
# the logger at all. Enable this for testing purposes or for local
# development. (Heads up: No parameter sanitation is done, so passwords etc
# will be logged with this enabled) Setting the value to +:debug+ will
# print the actual Cypher statements before any request is sent. This may
# be helpful inspection of slow/never-ending Cypher statements.
config.query_log_enabled = false
# We allow the http.rb gem to be configured by the user for special needs.
# Just assign a user given block here and you can reconfigure the client.
# Just make sure to return the configured +HTTP::Client+ instance
# afterwards.
config.http_client_configure = proc do |connection|
# Enable the request/response instrumentation
# connection.use(instrumentation: {
# instrumenter: ActiveSupport::Notifications.instrumenter
# })
# Return the configured connection
connection
end
# We allow to check/modify the raw neo4j response body string before it
# gets JSON parsed and handled. This may be handy for some low-level
# analyses on the raw responses. The first argument of the user defined
# block will be the raw response body string and the second it the full
# HTTP response object (+HTTP::Response+). The user defined block MUST
# return a JSON parsable string (eg. the original input) in order to
# continue regular processing.
config.raw_response_handler = proc do |body, response|
# Do some checks on the raw/unparsed response body ..
# Return the original (or modified) response body
body
end
end
TL;DR: You just want to run a single Cypher statement, do it like that:
result = Boltless.execute!('RETURN date() AS date')
# => #<Boltless::Result
# columns=[:date],
# rows=[#<Boltless::ResultRow columns=[:date],
# values=["2022-08-16"],
# meta=[{:type=>"date"}], graph=nil>],
# stats=nil>
result.value
# => "2022-08-16"
This way the Boltless gem sends a single HTTP request to the neo4j HTTP API to
perform the statement and it fetches the results in a lazily mapped way. With
the result mapping you have to pay a little performance penalty, but its only
~3.57% overhead, so its enabled by default. To turn it off, add raw_results: true
as addtional keyword argument to all top-level calls. (eg.
Boltless.execute!('RETURN 1', raw_results: true)
)
We allow the usage of the full neo4j Cypher transaction API.
- BEGIN:
Boltless.transaction(&block)
,Boltless.transaction!(&block)
- RUN:
tx.run
,tx.run!
,tx.run_in_batch
,tx.run_in_batch!
- COMMIT:
tx.commit
,tx.commit!
- ROLLBACK:
tx.rollback
,tx.rollback!
- RUN:
- BEGIN & COMMIT:
Boltless.execute
,Boltless.execute!
, (aliases are:Boltless.query
,Boltless.query!
,Boltless.read
,Boltless.read!
,Boltless.write
,Boltless.write!
)Boltless.one_shot(&block)
,Boltless.one_shot!(&block)
All the transaction workflow/Cypher executing methods are available in a bang
and non-bang variant. The bang variants will raise on network issues (like
connection failures, timeouts, etc), user issues (like try a write operation
inside a read-only transaction, Cypher syntax errors, etc). The non-bang
variants will suppress these and will return nil
or false
(based on the
context). This reflects common Rails/ActiveRecord method behaviour (eg.
Model.save
/Model.save!
).
When using the Boltless.transaction
or Boltless.transaction!
methods, we
automatically handle the transaction for you. After the end of the user given
block, the transaction will be committed. When an exception is raised from
within the user given block, then the transaction is automatically rolled back.
This is also true for bad interactions with neo4j (eg. sending an Cypher
statement with syntax errors, etc).
Here we setup a transaction with the read-write mode,
on the default database (neo4j
):
Boltless.transaction! do |tx|
count = tx.run('MATCH (n) RETURN count(n)').value
# => 5
# You can also run multiple Cypher statements as
# batch (within a single HTTP request) like this:
tx.run_in_batch!(
['CREATE (n:User { name: $name, email: $email })',
name: 'Klaus', email: '[email protected]'],
['CREATE (n:User { name: $name, email: $email })',
name: 'Bernd', email: '[email protected]'],
['CREATE (n:User { name: $name, email: $email })',
name: 'Uwe', email: '[email protected]']
)
# Works also for read/query statements:
pp tx.run_in_batch!(
['RETURN 1 AS count'],
['RETURN date()'],
['MATCH (n:User { name: $name }) RETURN n.email', name: 'Klaus'],
['MATCH (n:User) RETURN n.name AS name, n.email AS email LIMIT 10']
).map(&:values)
# => [
# # Mapped result values of first statement
# [{:count=>1}],
# # Mapped result values of second statement
# [{:date=>"2022-08-16"}],
# # Mapped result values of third statement
# [],
# # Mapped result values of fourth statement
# [
# {:name=>"Klaus", :email=>"[email protected]"},
# {:name=>"Bernd", :email=>"[email protected]"},
# {:name=>"Uwe", :email=>"[email protected]"}
# ]
# ]
# The result of the user block is returned as result
tx.run('MATCH (n) WHERE n.name = $name RETURN count(n)', name: 'Klaus')
end
# => #<Boltless::Result
# columns=[:"count(n)"],
# rows=[#<Boltless::ResultRow
# columns=[:"count(n)"],
# values=[9],
# meta=[nil],
# graph=nil>],
# stats=nil>
The neo4j HTTP API allows to send one or multiple Cypher statements in a single
shot transaction endpoint. This can be used with the Boltless gem in
two-flavors. The first is Boltless.execute(!)
(and their aliases
Boltless.query(!)
, Boltless.read(!)
, Boltless.write(!)
). This one allows
a single Cypher query to be send in this one-shot manner. The other is
Boltless.one_shot(!)
which allows to send one/more Cypher statements from a
user given block.
# Setup a read-only one-shot transaction (see +BEGIN & COMMIT+)
Boltless.one_shot!(:read) do |tx|
# In this scope you can just prepare the statements which are sent in "one
# shot" (a single HTTP request) to the neo4j HTTP API, therefore you cannot
# access the result of each statement after it was added
tx.add('MATCH (n) RETURN count(n)')
# => <Boltless::StatementCollector ..>
tx.add('MATCH (n) WHERE n.name = $name RETURN count(n)', name: 'Klaus')
# => <Boltless::StatementCollector ..>
end
# => [
# # Result of the first statement
# #<Boltless::Result ...>,
# # Result of the second statement
# #<Boltless::Result ...>
# ]
The Boltless gem allows you to access the Cypher statement result data in a convenient way like this:
Boltless.execute!(
'MATCH (n:User) RETURN n.name AS name, n.email AS email LIMIT 10',
).map { |row| "#{row[:name]} (#{row[:email]})" }
# => ["Klaus ([email protected])", "Bernd ([email protected])",
# "Uwe ([email protected])"]
If you prefer to work on the raw HTTP API results (eg. for performance
reasons), you can add raw_results: true
as addtional keyword argument to all
top-level calls. This is how it looks like:
Boltless.execute!(
'MATCH (n:User) RETURN n.name AS name, n.email AS email LIMIT 10',
raw_results: true
)
# => {:columns=>["name", "email"],
# :data=>
# [{:row=>["Klaus", "[email protected]"], :meta=>[nil, nil]},
# {:row=>["Bernd", "[email protected]"], :meta=>[nil, nil]},
# {:row=>["Uwe", "[email protected]"], :meta=>[nil, nil]}]}
Checkout the fixtures directory for more HTTP API response examples.
We allow to fetch the query statistics on each individual Cypher statement. The
only thing you need to do is to specify with_stats: true
as Cypher statement
parameter, like this:
Boltless.execute!(
'CREATE (n:User { name: $name, email: $email })',
name: 'Monika',
email: '[email protected]',
with_stats: true
)
# => #<Boltless::Result
# columns=[],
# rows=[],
# stats={:contains_updates=>true,
# :nodes_created=>1,
# :nodes_deleted=>0,
# :properties_set=>2,
# :relationships_created=>0,
# :relationship_deleted=>0,
# :labels_added=>1,
# :labels_removed=>0,
# :indexes_added=>0,
# :indexes_removed=>0,
# :constraints_added=>0,
# :constraints_removed=>0,
# :contains_system_updates=>false,
# :system_updates=>0}>
See the HTTP API documentation for further details.
If you want to understand the graph structure of nodes and relationships returned by your query, you can specify the
graph
results data format. This is useful when you want to visualize the graph structure. The format collates all the nodes and relationships from all columns of the result, and also flattens collections of nodes and relationships, including paths.
Boltless.execute!(
'MATCH (n:User) RETURN n.name AS name, n.email AS email LIMIT 10',
result_as_graph: true
)
# => #<Boltless::Result
# columns=[:name, :email],
# rows=[#<Boltless::ResultRow
# columns=[:name, :email],
# values=["Klaus", "[email protected]"],
# meta=[nil, nil],
# graph={:nodes=>[], :relationships=>[]}>,
# [+3 ..]],
# stats=nil>
See the HTTP API documentation for further details.
The Boltless gem ships with a set of helpers and utilities for the daily business of a graph database developer. Check out the following sections!
Just like SQL-Injection, the same thing is doable at your neo4j database with the Cypher query language. The ultimate Pro Tip is to avoid constructing a Cypher query (concatenation) with user given inputs. Instead use Cypher parameters, just like in ActiveRecord/Hibernate this way:
class UsersController < ApplicationController
# [..]
def new
# Bad: Don't do this, because of Cypher-Injection.
Boltless.execute!("CREATE (n:User { name: '#{params[:name]}' })")
# Good: Use Cypher parameters.
Boltless.execute!('CREATE (n:User { name: $name })', name: params[:name])
end
end
Yay. This is great. But sometimes this is just not sufficient, because you cannot wildy place parameters inside the Cypher statement. Like this:
// We want to check if a specific user node has a relationship to a specific
// payment node, which reflects that the user can read and write it
MATCH (s:User { id: $subject_id })
-[r:READ|WRITE]->
(o:Payment { id: $payment_id })
RETURN 1
LIMIT 1
Then you cannot use a Cypher parameter for the s:User
(like
s:$subject_label
) or -[r:READ:WRITE]->
(like -[r:${types}]->
) parts
directly. You could do some glichting with the where statement, but this would
be ugly or non-index using.
Therefore, the Boltless gem ships an utilities to help you with that. See Prepare a Cypher statement.
Ok, you are aware of the Cypher parameters and their shortcommings. The
Boltless gem comes with a utilities named Boltless.build_cypher
. Use it like
that:
cypher = Boltless.build_cypher(
subject_label: subject.model_name,
object_label: object.model_name,
predicate_types: [:read, :write]
) do
<<~CYPHER
MATCH (s:%<subject_label>s { id: $subject_id })
-[r:%<predicate_types>s]->
(o:%<object_label>s { object_id: $object_id })
RETURN 1
LIMIT 1
CYPHER
end
# => " MATCH (s:User { id: $subject_id })\n" +
# " -[r:READ|WRITE]->\n" +
# " (o:Payment { id: $object_id })\n" +
# "RETURN 1\n" +
# " LIMIT 1"
This utility takes care of correct Cypher escaping, based on the parameters (by name) you pass in. Parameters ending with:
_label(s)
are treated as node labels and are camelized and joined with colons_type(s)
are treated as relationship types and are uppercased and joined with pipes_str(s)
are treated as free-text, it will be escaped properly with double quotes
When the given string(s) include special characters they will be escaped with backticks respectively.
Heads up! Do not use the Boltless.build_query
utility blindly with
ALL of your variables. Whenever possible use the native Cypher parameters
as this helps the neo4j database to re-use already parsed Cypher statements,
and therefore results in better performance.
Furthermore, you can use the Boltless.build_cypher
utility directly within
transaction contexts like this:
# Can directly be used in a regular transaction:
Boltless.transaction! do |tx|
# tx: Boltless::Transaction
cypher = tx.build_cypher(
# [your parameters]
) do
# [your Cypher template]
end
tx.run!(cypher, param_a: 7892)
end
# And also in a one-shot transaction:
Boltless.one_shot! do |tx|
# tx: Boltless::StatementCollector
cypher = tx.build_cypher(
# [your parameters]
) do
# [your Cypher template]
end
tx.add(cypher, param_a: 7892)
end
We also allow to handle the common neo4j database operations in a seamless way. This includes adding/dropping indexes/constraints, full database cleanups and index/constraints listing. Take a look at the following example:
# Add a new index for user (+User+ labeled nodes) +id+ properties,
# this may be helpful for querying/merging later
Boltless.add_index(name: 'user_id',
for: '(n:User)',
on: 'n.id')
# Add a new constraint, to enforce unique user
# emails across all +User+ labeled nodes
Boltless.add_constraint(name: 'uniq_user_email',
for: '(n:User)',
require: 'n.email IS UNIQUE')
# Fetch the index names of the +neo4j+ database,
# you can also specify the +database+ argument to every querying
# method in order to specify the database in use
Boltless.index_names(database: 'neo4j')
# => ["user_id"]
# Fetch the constraint names of the +neo4j+ database (by default)
Boltless.constraint_names
# => ["uniq_user_email"]
# Drop an index by its name. When it does not exists
# this method does not raise errors.
Boltless.drop_index('user_id')
# Drop a constraint by its name. When it does not exists
# this method does not raise errors.
Boltless.drop_constraint('uniq_user_email')
# Clear the neo4j database (remove all indexes, constraints,
# nodes and relationships)
Boltless.clear_database!
Fetching a large result (68.845 rows, 8.73 MB raw JSON) from neo4j results in:
- 1.52971222s (avg), when the raw result is used (no restructuring in Ruby)
- 1.58435011s (avg, ±1.0x same-ish, ~3.57%), when the result is restructured in Ruby (but lazily mapped)
Another interesting comparison looks at some of our production workloads:
Operation(avg) | Bolt (neo4j-ruby-driver) | HTTP API (neo4j-http) | HTTP API (Boltless) |
---|---|---|---|
read many rows | 191ms (±1x base) | 316ms (1.7x slower) | 258ms (1.4x slower) |
read single row | 52ms (±1x base) | 52ms (±1.0x same-ish) | 53ms (±1.0x same-ish) |
write | 350ms (±1x base) | 289ms (1.2x faster) | 291ms (1.2x faster) |
198ms (±1x base) | 219ms (1.1x slower) | 201ms (±1.0x same-ish) |
After checking out the repo, run make install
to install dependencies. Then,
run make test
to run the tests. You can also run make shell-irb
for an
interactive prompt that will allow you to experiment.
Everyone interacting in the project codebase, issue tracker, chat rooms and mailing lists is expected to follow the code of conduct.
Bug reports and pull requests are welcome on GitHub at https://github.com/hausgold/boltless. Make sure that every pull request adds a bullet point to the changelog file with a reference to the actual pull request.
The release process of this Gem is fully automated. You just need to open the Github Actions Release Workflow and trigger a new run via the Run workflow button. Insert the new version number (check the changelog first for the latest release) and you're done.