Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV-1096 Rights API: translate OFFSET queries into inequalities #13

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions lib/rights_api.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,11 @@ module RightsAPI
end

require "rights_api/app"
require "rights_api/cursor"
require "rights_api/database"
require "rights_api/order"
require "rights_api/query"
require "rights_api/query_parser"
require "rights_api/result"
require "rights_api/result/error_result"
require "rights_api/schema"
Expand Down
1 change: 1 addition & 0 deletions lib/rights_api/app.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# frozen_string_literal: true

require "cgi"
require "sinatra"
require "sinatra/json"
require "sinatra/reloader" if development?
Expand Down
114 changes: 114 additions & 0 deletions lib/rights_api/cursor.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# frozen_string_literal: true

require "base64"
require "json"
require "sequel"

# Given a list of sort fields -- Array of symbols
# and a list of field -> value mappings (the last result)
# create a string that can be decoded into
# a WHERE list

# Tries to store just enough information so the next query can
# pick up where the current one left off. The Base64-encoded cursor string
# is a serialized array containing the current offset and one value for each
# sort parameter (explicit or default) used in the query.

# CURSOR LIFECYCLE
# - At the beginning of the query a Cursor is created with the "cursor" URL parameter, or
# nil if none was supplied (indicating first page of results, i.e. no previous query).
# - Query calls `cursor.where` to create a WHERE clause based on the decoded values
# from the previous result (in effect saying "WHERE fields > last_result")
# - Query calls `cursor.offset` for the current page of results.
# - Query calls `cursor.encode` to calculate a new offset and new last_result values
# dictated by the current ORDER BY.

# IVAR SEMANTICS
# - `offset` the (zero-based) offset into the overall results set produced by `where`, or perhaps
# "give me the results at offset N (by using values X Y and Z)"
# - `values` the X Y and Z from above, these are the relevant values from the previous result
# if there was one.
# It is possibly counterintuitive that X Y and Z are NOT at offset N. Offset N is the location
# of the NEXT record.

# CAVEATS
# Relies on the search parameters being unchanged between queries,
# if they do change then the results are undefined.
module RightsAPI
class Cursor
OFFSET_KEY = "off"
LAST_ROW_KEY = "last"
attr_reader :values, :offset

# @param cursor_string [String] the URL parameter to decode
# @return [Array] of the form [offset, "val1", "val2" ...]
def self.decode(cursor_string)
JSON.parse(Base64.urlsafe_decode64(cursor_string))
end

# JSON-encode and Base64-encode an object
# @param arg [Object] a serializable object (always an Array in this class)
# @return [String]
def self.encode(arg)
Base64.urlsafe_encode64(JSON.generate(arg))
end

def initialize(cursor_string: nil)
@offset = 0
@values = []
if cursor_string
@values = self.class.decode cursor_string
@offset = @values.shift
end
end

# Generate zero or one WHERE clauses that will generate a pseudo-OFFSET
# based on ORDER BY parameters.
# ORDER BY a, b, c TRANSLATES TO
# WHERE (a > 1)
# OR (a = 1 AND b > 2)
# OR (a = 1 AND b = 2 AND c > 3)
Comment on lines +67 to +70
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see this, in more detail, up top in the documentation.
Because this is the meat and potatoes of this class, no?

With an actual 5-ish line example using the code...

# Example:
require "rights_api/cursor"
cursor = RightsAPI.Cursor.new("x y z")
# ...
cursor.where(x,y,z) # -> [x OR y OR z]

# @param model [Class] Sequel::Model subclass for the table being queried,
# only used for qualifying column names in the WHERE clause.
# @param order [Array<RightsAPI::Order>] the current query's ORDER BY
# @return [Array<Sequel::LiteralString>] zero or one Sequel literals
def where(model:, order:)
return [] if values.empty?

# Create one OR clause for each ORDER.
# Each OR clause is a series of AND clauses.
# The last element of each AND clause is < or >, the others are =
# The first AND clause has only the first ORDER parameter.
# Each subsequent one adds one ORDER parameter.
or_clause = []
order.count.times do |order_index|
# Take a slice of ORDER of size order_index + 1
and_clause = order[0, order_index + 1].each_with_index.map do |ord, i|
# in which each element is a "col op val" string and the last is an inequality
op = if i == order_index
ord.asc? ? ">" : "<"
else
"="
end
"#{model.table_name}.#{ord.column}#{op}'#{values[i]}'"
end
or_clause << "(" + and_clause.join(" AND ") + ")"
end
[Sequel.lit(or_clause.join(" OR "))]
end

# Encode the offset and the relevant values from the last result row
# (i.e. those used in the current ORDER BY)
# @param order [Array<RightsAPI::Order>] the current query's ORDER BY
# @param rows [Sequel::Dataset] the result of the current query
# @return [String]
def encode(order:, rows:)
Copy link

@mwarin mwarin Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get sweaty seeing def encode and def self.encode in the same class.
If both methods have to be called encode I would mention the justification for that.
If they don't have to, I would name them more differently in order to avoid operator errors down the line.

data = [offset + rows.count]
row = rows.last
order.each do |ord|
data << row[ord.column]
end
self.class.encode data
end
end
end
6 changes: 4 additions & 2 deletions lib/rights_api/model_extensions.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# frozen_string_literal: true

require_relative "order"

module RightsAPI
module ModelExtensions
# Overridden by classes that want to do some kind of #eager or #eager_graph
Expand All @@ -19,9 +21,9 @@ def default_key
end

# For use in ORDER BY clause.
# @return [Sequel::SQL::QualifiedIdentifier]
# @return [Array<RightsAPI::Order>]
def default_order
query_for_field field: default_key
[Order.new(column: default_key)]
end

# @param field [String, Symbol]
Expand Down
7 changes: 7 additions & 0 deletions lib/rights_api/models/access_statement_map.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ def self.default_key
:attr_access_id
end

def self.default_order
[
Order.new(column: :a_attr),
Order.new(column: :a_access_profile)
]
end

# @param [String, Symbol] field
# @return [Sequel::SQL::Expression]
def self.query_for_field(field:)
Expand Down
10 changes: 7 additions & 3 deletions lib/rights_api/models/rights_current.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,14 @@ def self.query_for_field(field:)
super field: field
end

# rights_current and rights_log should order by timestamp
# @return [Sequel::SQL::Expression]
# rights_current and rights_log should order by htid, timestamp
# @return [Array<RightsAPI::Order>]
def self.default_order
qualify field: :time
[
Order.new(column: :namespace),
Order.new(column: :id),
Order.new(column: :time)
]
end

def to_h
Expand Down
12 changes: 8 additions & 4 deletions lib/rights_api/models/rights_log.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ class RightsLog < Sequel::Model(:rights_log)
many_to_one :attribute_obj, class: :"RightsAPI::Attribute", key: :attr
many_to_one :reason_obj, class: :"RightsAPI::Reason", key: :reason
many_to_one :source_obj, class: :"RightsAPI::Source", key: :source
set_primary_key [:namespace, :id]
set_primary_key [:namespace, :id, :time]

# Maybe TOO eager. This makes us partially responsible for the fact that rights_current.source
# has an embedded access_profile.
Expand All @@ -31,10 +31,14 @@ def self.query_for_field(field:)
super field: field
end

# rights_current and rights_log should order by timestamp
# @return [Sequel::SQL::Expression]
# rights_current and rights_log should order by htid, timestamp
# @return [Array<RightsAPI::Order>]
def self.default_order
qualify field: :time
[
Order.new(column: :namespace),
Order.new(column: :id),
Order.new(column: :time)
]
end

def to_h
Expand Down
30 changes: 30 additions & 0 deletions lib/rights_api/order.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# frozen_string_literal: true

# A class that encapsulates the field and ASC/DESC properties of a
# single ORDER BY argument.

module RightsAPI
class Order
attr_reader :column
# @param column [Symbol] the field to ORDER BY
# @param asc [Boolean] true if ASC, false if DESC
def initialize(column:, asc: true)
@column = column
@asc = asc
end

# @return [Boolean] is the order direction ASC?
def asc?
@asc
end

# @return [Sequel::SQL::OrderedExpression]
def to_sequel(model:)
if asc?
Sequel.asc(model.qualify(field: column))
else
Sequel.desc(model.qualify(field: column))
end
end
end
end
29 changes: 17 additions & 12 deletions lib/rights_api/query.rb
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
# frozen_string_literal: true

require "benchmark"
require "cgi"

require_relative "cursor"
require_relative "error"
require_relative "query_parser"
require_relative "result"
require_relative "services"

module RightsAPI
class Query
attr_reader :model, :params, :parser, :total, :dataset
attr_reader :model, :params, :parser, :total

# @param model [Class] Sequel::Model subclass for the table being queried
# @param params [Hash] CGI parameters submitted to the Sinatra frontend
Expand All @@ -19,29 +18,35 @@ def initialize(model:, params: {})
@params = params
@parser = QueryParser.new(model: model)
@total = 0
@dataset = nil
end

# @return [Result]
def run
dataset = nil
# This may raise QueryParserError
parser.parse(params: params)
time_delta = Benchmark.realtime do
@dataset = model.base_dataset
dataset = model.base_dataset
parser.where.each do |where|
@dataset = dataset.where(where)
dataset = dataset.where(where)
end
# Save this here because offset and limit may alter the count.
dataset = dataset.order(*(parser.order.map { |order| order.to_sequel(model: model) }))
# Save this here because limit and cursor would otherwise alter the count.
@total = dataset.count
@dataset = dataset.order(*parser.order)
.offset(parser.offset)
.limit(parser.limit)
.all
# Apply the cursor to get to the offset we want
parser.cursor.where(model: model, order: parser.order).each do |where|
dataset = dataset.where(where)
end
dataset = dataset.limit(parser.limit).all
end
result = Result.new(offset: parser.offset, total: total, milliseconds: 1000 * time_delta)
result = Result.new(offset: parser.cursor.offset, total: total, milliseconds: 1000 * time_delta)
dataset.each do |row|
result.add! row: row.to_h
end
if result.more?
cursor = parser.cursor.encode(order: parser.order, rows: dataset)
result.cursor = cursor
end
result
end
end
Expand Down
34 changes: 25 additions & 9 deletions lib/rights_api/query_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

require "sequel"

require_relative "cursor"
require_relative "error"
require_relative "order"

# Processes the Hash of URL parameters passed to the API into an
# Array of WHERE constraints, as well as LIMIT, and OFFSET values.
Expand All @@ -11,35 +13,40 @@
module RightsAPI
class QueryParser
DEFAULT_LIMIT = 1000
DEFAULT_OFFSET = 0
attr_reader :params, :model, :where, :order, :offset, :limit
attr_reader :params, :model, :where, :order, :limit

# @param model [Class] Sequel::Model subclass for the table being queried
def initialize(model:)
@model = model
@where = []
@order = []
@cursor = nil
@limit = DEFAULT_LIMIT
@offset = DEFAULT_OFFSET
end

def parse(params: {})
@params = params
params.each do |key, values|
key = key.to_sym
case key
when :offset
parse_offset(values: values)
when :cursor
parse_cursor(values: values)
when :limit
parse_limit(values: values)
else
parse_parameter(key: key, values: values)
end
end
@order = [model.default_order] if @order.empty?
# Always tack on the default order even if it is redundant.
# The cursor implementation requires that there be an intrinsic order.
@order += model.default_order
self
end

def cursor
@cursor ||= Cursor.new
end

private

# Parses a general search parameter and appends the resulting Sequel
Expand All @@ -55,9 +62,16 @@ def parse_parameter(key:, values:)
end
end

# Extract a single integer that can be passed to dataset.offset.
def parse_offset(values:)
@offset = parse_int_value(values: values, type: "OFFSET")
# Parse cursor value into an auxiliary WHERE clause
def parse_cursor(values:)
if values.count > 1
raise QueryParserError, "multiple cursor values (#{values})"
end
begin
@cursor = Cursor.new(cursor_string: values.first)
rescue ArgumentError => e
raise QueryParserError, "cannot decode cursor: #{e.message}"
end
end

# Extract a single integer that can be passed to dataset.limit.
Expand All @@ -72,6 +86,8 @@ def parse_limit(values:)
# @param type [String] "OFFSET" or "LIMIT", used only for reporting errors.
# @return [Integer]
def parse_int_value(values:, type:)
return values.last if values.last.is_a? Integer

value = values.last.to_i
# Make sure the offset can make a round-trip conversion between Int and String
# https://stackoverflow.com/a/1235891
Expand Down
Loading
Loading