-
Notifications
You must be signed in to change notification settings - Fork 13
Search Interface
All queries use a single thread. They check that the query parameters are valid. The following functions are member functions of GBWT
, CachedGBW
, and DynamicGBWT
. The structures are defined in gbwt.h, cached_gbwt.h, and dynamic_gbwt.h, respectively.
See Query Interface for the definitions of search states.
These queries return a search state, which is empty if the parameters are invalid. A pattern matches a search state, if the prefixes in the range of record offsets end with the pattern. The iterators are assumed to be input iterators.
SearchState find(node_type node) const
template<class Iterator> SearchState find(Iterator begin, Iterator end) const
This query returns the search state matching the pattern.
-
node
: Node identifier. -
begin
: Start of the pattern, inclusive. -
end
: End of the pattern, exclusive.
SearchState prefix(node_type node) const
template<class Iterator> SearchState prefix(Iterator begin, Iterator end) const
These queries return the search state for the prefixes of the indexed paths that are equal to the pattern.
-
node
: Node identifier. -
begin
: Start of the pattern, inclusive. -
end
: End of the pattern, exclusive.
SearchState extend(SearchState state, node_type node) const
template<class Iterator> SearchState extend(SearchState state, Iterator begin, Iterator end) const
These queries extend the search from the given search state. extend(find(A), B)
is equivalent to searching for the concatenation with find(A + B)
.
-
state
: Search state where the search is started. -
node
: Node identifier. -
begin
: Start of the pattern, inclusive. -
end
: End of the pattern, exclusive.
Bidirectional search maintains the search states for both orientations of the pattern. It requires that both orientations of the paths have been indexed. This can be checked with GBWT::bidirectional()
. The queries do not check whether the index supports bidirectional search.
These queries return a bidirectional search state, which is empty if the parameters are invalid.
BidirectionalState bdFind(node_type node) const
Returns the bidirectional search state for a pattern consisting of a single node node
. This is similar to the unidirectional query find(node)
.
BidirectionalState bdExtendForward(BidirectionalState state, node_type node) const
Assume that search state state
corresponds to pattern P. This returns the bidirectional search state for the concatenation of P and node
. This is similar to the unidirectional query extend(state, node)
.
BidirectionalState bdExtendBackward(BidirectionalState state, node_type node) const
Assume that search state state
corresponds to pattern P. This returns the bidirectional search state for the concatenation of node
and P.
These convert positions to path identifiers. On the average, it takes d/2 steps of LF()
to find a sampled identifier, where d is the sample interval.
size_type locate(node_type node, size_type i) const
size_type locate(edge_type position) const
This query returns the path identifier for the given position. It iterates LF()
until it finds a sample. If the position is invalid, the return value is invalid_sequence()
.
-
node
: Node identifier. -
i
: Record offset. -
position
: A combination of the above.
std::vector<size_type> locate(node_type node, range_type range) const
std::vector<size_type> locate(SearchState state) const
This query returns the set of path identifiers for the given range of positions. It is much faster than calling locate()
separately for each record offset. If the range of positions is invalid, the returned set will be empty.
-
node
: Node identifier. -
range
: Range of record offsets. -
state
: A combination of the above.
vector_type extract(size_type sequence) const
vector_type extract(edge_type position) const
vector_type extract(edge_type position, size_type max_length) const
These queries return the path, a suffix, and a substring, respectively. If the query parameters are invalid, the returned path is empty.
-
sequence
: Path identifier. -
position
: Starting position of the extaction as a combination of a node indentifier and a record offset. -
max_length
: Maximum length of the returned substring.