Skip to content

Data Types

Jouni Siren edited this page Jul 15, 2019 · 3 revisions

General

These data types and utility methods are defined in utils.h.

By default, the header defines GBWT_SAVE_MEMORY in order to save memory. This makes certain data types use 32-bit integers instead of 64-bit integers.

Integer types

typedef std::uint64_t size_type;
typedef std::uint32_t short_type;
typedef std::uint8_t  byte_type;

The interface uses several aliases for size_type:

  • node_type: Node identifier.
  • comp_type: Record identifier, compacted node identifier.
  • rank_type: Edge rank, a character in the local alphabet of a record.

Plain size_type is used for path identifiers and record offsets.

Ranges

typedef std::pair<size_type, size_type> range_type;

Closed ranges are represented as pairs of integers. If rng is a range, it denotes the integers from rng.first to rng.second (inclusive). If rng.first + 1 > rng.second + 1, the range is empty. This means that range_type(0, -1) is empty, even though the integers are unsigned.

struct Range defines a number of static functions:

  • size length(range_type range): The length of range.
  • bool empty(range_type range): Is range empty?
  • size_type bound(size_type value, range_type range): Adjusts value up or down until it falls within range. If range is empty, returns range.first.
  • size_type bound(size_type value, size_type low, size_type high): As above with range_type(low, high) as range.
  • range_type empty_range(): Returns the empty range range_type(1, 0).

Integer pairs

The following types use 32-bit integers to reduce the size of the dynamic GBWT:

typedef std::pair<short_type, short_type> edge_type;
typedef std::pair<short_type, short_type> run_type;
typedef std::pair<short_type, short_type> sample_type;

Their intended semantics are the following:

typedef std::pair<node_type, size_type> edge_type;
typedef std::pair<rank_type, size_type> run_type;
typedef std::pair<size_type, size_type> sample_type;
  • edge_type: Node identifier and either record offset or path count.
  • run_type: Edge rank and run length.
  • sample_type: Record offset and path identifier.
Clone this wiki locally