-
Notifications
You must be signed in to change notification settings - Fork 13
Removing Sequences
Removing sequences from GBWT is similar to inserting them. The implemented algorithm is an in-memory variant of the parallel merging algorithm. Multiple search threads search for the sequences to be removed, building the rank array in memory. The positions specified by the rank array are then removed from the index. Because the uncompressed rank array is stored in memory, requiring temporarily up to tens of bytes times the total length of the sequences, the algorithm is mostly suited for removing a small number of sequences.
If the index is bidirectional, any request to remove sequence N
will actually remove sequences Path::encode(N, false)
and Path::encode(N, true)
. Otherwise sequence N
will be removed instead. If at least one of the specified sequence identifiers is invalid, no sequences are removed.
Sequences can be removed with remove_seq
.
remove_seq [options] base_name seq1 [seq2 ...]
The program reads base_name.gbwt
, removes the sequences with identifiers seq1
, seq2
, ... The output is written back to base_name.gbwt
, unless specified otherwise.
-
-c N
: Use chunks ofN
sequences per search thread. -
-o X
: Write the output toX.gbwt
. -
-r
: Remove the range of sequencesseq1
toseq2
(inclusive). Requries exactly two sequence arguments.
Example: remove_seq -r -o output input 11 20
Reads input.gbwt
, removes sequences 11 to 20, and writes the result to output.gbwt
.