-
Notifications
You must be signed in to change notification settings - Fork 1
Guide library format
The guide library format is designed to handle single or multi fragment guide data.
At present pycroquet
only accepts single and dual guide libraries in this format.
Minimal example, this is a tab separated file:
#id sgrna_ids sgrna_seqs gene_pair_id
0 a ACGT A~B
1 x ACGT X~Y
These are optional, but recommended.
Metadata headers begin ##
and should precede the column header, e.g.:
##library-type: single
##library-name: my first library
##species: human
##assembly: GRCh38
##gene-build-source: ensembl
##gene-build-version: 103
All you are able to define anything you like here, although the above are recommended.
One exception is library-type
, this can be used to enforce the validation of columns that can use the |
separator, see below.
To allow for the possibility of experimental error resulting in R1/R2 being swapped vs. the sgrna_seqs
order in the library a dual-guide specific header has been defined. To swap the order simply include:
##dual-orientation: R2_R1
Other values have no impact.
There are 4 required fields and 11 optional items
A unique identifier for the vector and it is different for each vector in the library. This is the id that will be used for outputting the counts.
This is the set of identifiers for the guides that are used in the vector. For the dual CRISPR-Cas9 libraries, there are two guides and for combination screens there will be more than two guides. The guides are combined together using a separator |
. The order of the guides are <left_guide_id>|<right_guide_id>
for dual CRISPR-Cas9 knockout screens.
In single guide no |
separator is expected
This is the sequence of the guides of the vector that are combined together using the separator character |
. The order of the sequences is the same as the order of the guide ids that are in the sgRNA_ids field. This is necessary for the mapping. For dual CRISPR-Cas9 knockout screens the first guide is expected to map in forward direction and the second guide is expected to map in reverse direction. These are always provided in 5'-3' orientation, see sgrna_strands for orientation.
This should still be completed for single-guide libraries.
This is an id that represents the pair of the regions (genes, non-targeting and intergenic regions) that are targeted by the vector. This can be a numerical ID.
Items with separator
are expected to follow the ordering as defined in sgrna_ids
above
- sgrna_strands
- Can be used to override expected mapping orientation of
sgrna_seqs
- separator: '|'
- Can be used to override expected mapping orientation of
- sgrna_symbols
- separator: '|'
- sgrna_chrs
- separator: '|'
- sgrna_starts
- separator: '|'
- sgrna_ends
- separator: '|'
- sgrna_confidences
- separator: '|'
- sgrna_off_targets
- separator: '|'
- sgrna_libraries
- separator: '|'
- scaffold
- target_type
- custom_annotation
- No tabs.