You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ensembl adds N-padding to some of the PATCH contigs to make their coordinates during alignment identical to the position on the primary contig the patch is derived from:
All alternative assembly and patch regions have their sequence padded
with N's to ensure alignment programs can report the correct index
regions
e.g. A patch region with a start position of 1,000,001 will have 1e6 N's added
its start so an alignment program will report coordinates with respect to the
whole chromosome.
Example contig from the HG-2365 PATCH:
Reference Build
Name
Length
Ensembl
CHR_HG2365_PATCH
102714182
gencode
ML143371.1
5500449
NCBI
NW_021160017.1
5500449
So for this contig Ensembl has added 97,213,733 extra Ns to the sequence. Which means we cannot match it with gencode's ML143371.1 or NCBI's NW_021160017.1.
The text was updated successfully, but these errors were encountered:
In order to address this we would have to amend the file format of the recontig mapping files to have an optional column that contains the offset to be applied to record starting positions.
Ensembl adds N-padding to some of the PATCH contigs to make their coordinates during alignment identical to the position on the primary contig the patch is derived from:
Example contig from the HG-2365 PATCH:
So for this contig Ensembl has added 97,213,733 extra Ns to the sequence. Which means we cannot match it with gencode's ML143371.1 or NCBI's NW_021160017.1.
The text was updated successfully, but these errors were encountered: