You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refget defines three supported identifier algorithms; MD5,TRUNC512 and GA4GH Identifier. All three algorithms normalise sequence input by stripping all whitespace characters and restricting to characters in the range A-Z. We chose this character range as a compromise between the methods and requirements employed by CRAM, ENA and the Variation Representation Specification (VRS).10MD5 is the default checksum algorithm used by the CRAM format’s M5 tag and hence the CRR. It is provided for backwards compatibility with existing CRAM files. However,there are limitations to md5’s algorithm the occurrence of a checksum collision between non-identical sequences would be catastrophic. To mitigate this concern, we co-developed two schemes with the Genomic Knowledge Standards’ Variation Representation Specification (VRS) based on the SHA-512 checksum algorithm called TRUNC512 andGA4GH identifier. Both schemes use the first 24 bytes of aSHA-512 digest. TRUNC512 chooses to represent this as ahex encoded string. GA4GH identifier converts these bytes into a base64 URL encoded string formatted as “ga4gh:SQ.XXXX”. Both algorithms are interchangeable since both represent the same underlying SHA-512 digest,however the GA4GH identifier is preferred to maintain VRS compatibility.
I tought that refget only used trunc512 and md5 but it seems we should support the GA4GH identifiers. Luckily, I think we can store just trunc512 and md5 as we are doing at the moment and allow searches by ga4gh id by transforming it on the fly to the trunc512 id.
The text was updated successfully, but these errors were encountered:
from the refget paper:
I tought that refget only used trunc512 and md5 but it seems we should support the GA4GH identifiers. Luckily, I think we can store just trunc512 and md5 as we are doing at the moment and allow searches by ga4gh id by transforming it on the fly to the trunc512 id.
The text was updated successfully, but these errors were encountered: