Exceptions for v2 to v3 conversion #291

lcreteig · 2023-12-05T17:01:51Z

I noticed that the py-ard implementation for converting v2 to v3 typings is not yet complete:

Lines 586 to 600 in d43a035

    
           # TODO: Create mapping table using both the allele list history and 
        
           #  deleted alleles as reference. 
        
           # Temporary Example 
        
           v2_to_v3_example = { 
        
               "A*0104": "A*01:04:01:01N", 
        
               "A*0105N": "A*01:04:01:01N", 
        
               "A*0111": "A*01:11N", 
        
               "A*01123": "A*01:123N", 
        
               "A*0115": "A*01:15N", 
        
               "A*0116": "A*01:16N", 
        
               "A*01160": "A*01:160N", 
        
               "A*01162": "A*01:162N", 
        
               "A*01178": "A*01:178N", 
        
               "A*01179": "A*01:179N", 
        
               "DRB5*02ZB": "DRB5*02:UTV",

I was wondering if this is still on the roadmap and/or if I can contribute anything to it.

The heuristic conversion in ard._predict_v3() does not work in all cases, because there's a bunch of exceptions. For example, if I apply it to the Current Name column of the IPD-IMGT/HLA pre-2010 nomenclature file, I get a different result than in the Name as of April 2010 column in 1,045 out of 4,826 cases.

I've made a mapping table such as in the linked snippet above for my own use case, based on the following files:

The IPD-IMGT/HLA pre-2010 nomenclature file
The IPD-IMGT/HLA deleted alleles file. If a v2 allele has been deleted, its correct v3 equivalent is not included in the nomenclature file (but is "None"), so I've pulled these from the Description column in the deleted alleles file.
The obsolete allele-specific codes and obsolete DPB1-specific codes from https://bioinformatics.bethematchclinical.org/hla-resources/allele-codes/allele-code-nomenclature/ (Excel files linked under points 4 and 5)

Can you think of any other exceptions that should be included? In that case I'd really appreciate your feedback.

I'd be happy to share the mapping table, or the code that creates it (I have this in R now, but could probably translate it to Python) if that's of any use to you.

The text was updated successfully, but these errors were encountered:

pbashyal-nmdp · 2023-12-05T20:47:52Z

Hi Leon,

There wasn't a clear V2 to V3 standard mapping file available so it wasn't fleshed out properly. We'll review your approach. If you are able to, you can share you R script (no need to convert it). Thanks for following up on this.

Organization have their own mapping so we do support having your own V2 to V3 mappings using pyard-import's --v2-to-v3-mapping option. You can provide a CSV with 2 columns(V2 version, V3 version without column headers). See pyard-import section in README.

$ pyard-import  --v2-to-v3-mapping map2to3.csv

pbashyal-nmdp · 2023-12-05T20:49:44Z

Looks like the current version is failing for --v2-to-v3-mapping option with the error:

AttributeError: 'tuple' object has no attribute 'cursor'

I'll have an update soon that should fix the issue.

pbashyal-nmdp · 2023-12-05T21:50:23Z

Just released 1.0.8. If you upgrade or pip install py-ard==1.0.8, --v2-to-v3-mapping option should work with a mapping file.

lcreteig · 2023-12-06T00:41:32Z

A I didn't know about about --v2-to-v3-mapping flag, that's helpful, then I could still use my mapping table together with the heuristic prediction. Here's the table in case you're interested: map2to3.csv. Also, here's the R code that generated (something like) it.

I tried it with py-ard 1.0.8., database version 3540, and the above mapping table. This works for almost all cases, but strangely enough there's a handful (61) of cases in the IPD-IMGT/HLA pre-2010 nomenclature file where pyard seems to simply return the v2 typing, even though it's v3 equivalent is in the mapping table. This might be something on my end, but in case you'd like to try and reproduce it:

For example:

pyard-import --imgt-version 3.54.0  --v2-to-v3-mapping map2to3.csv

import pyard
ard = pyard.init('3540')
ard.v2_to_v3("A*020113") # works fine
# >>> 'A*02:01:13'
ard.v2_to_v3("A*020114") # just returns v2
# >>> 'A*020114'

pbashyal-nmdp · 2023-12-06T14:57:52Z

Awesome. Thanks, I'll take a look.

pbashyal-nmdp · 2023-12-06T18:21:58Z

Looks like A*02:01:14 is not a valid allele for 3540 db version.
I think it should skip this test in non-strict mode. I'll update it to use the mapping when in non-strict mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exceptions for v2 to v3 conversion #291

Exceptions for v2 to v3 conversion #291

lcreteig commented Dec 5, 2023

pbashyal-nmdp commented Dec 5, 2023

pbashyal-nmdp commented Dec 5, 2023

pbashyal-nmdp commented Dec 5, 2023

lcreteig commented Dec 6, 2023

pbashyal-nmdp commented Dec 6, 2023

pbashyal-nmdp commented Dec 6, 2023

Exceptions for v2 to v3 conversion #291

Exceptions for v2 to v3 conversion #291

Comments

lcreteig commented Dec 5, 2023

pbashyal-nmdp commented Dec 5, 2023

pbashyal-nmdp commented Dec 5, 2023

pbashyal-nmdp commented Dec 5, 2023

lcreteig commented Dec 6, 2023

pbashyal-nmdp commented Dec 6, 2023

pbashyal-nmdp commented Dec 6, 2023