-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exceptions for v2 to v3 conversion #291
Comments
Hi Leon, There wasn't a clear V2 to V3 standard mapping file available so it wasn't fleshed out properly. We'll review your approach. If you are able to, you can share you R script (no need to convert it). Thanks for following up on this. Organization have their own mapping so we do support having your own V2 to V3 mappings using
|
Looks like the current version is failing for
I'll have an update soon that should fix the issue. |
Just released |
A I didn't know about about I tried it with py-ard 1.0.8., database version 3540, and the above mapping table. This works for almost all cases, but strangely enough there's a handful (61) of cases in the IPD-IMGT/HLA pre-2010 nomenclature file where pyard seems to simply return the v2 typing, even though it's v3 equivalent is in the mapping table. This might be something on my end, but in case you'd like to try and reproduce it: For example: pyard-import --imgt-version 3.54.0 --v2-to-v3-mapping map2to3.csv import pyard
ard = pyard.init('3540')
ard.v2_to_v3("A*020113") # works fine
# >>> 'A*02:01:13'
ard.v2_to_v3("A*020114") # just returns v2
# >>> 'A*020114' |
Awesome. Thanks, I'll take a look. |
Looks like |
I noticed that the py-ard implementation for converting v2 to v3 typings is not yet complete:
py-ard/pyard/db.py
Lines 586 to 600 in d43a035
I was wondering if this is still on the roadmap and/or if I can contribute anything to it.
The heuristic conversion in
ard._predict_v3()
does not work in all cases, because there's a bunch of exceptions. For example, if I apply it to theCurrent Name
column of the IPD-IMGT/HLA pre-2010 nomenclature file, I get a different result than in theName as of April 2010
column in 1,045 out of 4,826 cases.I've made a mapping table such as in the linked snippet above for my own use case, based on the following files:
"None"
), so I've pulled these from theDescription
column in the deleted alleles file.Can you think of any other exceptions that should be included? In that case I'd really appreciate your feedback.
I'd be happy to share the mapping table, or the code that creates it (I have this in R now, but could probably translate it to Python) if that's of any use to you.
The text was updated successfully, but these errors were encountered: