-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Changed read_in_txt
helper function to use unique column as index for allele mapping data
#27
BUG: Changed read_in_txt
helper function to use unique column as index for allele mapping data
#27
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #27 +/- ##
==========================================
+ Coverage 93.35% 93.37% +0.01%
==========================================
Files 18 18
Lines 1054 1056 +2
==========================================
+ Hits 984 986 +2
Misses 70 70 ☔ View full report in Codecov by Sentry. |
Hey @ChristosMatzoros, could you please check this one out? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting this together @VinzentRisch. I executed both the old and new code and achieved the expected results. You did an excellent job introducing the unique identifier for allele mappings, effectively preventing duplicate IDs in the biom table. I consider this to be the most straightforward and clean solution. Regarding the size of the unique identifiers, I believe they do not detract from readability; quite the opposite, they may provide important information to the user. Everything seems ok from my side!
Hi @misialq, @ChristosMatzoros just reviewed and approved my changes, but i still can't merge the PR. |
Hey @VinzentRisch, sorry about that - I fixed the permissions and you should be able to merge now (after updating). 🚀 |
This PR was created to fix #26.
read_in_txt
.map_type
is set to "allele" the column Reference Sequence is used as the index for the count tables. Reference Sequence values are unique and this solves the issue with the biom table format. Only downside is that the unique identifiers are quite long and not very readable (eg. Prevalence_Sequence_ID:97743|ID:1437|Name:AcrF|ARO:3000502) instead of just AcrFmap_type
is set to "gene" the behaviour is as before and the column ARO Term is used as the index.Set up an environment
Run it locally
Download test files reads.qza and card_db.qza
Test it out!
Takes about 10 min.