-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: adjust regex for genome data semantic types #322
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Sann5,
I think you should simplify those regexes even more - there isn't really any need to have that part which recognizes slashes (since it may or may not be there, we may as well jsut get rid of it altogether). Also, you should simplify all the tests: there is no need for all those test cases with pre- and suffixes anymore - one case should be enough.
@misialq done :) |
@misialq I also changed some paths in the transformer tests :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✅ @lizgehret over to you!
changes all look reasonable on my end 👍 |
Regexes defining the file collections of semantic types
GenomeData[Loci]
,GenomeData[Genes]
,GenomeData[Proteins]
had the following structure:<name>
is eitherloci
,genes
orproteins
<format>
is the format of the files like FASTA of gff and<extension>
is the associated file extensionWe propose changing the regex to something like
basically matching any random file name with an optional parent directory. The keyword
<name>
(loci
,genes
orproteins
) is redundant as the specific formats are anyways linked to their respective semantic types. Ergo there is no need to indicate that in file names.Additional Info
This PR blocks bokulich-lab/q2-moshpit#154