Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: adjust regex for genome data semantic types #322

Merged
merged 5 commits into from
May 2, 2024

Conversation

Sann5
Copy link
Contributor

@Sann5 Sann5 commented Apr 30, 2024

Regexes defining the file collections of semantic types GenomeData[Loci], GenomeData[Genes], GenomeData[Proteins] had the following structure:

class <name>DirectoryFormat(model.DirectoryFormat):
    <name> = model.FileCollection(r'(.*\_)?<name>[0-9]*\.<extension>$',
                                format=<format>)
  • where <name> is either loci, genes or proteins
  • <format> is the format of the files like FASTA of gff and <extension> is the associated file extension

We propose changing the regex to something like

(.*\/)?(.*)\.<extension>$

basically matching any random file name with an optional parent directory. The keyword <name> (loci, genes or proteins) is redundant as the specific formats are anyways linked to their respective semantic types. Ergo there is no need to indicate that in file names.

Additional Info

This PR blocks bokulich-lab/q2-moshpit#154

Copy link
Collaborator

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Sann5,
I think you should simplify those regexes even more - there isn't really any need to have that part which recognizes slashes (since it may or may not be there, we may as well jsut get rid of it altogether). Also, you should simplify all the tests: there is no need for all those test cases with pre- and suffixes anymore - one case should be enough.

@Sann5
Copy link
Contributor Author

Sann5 commented May 2, 2024

@misialq done :)

@Sann5
Copy link
Contributor Author

Sann5 commented May 2, 2024

@misialq I also changed some paths in the transformer tests :)

Copy link
Collaborator

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅ @lizgehret over to you!

@lizgehret
Copy link
Member

changes all look reasonable on my end 👍

@lizgehret lizgehret merged commit 2883160 into qiime2:dev May 2, 2024
4 checks passed
@Sann5 Sann5 deleted the refactor_genome_data branch May 3, 2024 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Completed
Development

Successfully merging this pull request may close these issues.

3 participants