MAINT: adjust regex for genome data semantic types #322

Sann5 · 2024-04-30T12:07:51Z

Regexes defining the file collections of semantic types GenomeData[Loci], GenomeData[Genes], GenomeData[Proteins] had the following structure:

class <name>DirectoryFormat(model.DirectoryFormat):
    <name> = model.FileCollection(r'(.*\_)?<name>[0-9]*\.<extension>$',
                                format=<format>)

where <name> is either loci, genes or proteins
<format> is the format of the files like FASTA of gff and <extension> is the associated file extension

We propose changing the regex to something like

(.*\/)?(.*)\.<extension>$

basically matching any random file name with an optional parent directory. The keyword <name> (loci, genes or proteins) is redundant as the specific formats are anyways linked to their respective semantic types. Ergo there is no need to indicate that in file names.

Additional Info

This PR blocks bokulich-lab/q2-moshpit#154

misialq

Hey @Sann5,
I think you should simplify those regexes even more - there isn't really any need to have that part which recognizes slashes (since it may or may not be there, we may as well jsut get rid of it altogether). Also, you should simplify all the tests: there is no need for all those test cases with pre- and suffixes anymore - one case should be enough.

Sann5 · 2024-05-02T13:36:51Z

@misialq done :)

q2_types/genome_data/tests/test_format.py

Sann5 · 2024-05-02T14:28:01Z

@misialq I also changed some paths in the transformer tests :)

misialq

LGTM ✅ @lizgehret over to you!

lizgehret · 2024-05-02T18:12:48Z

changes all look reasonable on my end 👍

adjust regex for genome data semantic types

235e4cd

github-actions bot mentioned this pull request Apr 30, 2024

ENH: allow SampleData[MAGs] as input to predict-genes-prodigal bokulich-lab/q2-moshpit#154

Merged

Sann5 added 2 commits April 30, 2024 16:11

adjust tests

3e90c57

adjust data paths in setup.py

1eb31d8

misialq requested changes May 1, 2024

View reviewed changes

simplify regex and tests

084b5f1

misialq reviewed May 2, 2024

View reviewed changes

q2_types/genome_data/tests/test_format.py Outdated Show resolved Hide resolved

rename '*-with-suffix' folders and paths to '*'

b168eb3

misialq approved these changes May 2, 2024

View reviewed changes

misialq assigned lizgehret May 2, 2024

lizgehret approved these changes May 2, 2024

View reviewed changes

lizgehret merged commit 2883160 into qiime2:dev May 2, 2024
4 checks passed

Sann5 deleted the refactor_genome_data branch May 3, 2024 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: adjust regex for genome data semantic types #322

MAINT: adjust regex for genome data semantic types #322

Sann5 commented Apr 30, 2024 •

edited

Loading

misialq left a comment

Sann5 commented May 2, 2024

Sann5 commented May 2, 2024

misialq left a comment

lizgehret commented May 2, 2024

MAINT: adjust regex for genome data semantic types #322

MAINT: adjust regex for genome data semantic types #322

Conversation

Sann5 commented Apr 30, 2024 • edited Loading

Additional Info

misialq left a comment

Choose a reason for hiding this comment

Sann5 commented May 2, 2024

Sann5 commented May 2, 2024

misialq left a comment

Choose a reason for hiding this comment

lizgehret commented May 2, 2024

Sann5 commented Apr 30, 2024 •

edited

Loading