Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST]: skip over/ignore short sequences #62

Open
ptrebert opened this issue Jan 11, 2024 · 3 comments
Open

[FEATURE REQUEST]: skip over/ignore short sequences #62

ptrebert opened this issue Jan 11, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@ptrebert
Copy link

ptrebert commented Jan 11, 2024

Is this a feature request for FCS-adaptor or FCS-GX?
FCS-adaptor (I am using v0.4.0)

Describe the problem you'd like to be solved
Don't fail on sequences <10 bp

Describe the solution you'd like
Please add a CLI switch to simply skip over/ignore sequences that are shorter than 10bp

Describe alternatives you've considered
Checking/filtering all input sequences beforehand, which implies that each sequence file is processed at least twice (checking and then adaptor scanning)

Thanks

@etvedte
Copy link
Contributor

etvedte commented Jan 11, 2024

Can I ask what context you are working with sequences <10 bp? NCBI GenBank submissions have length requirements (200 bp for genome sequences, 10 bp for others, hence the validation check included here). If these aren't intended for submission to NCBI archives, that's fine, and if we can consider adding this as an optional flag in a future release. For now my best suggestion would be a workaround to extract the short sequences, set them aside while running FCS-adaptor on larger sequences, then add them back in.

@ptrebert
Copy link
Author

I am not "working" on these sequences, or at least I strongly assume that this is some garbage contained in a handful of the genome assemblies I am analyzing. If that length requirement exists due to strict filter criteria for submissions, maybe a cleaner solution would be to have a strict setting that is used/on by default that would simply label too short sequences for removal, i.e. analogous to flagging any remainders of adaptor sequences.

In my case, of course, I am now pre-filtering the assembly FASTAs.

@etvedte
Copy link
Contributor

etvedte commented Jan 18, 2024

Point taken. We will consider this for our next release.

@etvedte etvedte added the enhancement New feature or request label Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants