Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved QC and inclusion of end of execution reports #15

Merged
merged 9 commits into from
Aug 19, 2024

Conversation

mberacochea
Copy link
Member

QC failed and assembed runs .csv "end of run" reports.

Using the fastq json file and some groovy, I've added an exclusion and reporting mechnism.

I've added two new QC options for miassembler:
--filter_ratio_threshold:
- The maximum allowed ratio of reads after filtering. If more than 90% of the reads are filtered out, the threshold is considered exceeded, and the run is not assembled. [default: 0.9]

--low_reads_count_threshold:
- The minimum number of reads required after filtering. If the read count falls below this threshold, the run is not assembled. [default: 1000]

If either threshold is exceeded the run is skipped, and the accessions are recorded in a CSV file (e.g., SRR6180434,filter_ratio_threshold_exceeded).

[Warning]
The unit tests need to be adjusted, and these changes should be tested thoroughly. I have only tested this locally.

Using the fastq json file and some groovy, I've added an exclusion and reporting mechnism.

I've added two new QC options for miassembler:
  --filter_ratio_threshold:
    - The maximum allowed ratio of reads after filtering. If more than 90% of the reads are filtered out, the threshold is considered exceeded, and the run is not assembled. [default: 0.9]

  --low_reads_count_threshold:
    - The minimum number of reads required after filtering. If the read count falls below this threshold, the run is not assembled. [default: 1000]

If either threshold is exceeded the run is skipped, and the accessions are recorded in a CSV file (e.g., `SRR6180434,filter_ratio_threshold_exceeded`).
Copy link
Member

@SandyRogers SandyRogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mberacochea !

My only request for now is to add an example head of the assembled_runs.csv and qc_failed_runs.csv files to the README.

nextflow_schema.json Outdated Show resolved Hide resolved
workflows/miassembler.nf Outdated Show resolved Hide resolved
Copy link
Member

@Ge94 Ge94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks Martin! Just a note on commented code (unless I misunderstood the filter) :)

Copy link
Contributor

@KateSakharova KateSakharova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That groovi magic is out of my understanding :D
Could you add an example of output .csv with failed runs? Better even add it to README I think...

…ports.

The fetch tool unit tests are not failling ATM, ENA API issues.
Copy link
Member

@SandyRogers SandyRogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mberacochea ! One tiny wording suggestion on README.

README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@KateSakharova KateSakharova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@mberacochea mberacochea merged commit 132a251 into main Aug 19, 2024
1 check passed
@mberacochea mberacochea changed the title QC filtering bits. Improved QC and inclusion of end of execution reports Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants