Skip to content

Commit

Permalink
Add test results
Browse files Browse the repository at this point in the history
  • Loading branch information
suecharo committed Sep 21, 2022
1 parent b92cd00 commit 3f4810a
Show file tree
Hide file tree
Showing 13 changed files with 3,983 additions and 22 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ It compares the execution results of [nf-core/rnaseq v3.7](https://nf-co.re/rnas

```bash
$ tonkaz ./tests/example_crate/rnaseq_1st.json ./tests/example_crate/rnaseq_2nd.json

# Example output:
$ cat ./tests/comparison_results/rnaseq_same_env.log
```

We provide various examples in the [tests/README.md](./tests/README.md).
Expand Down
71 changes: 49 additions & 22 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,64 @@

This directory contains tests for `Tonkaz`.

These test data are generated using [`sapporo-wes/sapporo-service`](https://github.com/sapporo-wes/sapporo-service) and [`sapporo-wes/yevis-cli`](https://github.com/sapporo-wes/yevis-cli).
These test data are generated using [`sapporo-wes/sapporo-service`](https://github.com/sapporo-wes/sapporo-service), [`sapporo-wes/yevis-cli`](https://github.com/sapporo-wes/yevis-cli) and Tonkaz.

Also, please check [sapporo-wes/test-workflow](https://github.com/sapporo-wes/test-workflow).
The procedure of generating each data is as follows:

```
workflow -- (Sapporo-service/Yevis) --> execution_results + ro_crate -- (Tonkaz) --> comparison_results
```

## Run tests

Several combinations of crates are available as follows:

```bash
# GATK (Linux, 1st) <-> GATK (Linux, 2nd)
# Use case: Same environment
# Result: ./comparison_results/gatk_same_env.log
$ deno test -A ./tests/gatk_test.ts

# GATK (Linux) <-> GATK (Mac)
# Use case: Different environment
# Result: ./comparison_results/gatk_diff_env.log
$ deno test -A ./tests/gatk_mac_test.ts

# JGA (Linux, 1st) <-> JGA (Linux, 2nd)
# Use case: Same environment
# Result: ./comparison_results/jga_same_env.log
$ deno test -A ./tests/jga_test.ts

# JGA (Linux) <-> JGA (Mac)
# Use case: Different environment
# Result: ./comparison_results/jga_diff_env.log
$ deno test -A ./tests/jga_mac_test.ts

# RNA-seq (Linux, 1st) <-> RNA-seq (Linux, 2nd)
# Use case: Same environment
# Result: ./comparison_results/rnaseq_same_env.log
$ deno test -A ./tests/rnaseq_test.ts

# RNA-seq (Linux) <-> RNA-seq (Mac)
# Use case: Different environment
# Result: ./comparison_results/rnaseq_diff_env.log
$ deno test -A ./tests/rnaseq_mac_test.ts

# RNA-seq (Linux, 1st) <-> RNA-seq (Linux, v3.6)
# Use case: Different version
# Result: ./comparison_results/rnaseq_diff_ver.log
$ deno test -A ./tests/rnaseq_v3.6_test.ts

# RNA-seq (Linux, 1st) <-> RNA-seq (Linux, small)
# Use case: Missing dataset
# Result: ./comparison_results/rnaseq_missing_data.log
$ deno test -A ./tests/rnaseq_small_test.ts

# RNA-seq (Linux, 1st) <-> RNA-seq (Linux, small)
# Use case: All files
# Result: ./comparison_results/rnaseq_all_files.log
$ deno test -A ./tests/rnaseq_all_files_test.ts

# RNA-seq (Linux, with yevis) <-> RNA-seq (Linux, only sapporo)
$ deno test -A ./tests/rnaseq_only_sapporo_test.ts

Expand All @@ -44,7 +69,9 @@ $ deno test -A ./tests/trimming_mac_test.ts

## About test data

The json files contained in [`example_crate`](./example_crate) are generated using [`sapporo-wes/sapporo-service`](https://github.com/sapporo-wes/sapporo-service) and [`sapporo-wes/yevis-cli`](https://github.com/sapporo-wes/yevis-cli).
The raw data of workflow execution results are stored in [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7098337.svg)](https://doi.org/10.5281/zenodo.7098337).

The crate files contained in [`example_crate`](./example_crate):

```
example_crate/
Expand All @@ -66,7 +93,7 @@ example_crate/

### Executed environment

About the environment in which the crate was generated.
About the environment in which these crates were generated.

| Field | Linux env | Mac Apple silicon env |
| ----------------------- | ------------------------------------------ | --------------------- |
Expand Down Expand Up @@ -97,6 +124,24 @@ Executed as follows:
$ yevis test --fetch-ro-crate https://raw.githubusercontent.com/sapporo-wes/test-workflow/main/yevis-metadata_gatk-workflows_mitochondria-pipeline.yml
```

### JGA

- Crate:
- [`jga_1st.json`](./example_crate/jga_1st.json)
- Crate generated on `Linux` environment. (1st execution)
- [`jga_2nd.json`](./example_crate/jga_2nd.json)
- Crate generated on `Linux` environment. (2nd execution (same settings))
- [`jga_mac.json`](./example_crate/jga_mac.json)
- Crate generated on `Mac Apple silicon` environment.

See https://github.com/sapporo-wes/test-workflow#biosciencedbcjga-analysis---per-sample-workflow for more details about the executed workflow.

Executed as follows:

```bash
$ yevis test --fetch-ro-crate https://raw.githubusercontent.com/sapporo-wes/test-workflow/main/yevis-metadata_jga-workflow_per-sample.yml
```

### RNA-seq

- Crate:
Expand Down Expand Up @@ -131,24 +176,6 @@ $ yevis test --fetch-ro-crate https://raw.githubusercontent.com/sapporo-wes/test
$ yevis test --fetch-ro-crate https://raw.githubusercontent.com/sapporo-wes/test-workflow/main/yevis-metadata_nf-core_rnaseq_v3.6.yml
```

### JGA

- Crate:
- [`jga_1st.json`](./example_crate/jga_1st.json)
- Crate generated on `Linux` environment. (1st execution)
- [`jga_2nd.json`](./example_crate/jga_2nd.json)
- Crate generated on `Linux` environment. (2nd execution (same settings))
- [`jga_mac.json`](./example_crate/jga_mac.json)
- Crate generated on `Mac Apple silicon` environment.

See https://github.com/sapporo-wes/test-workflow#biosciencedbcjga-analysis---per-sample-workflow for more details about the executed workflow.

Executed as follows:

```bash
$ yevis test --fetch-ro-crate https://raw.githubusercontent.com/sapporo-wes/test-workflow/main/yevis-metadata_jga-workflow_per-sample.yml
```

### Trimming

- Crate:
Expand Down
69 changes: 69 additions & 0 deletions tests/comparison_results/gatk_diff_env.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
Tonkaz 0.2.2

Checking Crate2 based on Crate1:

Crate1: ./tests/example_crate/gatk_1st.json
Crate2: ./tests/example_crate/gatk_mac.json

.----------------------------------------------------------------------------------------------.
| | Crate1 | Crate2 |
|----------------|--------------------------------------|--------------------------------------|
| WF Name | GATK workf ... Mitochondria Pipeline | GATK workf ... Mitochondria Pipeline |
| WF ID | 68c32f9c-bdaa-420c-879c-90b40d8bc4d5 | 68c32f9c-bdaa-420c-879c-90b40d8bc4d5 |
| WF Ver | 1.0.0 | 1.0.0 |
| WF Type | Workflow Description Language | Workflow Description Language |
| WF Type Ver | 1.0 | 1.0 |
| WF Eng Name | cromwell | cromwell |
| WF Eng Version | 80 | 80 |
| Sapporo Ver | 1.4.8 | 1.4.8 |
| Run Name | example_test | example_test |
| Run State | COMPLETE  | EXECUTOR_ERROR  |
| ExitCode | 0  | 1  |
| Start Time | 2022-09-08 09:30:03 | 2022-09-14 05:42:22 |
| End Time | 2022-09-08 09:58:32 | 2022-09-14 05:44:11 |
| Duration | 28m 29s | 1m 49s |
| # Attachments | 2 files | 2 files |
| # Intermediate | 314 files  | 12 files  |
| # Outputs | 13 files (5 EDAM-assigned files)  | 0 files (0 EDAM-assigned files)  |
'----------------------------------------------------------------------------------------------'
* EDAM extensions: .bam/.bb/.bed/.bw/.fa/.fasta/.fastq/.fastq.gz/.fq/.fq.gz/.gtf/.gff/.sam/.vcf/.vcf.gz/.wig

Comparing workflow results...
Calculate the reproducibility level by comparing the EDAM-assigned output files of Crate1 and Crate2. (option `--all` to use all output files)

Reproducibility level is defined as follows:

- Level3 ⭐⭐⭐ : Files are identical with the same checksum
- Level2 ⭐⭐ : Files are different, but their features (file size, map rate, etc.) are similar (within threshold: 0.05)
- Level1 ⭐ : Files are different, and their features are different (beyond threshold)
- Level0 : File not found

Level3: "Fully Reproduced" <---> Level0: "Not Reproduced"

=== Level3 ⭐⭐⭐ (Same Checksum, 0/0 files)

=== Level2 ⭐⭐ (Similar Features, 0/0 files)

=== Level1 ⭐ (Different Features, 0/0 files)

=== Level0 (Not Found, Crate1: 5 files, Crate2: 0 files)

- Only in Crate1:

- outputs/G97753.NA12878.bam
- outputs/G97753.NA12878.final.split.vcf
- outputs/G97753.NA12878.realigned.bam
- outputs/G97753.NA12878.vcf
- outputs/splitAndPassOnly.vcf

Summarize compare result:

.---------------------------------------------------------------------------.
| Reproducibility | Level | Definition | File # |
|---------------------------|-----------|---------------------|-------------|
| Fully Reproduced | ⭐⭐⭐ | Same Checksum | 0 files |
| Acceptable Differences | ⭐⭐ | Similar Features | 0 files |
| Unacceptable Differences | ⭐ | Different Features | 0 files |
| Not Reproduced | | Not Found | 5 files |
'---------------------------------------------------------------------------'

108 changes: 108 additions & 0 deletions tests/comparison_results/gatk_same_env.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
Tonkaz 0.2.2

Checking Crate2 based on Crate1:

Crate1: ./tests/example_crate/gatk_1st.json
Crate2: ./tests/example_crate/gatk_2nd.json

.----------------------------------------------------------------------------------------------.
| | Crate1 | Crate2 |
|----------------|--------------------------------------|--------------------------------------|
| WF Name | GATK workf ... Mitochondria Pipeline | GATK workf ... Mitochondria Pipeline |
| WF ID | 68c32f9c-bdaa-420c-879c-90b40d8bc4d5 | 68c32f9c-bdaa-420c-879c-90b40d8bc4d5 |
| WF Ver | 1.0.0 | 1.0.0 |
| WF Type | Workflow Description Language | Workflow Description Language |
| WF Type Ver | 1.0 | 1.0 |
| WF Eng Name | cromwell | cromwell |
| WF Eng Version | 80 | 80 |
| Sapporo Ver | 1.4.8 | 1.4.8 |
| Run Name | example_test | example_test |
| Run State | COMPLETE | COMPLETE |
| ExitCode | 0 | 0 |
| Start Time | 2022-09-08 09:30:03 | 2022-09-08 08:59:34 |
| End Time | 2022-09-08 09:58:32 | 2022-09-08 09:26:25 |
| Duration | 28m 29s | 26m 51s |
| # Attachments | 2 files | 2 files |
| # Intermediate | 314 files | 314 files |
| # Outputs | 13 files (5 EDAM-assigned files) | 13 files (5 EDAM-assigned files) |
'----------------------------------------------------------------------------------------------'
* EDAM extensions: .bam/.bb/.bed/.bw/.fa/.fasta/.fastq/.fastq.gz/.fq/.fq.gz/.gtf/.gff/.sam/.vcf/.vcf.gz/.wig

Comparing workflow results...
Calculate the reproducibility level by comparing the EDAM-assigned output files of Crate1 and Crate2. (option `--all` to use all output files)

Reproducibility level is defined as follows:

- Level3 ⭐⭐⭐ : Files are identical with the same checksum
- Level2 ⭐⭐ : Files are different, but their features (file size, map rate, etc.) are similar (within threshold: 0.05)
- Level1 ⭐ : Files are different, and their features are different (beyond threshold)
- Level0 : File not found

Level3: "Fully Reproduced" <---> Level0: "Not Reproduced"

=== Level3 ⭐⭐⭐ (Same Checksum, 0/5 files)

=== Level2 ⭐⭐ (Similar Features, 5/5 files)

- G97753.NA12878.bam
.--------------------------------------------------------------------.
| | in Crate1 | in Crate2 |
|----------------|-------------------------|-------------------------|
| File Size | 147.78 MB (154962060)  | 147.78 MB (154962147)  |
| Total Reads | 2349886 | 2349886 |
| # Mapped | 2349886 (100.00%) | 2349886 (100.00%) |
| # Duplicate | 719878 (30.63%) | 719878 (30.63%) |
'--------------------------------------------------------------------'

- G97753.NA12878.final.split.vcf
.--------------------------------------------------------------------.
| | in Crate1 | in Crate2 |
|----------------|-------------------------|-------------------------|
| File Size | 18.51 KB (18955)  | 18.51 KB (18953)  |
| Line Count | 107 | 107 |
'--------------------------------------------------------------------'

- G97753.NA12878.realigned.bam
.--------------------------------------------------------------------.
| | in Crate1 | in Crate2 |
|----------------|-------------------------|-------------------------|
| File Size | 98.46 MB (103241649)  | 98.46 MB (103241732)  |
| Total Reads | 2361236 | 2361236 |
| # Mapped | 2361223 (100.00%) | 2361223 (100.00%) |
| # Duplicate | 723304 (30.63%) | 723304 (30.63%) |
'--------------------------------------------------------------------'

- G97753.NA12878.vcf
.--------------------------------------------------------------------.
| | in Crate1 | in Crate2 |
|----------------|-------------------------|-------------------------|
| File Size | 24.48 KB (25069)  | 24.48 KB (25063)  |
| Line Count | 99 | 99 |
'--------------------------------------------------------------------'

- splitAndPassOnly.vcf
.--------------------------------------------------------------------.
| | in Crate1 | in Crate2 |
|----------------|-------------------------|-------------------------|
| File Size | 15.85 KB (16229)  | 15.85 KB (16228)  |
| Line Count | 96 | 96 |
| Variant Count | 18 | 18 |
| SNPs Count | 16 | 16 |
| Indels Count | 2 | 2 |
'--------------------------------------------------------------------'

=== Level1 ⭐ (Different Features, 0/5 files)

=== Level0 (Not Found, Crate1: 0 files, Crate2: 0 files)

Summarize compare result:

.---------------------------------------------------------------------------.
| Reproducibility | Level | Definition | File # |
|---------------------------|-----------|---------------------|-------------|
| Fully Reproduced | ⭐⭐⭐ | Same Checksum | 0 files |
| Acceptable Differences | ⭐⭐ | Similar Features | 5 files |
| Unacceptable Differences | ⭐ | Different Features | 0 files |
| Not Reproduced | | Not Found | 0 files |
'---------------------------------------------------------------------------'

Loading

0 comments on commit 3f4810a

Please sign in to comment.