Skip to content

Commit

Permalink
test: add unit tests (#72)
Browse files Browse the repository at this point in the history
* ci: add pytest

* build: add pysam dependency

* build: add pytest and coverage dependencies

* refactor: add necessary things to pass pytest

* test: add test for filter_multimappers.py

* test: add test file for filter_multimappers.py

* refactor: change some lines

* revert: revert to initial version

* test: remove test file

* test: remove test file

* test: add test file

* test: modify test_write_output

* build: add pysam dependencies

* test: add coverage file

* Delete snakemake_report.html

* build: delete unused import

* build: remove pysam dependency

* build: remove pytest-cov and add pysam dependency

* ci: remove .coverage

* refacor: change path to files

* ci: ypdate unit tests call

* test: add files for testing

* test: add test for secondary and supp. alingments

* test: add pragma to avoid testing

* ci: update codecov-actions to v3

* ci: update pytest call

* build: add pytest-cov dependency

* ci: update pytest call

* ci: update pytest call

* ci: update pytest call

* ci: update pytest call

* ci: update pytest call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report and pytest call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage report call

* ci: update coverage call

* ci: update coverage call

* ci: update coverage call

* ci: update coverage call

* ci: update submit coverage call

* style: correct whitespaces in pragma

* refactor: change first iteration

* refactor: delete unuseful tests

* refactor: modify functions

* test: modify output file

* refactor: remove skip secondary alns

* test: add codecov.yml

* test: add codecov.yml

* test: add codecov.yml

* test: update path on codecov.yml

* refactor: remove first assignment

* test: add token in coverage call

* build: remove codecov.yml

* test: delete coverage call

* style: remove whitespaces

---------

Co-authored-by: Iris Mestres <[email protected]>
  • Loading branch information
deliaBlue and Iris Mestres authored Apr 29, 2023
1 parent 5585ff9 commit e5f282e
Show file tree
Hide file tree
Showing 15 changed files with 419 additions and 21 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ jobs:
defaults:
run:
shell: bash -l {0}

steps:

- name: Checkout Repository
Expand Down Expand Up @@ -43,3 +44,31 @@ jobs:

- name: Run local test
run: bash test/test_workflow_local.sh

unit-testing:
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}

steps:
- name: Checkout Repository
uses: actions/checkout@v3

- name: Setup environment
uses: conda-incubator/setup-miniconda@v2
with:
mamba-version: "*"
activate-environment: mirflowz
environment-file: environment.yml
auto-activate-base: false

- name: Update mirflowz env with root packages
run: mamba env update -n mirflowz -f environment.root.yml

- name: Update mirflowz env with dev packages
run: mamba env update -n mirflowz -f environment.dev.yml

- name: run unit tests
working-directory: ./scripts/tests
run: pytest --cov=scripts --cov-branch --cov-report=term-missing
5 changes: 5 additions & 0 deletions environment.dev.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
name: mirflowz
channels:
- bioconda
- defaults
dependencies:
- coverage>=7.2
- pytest>=7.1.2
- pytest-cov
- pysam>=0.21.0
- snakefmt
45 changes: 24 additions & 21 deletions scripts/filter_multimappers.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def parse_arguments():
return parser


def count_indels(aln: pysam.AlignedSegment) -> int:
def count_indels(aln: pysam.libcalignedsegment.AlignedSegment) -> int:
"""Count the number of indels in an alignment based on its CIGAR string.
This function counts the number of indels in the alignment based on the
Expand Down Expand Up @@ -101,29 +101,29 @@ def find_best_alignments(alns: List[pysam.AlignedSegment]) -> List[pysam.Aligned
Retrns:
best_alignments: alignments with the less indels
"""
aln_indels = [(aln, count_indels(alignment=aln)) for aln in alns]
min_indels = min(aln_indels, key=lambda x: x[1])[1]
best_alignments = [aln for i, (aln, indels) in enumerate(aln_indels) if indels == min_indels]
if len(alns) == 1:
return alns

for i in range(len(best_alignments)):
best_alignments[i].set_tag('NH', len(best_alignments))
best_alignments[i].set_tag('HI', i + 1)
else:
aln_indels = [(aln, count_indels(aln=aln)) for aln in alns]
min_indels = min(aln_indels, key=lambda x: x[1])[1]
best_alignments = [aln for i, (aln, indels) in enumerate(aln_indels) if indels == min_indels]

for i in range(len(best_alignments)):
best_alignments[i].set_tag('NH', len(best_alignments))
best_alignments[i].set_tag('HI', i + 1)

return best_alignments
return best_alignments


def write_output(alignments: List[pysam.AlignedSegment]) -> None:
def write_output(alns: List[pysam.AlignedSegment]) -> None:
"""Write the output to the standard output (stdout).
Args:
alignments: alignments with the same query name
"""
if len(alignments) == 1:
sys.stdout.write(alignments[0].to_string() + '\n')
else:
best_alignments = find_best_alignments(alignments=alignments)
for alignment in best_alignments:
sys.stdout.write(alignment.to_string() + '\n')
for alignment in alns:
sys.stdout.write(alignment.to_string() + '\n')


def main(sam_file: Path) -> None:
Expand All @@ -136,12 +136,12 @@ def main(sam_file: Path) -> None:

sys.stdout.write(str(samfile.header))

current_query = None
current_alignments: list[pysam.AlignedSegment] = []
current_query = None

for alignment in samfile:

if alignment.is_secondary or alignment.is_supplementary:
if alignment.is_supplementary:
continue

if current_query is None:
Expand All @@ -151,14 +151,17 @@ def main(sam_file: Path) -> None:
current_alignments.append(alignment)

else:
write_output(alignments=current_alignments)
current_alignments = find_best_alignments(current_alignments)
write_output(alns=current_alignments)

current_query = alignment.query_name
current_alignments = [alignment]

write_output(alignments=current_alignments)
if len(current_alignments) > 0:
current_alignments = find_best_alignments(current_alignments)
write_output(alns=current_alignments)


if __name__ == "__main__":
args = parse_arguments().parse_args()
main(sam_file=args.infile)
args = parse_arguments().parse_args() # pragma:no cover
main(sam_file=args.infile) # pragma: no cover
4 changes: 4 additions & 0 deletions scripts/tests/files/in_sam_diff_multimappers.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
@HD VN:1.0 SO:queryname
@SQ SN:19 LN:600000 M5:f9635dbff42b16049de830a7a121fa12 UR:NA
1-1 16 19 77595 255 14M1D8M * 0 0 GCAGGAGAATCACTGATGTCAG * MD:Z:14^T2A1C3 NH:i:2 NM:i:3 XA:Z:Q XI:i:1
1-1 0 19 330456 255 4M1D1M1I3M1D13M * 0 0 CTGACATCAGTGATTCTCCTGC * MD:Z:4^G4^A13 NH:i:2 NM:i:3 XA:Z:Q XI:i:0
3 changes: 3 additions & 0 deletions scripts/tests/files/in_sam_empty.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
@HD VN:1.0 SO:queryname
@SQ SN:19 LN:600000 M5:f9635dbff42b16049de830a7a121fa12 UR:NA
@PG ID:samtools PN:samtools VN:1.16.1 CL:samtools sort -n -o results/test_lib/header_sorted_catMappings.sam results/test_lib/concatenated_header_catMappings.sam
5 changes: 5 additions & 0 deletions scripts/tests/files/in_sam_equal_multimappers.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
@HD VN:1.0 SO:queryname
@SQ SN:19 LN:600000 M5:f9635dbff42b16049de830a7a121fa12 UR:NA
1-1 0 19 142777 255 15M1I5M * 0 0 GCTAGGTGGGAGGCTTGAAGC * MD:Z:4C0T14 NH:i:3 NM:i:3 XA:Z:Q XI:i:0
1-1 16 19 270081 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14G0G4 NH:i:3 NM:i:3 XA:Z:Q XI:i:2
1-1 16 19 545543 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14A0G4 NH:i:3 NM:i:3 XA:Z:Q XI:i:1
13 changes: 13 additions & 0 deletions scripts/tests/files/in_sam_multimappers.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
@HD VN:1.0 SO:queryname
@SQ SN:19 LN:600000 M5:f9635dbff42b16049de830a7a121fa12 UR:NA
1-1 0 19 7589 255 24M * 0 0 ATTTCAAGCCAGGTGGCGTTTTTC * MD:Z:13G10 NH:i:1 NM:i:1
2-1 0 19 63250 255 22M * 0 0 GAAAGCGCTTCGCTTCAGAGTG * MD:Z:11C10 NH:i:1 NM:i:1
3-1 0 19 63251 255 22M * 0 0 AAAGCGCTTCCCTTCAGAGTGA * MD:Z:21T NH:i:1 NM:i:1
4-1 0 19 63250 255 22M * 0 0 TAAAGCGCTTCCCTTCAGAGTG * MD:Z:G21 NH:i:1 NM:i:1
5-1 0 19 7589 255 24M * 0 0 CTTTCAAGCCAGGGGGCGTTTTTC * MD:Z:A23 NH:i:1 NM:i:1
6-1 0 19 7590 255 24M * 0 0 TTTCAAGCCAGGTGGCGTTTTTCT * MD:Z:12G11 NH:i:1 NM:i:1
7-1 0 19 142777 255 15M1I5M * 0 0 GCTAGGTGGGAGGCTTGAAGC * MD:Z:4C0T14 NH:i:3 NM:i:3 XA:Z:Q XI:i:0
7-1 16 19 270081 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14G0G4 NH:i:3 NM:i:3 XA:Z:Q XI:i:2
7-1 16 19 545543 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14A0G4 NH:i:3 NM:i:3 XA:Z:Q XI:i:1
8-1 16 19 77595 255 14M1D8M * 0 0 GCAGGAGAATCACTGATGTCAG * MD:Z:14^T2A1C3 NH:i:2 NM:i:3 XA:Z:Q XI:i:1
8-1 0 19 330456 255 4M1D1M1I3M1D13M * 0 0 CTGACATCAGTGATTCTCCTGC * MD:Z:4^G4^A13 NH:i:2 NM:i:3 XA:Z:Q XI:i:0
13 changes: 13 additions & 0 deletions scripts/tests/files/in_sam_sec_sup.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
@HD VN:1.0 SO:queryname
@SQ SN:19 LN:600000 M5:f9635dbff42b16049de830a7a121fa12 UR:NA
1-1 0 19 7589 255 24M * 0 0 ATTTCAAGCCAGGTGGCGTTTTTC * MD:Z:13G10 NH:i:1 NM:i:1
2-1 0 19 63250 255 22M * 0 0 GAAAGCGCTTCGCTTCAGAGTG * MD:Z:11C10 NH:i:1 NM:i:1
3-1 2128 19 63251 255 22M * 0 0 AAAGCGCTTCCCTTCAGAGTGA * MD:Z:21T NH:i:1 NM:i:1
4-1 0 19 63250 255 22M * 0 0 TAAAGCGCTTCCCTTCAGAGTG * MD:Z:G21 NH:i:1 NM:i:1
5-1 2064 19 7589 255 24M * 0 0 CTTTCAAGCCAGGGGGCGTTTTTC * MD:Z:A23 NH:i:1 NM:i:1
6-1 256 19 7590 255 24M * 0 0 TTTCAAGCCAGGTGGCGTTTTTCT * MD:Z:12G11 NH:i:1 NM:i:1
7-1 0 19 142777 255 15M1I5M * 0 0 GCTAGGTGGGAGGCTTGAAGC * MD:Z:4C0T14 NH:i:3 NM:i:3 XA:Z:Q XI:i:0
7-1 16 19 270081 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14G0G4 NH:i:3 NM:i:3 XA:Z:Q XI:i:2
7-1 16 19 545543 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14A0G4 NH:i:3 NM:i:3 XA:Z:Q XI:i:1
8-1 16 19 77595 255 14M1D8M * 0 0 GCAGGAGAATCACTGATGTCAG * MD:Z:14^T2A1C3 NH:i:2 NM:i:3 XA:Z:Q XI:i:1
8-1 0 19 330456 255 4M1D1M1I3M1D13M * 0 0 CTGACATCAGTGATTCTCCTGC * MD:Z:4^G4^A13 NH:i:2 NM:i:3 XA:Z:Q XI:i:0
1 change: 1 addition & 0 deletions scripts/tests/files/out_sam_diff_multimappers.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1-1 16 19 77595 255 14M1D8M * 0 0 GCAGGAGAATCACTGATGTCAG * MD:Z:14^T2A1C3 NM:i:3 XA:Z:Q XI:i:1 NH:i:1 HI:i:1
3 changes: 3 additions & 0 deletions scripts/tests/files/out_sam_empty.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
@HD VN:1.0 SO:queryname
@SQ SN:19 LN:600000 M5:f9635dbff42b16049de830a7a121fa12 UR:NA
@PG ID:samtools PN:samtools VN:1.16.1 CL:samtools sort -n -o results/test_lib/header_sorted_catMappings.sam results/test_lib/concatenated_header_catMappings.sam
3 changes: 3 additions & 0 deletions scripts/tests/files/out_sam_equal_multimappers.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
1-1 0 19 142777 255 15M1I5M * 0 0 GCTAGGTGGGAGGCTTGAAGC * MD:Z:4C0T14 NM:i:3 XA:Z:Q XI:i:0 NH:i:3 HI:i:1
1-1 16 19 270081 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14G0G4 NM:i:3 XA:Z:Q XI:i:2 NH:i:3 HI:i:2
1-1 16 19 545543 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14A0G4 NM:i:3 XA:Z:Q XI:i:1 NH:i:3 HI:i:3
12 changes: 12 additions & 0 deletions scripts/tests/files/out_sam_multimappers.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
@HD VN:1.0 SO:queryname
@SQ SN:19 LN:600000 M5:f9635dbff42b16049de830a7a121fa12 UR:NA
1-1 0 19 7589 255 24M * 0 0 ATTTCAAGCCAGGTGGCGTTTTTC * MD:Z:13G10 NH:i:1 NM:i:1
2-1 0 19 63250 255 22M * 0 0 GAAAGCGCTTCGCTTCAGAGTG * MD:Z:11C10 NH:i:1 NM:i:1
3-1 0 19 63251 255 22M * 0 0 AAAGCGCTTCCCTTCAGAGTGA * MD:Z:21T NH:i:1 NM:i:1
4-1 0 19 63250 255 22M * 0 0 TAAAGCGCTTCCCTTCAGAGTG * MD:Z:G21 NH:i:1 NM:i:1
5-1 0 19 7589 255 24M * 0 0 CTTTCAAGCCAGGGGGCGTTTTTC * MD:Z:A23 NH:i:1 NM:i:1
6-1 0 19 7590 255 24M * 0 0 TTTCAAGCCAGGTGGCGTTTTTCT * MD:Z:12G11 NH:i:1 NM:i:1
7-1 0 19 142777 255 15M1I5M * 0 0 GCTAGGTGGGAGGCTTGAAGC * MD:Z:4C0T14 NM:i:3 XA:Z:Q XI:i:0 NH:i:3 HI:i:1
7-1 16 19 270081 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14G0G4 NM:i:3 XA:Z:Q XI:i:2 NH:i:3 HI:i:2
7-1 16 19 545543 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14A0G4 NM:i:3 XA:Z:Q XI:i:1 NH:i:3 HI:i:3
8-1 16 19 77595 255 14M1D8M * 0 0 GCAGGAGAATCACTGATGTCAG * MD:Z:14^T2A1C3 NM:i:3 XA:Z:Q XI:i:1 NH:i:1 HI:i:1
10 changes: 10 additions & 0 deletions scripts/tests/files/out_sam_sec_sup.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
@HD VN:1.0 SO:queryname
@SQ SN:19 LN:600000 M5:f9635dbff42b16049de830a7a121fa12 UR:NA
1-1 0 19 7589 255 24M * 0 0 ATTTCAAGCCAGGTGGCGTTTTTC * MD:Z:13G10 NH:i:1 NM:i:1
2-1 0 19 63250 255 22M * 0 0 GAAAGCGCTTCGCTTCAGAGTG * MD:Z:11C10 NH:i:1 NM:i:1
4-1 0 19 63250 255 22M * 0 0 TAAAGCGCTTCCCTTCAGAGTG * MD:Z:G21 NH:i:1 NM:i:1
6-1 256 19 7590 255 24M * 0 0 TTTCAAGCCAGGTGGCGTTTTTCT * MD:Z:12G11 NH:i:1 NM:i:1
7-1 0 19 142777 255 15M1I5M * 0 0 GCTAGGTGGGAGGCTTGAAGC * MD:Z:4C0T14 NM:i:3 XA:Z:Q XI:i:0 NH:i:3 HI:i:1
7-1 16 19 270081 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14G0G4 NM:i:3 XA:Z:Q XI:i:2 NH:i:3 HI:i:2
7-1 16 19 545543 255 6M1I14M * 0 0 GCTTCAAGCCTCCCACCTAGC * MD:Z:14A0G4 NM:i:3 XA:Z:Q XI:i:1 NH:i:3 HI:i:3
8-1 16 19 77595 255 14M1D8M * 0 0 GCAGGAGAATCACTGATGTCAG * MD:Z:14^T2A1C3 NM:i:3 XA:Z:Q XI:i:1 NH:i:1 HI:i:1
7 changes: 7 additions & 0 deletions scripts/tests/files/sam_no_multimappers.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
@HD VN:1.0 SO:queryname
@SQ SN:19 LN:600000 M5:f9635dbff42b16049de830a7a121fa12 UR:NA
1-1 0 19 44414 255 21M * 0 0 GAAGGCGCTTCCCTTTGGAGT * MD:Z:21 NH:i:1 NM:i:0
2-1 0 19 44377 255 11M3I11M * 0 0 CTACAAAGGGAGGTAGCACTTTCTC * MD:Z:22 NH:i:1 NM:i:3 XA:Z:Q XI:i:0
3-1 0 19 7590 255 20M3I3M * 0 0 TTTCAAGCCAGGGGGCGTTTCCGTTC * MD:Z:23 NH:i:1 NM:i:3 XA:Z:Q XI:i:0
4-1 0 19 7590 255 23M * 0 0 TTTCAAGTCAGGGGGCGTTTTTC * MD:Z:7C15 NH:i:1 NM:i:1
5-1 0 19 30893 255 32M2I53M * 0 0 CTCAAGCTGTGACTCTCCAGAGGGATGCACTTGATCTCTTATGTGAAAAAAAAGAAGGCGCTTCCCTTTAGAGCGTTACGGTTTGGG * MD:Z:85 NH:i:1 NM:i:2 XA:Z:Q XI:i:0
Loading

0 comments on commit e5f282e

Please sign in to comment.