Skip to content

Commit

Permalink
reduce wordcount paper
Browse files Browse the repository at this point in the history
  • Loading branch information
TahiriNadia authored Jun 8, 2024
1 parent d121a2c commit c808a9a
Showing 1 changed file with 2 additions and 16 deletions.
18 changes: 2 additions & 16 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,25 +118,11 @@ In the comparison of phylogenetic trees, which are constructed based on genetic

## Editing Multiple Sequence Alignment Methods

Multiple Sequence Alignment (MSA) holds immense significance in bioinformatics as it serves as a foundational step for the comparison and analysis of biological sequences. Here is an in-depth overview of some widely used MSA methods:

- **Pairwise Alignment**: Fundamental in comparing two sequences [@li2018minimap2].
- **MUSCLE**: Multiple Sequence Comparison by Log-Expectation, a popular tool for high-quality MSA [@edgar2004muscle].
- **CLUSTALW**: A widely-used software for multiple sequence alignment [@hung2016sequence].
- **MAFFT**: Multiple Alignment using Fast Fourier Transform, known for its accuracy and efficiency [@katoh2013mafft].
Multiple Sequence Alignment (MSA) holds immense significance in bioinformatics as it serves as a foundational step for the comparison and analysis of biological sequences. Here is an in-depth overview of some widely used MSA methods: 1) **Pairwise Alignment** [@li2018minimap2], 2) **MUSCLE** [@edgar2004muscle], 3) **CLUSTALW** [@hung2016sequence], and **MAFFT** [@katoh2013mafft].

## Similarity Methods

Sequences with notable variability were specifically retained for analysis. The dissimilarity assessment between each sequence pair involved the application of an extensive set of 8 metrics:

1. **Hamming distance**: Measures the difference between two strings of equal length [@labib2019hamming].
2. **Levenshtein distance**: Evaluates the minimum number of single-character edits required to transform one sequence into another [@yujian2007normalized].
3. **Damerau-Levenshtein distance**: Similar to Levenshtein distance, with an additional operation allowing transpositions of adjacent characters [@zhao2019string].
4. **Jaro similarity**: Computes the similarity between two strings, considering the number of matching characters and transpositions [@pradhan2015review].
5. **Jaro-Winkler similarity**: An enhancement of Jaro similarity, giving more weight to common prefixes [@pradhan2015review].
6. **Smith–Waterman similarity**: Utilizes local sequence alignment to identify similar regions within sequences [@waterman1978similarity].
7. **Jaccard similarity**: Measures the similarity between finite sample sets [@bag2019efficient].
8. **Sørensen-Dice similarity**: Particularly useful for comparing the similarity of two samples [@li2020generic].
Sequences with notable variability were specifically retained for analysis. The dissimilarity assessment between each sequence pair involved the application of an extensive set of 8 metrics: 1) **Hamming distance** [@labib2019hamming], 2) **Levenshtein distance** [@yujian2007normalized], 3) **Damerau-Levenshtein distance** [@zhao2019string], 4) **Jaro similarity** [@pradhan2015review], 5) **Jaro-Winkler similarity** [@pradhan2015review], 6) **Smith–Waterman similarity** [@waterman1978similarity], 7) **Jaccard similarity** [@bag2019efficient], and 8) **Sørensen-Dice similarity** [@li2020generic].

This comprehensive methodology ensures a nuanced and high-quality analysis, contributing to a deeper understanding of sequence distinctions.

Expand Down

0 comments on commit c808a9a

Please sign in to comment.