workflow

📊 Exploring the aPhyloGeo workflow is essential to harness the full potential of this bioinformatics pipeline. Follow these steps to perform phylogeographic analysis effectively:

Algorithm Workflow

The diagram below illustrates the workflow of the algorithm, consisting of several key blocks, each highlighted with a distinct color .

First Block (Light Blue): This block creates climate trees based on input climate data (CSV file) and validates the input parameters using a YAML file. More precisely, the climate trees were generated by calculating the pairwise differences between each value of the species' habitats, normalized between the minimum and maximum of the parameter. This process resulted in a symmetric square matrix. From this matrix, the climate tree was inferred using the Neighbor-Joining method. This involves processing climatic variables such as temperature, precipitation, and elevation to construct phylogenetic trees that represent the relationships between geographic locations based on their climatic similarity.
Second Block (Light Green): This block creates phylogenetic trees based on input genetic data and performs input parameter validation (refer to the YAML file). This entails aligning DNA or amino acid sequences, inferring phylogenetic relationships using various methods (e.g., maximum likelihood, Bayesian inference), and assessing the statistical support for the inferred tree topology.
Third Block (Light Pink): The third block, referred to as the phylogeography step, is the crux of the analysis. It compares the genetic trees (representing evolutionary relationships) with the climate trees (representing environmental similarity). This comparison utilizes either the Robinson-Foulds distance or the Least Squares distance to quantify the degree of congruence between the two tree types. The output of this step includes:
Topological congruence statistics: Quantifying the degree of similarity between the genetic and climate trees.
Co-phylogenetic visualizations: Graphical representations highlighting the associations between genetic lineages and climatic niches.
Statistical tests: Assessing the significance of the observed phylogeographic patterns.

This third block is pivotal, forming the basis from which users obtain output data (i.e., name of gene, name of climate parameter, bootstrap value, Robinson-Foulds distance, entropy distance, least-square distance, the starting position and the ending position of windows, and climatic and genetic trees) with essential calculations (i.e., distances, tree inference, sequence alignment). Our approach is optimized to adapt to various computing environments through elasticity and utilize parallelism and available GPUs/CPUs based on resource usage per unit of computation. This flexibility enables efficient processing of a single genetic window, as outlined in the workflow below.

Multiprocessing

The algorithm supports multiprocessing, allowing simultaneous analysis of multiple windows. This feature is particularly recommended for large datasets.

Dependencies

This work relies on the following software packages:

Biopython version 1.79 (BSD 3-Clause License)
Bio version 1.5.2 (New BSD License)
numpy version 1.21.6 (BSD 3-Clause License)

Feel free to explore and contribute to our repository!

Please email us at: [email protected] for any questions or feedback.

Wiki

Available analyses

Misc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workflow

Algorithm Workflow

Multiprocessing

Dependencies

Clone this wiki locally