Skip to content

catheriz/Random_walk_with_Restart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Random Walk with Restart on Multiplex Network

This project employs the Random Walk with Restart (RWR) algorithm on a multiplex network, with -log10(P-value) used as the initial score vector. Each network consists of three layers:

  1. Protein-Protein Interaction (PPI) Layer: Contains genes/proteins directly interacting with disease-associated genes (1st shell) and those interacting with 1st shell genes/proteins (2nd shell).
  2. KEGG Pathways Layer: Represents pathways in which the disease genes participate.
  3. Co-expression Layer: Derived from RNA expression data from 109 immune cell samples and myositis/immune-related tissues, including bone marrow, spleen, thymus, tonsils, skin, lymph node, and skeletal muscle.

Permutation Testing

We conducted 1000 RWR iterations by randomly permuting seed nodes. Candidate p-values were calculated based on the proportion of permuted RWR scores that were equal to or exceeded the observed RWR score.

Running Sample Code

To perform the RWR and Permutation Test, use the following command structure:

Rscript RWR_with_Permutation_Test.R <disease_name> <PPI_input_file> <KEGG_input_file> <coexpression_network_input_file> <output_directory>

Parameters

  • <disease_name>: Name of the disease for analysis (e.g., "PM")
  • <PPI_input_file>: Path to the PPI adjacency matrix file
  • <KEGG_input_file>: Path to the KEGG pathway gene adjacency matrix file
  • <coexpression_network_input_file>: Path to the co-expression network adjacency matrix file
  • <output_directory>: Directory where results will be saved

Output

The program generates four output files with suffix:

  • _RWR_M_result.txt: Contains the results of the RWR implementation with seed nodes.
  • _No_Seeds_RWR_M_result.txt: Contains the results of the RWR implementation without seed nodes.
  • _permutation_result.txt: Contains the results of the permutation test with seed nodes.
  • _permutation_no_seed_result.txt: Contains the results of the permutation test without seed nodes.

Algorithm Reference

Our paper, Meta-analyses Uncover the Genetic Architecture of Idiopathic Inflammatory Myopathies, forms the foundation of this analysis (accepted; citation forthcoming).

The RWR algorithm implementation is based on: A Valdeolivas, L Tichit, C Navarro, S Perrin, G Odelin, N Levy, P Cau, E Remy, and A Baudot. 2018. “Random walk with restart on multiplex and heterogeneous biological networks.” Bioinformatics 35 (3).

Our code includes modifications based on the source code, with adjustments documented in the Supplementary Information, such as:

  • Layer Normalization: Performed on each layer of the multiplex network.
  • Degree-degree Spearman Correlation: Calculated for identical nodes across layers.
  • Permutation Test: Conducted to assess the significance of RWR scores.

Other Resource

PPI_get_interactions_by_gene.py

This script is adapted from the BioGRID REST Service to streamline the retrieval and curation of protein-protein interaction (PPI) data from the BioGRID database. With modifications, it allows users to easily retrieve interaction data for a specified list of genes and save the results to an output file.

Usage:

python PPI_get_interactions_by_gene.py <input_gene_list> <output_file_name>

Parameters:

  • <input_gene_list>: Path to a text file containing a list of genes (each gene should be enclosed in single or double quotes and separated by commas)
  • <output_file_name>: Name of the file where the interaction data will be saved.

process_PPI_results_to_get_another_layer.py

This script takes the output from PPI_get_interactions_by_gene.py to generate a new input file for additional interaction layers.

Usage:

python process_PPI_results_to_get_another_layer.py <input_file_name_from_PPI_get_interactions_by_gene.py> <output_gene_list_for_next_layer>

Parameters:

  • <input_file_name_from_PPI_get_interactions_by_gene.py>: Path to the gene interaction file generated by PPI_get_interactions_by_gene.py.
  • <output_gene_list_for_next_layer>: Path to the output file where genes for the next interaction layer will be saved.

form_PPI_adjacency_matrix.py

This script creates a PPI adjacency matrix to be used in RWR analysis. It combines data from one or more interaction layers to form a complete adjacency matrix. Usage:

python form_PPI_adjacency_matrix.py \
    <layer1_interactions.txt> \
    <layer2_interactions.txt> ... \
    <combined_edge_list.csv> \
    <adjacency_matrix.csv>

Parameters:

  • <layeri_interactions.txt>: Paths to one or more input files containing gene interaction data for each PPI shell. You can provide as many layers as needed, with each file corresponding to a layer of PPI shell.
  • <combined_edge_list.csv>: Output file for the combined edge list across all layers, with edge weights representing interaction counts.
  • <adjacency_matrix.csv>: Output file for the final adjacency matrix in CSV format.

get_KEGG_pathway.py

This script fetches KEGG pathway information for a given list of disease genes using the BioServices library. It queries KEGG to find all pathways associated with each gene and saves the results to a CSV file, making it easy to integrate KEGG pathway data into downstream analyses.

Usage:

python get_KEGG_genes.py <input_gene_list> <output_file_name>

Parameters:

  • <input_gene_list>: Path to a text file containing a list of gene names (one gene per line).
  • <output_file_name>: Desired output CSV file name where the KEGG pathway information will be saved.

KEGG_network.R

This script parses and curates genes from the output of get_KEGG_pathway.py, creating a gene interaction network for each KEGG pathway. The result is a set of edge lists saved in the specified output folder. The resulting edge lists can be merged to create an input for KEGG adjacency matrix generation.

Usage:

Rscript KEGG_network.R <kegg_pathway_file> <output_directory>

Parameters:

  • <kegg_pathway_file>: Path to the CSV file containing KEGG pathway information, as generated by get_KEGG_pathway.py.
  • <output_directory>: Directory to save the resulting gene network files.

make_KEGG_adjacency_matrix.py

This script constructs a KEGG adjacency matrix from a combined KEGG edge list file, making it suitable for RWR analysis. It reads the gene interactions, applies weights, and creates a directed adjacency matrix.

Usage:

python make_KEGG_adjacency_matrix.py <input_edge_list> <output_edge_list_with_weights> <output_adjacency_matrix>

Parameters:

  • <input_edge_list>: The combined KEGG edge list file.
  • <output_edge_list_with_weights>: Output file path for the edge list with interaction weights.
  • <output_adjacency_matrix>: Output path for the directed adjacency matrix in CSV format.

Construct Co-expression Network

Please refer to RWR-MH.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published