Analyzing Genomes in a Graph Database

This repository hosts the code and the data for the post Analyzing Genomes in a Graph Database.

Prerequisites:

If you want to prepare the data yourself, you need the pyphy library.

First, clone the pyphy repo, and prepare both the library and the data following the instructions there.

Scripts

First, download_kegg_various_databases.py download data from KEGG via its API. To download the genomes, run:

python download_kegg_various_databases.py genome [genome_output_folder]
Then use download_kegg_various_databases.py again to download the KO:

python download_kegg_various_databases.py ko [ko_output_folder]

Sometimes these two commands will miss some files because of the network connection. Rerun the command. It will check which files are already downloaded and then it will only download those missing files.

Use genome_parser.py to process the genome data:

python genome_parser.py [genome_folder] [target_taxonomic_rank] [target_taxonomic_name]

where target_taxonomic_rank and target_taxonomic_name are the taxonomic group you want to analyze. For the article, I use "phylum" and "Proteobacteria".
It outputs a mapping.csv file, which is needed for the next step.

Use kegg_parser.py to process the KO data:

python kegg_parser.py [ko_output_folder] [path_to_mapping.csv]

After finishing these steps, four csv files needed for Neo4j are produced, connections.csv, has_kegg.csv, kegg.csv, taxon.csv.

I have also attached mine in the repo.

Authors

Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing Genomes in a Graph Database

Prerequisites:

Scripts

Authors

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
download_kegg_various_databases.py		download_kegg_various_databases.py
genome_parser.py		genome_parser.py
kegg_parser.py		kegg_parser.py

License

dgg32/neo4j_genome_ko

Folders and files

Latest commit

History

Repository files navigation

Analyzing Genomes in a Graph Database

Prerequisites:

Scripts

Authors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages