XML parsing using Python & Neo4j implementation

This repository contains python scripts to parse various file formats which includes XML, .txt etc and also Neo4j scripts to upload CSV files,creating relations between the nodes,querying using Cypher along with some update and delete scripts.

FYI

all the scripts are in Python 3.6.8 & 3.7 version
used atom, sublime text, jupyter notebook for scripting
make sure to check your environments and home directories set up before implementing any uitlities/tools
some command line ETL is done (refer ETL.rtf file)
DDL, DML files contains schema definitions and data manipulations for PostgreSQL ignore if not required

flow goes from code to extra data --> write to csv --> upload to neo4j --> relationships --> querying

Code

all the .py files includes mostly the similar scripts,however the level of data extraction changes from file to file
imported xml.etree.ElementTree module to parse the file

this is snippet of code, refer Knowledge-Project/.py files

example:

import csv
import xml.etree.ElementTree as ET
tree = ET.parse('saliva_metabolites.xml') #small_saliva.xml
root = tree.getroot()
ns ={'nsstring':'http://www.hmdb.ca'} #storing namespace string in a variable

conditions to check none type
csv writer to store the extracted data into a csv file format

example:

writer.writerow(salivaheader)
	for row in salivadata:
		writer.writerow(row)
salivacsv.close()

Installation

install neo4j

--suggested Community version--

make sure to set up the home directory so we can run through the command line

Load CSV files WITH HEADERS

As of now neo4j accepts csv files to upload data with or without headers, all scripts in this repo are for csv with headers.(refer Knowledge-Project/Neo4jloads folder for more examples)
Neo4j cannot find the path to file on your local machine unless it is in the neo4j imports directory. Make sure the data is in neo4j imports directory

phenolsdata directory is copied to the neo4j imports

cp /Users/aginni/Documents/phenolsdata/phenols_classification.csv /Users/aginni/opt/brew/Cellar/neo4j/3.5.3/libexec/import/phenolsdata/

Either MERGE or CREATE functions can be used to create the nodes with all the properties
Syntax is important: for example if you specify line.data instead of Line.data it gives an error
Also aliases you specify will appear on the nodes, so now is the chance to give a good(consider as renaming)property labels

Example:

LOAD CSV WITH HEADERS FROM 'file:///phenolsdata/phenols_classification.csv' AS Line CREATE (:phenols_classfication {class: Line.class, subclass: Line.subclass, compound_name: Line.compound_name, phenol_id: Line.phenol_id, mol_wt: Line.mol_wt, formula: Line.formula})

Querying

*EXAMPLES TO TEST THE MAPPING

Always check for the existence of the data before updating, creating relations or even deleting, as Neo4j does not throw any errors
In the code below, we are matching two node labels based on their properties

Example:

MATCH (r:phenol_to_chebi{phenol_id : '421'}) MATCH (x:CHEBI{chebi_id:'CHEBI:62023'}) RETURN r, x 
# r,x are aliases for phenol_to chebi and CHEBI nodes respectively

Same as above but here we are matching and creating a relationship between the nodes based on a matching property

MATCH (a:Compound) MATCH(b:PhenolCompounds) WHERE a.cid = b.cid CREATE (a)-[r:SAME_AS]->(b) RETURN a,b

Rename label and remove old one

Example:

Match the node label and rename using SET function and it is important to remove the old one as it creates redundancy with in the database

MATCH (s:phenols_classfication) SET s:Phenols_Classification REMOVE s:phenols_classfication

Matching label and removing

MATCH (s:phenol_to_pubmed) REMOVE s:phenol_to_pubmed  #to remove any unnecessary nodes

MATCH (s:phenol_to_chebi) REMOVE s:phenol_to_chebi

Relationships between nodes

Relation between nodes can be assigned by matching the labels and specifying the property nodes

Example:

MATCH (a:Compound) MATCH(b:HMDBMetabolites) WHERE a.cid = b.cid CREATE (a)<-[:SAMEAS]-(b) RETURN a,b
MATCH (a:CHEBI) MATCH(b:HMDBMetabolites) WHERE a.chebi_id = b.chebi_id CREATE (a)<-[:SAMEAS]-(b) RETURN a,b

MATCH (a:HMDBPathways) MATCH(b:Pathway Interaction Database) WHERE a.pathway_name = b.name return a,b limit 10 # to check

DELETE relationships between nodes

To delete the relationship between two nodes, it is important to specify the relation type else it will remove all existing relationships that matches the condition

Example:

MATCH(:phenol_to_chebi)-[r:REFERENCED_IN](:phenolcompounds) DELETE r

Same as above,but here we are deleting the relation just between the two properties of the nodes

MATCH (r:phenol_to_chebi{phenol_id : '421'})-[deleteme:SAME_AS]->(x:CHEBI{chebi_id:'CHEBI:62023'}) DELETE deleteme;

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
NEo4jloads		NEo4jloads
.DS_Store		.DS_Store
Clustering analysis		Clustering analysis
DDL		DDL
DML		DML
ETL.rtf		ETL.rtf
HMDB.py		HMDB.py
HMDB_Pathways.py		HMDB_Pathways.py
README.md		README.md
SRdata.py.ipynb		SRdata.py.ipynb
cluster.ipynb		cluster.ipynb
drug2mesh.py		drug2mesh.py
drugpathways.py		drugpathways.py
drugs.py		drugs.py
level1ETL..docx		level1ETL..docx
omimparser.py		omimparser.py
saliva_ontology.py		saliva_ontology.py
saliva_paths_disease.py		saliva_paths_disease.py
sweat_ontology.py		sweat_ontology.py
xml_parser.py		xml_parser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XML parsing using Python & Neo4j implementation

FYI

Code

example:

example:

Installation

Load CSV files WITH HEADERS

Example:

Querying

Example:

Rename label and remove old one

Example:

Matching label and removing

Relationships between nodes

Example:

DELETE relationships between nodes

Example:

About

Releases

Packages

Languages

aginnimb/Knowledgebase

Folders and files

Latest commit

History

Repository files navigation

XML parsing using Python & Neo4j implementation

FYI

Code

example:

example:

Installation

Load CSV files WITH HEADERS

Example:

Querying

Example:

Rename label and remove old one

Example:

Matching label and removing

Relationships between nodes

Example:

DELETE relationships between nodes

Example:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages