-
Notifications
You must be signed in to change notification settings - Fork 0
Home
The aim of this project is to generate RDF data from the (table format) data from ICGC Data Portal for better reusability and interoperability.
- Ubuntu 18.04
- For using AWS, EC2 t2.medium (4GB memory) is recommended.
- For generating the whole data, 250GB disk space is needed.
Pull this repository
$ git clone https://github.com/med2rdf/icgc.git
Add the OS user to docker group and login again to the console.
$ sudo groupadd docker
$ sudo usermod -aG docker $USER
$ exit
Install docker via apt.
$ sudo apt-get update
$ sudo apt install docker.io
$ sudo systemctl start docker
$ docker --version
Docker version 18.06.1-ce, build e68fc7a
Get Dockerfile to build Docker image of Oracle Database.
$ mkdir oracle
$ cd oracle
$ git clone https://github.com/oracle/docker-images.git
Download Oracle Database 18.3.0 from here, and put it in this directory.
$ mv LINUX.X64_180000_db_home.zip \
~/oracle/docker-images/OracleDatabase/SingleInstance/dockerfiles/18.3.0/
Build docker image (needs 4GB memory).
$ cd ~/oracle/docker-images/OracleDatabase/SingleInstance/dockerfiles/
$ bash buildDockerImage.sh -v 18.3.0 -e
Launch Oracle Database on a docker container.
$ docker run --name oracle \
-p 1521:1521 -e ORACLE_PWD=Welcome1 \
-v $HOME:/host-home \
oracle/database:18.3.0-ee
Configure the database as a triplestore.
$ docker exec -it oracle \
sqlplus sys/Welcome1@ORCLPDB1 as sysdba @/host-home/icgc/scripts/setup.sql
For downloading the latest project list, access Data Portal, click Available Data Type > SSM, and click "Export Table as TSV" icon.
Create project list: project.tsv
$ cd scripts/download/
$ bash 01_projects.sh projects_2018_02_14_10_50_42.tsv > projects.tsv
Download all files. Use projects_test.tsv
for testing.
$ bash 02_download_all.sh projects.tsv
Use projects_test.tsv
for testing.
$ chmod 777 ~/icgc/log ~/icgc/output
$ docker start oracle
$ docker exec -it oracle \
sqlplus sys/Welcome1@ORCLPDB1 as sysdba @/host-home/icgc/scripts/00_user.sql
$ docker exec -it oracle \
sh /host-home/icgc/scripts/00_run.sh download/projects.tsv \
> ~/icgc/log/main.log
Ontologies referenced
- hco: https://github.com/med2rdf/hco
- med2rdf: https://github.com/med2rdf/med2rdf-ontology
- faldo: https://github.com/OBF/FALDO (genomic positions of mutations)
Guideline
- ICGC Data Portal の表形式のデータをデータベースに格納します
- マッピング定義(R2RML で記載)に従ってデータを RDF に変換します
詳細は以下のページを参照
- 参照する外部 RDF リソース: UniProt(遺伝子名)
© 2018 github.com/ryotayamanaka