-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Working version for E94 EG41 WBPS11 (#15)
* Add missing A-MEXP-2210 for human * maps symbol to ensemble gene id, as symbol is required for decorations * Fix plasmodium biomart version (wasn't picked by automatic changes). * Aspergillus chromosomes follow arabic numerals, not roman. * Fixes database name for o glumipatula * Adds python script to validate gtf urls per organism * Fix brachypodium gtf URL * Fix aspergillus GTF * Fix brassica oleracea and rapa GTFs * Fix chlamydomonas GTF * Fix danio rerio GTF * Fix hordeum vulgare GTF * Fix musa acuminata GTF * Fix o glumipatula and o glaberrima GTF * Fix o nivara and o punctata GTF * Fix physcomitrella patens GTF * Fix populus trichocarpa GTF * Fix setaria italica GTF * Fix triticum aestivum GTF * Fix vitis vinifera GTF * Fix Zea mays GTF * Makes GTF downloads a bit more silent (avoids progress bar).
- Loading branch information
Showing
8 changed files
with
71 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
#!/usr/bin/env python3 | ||
|
||
from ftplib import FTP | ||
import argparse, re, os, sys | ||
|
||
def parse_url(url): | ||
""" | ||
Parses a complete ftp URL into server, path and file name. | ||
""" | ||
match = re.search("ftp://([a-z\.]*)/(.*)$", url) | ||
server = match.group(1) | ||
path_tokens = match.group(2).split("/") | ||
return server, "/"+"/".join(path_tokens[:-1]), path_tokens[-1] | ||
|
||
|
||
parser = argparse.ArgumentParser(description='Check GTF URLs for organism and release based on gxa_references.conf file.') | ||
parser.add_argument('--organism', help='Organism to validate for') | ||
# parser.add_argument('--source', help='ensembl or wbps') | ||
parser.add_argument('--release', help='release number') | ||
args = parser.parse_args() | ||
|
||
gxa_references_path = os.path.abspath(os.path.dirname(sys.argv[0]))+"/gxa_references.conf" | ||
|
||
for line in open(gxa_references_path, 'r'): | ||
(organism, url) = line.split() | ||
if organism == args.organism: | ||
corrected_url = url.replace("RELNO", args.release) | ||
server, path, gtf_file = parse_url(corrected_url) | ||
ftp = FTP(server) | ||
ftp.login() | ||
try: | ||
ftp.cwd(path) | ||
except Error: | ||
print("Path "+path+" not found!") | ||
sys.exit(1) | ||
files_listed = [] | ||
ftp.retrlines('NLST', files_listed.append) | ||
if gtf_file in files_listed: | ||
print("File found!") | ||
sys.exit(0) | ||
else: | ||
print("Not found: "+path+'/'+gtf_file) | ||
print("Possible alternatives are:") | ||
for file_l in files_listed: | ||
if "gtf" in file_l: | ||
print("- "+file_l) | ||
sys.exit(1) |