Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD: cross-compatibility with either Python2 or Python3 #52

Open
wants to merge 91 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
35371f7
ADD: travis badge
kevinkle Feb 2, 2018
51017c4
START: python2/3 support
kevinkle Feb 3, 2018
19b2041
CHANGE: dont email me
kevinkle Feb 3, 2018
6688d55
ADD: future deps
kevinkle Feb 3, 2018
ced3832
FIX: python3/2 definition for os.getcwdu() and os.getcwd()
kevinkle Feb 3, 2018
128fd13
FIX: was throwing an AttributeError not import
kevinkle Feb 3, 2018
3cb657b
START: tests passing for pasturize, now making conda build for python2/3
kevinkle Feb 3, 2018
447afe8
UPDATE: bump the version number
kevinkle Feb 3, 2018
e1ba92d
FIX: noarch:python
kevinkle Feb 3, 2018
4eb042c
DEBUG: conda-build issues
kevinkle Feb 3, 2018
b8a20f1
DEBUG: conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfi…
kevinkle Feb 3, 2018
537bcc8
ADD: git branch
kevinkle Feb 3, 2018
35de085
ADD: git branch
kevinkle Feb 3, 2018
9da129f
FIX: req future in conda too
kevinkle Feb 3, 2018
50d4aca
FIX: future
kevinkle Feb 3, 2018
65bf59c
DEBUG: conda install doesnt take --channel-priority seriously, try pu…
kevinkle Feb 3, 2018
59ab8b9
DEBUG: looks like its still forcing python3
kevinkle Feb 4, 2018
35c033a
CHANGE: unpin the req
kevinkle Feb 4, 2018
9dbaa7a
DEBUG: more py versioning
kevinkle Feb 4, 2018
6cdbadf
CHANGE: use local src
kevinkle Feb 4, 2018
bb7e369
DEBUG: build me a py27 ver conda!!!
kevinkle Feb 4, 2018
cec8e49
DEBUG: build me a py27 ver conda!!!
kevinkle Feb 4, 2018
2fd7131
DEBUG: more py27
kevinkle Feb 4, 2018
c3cdc9a
START: can now start building py27, but getting compatibility probles…
kevinkle Feb 4, 2018
e855c3c
CHANGE: looks like package_data isnt probably used in our case
kevinkle Feb 4, 2018
6a3a600
UPDATE: pre-PR changes
kevinkle Feb 4, 2018
01ffe08
Merge pull request #51 from phac-nml/kevinkle-patch-1
kevinkle Feb 4, 2018
7965483
ADD: readd version pinning for deps
kevinkle Feb 4, 2018
d1fd1d1
Merge branch 'pasteurize' of https://github.com/phac-nml/ecoli_seroty…
kevinkle Feb 4, 2018
3feec68
ADD: test building both py27 and py36 versions in travis
kevinkle Feb 4, 2018
7548588
DEL: ectyper.puml is >year old
kevinkle Feb 4, 2018
255df4f
CHANGE: allow flexibility in biopython version
kevinkle Feb 4, 2018
ba5c4ac
CHANGE: no longer hard pinning every dep
kevinkle Feb 4, 2018
34f2bb9
CHNAGE: allow a lower pandas version
kevinkle Feb 4, 2018
42c171d
FIX: need to backport TemporaryDirectory
kevinkle Feb 6, 2018
c818ed3
FIX: dep when py27
kevinkle Feb 6, 2018
723850b
FIX: added aliases
kevinkle Feb 6, 2018
66224de
DEBUG: imports
kevinkle Feb 6, 2018
3cd44d0
DEBUG: just define tempfile in a file
kevinkle Feb 6, 2018
8037db8
DEBUG: meta.yaml
kevinkle Feb 6, 2018
bcb5693
FIX?: backports.weakref
kevinkle Feb 6, 2018
a96f234
FIX: looks like tempfile finally imported correctly, fixed typo on Na…
kevinkle Feb 6, 2018
7a88e63
ADD: have travis test in 2.7 too
kevinkle Feb 6, 2018
f65f3fe
FIX: have travis env install backports.weakref if py27
kevinkle Feb 6, 2018
10e70dc
FIX: typo
kevinkle Feb 6, 2018
75947c6
CHANGE: nest the conda builds within a if py27 else
kevinkle Feb 6, 2018
aa3bbfe
FIX: add a subprocess.check_output replacement for py2
kevinkle Feb 7, 2018
dc838ed
DEBUG: use a backport instead
kevinkle Feb 7, 2018
959f4c2
FIX: imports
kevinkle Feb 7, 2018
f73c4d5
FIX: anaconda has an old version of subprocess32 without the run()
kevinkle Feb 7, 2018
6e8167a
FIX: "conda --add channels" is really "conda --add channel"
kevinkle Feb 7, 2018
31bab1a
ADD: log msgs to check if the py27 version sees the correct ref files…
kevinkle Feb 7, 2018
da7ddcd
FIX: typo
kevinkle Feb 7, 2018
a53f9f0
FIX: was wrong about the package_data vs include_package_data
kevinkle Feb 7, 2018
2fc624f
FIX: rm unicode literals
kevinkle Feb 7, 2018
1aa37c0
DEBUG: TypeError: write() argument 1 must be unicode, not str
kevinkle Feb 7, 2018
5e35232
FIX: read as unicode in store_df(output_df, parsed_output_file)
kevinkle Feb 7, 2018
37ab949
FIX: typo
kevinkle Feb 7, 2018
d2940b9
DEBUG: unicode where art thou
kevinkle Feb 7, 2018
8c34ce8
FIX?: explictedly have pandas read/write encoding=utf-8 (defaults to …
kevinkle Feb 7, 2018
a64f77f
UPDATE: bump the pandas ver
kevinkle Feb 11, 2018
e372c51
DEBUG: see why theres str data in blast_output_to_df
kevinkle Feb 11, 2018
b54ee26
DEBUG: check ectyper_dict_to_df()
kevinkle Feb 11, 2018
ee3dae4
DEBUG: print the dtypes of our df
kevinkle Feb 11, 2018
a44635b
DEBUG: deliberately convert df
kevinkle Feb 11, 2018
162fc12
FIX: typo
kevinkle Feb 11, 2018
6e33393
DEBUG: didnt convert?
kevinkle Feb 11, 2018
f43424f
DEBUG: print types
kevinkle Feb 11, 2018
05413c6
DEBUG: looks like still objects
kevinkle Feb 11, 2018
fe14096
DEBUG: more unicode checking
kevinkle Feb 11, 2018
5a35f24
DEBUG: keys being read as ?
kevinkle Feb 11, 2018
0bfd344
DEBUG: open
kevinkle Feb 11, 2018
2cd8ef5
DEBUG: dont have pandas encode utf-8 on write
kevinkle Feb 11, 2018
9b3da2e
DEUBG: bypass storing the output_df for now
kevinkle Feb 11, 2018
967b82f
DEBUG: dont use open from builtins
kevinkle Feb 11, 2018
b96e334
DEBUG: rm encoding from open
kevinkle Feb 11, 2018
94ac049
DEL: prints
kevinkle Feb 11, 2018
f41aa86
CHECK: writing csv ok now, but unsure if line conversion is effecting…
kevinkle Feb 11, 2018
fd0895b
FIX: convert the sframe value for py27 and ADD: checks for pandas fun…
kevinkle Feb 11, 2018
d40a1ae
DEBUG: see if the get_prediction() is being effected by backport
kevinkle Feb 11, 2018
e5918d1
DEBUG: looks like there is a predictors_df being created which isnt e…
kevinkle Feb 11, 2018
860f0a1
FIX: typo
kevinkle Feb 11, 2018
49d81c5
DEBUG: something isnt working in the get_prediction for py27
kevinkle Feb 11, 2018
8a2a623
DEBUG: pandas is performing as it should (there are valid predictors)…
kevinkle Feb 12, 2018
29b4cc5
FIX: should be usuing equality comparison not object comparison
kevinkle Feb 12, 2018
79cd169
DEL: stuff used for debugging
kevinkle Feb 12, 2018
0f8cdf2
CHANGE: dont try to upload
kevinkle Feb 12, 2018
dbf190c
FIX: set a global for read flags in predictionFunctions that differs …
kevinkle Feb 12, 2018
7594134
CHANGE: read_flags for genomeFunctions too and no longer force py3 op…
kevinkle Feb 12, 2018
05481ae
EDIT: add .bak files to .gitignore
kevinkle Feb 12, 2018
73cb919
EDIT: cleanup
kevinkle Feb 12, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,6 @@ output/
validation/enterobase_90_50_with_blacklist.csv
.coveragerc
coverage_html_report/

# Conda-build backup files
*.bak
26 changes: 22 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
language: python
python:
# We don't actually use the Travis Python, but this keeps it organized.
- "2.7"
- "3.6"
install:
- sudo apt-get update
Expand All @@ -18,10 +19,27 @@ install:
- conda update -q conda
# Useful for debugging any issues with conda
- conda info -a
- conda config --add channels bioconda
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION samtools bowtie2 mash bcftools biopython nose blast pandas seqtk
- conda config --add channels bioconda --add channels conda-forge
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION samtools bowtie2 mash bcftools biopython nose blast pandas seqtk future
- source activate test-environment
- if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
conda install backports.weakref && conda install -c kevinkle subprocess32;
else
continue;
fi
- python setup.py install

# Setup automatic conda uploading.
# - conda install anaconda-client
# - conda config --set anaconda_upload yes
# test the conda build
- conda install conda-build
- if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
conda build -c kevinkle -c bioconda recipe/ --python=2.7;
else
conda build -c bioconda recipe/ --python=3.6;
fi
-
script:
- nosetests
- nosetests
notifications:
email: false
18 changes: 15 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# ECTyper (an easy typer)
**ecyper** wraps a standalone serotyping module for _Escherichia coli_.
**ecyper** wraps a standalone serotyping module for _Escherichia coli_.
Supports _fasta_ and _fastq_ file formats.

[![Build Status](https://travis-ci.org/phac-nml/ecoli_serotyping.svg?branch=master)](https://travis-ci.org/phac-nml/ecoli_serotyping)

# Dependencies:
- python 3.6.3.*
- pandas 0.21.0.*
Expand All @@ -19,8 +21,8 @@ Supports _fasta_ and _fastq_ file formats.
1. `bash miniconda.sh -b -p $HOME/miniconda`
1. `export PATH="$HOME/miniconda/bin:$PATH"`
2. Install ectyper
* Directly via `conda`
1. `conda install -c bioconda ectyper`
* Directly via `conda`
1. `conda install -c bioconda ectyper`
* Through `github`
1. Install dependencies
`conda install pandas samtools bowtie2 mash bcftools biopython nose blast seqtk tqdm python=3.6`
Expand Down Expand Up @@ -62,3 +64,13 @@ optional arguments:
Directory location of output files.
```
* The first time species identification is enabled you will need to wait for **ectyper** to download the reference sequences.

# Building the conda package
Python 2.7
(requires a custom version of process32 from the channel kevinkle)

`conda build -c kevinkle -c bioconda recipe/ --python=2.7`

Python 3.6

`conda build -c bioconda recipe/ --python=3.6`
45 changes: 0 additions & 45 deletions ectyper.puml

This file was deleted.

8 changes: 8 additions & 0 deletions ectyper/blastFunctions.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@
"""
Functions for setting up, running, and parsing blast
"""
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import
from builtins import open
from builtins import str
from future import standard_library
standard_library.install_aliases()
import logging
import os

Expand Down
7 changes: 7 additions & 0 deletions ectyper/commandLineOptions.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
#!/usr/bin/env python

from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import
from builtins import int
from future import standard_library
standard_library.install_aliases()
import argparse


Expand Down
34 changes: 33 additions & 1 deletion ectyper/definitions.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,25 @@
"""
Definitions for the ectyper project
"""
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import

from future import standard_library
standard_library.install_aliases()
import os
import sys

ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
DATA_DIR = os.path.join(ROOT_DIR, 'Data')
WORKPLACE_DIR = os.getcwd()
# Python3 vs Python2 difference.
try:
# Python3
WORKPLACE_DIR = os.getcwdu()
except:
# Python2
WORKPLACE_DIR = os.getcwd()

SEROTYPE_FILE = os.path.join(DATA_DIR, 'ectyper_data.fasta')
SEROTYPE_ALLELE_JSON = os.path.join(DATA_DIR, 'ectyper_dict.json')
Expand All @@ -18,3 +31,22 @@
SAMTOOLS = 'samtools'
REFSEQ_SUMMARY = os.path.join(DATA_DIR, 'assembly_summary_refseq.txt')
REFSEQ_SKETCH = os.path.join(DATA_DIR, 'refseq.genomes.k21s1000.msh')

if os.name == 'posix' and sys.version_info[0] < 3:
# Python2
from ectyper.tempfile import TemporaryDirectory
from tempfile import NamedTemporaryFile
else:
# Python3
from tempfile import TemporaryDirectory, NamedTemporaryFile
# Aliases
TEMPDIR = TemporaryDirectory
NAMEDTEMPFILE = NamedTemporaryFile

# Python 2.7 Compatibility
if sys.version_info[0] < 3:
# In Python 2.7, Pandas will need binary (not unicode) when using open().
read_flags = 'rb'
else:
# Python 3.6 will read as unicode text when using open().
read_flags = 'r'
27 changes: 18 additions & 9 deletions ectyper/ectyper.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,17 @@
"""
Predictive serotyping for _E. coli_.
"""
from __future__ import division
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import absolute_import
from builtins import range
from builtins import str
from future import standard_library
standard_library.install_aliases()
import logging
import os
import sys
import tempfile
import datetime
from urllib.request import urlretrieve

Expand Down Expand Up @@ -40,7 +47,7 @@ def run_program():
LOG.debug(args)

## Initialize temporary directories for the scope of this program
with tempfile.TemporaryDirectory() as temp_dir:
with definitions.TEMPDIR() as temp_dir:
temp_files = create_tmp_files(temp_dir, output_dir=args.output)
LOG.debug(temp_files)

Expand Down Expand Up @@ -167,22 +174,22 @@ def create_tmp_files(temp_dir, output_dir=None):

def run_prediction(genome_files, args, predictions_file):
'''Core prediction functionality

Args:
genome_files:
list of genome files
args:
commandline arguments
predictions_file:
filename of prediction output

Returns:
predictions_file with prediction written in it
'''
query_file = definitions.SEROTYPE_FILE
ectyper_dict_file = definitions.SEROTYPE_ALLELE_JSON
# create a temp dir for blastdb
with tempfile.TemporaryDirectory() as temp_dir:
with definitions.TEMPDIR() as temp_dir:
# Divide genome files into chunks
chunk_size = 50
genome_chunks = [
Expand All @@ -191,6 +198,8 @@ def run_prediction(genome_files, args, predictions_file):
]
for index, chunk in enumerate(genome_chunks):
LOG.info("Start creating blast database #{0}".format(index + 1))
LOG.info("Using SEROTYPE_FILE: {0}".format(query_file))
LOG.info("Using SEROTYPE_ALLELE_JSON: {0}".format(ectyper_dict_file))
blast_db = blastFunctions.create_blast_db(chunk, temp_dir)

LOG.info("Start blast alignment on database #{0}".format(index + 1))
Expand All @@ -204,10 +213,10 @@ def run_prediction(genome_files, args, predictions_file):

def get_raw_files(raw_files):
"""Take all the raw files, and filter not fasta / fastq

Args:
raw_files(str): list of files from user input

Returns:
A dictitionary collection of fasta and fastq files
example:
Expand Down Expand Up @@ -235,7 +244,7 @@ def filter_for_ecoli_files(raw_dict, temp_files, verify=False, species=False):
Assemble fastq files to fasta files,
then filter all files by reference method if verify is enabled,
if identified as non-ecoli, identify species by mash method if species is enabled.

Args:
raw_dict{fasta:list_of_files, fastq:list_of_files}:
dictionary collection of fasta and fastq files
Expand Down Expand Up @@ -266,7 +275,7 @@ def filter_file_by_species(genome_file, genome_format, temp_dir, verify=False, s
Assemble fastq file to fasta file,
then filter the file by reference method if verify is enabled,
if identified as non-ecoli, identify species by mash method if species is enabled.

Args:
genome_file: input genome file
genome_format(str): fasta or fastq
Expand Down
16 changes: 11 additions & 5 deletions ectyper/genomeFunctions.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
'''
Genome Utilities
'''
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import
#!/usr/bin/env python

from builtins import str
from future import standard_library
standard_library.install_aliases()
import logging
import os
import re
import tempfile
from tarfile import is_tarfile

from Bio import SeqIO
Expand Down Expand Up @@ -63,14 +69,14 @@ def get_valid_format(file):
"""
for fm in ['fastq', 'fasta']:
try:
with open(file, "r") as handle:
with open(file, definitions.read_flags) as handle:
data = SeqIO.parse(handle, fm)
if any(data):
if is_tarfile(file):
LOG.warning("Compressed file is not supported: {}".format(file))
return None
return fm
except FileNotFoundError as err:
except IOError as err:
LOG.warning("{0} is not found".format(file))
return None
except UnicodeDecodeError as err:
Expand Down Expand Up @@ -113,14 +119,14 @@ def get_genome_names_from_files(files, temp_dir):
n_name = file_path_name.replace(' ', '_')

# create a new file for the updated fasta headers
new_file = tempfile.NamedTemporaryFile(dir=temp_dir, delete=False).name
new_file = definitions.NAMEDTEMPFILE(dir=temp_dir, delete=False).name

# add the new name to the list of files and genomes
list_of_files.append(new_file)
list_of_genomes.append(n_name)

with open(new_file, "w") as outfile:
with open(file) as infile:
with open(file, definitions.read_flags) as infile:
for record in SeqIO.parse(infile, "fasta"):
outfile.write(">lcl|" + n_name + "|" + record.description + "\n")
outfile.write(str(record.seq) + "\n")
Expand Down
6 changes: 6 additions & 0 deletions ectyper/loggingFunctions.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,13 @@
"""
Set up the logging
"""
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import

from future import standard_library
standard_library.install_aliases()
import logging
import os

Expand Down
Loading