Collection of short python scripts for bioinformatics

Data scrapping

Tandem repeats search with TRF

Search tandem repeats in given folder with fasta files:

python parallel_trf.py input_folder output_folder mask threads

Example:

python parallel_trf.py ~/human_genome/fasta ~/human_genome/trf fa 20

Illumina run statistics

Compute and draw distribution of PE fragment lengths:

python fragments_length_from_sam.py -o image_file -i sam_file

Functions related to SAM file

Count unmapped reads:

from PyBioSnippets.sam.sam_functions import count_unmapped

(mapped, unmapped) = count_unmapped(sam_file)

Save unmapped reads from SAM file to fasta file:

from PyBioSnippets.sam.sam_functions import save_unmapped_to_fasta

save_unmapped_to_fasta(sam_file, fasta_file)

Compute fragment lengths statistics for first l lines.

python fragments_length_from_sam.py -o stat.png -i data.sam -l 100000

Count FLAG values for given SAM file:

python hiseq/sam_stats.py -i data.sam

Fastq operations

Join splitted HiSeq files:

python hiseq/join_fastq.py --remove False --input some_folder --mask read_L001_R1

Fix too long quality scores in corrupted HiSeq files

fix_uncorrect_long_quality(fastq_file, corrected_fastq_output)

Iterator for pair end files:

for read_obj1, read_obj2 in iter_pe_data(fastq_file1, fastq_file2):
	do_somethind()

Convert fastq to fasta:

python hiseq/fastq_to_fasta.py -i data.fastq -o data.fasta

Kmers analysis

Compute kmer frequences percents for coverage plot.

python compute_kmer_coverage.py input_file output_file

PacBio analysis

Convert bax.h5 files into fasta and fastq files.

ls | grep bax.h5 | xargs -n 1 --max-procs 64 python baxh5_to_fastq.py

cat *fasta > pacbio.fasta

cat *fastq > pacbio.fastq

Chromosome statistics

Get dictionary with chromosome lengths

chr2length = get_chromosome_lengths(rerence_multifasta)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Collection of short python scripts for bioinformatics

Data scrapping

Tandem repeats search with TRF

Illumina run statistics

Functions related to SAM file

Fastq operations

Kmers analysis

PacBio analysis

Chromosome statistics

Files

README.md

Latest commit

History

README.md

File metadata and controls

Collection of short python scripts for bioinformatics

Data scrapping

Tandem repeats search with TRF

Illumina run statistics

Functions related to SAM file

Fastq operations

Kmers analysis

PacBio analysis

Chromosome statistics