20171203

2017/12/03

RUN first 1000 line of AGO2KO_RNA1 in /proj/sllstore2017067/Project_TT_RNA_seq_MMK/trialrun/AGO2KO_RNA_1 for trial read mapping

sbatch -A g2017020 -t 2:00:00 -p core -n 20 -o AGO2KO_RNA_1_trial.out -e AGO2KO_RNA_1_trial.err sbatch_trial.sh

The SAM files contains not only chr1 annotation (which make senses, as there are regions that could be ambiguously mapped to other chr1) . Thus, attempt to run with only chr1 annotation (with the fasta file include chr1 and scaffolding sequences) And we also attempted to run the mapping with the whole genome fasta file and annotation.

HTseq was run, yet no. of hits was very low (as well as the counts). Therefore, We are concerned if we have run the program properly, and one major problem that we are still unclear with the input files regarding its read information.

Galaxy was used as well, the count reads were much better than using HTseq server in Uppmax. ( Using single reads option).

Email Rui requesting about if the RNA_seq_1 and RNA_seq_2 are reads files or replicates. At the meantime, developing a pipeline for generating mapping sam files. Assuming these files are read files (i.e. pair-ends)
Organise the hierarchy of files.
Set up sbatch script for alignment filtering and only taking the reads mapping to chr1

using MAPQ -q 7

Question:

Are RNA_seq_1 and RNA_seq_2 pair readings ? or replicates?
If one files with both ends, how do we extract them into separate files, or is it a way to run it with single file (which containing both ends)
Quick Demonstration of galaxy (for options and such)
FOR HTseq, -s: stranded, -t option for RNAseq and TTseq, -m
For annotating, will it more appropriate to just use chr1 genome? How do you ensure the files you have given us are from chr1 only?
The quality in the fastq file, is it a way we can incorporate in the pipeline?

Sorted the alignment by the name using Samtools, as on HTseq website, it states the alignment required sorting before submitted to HTseq. BATCH SCRIPT!!!!!

The sorting was done, and we have noticed some of the alignments sharing the same name, hence we suspected that the fastq files containing both ends reading.
Double check by grep "NAME" | *.fastq -> result: 2 entries coming up.
Thus, we have done some manipulation with the files, attempt to separate them into two separate files which contain only one end read respectively.

Replace new line with space tr '\n' ' ' <AGO2KO_RNASeq_1_trial.fastq
Add new line before @NB sed 's/@NB/\n&/g'
Sort according to the ID sort -k 1 -n
Replace space with new line tr ' ' '\n'
Remove blank space sed '/^$/d'

tr '\n' ' ' <AGO2KO_RNASeq_1_trial.fastq | sed 's/@NB/\n&/g' | sort -k 1 -n | tr ' ' '\n' | sed '/^$/d' > AGO2KO_RNASeq_1_trial_oneline.fastq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20171203

Clone this wiki locally