-
Notifications
You must be signed in to change notification settings - Fork 0
20171203
2017/12/03
- RUN first 1000 line of AGO2KO_RNA1 in /proj/sllstore2017067/Project_TT_RNA_seq_MMK/trialrun/AGO2KO_RNA_1 for trial read mapping
sbatch -A g2017020 -t 2:00:00 -p core -n 20 -o AGO2KO_RNA_1_trial.out -e AGO2KO_RNA_1_trial.err sbatch_trial.sh
The SAM files contains not only chr1 annotation (which make senses, as there are regions that could be ambiguously mapped to other chr1) . Thus, attempt to run with only chr1 annotation (with the fasta file include chr1 and scaffolding sequences) And we also attempted to run the mapping with the whole genome fasta file and annotation.
HTseq was run, yet no. of hits was very low (as well as the counts). Therefore, We are concerned if we have run the program properly, and one major problem that we are still unclear with the input files regarding its read information.
Galaxy was used as well, the count reads were much better than using HTseq server in Uppmax. ( Using single reads option).
-
Email Rui requesting about if the RNA_seq_1 and RNA_seq_2 are reads files or replicates. At the meantime, developing a pipeline for generating mapping sam files. Assuming these files are read files (i.e. pair-ends)
-
Organise the hierarchy of files.
-
Set up sbatch script for alignment filtering and only taking the reads mapping to chr1
- using MAPQ -q 7
Question:
- Are RNA_seq_1 and RNA_seq_2 pair readings ? or replicates?
- If one files with both ends, how do we extract them into separate files, or is it a way to run it with single file (which containing both ends)
- Quick Demonstration of galaxy (for options and such)
- FOR HTseq, -s: stranded, -t option for RNAseq and TTseq, -m
- For annotating, will it more appropriate to just use chr1 genome? How do you ensure the files you have given us are from chr1 only?
- The quality in the fastq file, is it a way we can incorporate in the pipeline?
Sorted the alignment by the name using Samtools, as on HTseq website, it states the alignment required sorting before submitted to HTseq. BATCH SCRIPT!!!!!
- The sorting was done, and we have noticed some of the alignments sharing the same name, hence we suspected that the fastq files containing both ends reading.
- Double check by grep "NAME" | *.fastq -> result: 2 entries coming up.
- Thus, we have done some manipulation with the files, attempt to separate them into two separate files which contain only one end read respectively.
- Replace new line with space tr '\n' ' ' <AGO2KO_RNASeq_1_trial.fastq
- Add new line before @NB sed 's/@NB/\n&/g'
- Sort according to the ID sort -k 1 -n
- Replace space with new line tr ' ' '\n'
- Remove blank space sed '/^$/d'
tr '\n' ' ' <AGO2KO_RNASeq_1_trial.fastq | sed 's/@NB/\n&/g' | sort -k 1 -n | tr ' ' '\n' | sed '/^$/d' > AGO2KO_RNASeq_1_trial_oneline.fastq