DNA Sequencing pipeline for 454 GS/FLX runs

By Aung 🥷🏻

Posted 2013-10-23 1 min read

This is the pipeline process for 454 sequencing. I am not a biotech guy but after spending time, giving effort and getting help from my collague, I came up with a way to do that. The tools I used are NCBI SRA Toolkit and 454 sequencing tool for multiplexing.

process

Downloaded SRA files from …
Converted these SRA files into SFF format using sff-dump tool

    	$ sff-dump -A xxxx.sra

Rebuilt the scores of converted sff dataset with sfffile tool

    	$ sfffile -o xxxxxxn.sfff xxxxx.sff

Split the file according to MID groupname

    	$ sfffile -s GSMIDs/RLMIDs xxxxx.sff

Calculated the total MID matches for each group
- Extract sequence:

		 $ sffinfo -s 454Readsxx.sff > MIDx.fasta

*  Count total sequence no:  		

		 $ egrep -e '^>'  MIDx.fasta | wc -l

Combine sff files into one main file

	$ sfffile -o combined.sff xxx1.sff xxx2.sff xxx3.sfff ...

Other useful Commands

Get the quality scores from the sff file:

	$ sffinfo -q 454Readsxx.sff > MIDx.qual

Retrieve the flow intensities:

	$ sffinfo -f 454Readsxx.sff > MIDx.flow

View file:

	$ more/less MIDx.fasta

DNA Sequencing pipeline for 454 GS/FLX runs

process

Other useful Commands

Further Reading

Get Bio-Linux packages on Ubuntu 14.04

Pipline to trim adaptor sequences from illumina Hiseq pair-ended reads

Shell Scripting(part I)