Home DNA Sequencing pipeline for 454 GS/FLX runs

DNA Sequencing pipeline for 454 GS/FLX runs

This is the pipeline process for 454 sequencing. I am not a biotech guy but after spending time, giving effort and getting help from my collague, I came up with a way to do that. The tools I used are NCBI SRA Toolkit and 454 sequencing tool for multiplexing.


  • Downloaded SRA files from …

  • Converted these SRA files into SFF format using sff-dump tool

    	$ sff-dump -A xxxx.sra
  • Rebuilt the scores of converted sff dataset with sfffile tool
    	$ sfffile -o xxxxxxn.sfff xxxxx.sff
  • Split the file according to MID groupname
    	$ sfffile -s GSMIDs/RLMIDs xxxxx.sff
  • Calculated the total MID matches for each group
    • Extract sequence:
		 $ sffinfo -s 454Readsxx.sff > MIDx.fasta
*  Count total sequence no:  		
		 $ egrep -e '^>'  MIDx.fasta | wc -l
  • Combine sff files into one main file
	$ sfffile -o combined.sff xxx1.sff xxx2.sff xxx3.sfff ...

Other useful Commands

  • Get the quality scores from the sff file:
	$ sffinfo -q 454Readsxx.sff > MIDx.qual
  • Retrieve the flow intensities:
	$ sffinfo -f 454Readsxx.sff > MIDx.flow
  • View file:
	$ more/less MIDx.fasta
This post is licensed under CC BY 4.0 by the author.

Linux Applications for another failure

Pipline to trim adaptor sequences from illumina Hiseq pair-ended reads