This is the pipeline process for 454 sequencing. I am not a biotech guy but after spending time, giving effort and getting help from my collague, I came up with a way to do that. The tools I used are NCBI SRA Toolkit and 454 sequencing tool for multiplexing.
process
Downloaded SRA files from …
Converted these SRA files into SFF format using sff-dump tool
$ sff-dump -A xxxx.sra
- Rebuilt the scores of converted sff dataset with sfffile tool
$ sfffile -o xxxxxxn.sfff xxxxx.sff
- Split the file according to MID groupname
$ sfffile -s GSMIDs/RLMIDs xxxxx.sff
- Calculated the total MID matches for each group
- Extract sequence:
$ sffinfo -s 454Readsxx.sff > MIDx.fasta
1
* Count total sequence no:
$ egrep -e '^>' MIDx.fasta | wc -l
- Combine sff files into one main file
$ sfffile -o combined.sff xxx1.sff xxx2.sff xxx3.sfff ...
Other useful Commands
- Get the quality scores from the sff file:
$ sffinfo -q 454Readsxx.sff > MIDx.qual
- Retrieve the flow intensities:
$ sffinfo -f 454Readsxx.sff > MIDx.flow
- View file:
$ more/less MIDx.fasta