Pipline to trim adaptor sequences from illumina Hiseq pair-ended reads

By Aung 🥷🏻

Posted 2013-10-29 1 min read

This week, I have to analyze and trim the adaptor sequence for 5 illumina HiSeq 2000 reads (Penaeus_monodon & Litopenaeus vannamei species). If you want to know about the research work I am helping with,please go to this page: CentexShrimp.

Ok….Back to Sequence Trimming,

The detail information about the run sequence can be viewed at NCBI Website.

This is the list of the tools I used.

sra tool Kits : contain several tools for converting datasets from one format to another. for this analysis, I convert the sra datasets to fastq format with fastq-dump tool for further processing. Fastq-dump tool can split the dataset into 2 files for paired-ended run with the following command.

Fastq-dump --split-3

FastQC: FastQC tool is a useful tool to evaluate the quality of the sequence data and it povides the report including various information about the dataset. Before and after trimming the sequence data, it is a great tool to take a look what going on with the sequence data.
** Trimmomatic**: This is the main tool my supervisor choose for me to work on this task. It’s a combo tool to perform many trimming tasks for illumina runs.
FLASH: After trimming data with adaptor sequences, FLASH provides a way to merge paired-end reads.

As the last step,we can use FastQC tool again to evaluate the output merged file. The following image shows the base quality of one dataset after the process.

Pipline to trim adaptor sequences from illumina Hiseq pair-ended reads

Further Reading

Get Bio-Linux packages on Ubuntu 14.04

DNA Sequencing pipeline for 454 GS/FLX runs

Shell Scripting(part I)