This week, I have to analyze and trim the adaptor sequence for 5 illumina HiSeq 2000 reads (Penaeus_monodon & Litopenaeus vannamei species). If you want to know about the research work I am helping with,please go to this page: CentexShrimp.
Ok….Back to Sequence Trimming,
The detail information about the run sequence can be viewed at NCBI Website.
This is the list of the tools I used.
- sra tool Kits : contain several tools for converting datasets from one format to another. for this analysis, I convert the sra datasets to fastq format with fastq-dump tool for further processing. Fastq-dump tool can split the dataset into 2 files for paired-ended run with the following command.
FastQC: FastQC tool is a useful tool to evaluate the quality of the sequence data and it povides the report including various information about the dataset. Before and after trimming the sequence data, it is a great tool to take a look what going on with the sequence data.
** Trimmomatic**: This is the main tool my supervisor choose for me to work on this task. It’s a combo tool to perform many trimming tasks for illumina runs.
FLASH: After trimming data with adaptor sequences, FLASH provides a way to merge paired-end reads.
As the last step,we can use FastQC tool again to evaluate the output merged file. The following image shows the base quality of one dataset after the process.