Title : Next generation transcriptome assembly for Euterpe oleracea
Abstract:
Euterpe oleracea, Arecaceae family, is used especially for fruit. Its natural habitat is Amazon rainforest. Here we used high throughput RNA sequencing (RNAseq) to obtain the reference transcriptome for this species, establishing a critical genomic resource necessary for future genetics studies. Leaves of one adult individual of E. oleracea were collected in the Amazon rainforest (Brazil). They were immediately frozen in liquid nitrogen and lyophilized. Total RNA was isolated and converted into cDNA prior to sequencing using an Illumina NextSeq platform. A total of 102,576,656 raw reads (pair-end read of 151 length) were filtered by quality using Trimmomatic and assembled into 193,487 transcripts with Trinity. The E. oleracea novo transcriptome assembly contains 153,805 unigene databases represented by 117 Mbp, with a median (mean) contig length of 334 bp (608 bp) and a GC content of 44.9%. The databases were annotated for their putative functions based on Arabidopsis thaliana transcriptme database resource. Total of 14,207 annotated unigene databases were categorized into 30 functional groups under Gene Ontology terms. In the biological process category, cellular processes (39.09%) and metabolic processes (35.92%) were the predominant groups. For the cellular component category, the predominant were cell parts (51.98%) and organelles (32.6%). The main distributions in the molecular function category were catalytic activity (36.33%) and binding (35.48%). This is a critical resource to be used for the development of new molecular tools for conservation genetics and evolutionary studies.