Development of Computational Tools and Resources for Cotton microRNA Analysis
MicroRNAs (miRNAs) are an extensive class of small regulatory RNAs which regulate gene expression at the posttranscriptional levels. miRNAs target genes for mRNA cleavage or translation inhibition based on the complementary between the mRNAs and its corresponding miRNAs, and these miRNA target genes control development timing, organ development and response to environmental stress; thus miRNAs have been shown to play important roles in almost all biological and metabolic processes. Upland cotton (Gossypium hirsutum L.), one of the most important fiber producing crops, is widely planted in the world. Upland cotton originated from the reunion of two ancestral cotton genomes (A and D genomes) approximately 1-2 Myr ago, owning a complicated genome of allotetraploid (AADD, 2n=4x=52), with a haploid genome size estimated to be around 2.5 Gb. To date, about 80 miRNAs have been subsequentially identified in cotton by computational prediction or small RNA sequencing, many of which were also shown to be expressed differentially during fiber development. However, although miRNA-related research has become one of the hottest research in biology in the past decade and thousands of miRNAs have been identified, miRNA-related research in cotton is far beyond other plant species. One of the major reason is because of limited computational tools and resources for cotton. In this dissertation project, we first developed a comprehensive computational tool named miRDeepFinder, which can be used for miRNA identification, target prediction and GO-/KEGG-based functional analysis for both model and non-model plant species. A case study with a small RNA sequencing data of Arabidopsis showed miRDeepFinder is an accurate and robust tool for plant miRNA analysis in deep sequencing, since 12 of 13 novel miRNAs in Arabidopsis identified by miRDeepFinder were further confirmed by qRT-PCR. miRDeepFinder also incorporated the popularly-used Cleaveland software package for analysis of degradome sequencing data. Although cotton genome is still not available, huge cotton ESTs could be a good data resource for identification of cotton miRNAs and their targets. To better utilize cotton ESTs for miRNA identification, we globally re-assembled all the cotton ESTs and developed it to a cotton EST database, in which cotton coding genes and miRNAs were deeply annotated using BLASTx, BLASTn, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) resources. A total of 28,432 unique contigs were assembled from all 268,786 cotton ESTs currently available, belonging into 5,461 groups with a maximum cluster size of 196 members. Using these contigs, we also performed EST-based investigations of comparative transcriptome similarity between cotton and other plant species, sequence polymorphisms, expressed miRNAs and their targets, and SSR analysis. A total of 27,956 indel mutants and 149,616 single nucleotide polymorphisms (SNPs) were identified from consensus contigs. In a comparison with six model plant species, cotton ESTs show the highest overall similarity to grape. We also identified 151 and 4,214 EST-simple sequence repeats (SSRs) from contigs and raw ESTs respectively. Finally, all results were integrated to a comprehensive web-based cotton EST database (www.leonxie.com), in order to make these data widely available, and to facilitate access to EST-related genetic information. Subsequently, 3 cotton small RNA sequencing libraries treated by control, drought and salinity were sequenced. Based on miRDeepFinder, annotated cotton EST database, and cotton D genome of Gossypium raimondii, we identified 337 miRNAs with precursors in total, including 289 known miRNAs and 48 novel miRNAs. 155 of 337 miRNAs were found to be expressed differentially amongst the three treatments. Target prediction, GO-based functional classification, and KEGG-based functional enrichment uncovered many miRNAs and their stress-related targets might play roles in response to salinity and drought stresses. Using CitationRank-based literature mining, we sorted out the importance of genes related to stress of drought and salinity, respectively. It turned out NAC family, MYB family and MAPK family were ranked top under the context of drought and salinity, indicating their important roles for plant to combat stress of drought and salinity. To identify potential miRNAs and mRNA genes that significantly contribute to cotton fiber development, we constructed two libraries of 1-DPA (days post anthesis)-old leaf and ovule and sequenced them. A total of 128 pre-miRNAs, including 120 conserved and 8 novel pre-miRNAs were identified in cotton by miRDeepFinder. At least 40 miRNAs were either leaf or ovule-specific, whereas 62 miRNAs were shared in both leaf and ovule. Many transcription factors and other genes important for development of fiber were predicted to be miRNA targets. 22 predicted miRNA-target pairs were further validated by degradome sequencing analysis. In addition to miRNAs, we also identified 11 potential tasiRNAs-derived genes, many of which also might be involved in fiber development. miRNAs from cotton A and D genomes that reunioned together ~1-2 Myr ago might experience similar evolution pattern with coding genes. However, little is known about miRNA origin, expansion, loss, duplication, whether different derived miRNAs exchange with or affect each other, and how different genome-derived miRNAs and different genome-derived coding gene interact in cotton. To this, we systematically investigated miRNA expansion, expression pattern, miRNA targets amongst three cotton species Gossypium hirsutum (AADD), Gossypium arboreum (AA), Gossypium raimondii (DD). The origin of miRNAs and coding genes were the first to be categorized in upland cotton. Our results also showed that cotton-specific miRNAs might undergo remarkably expansion and some highly conserved miRNAs were likely to be lost despite most of conserved miRNAs were remained after genome polyploidization. The comparison of miRNA expression during seedling and fiber at 5 developmental stages revealed that different genome-derived miRNAs and miRNA*s displayed asymmetric expression pattern, implicating their diverse function in upland cotton phenotype. Upon all the identified miRNAs identified in upland cotton above, we also globally investigated miRNA modification features in cotton. Besides the observation of some similar modification features with other plant species in cotton, we also found many interesting modification forms, such as modification balance between 5' and 3' end miRNAs. Comparison of isomiR expression shows differential miRNA modification amongst the 6 developmental stages in terms of selective modification form, development-dependent modification, and differential expression abundance. In contrast to previous reports, cytodine is more frequently truncated and tailed from the two ends of isomiRs in cotton, implying existence of a complex cytodine balance in isomiRs. Together, we developed a comprehensive computational tool and data resource for cotton miRNA research, and used these tools to investigate miRNA roles in cotton fiber development and response to abiotic stress. Cotton miRNA evolution and modification were also studied. Thus, our tools, data resources and research findings would contribute us to deciphering miRNA regulatory function and evolution in cotton.
Xie, Fuliang. (January 2014). Development of Computational Tools and Resources for Cotton microRNA Analysis (Doctoral Dissertation, East Carolina University). Retrieved from the Scholarship. (http://hdl.handle.net/10342/4714.)
Xie, Fuliang. Development of Computational Tools and Resources for Cotton microRNA Analysis. Doctoral Dissertation. East Carolina University, January 2014. The Scholarship. http://hdl.handle.net/10342/4714. June 23, 2018.
Xie, Fuliang, “Development of Computational Tools and Resources for Cotton microRNA Analysis” (Doctoral Dissertation., East Carolina University, January 2014).
Xie, Fuliang. Development of Computational Tools and Resources for Cotton microRNA Analysis [Doctoral Dissertation]. Greenville, NC: East Carolina University; January 2014.
East Carolina University