Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences

dc.access.optionOpen Access
dc.contributor.advisorStiller, John
dc.contributor.authorRaymond, Miranda
dc.contributor.departmentBiology
dc.date.accessioned2018-03-13T16:25:19Z
dc.date.available2018-03-13T16:25:19Z
dc.date.created2017-12
dc.date.issued2017-12-11
dc.date.submittedDecember 2017
dc.date.updated2018-03-12T13:17:58Z
dc.degree.departmentBiology
dc.degree.disciplineBiochemistry
dc.degree.grantorEast Carolina University
dc.degree.levelUndergraduate
dc.degree.nameBS
dc.description.abstractThe genomics revolution introduced affordable technology capable of rapidly analyzing and comparing massive amounts of biological sequence data. Using the Basic Local Alignment Search Tool (BLAST) program on the National Center for Biotechnology Information (NCBI) website, a highly expressed gene sequence obtained from the plant Leptosiphon jepsonii was analyzed. This sequence was compared against other sequences archived in the NCBI database for similarities. These comparisons encompassed various phyla of life including other green plants, fungi, metazoans, algae and single-celled organisms. The original sequence query was compared to inferred protein sequences. Then the mRNA sequences corresponding to these proteins were analyzed against complete nucleotide accessions through reciprocal BLAST searches to ensure accuracy of results. The most similar sequences from these reciprocal BLAST searches were rRNA rather than mRNA sequences. This result indicates that numerous accessions in NCBI are inappropriately characterized as mRNAs and proteins, rather than ribosomal sequences. To explore the breadth of this misannotation issue, sequences from a wide range of organisms, including model genomes, were also examined. This study indicates that rapid, automated computational analyses of massive amounts of sequence data, combined with a heightened focus on novel findings, has led to a sizable influx of erroneous data within even the most reputable databases.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10342/6562
dc.publisherEast Carolina University
dc.subjectNCBI, BLAST, misannotation, rRNA, proteins
dc.titleGenomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences
dc.typeHonors Thesis
dc.type.materialtext

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RAYMOND-HONORSTHESIS-2017.pdf
Size:
1.27 MB
Format:
Adobe Portable Document Format