• Find People
  • Campus Map
  • PiratePort
  • A-Z
    • About
    • Submit
    • Browse
    • Login
    View Item 
    •   ScholarShip Home
    • Academic Affairs
    • Honors College
    • View Item
    •   ScholarShip Home
    • Academic Affairs
    • Honors College
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of The ScholarShipCommunities & CollectionsDateAuthorsTitlesSubjectsTypeDate SubmittedThis CollectionDateAuthorsTitlesSubjectsTypeDate Submitted

    My Account

    Login

    Statistics

    View Google Analytics Statistics

    Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences

    Thumbnail
    View/ Open
    RAYMOND-HONORSTHESIS-2017.pdf (1.272Mb)

    Show full item record
    Author
    Raymond, Miranda
    Abstract
    The genomics revolution introduced affordable technology capable of rapidly analyzing and comparing massive amounts of biological sequence data. Using the Basic Local Alignment Search Tool (BLAST) program on the National Center for Biotechnology Information (NCBI) website, a highly expressed gene sequence obtained from the plant Leptosiphon jepsonii was analyzed. This sequence was compared against other sequences archived in the NCBI database for similarities. These comparisons encompassed various phyla of life including other green plants, fungi, metazoans, algae and single-celled organisms. The original sequence query was compared to inferred protein sequences. Then the mRNA sequences corresponding to these proteins were analyzed against complete nucleotide accessions through reciprocal BLAST searches to ensure accuracy of results. The most similar sequences from these reciprocal BLAST searches were rRNA rather than mRNA sequences. This result indicates that numerous accessions in NCBI are inappropriately characterized as mRNAs and proteins, rather than ribosomal sequences. To explore the breadth of this misannotation issue, sequences from a wide range of organisms, including model genomes, were also examined. This study indicates that rapid, automated computational analyses of massive amounts of sequence data, combined with a heightened focus on novel findings, has led to a sizable influx of erroneous data within even the most reputable databases.
    URI
    http://hdl.handle.net/10342/6562
    Subject
    NCBI, BLAST, misannotation, rRNA, proteins
    Date
    2017-12-11
    Citation:
    APA:
    Raymond, Miranda. (December 2017). Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences (Honors Thesis, East Carolina University). Retrieved from the Scholarship. (http://hdl.handle.net/10342/6562.)

    Display/Hide MLA, Chicago and APA citation formats.

    MLA:
    Raymond, Miranda. Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences. Honors Thesis. East Carolina University, December 2017. The Scholarship. http://hdl.handle.net/10342/6562. March 01, 2021.
    Chicago:
    Raymond, Miranda, “Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences” (Honors Thesis., East Carolina University, December 2017).
    AMA:
    Raymond, Miranda. Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences [Honors Thesis]. Greenville, NC: East Carolina University; December 2017.
    Collections
    • Biology
    • Honors College
    Publisher
    East Carolina University

    xmlui.ArtifactBrowser.ItemViewer.elsevier_entitlement

    East Carolina University has created ScholarShip, a digital archive for the scholarly output of the ECU community.

    • About
    • Contact Us
    • Send Feedback