Evolution of olfactory receptors in birds By Robert Driver December, 2022 Director of Dissertation: Michael Brewer, PhD Major Department: Biology ABSTRACT Olfaction is an evolutionary ancient sensation, and is the perception and interpretation of chemical stimuli from surrounding air or water. Olfaction is an essential sensory modality for nearly all animals, and is used to define territories, to identify kinship, to navigate to breeding sites, to select mates, and when selecting mates. Unlike vision, which detects different wavelengths of a single particle, the photon, olfaction must detect a wide range of odor molecules. Odor molecules can be simple or complex, be large or small, and have a wide range of elements and chemical structures. To detect these diverse compounds, animals employ olfactory receptors, which constitute the largest gene family in all vertebrates. The total number of olfactory receptors that a species possesses can be used as a measurement of that species’ reliance upon smell in ecology and behavior. Despite the importance of smell and olfactory receptors in mammals, little is known about olfactory receptors in birds. The lack of knowledge of olfactory receptors in birds stems from centuries old misconceptions about birds relying on vision over olfaction in their behavior, leading scientists to historically overlook the use of smell in birds. Recent behavioral work is gradually debunking the notion that birds cannot smell, showing that birds use smell in similar ways to mammals, in foraging, individual recognition, and mate choice. However, research into olfactory receptors in birds continues to lag behind other vertebrate classes. My dissertation shows that birds have much larger olfactory receptor repertoires than the scientific community previously appreciated. In chapter 1, I show the discovery of hundreds of new olfactory receptors in birds, overlooked in previous studies, and show that olfactory receptors in birds, particularly the bird-specific gamma-c OR subfamily, can only be properly counted using genome assemblies that employ long-read sequencing technology. Knowing the importance of long-read assemblies for obtaining accurate olfactory receptor counts, I then expand olfactory receptor counts to 70 bird species with publicly available long read genomes, showing large olfactory repertoires across the bird phylogeny. I also show the dynamic birth and death of olfactory receptors through bird evolution, with a particularly high rate of death in the early lineages of the Neoaves bird group. However, our genomic counts only tell us the number of olfactory receptor genes in the genome, and do not directly implicate the olfactory receptors in a role specific to smell. To do this, in chapter 3, I show that the vast majority of olfactory receptors detected in the genomes of birds are indeed expressed in the olfactory epithelium, the tissue located inside the bird’s bill that is relevant to smell and the olfactory system. I further show that the gamma-c olfactory receptor subfamily is expressed in the olfactory epithelium, and that certain members of the family are expressed at high levels. These findings show that birds across the phylogeny likely use smell in their behavior and ecology, and that this sensory modality should not be overlooked in birds. My research paves the way for future studies to match bird olfactory receptors to the odors they respond to and to discover the odors that birds detect. Evolution of olfactory receptors in birds A Dissertation Presented to the Faculty of the Department of Biology and the Interdisciplinary Doctoral Program in Biology, Biomedicine, and Chemistry East Carolina University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy, Interdisciplinary Doctoral Program in Biology, Biomedicine, and Chemistry By Robert Driver December, 2022 Director of Dissertation: Michael Brewer, PhD Thesis Committee Members: Christopher Balakrishnan, PhD John Stiller, PhD Maude Baldwin, PhD © Robert Driver, 2022 TABLE OF CONTENTS LIST OF TABLES ............................................................................................................ vi LIST OF FIGURES ............................................................................................................ vii CHAPTER 1: HIGHLY CONTIGUOUS GENOMES IMPROVE THE UNDERSTANDING OF AVIAN OLFACTORY RECEPTOR REPERTOIRES ................................................... 1 Abstract ............................................................................................................ 1 Introduction ............................................................................................................ 1 Methods ............................................................................................................ 4 Assembly selection ........................................................................................ 5 Olfactory receptor identification ................................................................... 8 Classification of final OR set ......................................................................... 9 Results ............................................................................................................ 10 OR totals ........................................................................................................ 10 OR counts relative to previous studies........................................................... 12 Assembly effects on OR subgenome ............................................................. 12 Physical location of ORs in avian genomes ................................................... 13 Discussion ............................................................................................................ 15 Long-read sequencing is critical for characterizing avian OR repertoire ...... 15 Phylogenetic implications of updated OR counts .......................................... 16 Towards a better understanding of the avian OR subgenome ....................... 17 Long read assembly methods better characterize complex gene families ..... 19 Funding ............................................................................................................ 20 Acknowledgements .................................................................................................... 20 References ............................................................................................................ 21 CHAPTER 2: EVOLUTION OF OLFACTORY RECEPTOR REPERTOIRES ACROSS AVIAN PHYLOGENY ......................................................................................................... 27 Abstract ............................................................................................................ 27 Introduction ............................................................................................................ 28 Methods ............................................................................................................ 30 Assembly selection ........................................................................................ 30 OR identification and classification ............................................................... 31 Estimation of tree topology ............................................................................ 32 Estimation of branch lengths ......................................................................... 33 Trait analyses: data collection ........................................................................ 33 Trait analyses: phylogenetic generalized least squares .................................. 35 Phylogenetic analyses of OR counts: ancestral state reconstructions ............ 36 Phylogenetic analyses of OR counts: branch birth and death rates ............... 37 Results ............................................................................................................ 37 OR totals ........................................................................................................ 37 Ancestral state reconstruction ........................................................................ 38 Olfactory receptor birth and death rates ......................................................... 42 Comparisons of OR counts and traits ............................................................ 47 Discussion ............................................................................................................ 51 Olfactory capabilities are potentially widespread in birds ............................. 51 OR counts declined in early diverging Neoaves ............................................ 52 Dynamic birth and death of ORs across the bird tree .................................... 54 Olfactory bulb size correlates with OR repertoire counts .............................. 56 Diet and song learning are not correlated with OR counts ............................ 57 Nocturnality increases total OR counts.......................................................... 59 No evidence for influence of genome size on OR count ............................... 60 Conclusion ............................................................................................................ 61 References ............................................................................................................ 62 CHAPTER 3: FUNCTIONAL CHARACTERIZATION OF OLFACTORY RECEPTORS IN THE CONTEXT OF THEIR RADIATION IN BIRDS ........................................................ 69 Abstract ............................................................................................................ 69 Introduction ............................................................................................................ 69 Methods ............................................................................................................ 73 Sample collection ........................................................................................... 73 RNA extractions and sequencing ................................................................... 74 Read mapping ................................................................................................ 75 Counting and differential expression ............................................................. 76 Results ............................................................................................................ 77 ORs found in tissues ...................................................................................... 77 Differential expression ................................................................................... 79 Discussion ............................................................................................................ 80 Most genomic ORs are expressed in OE ....................................................... 80 High expression levels of ORs, including gamma-c ORs .............................. 85 Few differentially expressed ORs and high variance between samples ........ 86 Conclusion ............................................................................................................ 89 References ............................................................................................................ 90 LIST OF TABLES 1. Table 1.1 List of assemblies used .............................................................................. 6 2. Table 1.2 OR counts from short and long read assemblies ........................................ 11 LIST OF FIGURES 1. Fig. 1.1. Maximum-likelihood phylogenetic reconstruction of OR repertoire from long- read assemblies .......................................................................................................... 7 2. Fig. 1.2. Distribution of gamma-c ORs among chromosomes and scaffolds in chromosome-level assemblies ................................................................................... 14 3. Fig. 1.3. Magnification of the largest OR cluster in the chicken genome on chromosome 33................................................................................................................................ 19 4. Fig 2.1. Ancestral character states of bird ORs in two topologies. ............................ 40 5. Fig 2.2. Ancestral state reconstruction of bird OR subfamily repertoires, generated by maximum likelihood using the fastAnc function in phytools in R ............................ 45 6. Fig 2.3. Highest OR birth and death rates across two topologies. ............................. 46 7. Fig. 2.4. Comparison of OR counts between diurnal and nocturnal bird species. ..... 49 8. Fig. 2.5. Significant correlations between OR counts and olfactory bulb size to telencephalon size ratio and whole brain ratio. .......................................................... 50 9. Fig 3.1. OR gene expression in OE and pectoralis in log CPM................................. 83 10. Fig 3.2. OR expression levels and the total number of ORs expressed varied substantially between OE samples within species. ......................................................................... 84 I. HIGHLY CONTIGUOUS GENOMES IMPROVE THE UNDERSTANDING OF AVIAN OLFACTORY RECEPTOR REPERTOIRES Abstract Third generation (long read-based) sequencing technologies are reshaping our understanding of genome structure and function. One of the most persistent challenges in genome biology has been confidently reconstructing radiations of complex gene families. Olfactory receptors (ORs) represent just such a gene family with upwards of 1000s of receptors in some mammalian taxa. Whereas in birds olfaction was historically an overlooked sensory modality, new studies have revealed an important role for smell. Chromosome-level assemblies for birds allow a new opportunity to characterize patterns of OR diversity among major bird lineages. Previous studies of short read (second-generation) genome assemblies have associated OR gene family size with avian ecology, but such conclusions could be premature if new assembly methods reshape our understanding of avian OR evolution. Here we provide a fundamental characterization of OR repertoires in five recent genome assemblies, including the most recent assembly of golden- collared manakin (Manacus vitellinus). We find that short read-based assemblies systematically undercount the avian-specific gamma-c OR subfamily, a subfamily that comprises over 65 percent of avian OR diversity. Therefore, in contrast to previous studies we find a high diversity of gamma-c ORs across the avian tree of life. Building on these findings, ongoing sequencing efforts and improved genome assemblies will clarify the relationship between OR diversity and avian ecology. Introduction Our understanding of avian sensory biology has progressed substantially in recent years. Studies have discovered fantastic ways that birds experience the world, including the visual detection of non-spectral colors, the detection of sugar via a repurposed umami taste receptor in nectivorous species, and amphibious hearing in cormorants (Baldwin et al. 2014; Larsen et al. 2020; Stoddard et al. 2020). However, while studies investigating most senses, particularly bird vision, have received considerable attention, research into olfaction has lagged behind. Misconceptions about bird olfaction date back nearly 200 years, when John James Audubon falsely concluded that turkey vultures (Cathartes aura) could not smell carrion (Audubon 1826). Darwin also performed behavioral experiments on Andean condors (Vultur gryphus) to conclude that they could not smell meat (Darwin 1891). An examination of olfactory bulb size across a diversity of bird species concluded that birds could not have anything more than a rudimentary sense of smell (Hill 1905). In response to these conclusions, bird olfaction remained relatively unexplored until behavioral studies showed odor recognition in pigeons (Michelsen 1959). Following this study, there have been a wealth of morphological and behavioral studies testing for olfaction in both captive and wild birds (Bang and Cobb 1968; Hagelin 2007; Gwinner and Berger 2008; Nevitt et al. 2008; Krause et al. 2012; Van Huynh and Rice 2019). To follow this appreciation for the behavioral and ecological roles of olfaction in birds, researchers have characterized bird olfactory receptors at a genomic level. Olfactory receptors (ORs) are seven transmembrane domain rhodopsin-like G protein-coupled receptors that detect odors when expressed in the olfactory sensory neurons of the nasal epithelium (Buck and Axel 1991; Mombaerts 2004). In the protruding cilia of the olfactory sensory neurons, ORs recognize specific volatile compounds in their transmembrane domain binding pocket, which creates a signaling cascade that depolarizes the cell membrane and sends an action potential to the 2 olfactory bulb glomeruli and later the brain (summarized in Breer 2003). Each OR may recognize one or multiple odorants, and each odorant may be detected by one or multiple ORs, and so in this way, species may perceive a wide array of odors (Saito et al. 2009). ORs constitute one of the largest gene families in vertebrates. For example, the elephant genome contains about 2,000 intact ORs (Niimura et al. 2014). In birds, there are three major subfamilies of ORs, the alpha, gamma, and gamma-c (Niimura and Nei 2005). The alpha and gamma subfamilies are shared between all amniotes (Niimura and Nei 2005; Steiger et al. 2009). The third subfamily, gamma-c, is unique to birds (Niimura and Nei 2005; Silva et al. 2020). The gamma-c subfamily is numerous in some bird genomes comprising over 65% of the OR repertoire (Steiger et al. 2009; Khan et al. 2015). Gamma-c ORs are similar in sequence and sequences cluster by species rather than by orthologs among species (Steiger et al. 2009). Gamma-c ORs likely evolve with a high level of birth and death rates and gene conversion to maintain the species-specific clustering (Niimura and Nei 2005; Steiger et al. 2009). The first genomic investigations to determine bird OR repertoire counts provided further evidence for the potential of birds to recognize a wide variety of odors. A total of 214 intact ORs were reported in chicken (Gallus gallus) and 134 reported in zebra finch (Taeniopygia guttata) by Steiger et al. (2009). The finding of hundreds of ORs in the chicken and zebra finch genomes were replicated using multiple OR identifying pipelines (Wang et al. 2013; Khan et al. 2015; Vandewege et al. 2016). All of these studies identified similar OR counts as well as similar proportions of each OR subfamily, with the gamma-c family dominating the OR repertoire. The majority of ORs, however, were located on unmapped contigs, including over 90% of gamma-c ORs in chicken and zebra finch. 3 Second-generation (Illumina, short read-based) genomes greatly broadened genome sampling across the tree of life, including birds (Jarvis et al. 2014). However, in these assemblies, intact OR numbers were significantly lower than had been observed in the Sanger- based chicken and zebra finch assemblies (Steiger et al. 2009; Khan et al. 2015). Particularly absent from analyses was the distinct avian radiation of the gamma-c OR subfamily, with 45 of 46 species with short-read assemblies yielding fewer than 25 gamma-c ORs (Khan et al. 2015). Despite sequencing technology being a common thread in the 45 assemblies with lower OR counts, technical explanations were ruled out in favor of evolutionary explanations for the observed patterns of diversity (Khan et al. 2015). Chromosome-level reference genomes, using long-read sequencing technology, should provide more reliable information about OR repertoire diversity in birds. The Vertebrate Genomes Project recently expanded chromosome scale-assembly methods from model organisms across the vertebrate tree of life (Rhie et al. 2021). Combining these and other new assemblies, we are now able to characterize OR diversity in five bird genomes in which long- read approaches have been deployed (Feng et al. 2020; Liu et al. 2021; Rhie et al. 2021). Included in our species analyses is the new assembly of golden-collared manakin (Manacus vitellinus) that was sequenced as part of a collaborative effort within the National Science Foundation supported Research Coordination Network for biologists studying manakins (Pipridae). We directly compare Sanger, Illumina, hybrid, and Pac-Bio based assemblies to examine the ways in which our understanding of bird OR family repertoire, and our comprehension of avian olfactory capabilities, are shaped by these higher-quality assemblies. Methods 4 Assembly selection We sought to compare OR discovery rates and assembly quality by using select bird species with multiple publicly available genome assemblies on GenBank (https://www.ncbi.nlm.nih.gov/genbank/). Assemblies for each species varied in the sequencing technology employed and assembly software used (Table 1). In order to examine how variation in genome sequencing methods impacts OR discovery and description, we included one genome of each species with long-read sequencing technology (Pacific Biosciences (Pac Bio), RSII or Sequel) as well as one genome without long-read sequencing. We obtained two assemblies from five bird species: emu (Dromaius novaehollandiae), chicken (Gallus gallus), Anna’s hummingbird (Calypte anna), golden-collared manakin (Manacus vitellinus), and zebra finch (Taeniopygia guttata) (Warren et al. 2010; Zhang et al. 2014; Feng et al. 2020; Liu et al. 2021; Rhie et al. 2021, Table 1). In addition to the availability of multiple genomes, these 5 species are representative across the three major groupings of extant birds, including the Paleognathae and two groups within the Neognathae, Galloanseres and Neoaves, represent diverse ecology, and include two important avian models, chicken and zebra finch (Fig. 1A). 5 Table 1.1. List of assemblies used. Species Abbreviation Accession contigN5 Data types Assembler 0 (Mb) M. vitellinus1 Mvit1 GCF_000692 0.04 Illumina SOAPdenovo 015.1 M. vitellinus2 Mvit3 GCF_001715 0.29 PacBio/Illumina MaSuRCA 985.3 G. gallus3 Ggal4 GCF_000002 0.30 Sanger/454 Celera 315.3 G. gallus3 Ggal6 GCF_000002 17.65 PacBio Falcon 315.6 T. guttata4 Tgut1 GCF_000151 0.038 Sanger PCAP 805.1 T. guttata5 Tgut2 GCA_00395 12.00 PacBio/10x/Bionano/ Falcon etc. 7565.3 HiC D. Dnov1 GCA_01339 0.86 Illumina Allpaths-LG novaehollandiae 6795.1 D. Dnov2 GCA_01612 14.09 PacBio Falcon novaehollandiae 8335.1 6 C. anna 1 Cann1 GCF_000699 0.03 Illumina SOAPdenovo 085.1 C. anna5 Cann2 GCF_003957 14.52 PacBio/10x/Bionano/ Falcon etc. 555.1 HiC 1. Zhang et al. 2014 2. Feng et al. 2020 3. International Chicken Genome Consortium 4. Warren et al. 2010 5. Rhie et al. 2021 6. Liu et al. 2021 Note: Each species included has two representative assemblies. Within each species, one assembly was sequenced with either the Illumina or the Sanger sequencing platform, while the other assembly was sequenced at least in part with PacBio. 6 Fig. 1.1 Maximum-likelihood phylogenetic reconstruction of OR repertoire from long-read assemblies. Phylogenetic trees were assembled using IQ-TREE and only nodes with >50% support following a likelihood ratio test are shown. (A) Topological phylogeny of species used in this study is shown with red branches indicating Neoaves species. OR counts from long-read genomes for each species are given. The five species and assemblies shown are (B) M. vitellinus Mvit3, (C) C. anna Cann2, (D) T. guttata Tgut2, (E) G. gallus Ggut6, and (F) D. novaehollandiae Dnov2. The three OR subfamilies were assigned based on putative orthology to previously described bird ORs (Niimura and Nei 2005, Vandewege et al. 2016). Images not to scale. Image (A) is modified artwork by Kristen Orr. 7 Olfactory receptor identification To detect putatively functional ORs in the selected genomes, we created a BLAST query with a set of 2,110 OR protein sequences from six mammals (Ornithorhynchus anatinus, Didelphis virginiana, Bos taurus, Canis lupus, Rattus norvegicus, Macaca mulatta), two birds (Gallus gallus, Taeniopygia guttata), and one crocodilian (Gavialis gangeticus). We obtained this query OR set by combining previously published OR subgenomes (Niimura and Nei 2007; Niimura 2009; Vandewege et al. 2016). Using this query file, we performed TBLASTN searches against all 11 bird genomes with a threshold of E <1e-20. To remove pseudogenized and truncated ORs, we filtered for hits >250 amino acids long. For any single location on the genome, we filtered out hits within 100bp of each other, and selected the lowest e-value associated with that location. After obtaining unique BLAST hits, we extracted the associated nucleotide sequence from the genome as well as 300bp regions flanking the hit both upstream and downstream. We used a modified Perl script from Beichman et al. (2019) to detect open reading frames (ORFs) within each extracted region (Montague et al. 2014; Beichman et al. 2019). We then aligned these ORFs to each other as well as to the human Olfactory Receptor Family 2 Subfamily J Member 3 (OR2J3) sequence using the E-INS-I default parameters in MAFFT (Katoh and Standley 2013). Using the previously characterized transmembrane domains of OR2J3 as a guide, we removed any sequences that had five or more amino acid insertions or deletions within a transmembrane domain in the alignment (McRae et al. 2012; Beichman et al. 2019). This included ORFs with stop codons appearing prior to the end of the seventh transmembrane domain. 8 Using this alignment, we recorded the position of the first amino acid in the first transmembrane domain. To estimate the location of the ORF start codon, we used modified Perl scripts from Beichman et al. (2019) to find the most appropriate methionine upstream of this recorded transmembrane start position (Montague et al. 2014; Beichman et al. 2019). ORF sequences were then truncated at the 5’ ends to begin with this methionine. This set of ORFs was then aligned using the E-INSI-I parameters in MAFFT to a set of T. guttata reference ORs as well as 11 non-OR rhodopsin-like G-protein coupled receptors (non-OR GPCRs) that functioned as an outgroup (Katoh and Standley 2013; Niimura 2013; Vandewege et al. 2016; Beichman et al. 2019). We then used clustalW to generate a neighbor-joining tree from this alignment with 1000 bootstraps, gaps removed, and Kimura’s distance correction (Kimura 1980; Goujon et al. 2010). We then removed any ORFs that were phylogenetically more closely related to the non- OR GPCRs. Classification of final OR set We classified all remaining ORFs as functional ORs. Using this final set, we ran a maximum likelihood tree using IQ-TREE with automatic model selection and 1000 SH-like approximate likelihood ratio test replicates (Minh et al. 2020). Using ML support values, we collapsed all nodes <50% support into a polytomy using iTOL software, and rooted the tree using the ancestral branch leading to the 11 non-OR GPCRs (Letunic and Bork 2019). We classified bird ORs into subfamilies alpha, gamma, and gamma-c based on the subfamily of the query sequence used to identify the OR and the location of the OR in one of the three distinct avian OR clades (Steiger et al. 2009; Vandewege et al. 2016). We then counted the final number of OR sequences as well as the number of ORs from each subfamily. 9 Results OR totals We identified a total of 1496 ORs across all 10 bird assemblies from five species (Table 2). Of these ORs, the gamma-c subfamily constituted 77% (1158) of the total, while 18% (263) of the identified ORs were gamma, and 5% (74) were alpha subfamily ORs. For assemblies with long-read sequencing, we found 946 ORs, with 42 alpha (4%), 162 gamma (17%), and 741 gamma-c (78%). Within a single assembly, the chicken Ggal6 (see assembly abbreviations in Table 1) yielded the largest number of ORs, with 355 total, 303 (85%) of which were gamma-c ORs (Fig. 1E). Gamma-c represented 97% (179/184) of the ORs in zebra finch Tgut1, the highest percent gamma-c out of total OR repertoire for any assembly. 10 Table 1.2 OR counts from short and long read assemblies Name Total Alpha Gamma Gamma Proportion contigN50 OR total ORs ORs ORs -c gamma-c in (Mb) literature ORs Mvit1 9 1 8 0 0.000 0.04 91 Mvit3 117 2 18 97 0.829 0.29 Ggal4 272 10 33 229 0.842 0.30 2662 Ggal6 355 11 41 303 0.854 17.65 Tgut1 184 2 3 179 0.973 0.038 Tgut2 69 3 6 60 0.870 12.00 1902 Dnov1 57 17 33 7 0.123 0.86 Dnov2 296 26 75 195 0.659 14.09 Cann1 27 2 23 2 0.071 0.03 211 Cann2 109 0 23 86 0.761 14.52 1. Khan et al. 2015 2. Vandewege et al. 2016 Emu, Dnov2, yielded both the highest gamma counts (75) and alpha counts (26), in addition to 195 gamma-c ORs (Fig. 1F). In the OR maximum-likelihood phylogenies we elected to present each species separately for clarity of visualization (Fig. 1B-E). We noted that in these analyses the gamma OR subfamily for D. novaehollandiae were not recovered as monophyletic (Fig. 1F). In other multi-species analyses we have done this is not the case and this is also not the case in our Dnov2 neighbor-joining tree (analyses not shown). This unusual pattern here seems to be driven by the long branch at the base of the Dnov2 alpha OR clade. 11 OR counts relative to previous studies Overall, reanalysis of previously analyzed genomes were consistent with previous findings (Khan et al. 2015; Vandewege et al. 2016, Table 2). For Ggal4, we recovered 272 ORs, six more than previously recovered previously (Vandewege et al. 2016). Two previous searches of the Tgut1 assembly yielded 182 and 190 ORs, similar to our search of 184 ORs (Wang et al. 2013; Vandewege et al. 2016, respectively). We also found similar subfamily diversity of gamma and gamma-c ORs in zebra finch and chicken (Niimura 2009; Steiger et al. 2009; Wang et al. 2013; Khan et al. 2015; Vandewege et al. 2016). Similar patterns emerged for the other short read assemblies in our analysis (Table 2) giving us confidence in our methods of recovering ORs. Assembly effects on OR subgenome We found that assembly had substantial effects on the ability to reconstruct OR subgenomes. In 4 of the 5 surveyed species, inclusion of long-read sequencing (Pacific Biosciences) increased OR counts (Table 2). The most pronounced effect on OR repertoire was in the gamma-c family, which also constitutes the majority of known avian ORs. Between D. novaehollandiae assemblies, contigN50 improved from 0.86Mb in Dnov1 with Illumina sequencing to 14.09Mb in Dnov2. We detected an additional 239 ORs in Dnov2 of which 188 (78.6%) were from the gamma-c family. In the case of M. vitellinus, no gamma-c representatives were recovered from Mvit1 in our analysis or a previous analysis (Khan et al. 2015), yet our search of Mvit3 yielded 97 gamma-c ORs (Fig. 1B). Improved assemblies also resulted in the identification of additional ORs in the alpha and gamma subfamilies as well, but these effects were less pronounced (Table 2). The gamma 12 subfamily of D. novaehollandiae more than doubled in count in both Dnov2 compared to Dnov1 (42 new ORs), and in M. vitellinus between Mvit3 and Mvit1 (10 new ORs). In other species, the relative increase in gamma was smaller (Table 2). Alpha OR counts were similar between within-species assemblies (Table 2). Unexpectedly, we identified no alpha ORs in Calypte anna Cann2 assembly though two were previously reported based on Cann1. The Sanger sequencing-based T. guttata genome Tgut1 unexpectedly yielded a greater number of ORs than Tgut2, which was assembled with several technologies. This species was the only case in which a newer assembly based on long-read technology yielded fewer ORs than the assembly without long-read sequencing. Despite an overall higher OR count in Tgut1, we found more alpha ORs (3 versus 2) and more gamma ORs (6 versus 3) in Tgut2 than Tgut1. However, the overall count of ORs in Tgut2 was substantially lower as there were 119 fewer gamma-c ORs in Tgut2 than Tgut1. Therefore, the lower OR count in Tgut2 is entirely due to differences in gamma-c OR recovery. Physical location of ORs in avian genomes Although new, chromosome-scale assemblies have assigned the vast majority of genomic sequence data to chromosomes (Rhie et al. 2021), gamma-c OR regions remain primarily assigned to unmapped scaffolds (Fig. 2). The exception to this rule is the Ggal6 assembly in which 302 out of 303 identified ORs have been assigned to chromosomes. Most of these (274 ORs) map to a single microchromosome 33 (Lee et al. 2020, Fig 3). The remaining ORs are divided between two other microchromosomes, chromosome 16 (8 ORs) and chromosome 31 (16 ORs) with a single receptor on an unmapped scaffold. For C. anna Cann2 only 20% of gamma-c receptors were on scaffolds assigned to chromosomes. The main clusters of ORs were 13 a group of 10 assigned to the W chromosome and another 18 assigned to a single scaffold (RRCD01000065.1). For D. novaehollandiae Dnov2 only 8% of gamma-c ORs are assigned to chromosomes, with a large cluster of 108 on scaffold JABVCD010000554.1 (Fig. 2). The remaining gamma-c loci were distributed among 29 chromosomes and scaffolds. Finally, only 3% (2/60) of T. guttata Tgut2 gamma-c ORs were assigned to chromosomes with a cluster of 24 loci on scaffold VOHI02000029.1. Fig. 1.2. Distribution of gamma-c ORs among chromosomes and scaffolds in chromosome-level assemblies. The largest gamma-c OR cluster in each long-read assembly is located on unmapped scaffolds except G. gallus, where the largest cluster is localized to chromosome 33, a microchromosome. 14 Discussion 1 Long-read sequencing is critical for characterizing avian OR repertoire 2 Our results show that Illumina short read-based approaches were not successful in 3 accurately characterizing the gamma-c OR subfamily. The three Illumina-based assemblies we 4 assessed undercounted gamma-c diversity, revealing fewer than 10 gamma-c ORs in each 5 assembly. In all five of the assemblies sequenced with long-reads, we found that gamma-c ORs 6 constituted at least 66% of the OR subgenome and on average there were 148 gamma-c loci per 7 species. The hybrid assembly approach using MaSuRCA (Zimin et al. 2013) also substantially 8 increased gamma-c recovery in Mvit3. 9 In the most phylogenetically comprehensive analysis of avian ORs to date, Khan et al. 10 (2015) analyzed OR repertoire from 48 bird species. Forty-six of the assemblies surveyed were 11 sequenced and assembled using short read-based methods. The two other species included were 12 the Sanger-based chicken and zebra finch, and those had the highest OR diversity, which was 13 attributed to ecological adaptations of these two particular species (Khan et al. 2015). Other 14 reports in the literature also interpret a lack of gamma-c ORs as biologically meaningful without 15 considering the shortcomings of short read-based assemblies (Zhan et al. 2013). Importantly, 16 these issues extend beyond gamma-c ORs to other complex gene families such as the major 17 histocompatibility complex (MHC). Long-read based studies are also improving our 18 understanding of the avian MHC (He et al. 2021). Select MHC genes are linked to ORs in 19 mammals, providing further evidence that this family repertoire may also be obscured in 20 Illumina sequenced assemblies (Santos et al. 2010). 21 Prior to our analysis, chicken (Ggal3, Ggal4) and zebra finch (Tgut1) were the only 22 Sanger-based assemblies analyzed (Niimura 2009; Steiger et al. 2009; Wang et al. 2013; Khan et 23 al. 2015; Vandewege et al. 2016). Analyses of these assemblies, which involve longer read 24 lengths and Bacterial Artificial Chromosome (BAC)-based scaffolding, provided the only 25 previous evidence of substantial gamma-c OR diversity. Whereas the incorporation of long-read 26 sequencing methods greatly increased the count of gamma-c ORs relative to Illumina-based 27 assemblies, they also reduced the count of gamma-c in T. guttata compared to the previous 28 Sanger-based assembly. We propose two potential reasons for this disparity. It is possible that 29 the original Tgut1 assembly resolved alternative alleles as separate loci, artificially inflating the 30 total gamma-c count with duplicate loci. Tgut2 and many third-generation assemblies are 31 haplotype phased, mitigating this problem. Additionally, however, filtering steps at the end of 32 the Tgut2 assembly curation process used for quality control may aggressively remove repetitive 33 regions that harbor tandem gamma-c loci. 34 Phylogenetic implications of updated OR counts 35 Our finding of high OR diversity for D. novaehollandiae has potentially important 36 implications for broad scale patterns of OR evolution in birds. High OR diversity in this 37 paleognath genome contrasts with lower diversity found in previous analyses (Le Duc et al. 38 2015) and now suggests that the avian ancestor had high gamma-c OR diversity. Based on our 39 assessment, all three OR subfamilies have lower diversity in the three neoavian lineages tested 40 relative to D. novaehollandiae and G. gallus. Due to our limited taxonomic sampling, it remains 41 unclear whether the differences reflect broader patterns of phylogenetic change or lineage- 42 specific adaptations. For example, D. novaehollandiae and G. gallus are both omnivorous and 43 16 therefore may retain ORs as a variety of different odorants are relevant when foraging. The three 44 Neoaves species we selected are not generalists, instead C. anna is mostly nectivorous, M. 45 vitellinus is mostly frugivorous, and T. guttata is a granivore. Our understanding of phylogenetic 46 patterns of OR diversity will continue to change with improved genome assembly and taxonomic 47 sampling. For example by sequencing additional manakin species, we will be able to determine 48 whether OR repertoire is elaborated within a frugivorous family or if counts vary substantially 49 within individual lineages. 50 Towards a better understanding of the avian OR subgenome 51 Many key features of OR subgenomes and olfaction generally are well-characterized in 52 mammals, but not in birds (Olender et al. 2008; Niimura et al. 2014). There are no previous 53 reports of using assemblies with long-read sequencing to search for an OR repertoire in birds, 54 and until now, no previous reports of an expansive gamma-c OR repertoire outside of G. gallus 55 and T. guttata assemblies. To date, there is only one report of transcriptome sequencing an avian 56 olfactory epithelium, a critical undertaking considering that ORs are expressed in non-olfactory 57 tissues and some identified ORs may be nonfunctional despite having open reading frames 58 (Pluznick et al. 2009; Sin et al. 2019). Although Sin et al. (2019) do show the expression of at 59 least three gamma-c ORs in Oceanodroma leucorhoa olfactory epithelium, this still leaves the 60 expression of a potentially large number of gamma-c genes uncharacterized. A current priority 61 would be to sequence the olfactory epithelium transcriptome from species with high-quality 62 genome assemblies to understand the extent to which a high number of gamma-c ORs recovered 63 from the genome are functionally expressed in the OE. 64 17 Information about which ORs are expressed in the olfactory epithelium will also help 65 with functional testing to identify the binding properties of avian ORs in a process known as 66 “deorphanizing”. To date, no avian ORs are deorphanized, and so it remains unclear what 67 ligands are bound by any avian ORs. The diverse gamma-c ORs, unique to birds, are of 68 particular interest. Avian olfaction is in many fundamental ways a frontier in the field of sensory 69 biology. 70 There is also a great deal left to be learned about the molecular evolutionary mechanisms 71 that have given rise to the diversity of gamma-c genes in birds. First, enhanced spatial 72 information on the physical location of OR genes will be informative for understanding the 73 genetic processes at play. With most loci still scattered among unmapped scaffolds it is 74 somewhat unclear how clustered these loci are. That said, the G. gallus Ggal6 OR repertoire is 75 highly spatially localized (Figures 2, 3), a pattern that is likely in the other species as well. Given 76 spatial clustering, and extremely short branch-lengths, gamma-c species-specific clades could be 77 a result of gene conversion among loci, but further study is needed. In the passerine bird MHC, 78 endogenous retroviral elements may have played a role in generating gene family diversity in 79 MHC Class II (Balakrishnan et al. 2010) and the same could be the case in the gamma-c 80 radiation across birds. We note, however, that high repeat content is a general pattern across 81 avian microchromosomes and is not restricted to OR and MHC regions (Fig. 3, Burt 2002). 82 18 Fig. 1.3. Magnification of the largest OR cluster in the chicken genome on chromosome 33. 83 Numerous ORs are present in the displayed region and are flanked by repetitive elements, 84 potentially contributing to the difficulty of gamma-c OR subfamily discovery. Image from the 85 UCSC Genome Browser (Lee et al. 2020). 86 87 Long read assembly methods better characterize complex gene families 88 Our increased OR counts in long-read assemblies contributes to the growing literature 89 quantifying the advantages of third generation (long read) sequencing technology of second 90 generation (short read) sequencing technology. Third generation sequencing has greatly 91 improved the detection of tandem repeats generated by long terminal repeat retrotransposons, 92 microsatellites, homonucleotide stretches, and repetitive regions (Mason et al. 2016; Kapusta and 93 Suh 2017; Korlach et al. 2017). Large gene families found in clusters like the ORs described here 94 may show the greatest improvement following the incorporation of long-read data. Long read 95 sequencing has already led to better characterization of MHC loci in birds, and also greatly 96 increased the resolution and count of vomeronasal receptors in mammals (Larsen et al. 2014; He 97 et al. 2021). 98 Even with long read sequencing, the chromosomal location of all ORs remains largely 99 unresolved. With the exception of the chicken, ORs in the long-read assemblies that we analyzed 100 19 largely mapped to unassigned scaffolds, including the largest OR cluster in each assembly (Fig. 101 2). The inability to assign these scaffolds to a chromosome is likely related to expansion of 102 duplicate regions that contain OR loci and the high repeat element content found in 103 microchromosomes (Fig. 3, Burt 2002). Indeed the assignment of an OR containing region to the 104 hummingbird female-specific W chromosome is likely spurious and driven by the repetitive 105 sequences in these regions. Bird chromosomes are highly syntenic, suggesting that the large OR 106 cluster on the chicken microchromosome 33 likely match to homologous chromosomes (Nanda 107 et al. 2011). Manual curation of these regions may be required to resolve remaining 108 uncertainties. Other solutions to this complexity include physically mapping OR loci to 109 chromosomes, and/or using approaches less dependent on assembly. Sin and colleagues (2019) 110 incorporated assessment of depth of coverage in their OR quantification pipeline, an approach 111 that should be informative in face of varying assembly quality. 112 Funding 113 This work was supported by National Science Foundation awards [grant numbers 1457541, IOS- 114 1456612]. 115 Acknowledgements 116 We would like to thank the editors two anonymous reviewers and Dr. Maude Baldwin for helpful 117 discussion of the analyses presented here. In addition, we thank Drs. Michael Brewer and John 118 Stiller are serving on RJDs dissertation committee. The Dromaius novaehollandiae, Calypte 119 anna, and Taeniopygia guttata genomes were generated as part of the Vertebrate Genomes 120 Project (https://vgp.github.io/genomeark/) and pre-publication public access to these now 121 published data made these analyses possible. Likewise, we thank the Manakin Genomics 122 20 Research Coordination Network (https://www.manakinsrcn.org/) for generating public genomic 123 resources for the manakin species analyzed here. This material is based upon work conducted 124 while CNB was serving at the National Science Foundation. 125 References 126 Audubon J. J. (1826). Account of the habits of the turkey buzzard, Vultur aura, particularly with 127 the view of exploding the opinion generally entertained of its extraordinary power of 128 smelling. Edinburgh New Philosophical Journal 2:172-184. 129 Balakrishnan C. N., Ekblom R., Völker M., Westerdahl H., Godinez R., Kotkiewicz H., Burt D. 130 W, Graves T., Griffin D. K., Warren W. C., Edwards S. V. (2010). Gene duplication and 131 fragmentation in the zebra finch major histocompatibility complex. BMC Biology 8:29. 132 Baldwin M. W., Toda Y., Nakagita T., O’Connell M. J., Klasing K. C., Misaka T., Edwards S. 133 V., Liberles S. D. (2014). Evolution of sweet taste perception in hummingbirds by 134 transformation of the ancestral umami receptor. Science 345:929–33. 135 Bang B. G., Cobb S. (1968). The size of the olfactory bulb in 108 species of birds. The Auk 136 85:55–61. 137 Beichman A. C., Koepfli K-P., Li G., Murphy W., Dobrynin P., Kliver S., Tinker M. T., Murray 138 M. J., Johnson J., Lindblad-Toh K., Karlsson E. K., Lohmueller K. E., Wayne R. K. 139 (2019). Aquatic adaptation and depleted diversity: a deep dive into the genomes of the 140 sea otter and giant otter. Molecular Biology and Evolution 36:2631–55. 141 Breer H. (2003). Olfactory receptors: molecular basis for recognition and discrimination of 142 odors. Analytical and Bioanalytical Chemistry 377:427–33. 143 Buck L., Axel R. (1991). A novel multigene family may encode odorant receptors: A molecular 144 basis for odor recognition. Cell 65:175–87. 145 Burt D. W. (2002). Origin and evolution of avian microchromosomes. Cytogenetic and Genome 146 Research 96:97–112. 147 Darwin C. (1891). Journal of researches into the natural history and ecology of the countries 148 visited during the voyage of H.M.S. Beagle round the world, Journal of researches into 149 the natural history and ecology of the countries visited during the voyage of H.M.S. 150 Beagle round the world London, England: Ward, Lock & Co. 151 Feng S., Stiller J., Deng Y., Armstrong J., Fang Q., Reeve A. H., Xie D., Chen G., Guo C., 152 Faircloth B. C., Petersen B., Wang Z., Zhou Q., Diekhans M., Chen W., Andreu- 153 Sánchez S., Margaryan A., Howard J. T., Parent C., Pacheco G., Sinding M-H. S., Puetz 154 L., Cavill E., Ribeiro Â. M., Eckhart L., Fjeldså J., Hosner P. A., Brumfield R. T., 155 Christidis L., Bertelsen M. F., Sicheritz-Ponten T., Tietze D. T., Robertson B. C., Song 156 G., Borgia G., Claramunt S., Lovette I. J., Cowen S. J., Njoroge P., Dumbacher J. P., 157 21 Ryder O. A., Fuchs J., Bunce M., Burt D. W., Cracraft J., Meng G., Hackett S. J., Ryan 158 P. G., Jønsson K. A., Jamieson I. G., da Fonseca R. R., Braun E. L., Houde P., Mirarab 159 S., Suh A., Hansson B., Ponnikas S., Sigeman H., Stervander M., Frandsen P. B., van 160 der Zwan H., van der Sluis R., Visser C., Balakrishnan C. N., Clark A. G., Fitzpatrick J. 161 W., Bowman R., Chen N., Cloutier A., Sackton T. B., Edwards S. V., Foote D. J., 162 Shakya S. B., Sheldon F. H., Vignal A., Soares A. E. R., Shapiro B., González-Solís J., 163 Ferrer-Obiol J., Rozas J., Riutort M., Tigano A., Friesen V., Dalén L., Urrutia A. O., 164 Székely T., Liu Y., Campana M. G., Corvelo A., Fleischer R. C., Rutherford K. M., 165 Gemmell N. J., Dussex N., Mouritsen H., Thiele N., Delmore K., Liedvogel M., Franke 166 A., Hoeppner M. P., Krone O., Fudickar A. M., Milá B., Ketterson E. D., Fidler A. E., 167 Friis G., Parody-Merino Á. M., Battley P. F., Cox M. P., Lima N. C. B., Prosdocimi F., 168 Parchman T. L., Schlinger B. A., Loiselle B. A., Blake J. G., Lim H. C., Day L. B., 169 Fuxjager M. J., Baldwin M. W., Braun M. J., Wirthlin M. , Dikow R. B. , Ryder T. B., 170 Camenisch G., Keller L. F., DaCosta J. M., Hauber M. E., Louder M. I. M., Witt C. C., 171 McGuire J. A., Mudge J., Megna L. C., Carling M. D., Wang B., Taylor S. A., Del-Rio 172 G., Aleixo A., Vasconcelos A. T. R., Mello C. V., Weir J. T., Haussler D., Li Q., Yang 173 H., Wang J., Lei F., Rahbek C., Gilbert M. T. P., Graves G. R., Jarvis E. D., Paten B., 174 Zhang G. (2020). Dense sampling of bird diversity increases power of comparative 175 genomics. Nature 587:252–57. 176 Goujon M., McWilliam H., Li W., Valentin F., Squizzato S., Paern J., Lopez R. (2010). A new 177 bioinformatics analysis tools framework at EMBL–EBI. Nucleic Acids Research 178 38:W695–99. 179 Gwinner H., Berger S. (2008). Starling males select green nest material by olfaction using 180 experience-independent and experience-dependent cues. Animal Behaviour 75:971–76. 181 Hagelin J. C. (2007). The citrus-like scent of crested auklets: reviewing the evidence for an avian 182 olfactory ornament. Journal of Ornithology 2:195–201. 183 He K., Minias P., Dunn P. O. (2021). Long-read genome assemblies reveal extraordinary 184 variation in the number and structure of MHC loci in birds. Genome Biology and 185 Evolution 13:evaa270. 186 Hill A. (1905). Can birds smell? Nature 71:318–19. 187 Huynh A. V., Rice A. M. (2019). Conspecific olfactory preferences and interspecific divergence 188 in odor cues in a chickadee hybrid zone. Ecology and Evolution 9:9671–83. 189 Jarvis E. D., Mirarab S., Aberer A. J., Li B., Houde P., Li C., Ho S. Y. W., Faircloth B. C., 190 Nabholz B., Howard J. T., Suh A., Weber C. C., Fonseca R. R. da, Li J., Zhang F., Li H., 191 Zhou L., Narula N., Liu L., Ganapathy G., Boussau B., Bayzid M. S., Zavidovych V., 192 Subramanian S., Gabaldón T., Capella-Gutiérrez S., Huerta-Cepas J., Rekepalli B., 193 Munch K., Schierup M., Lindow B., Warren W. C., Ray D., Green R. E., Bruford M. 194 W., Zhan X., Dixon A., Li S., Li N., Huang Y., Derryberry E. P., Bertelsen M. F., 195 Sheldon F. H., Brumfield R. T., Mello C. V., Lovell P. V., Wirthlin M., Schneider M. P. 196 C., Prosdocimi F., Samaniego J. A., Velazquez A. M. V., Alfaro-Núñez A., Campos P. 197 F., Petersen B., Sicheritz-Ponten T., Pas A., Bailey T., Scofield P., Bunce M., Lambert 198 22 D. M., Zhou Q., Perelman P., Driskell A. C., Shapiro B., Xiong Z., Zeng Y., Liu S., Li 199 Z., Liu B., Wu K., Xiao J., Yinqi X., Zheng Q., Zhang Y., Yang H., Wang J., Smeds L., 200 Rheindt FE., Braun M., Fjeldsa J., Orlando L., Barker F. K., Jønsson K. A., Johnson W., 201 Koepfli K-P., O’Brien S., Haussler D., Ryder O. A., Rahbek C., Willerslev E., Graves 202 G. R., Glenn T. C., McCormack J., Burt D., Ellegren H., Alström P., Edwards S. V., 203 Stamatakis A., Mindell D. P., Cracraft J., Braun E. L., Warnow T., Jun W., Gilbert M. T. 204 P., Zhang G. (2014). Whole-genome analyses resolve early branches in the tree of life of 205 modern birds. Science 346:1320–31. 206 Kapusta A., Suh A. (2017). Evolution of bird genomes—a transposon’s?eye view. Annals of the 207 New York Academy of Sciences 1:164–85. 208 Katoh K., Standley D. M. (2013). MAFFT multiple sequence alignment software version 7: 209 improvements in performance and usability. Molecular Biology and Evolution 30:772– 210 80. 211 Khan I., Yang Z., Maldonado E., Li C., Zhang G., Gilbert M. T. P., Jarvis E. D., O’Brien S. J., 212 Johnson W. E., Antunes A. (2015). Olfactory receptor subgenomes linked with broad 213 ecological adaptations in Sauropsida. Molecular Biology and Evolution 32:2832–43. 214 Kimura M. (1980). A simple method for estimating evolutionary rates of base substitutions 215 through comparative studies of nucleotide sequences. Journal of Molecular Evolution 216 16:111–20. 217 Korlach J., Gedman G., Kingan S. B., Chin C-S., Howard J. T., Audet J-N., Cantin L., Jarvis E. 218 D. (2017). De novo PacBio long-read and phased avian genome assemblies correct and 219 add to reference genes generated with intermediate and short reads. GigaScience 6. 220 Krause E. T., Krüger O., Kohlmeier P., Caspers B. A. (2012). Olfactory kin recognition in a 221 songbird. Biology Letters 8:327–29. 222 Larsen O. N., Wahlberg M., Christensen-Dalsgaard J. (2020). Amphibious hearing in a diving 223 bird, the great cormorant (Phalacrocorax carbo sinensis). Journal of Experimental 224 Biology 223. 225 Larsen P. A., Heilman A. M, Yoder A. D. (2014). The utility of PacBio circular consensus 226 sequencing for characterizing complex gene families in non-model organisms. BMC 227 Genomics 15:720. 228 Le Duc D., Renaud G., Krishnan A., Almén M. S., Huynen L., Prohaska S. J., Ongyerth M., 229 Bitarello B. D., Schiöth H. B., Hofreiter M., Stadler P. F., Prüfer K., Lambert D., Kelso 230 J., Schöneberg T. (2015). Kiwi genome provides insights into evolution of a nocturnal 231 lifestyle. Genome Biology 16:147. 232 Lee C. M., Barber G. P., Casper J., Clawson H., Diekhans M., Gonzalez J. N., Hinrichs A. S., 233 Lee B. T., Nassar L. R., Powell C. C., Raney B. J., Rosenbloom K. R., Schmelter D., 234 Speir M. L., Zweig A. S., Haussler D., Haeussler M., Kuhn R. M., Kent W. J. (2020). 235 UCSC Genome Browser enters 20th year. Nucleic Acids Research 48:D756–61. 236 23 Letunic I., Bork P. (2019). Interactive Tree Of Life (iTOL) v4: recent updates and new 237 developments. Nucleic Acids Research 47:W256–59. 238 Liu J., Wang Z., Li J., Xu L., Liu J., Feng S., Guo C., Chen S., Ren Z., Rao J., Wei K., Chen Y., 239 Jarvis E. D., Zhang G., Zhou Q. (2021). A new emu genome illuminates the evolution of 240 genome configuration and nuclear architecture of avian chromosomes. Genome 241 Research 31:497–511. 242 Mason A. S., Fulton J. E., Hocking P. M., Burt D. W. (2016). A new look at the LTR 243 retrotransposon content of the chicken genome. BMC Genomics 17:688. 244 McRae J. F., Mainland J. D., Jaeger S. R., Adipietro K. A., Matsunami H., Newcomb R. D. 245 (2012). Genetic variation in the odorant receptor OR2J3 Is associated with the ability to 246 detect the “grassy” smelling odor, cis-3-hexen-1-ol. Chemical Senses 37:585–93. 247 Michelsen W. J. (1959). Procedure for studying olfactory discrimination in pigeons. Science 248 130:630–31. 249 Minh B. Q., Schmidt H. A., Chernomor O., Schrempf D., Woodhams M. D., von Haeseler A., 250 Lanfear R. (2020). IQ-TREE 2: New models and efficient methods for phylogenetic 251 inference in the genomic era. Molecular Biology and Evolution 37:1530–34. 252 Mombaerts P. (2004). Odorant receptor gene choice in olfactory sensory neurons: the one 253 receptor–one neuron hypothesis revisited. Current Opinion in Neurobiology 14:31–36. 254 Montague M. J., Li G., Gandolfi B., Khan R., Aken B. L., Searle S. M. J., Minx P., Hillier L. W., 255 Koboldt D. C., Davis B. W., Driscoll C. A., Barr C. S., Blackistone K., Quilez J., 256 Lorente-Galdos B., Marques-Bonet T., Alkan C., Thomas G. W. C., Hahn M. W., 257 Menotti-Raymond M., O’Brien S. J., Wilson R. K., Lyons L. A., Murphy W. J., Warren 258 W. C. (2014). Comparative analysis of the domestic cat genome reveals genetic 259 signatures underlying feline biology and domestication. Proceedings of the National 260 Academy of Sciences 111:17230–35. 261 Nanda I., Benisch P., Fetting D., Haaf T., Schmid M. (2011). Synteny conservation of chicken 262 macrochromosomes 1–10 in different avian lineages revealed by cross-species 263 chromosome painting. Cytogenetic and Genome Research 132:165–81. 264 Nevitt G. A., Losekoot M., Weimerskirch H. (2008). Evidence for olfactory search in wandering 265 albatross, Diomedea exulans. Proceedings of the National Academy of Sciences 266 105:4576–81. 267 Niimura Y. (2009). On the origin and evolution of vertebrate olfactory receptor genes: 268 comparative genome analysis among 23 chordate species. Genome Biology and 269 Evolution 1:34–44. 270 Niimura Y. (2013). Identification of Olfactory Receptor Genes from Mammalian Genome 271 Sequences. In: Crasto CJ, editor. Olfactory Receptors: Methods and Protocols. Methods 272 in Molecular Biology Totowa, NJ: Humana Press. p. 39–49. 273 24 Niimura Y., Matsui A., Touhara K. (2014). Extreme expansion of the olfactory receptor gene 274 repertoire in African elephants and evolutionary dynamics of orthologous gene groups in 275 13 placental mammals. Genome Research 24:1485–96. 276 Niimura Y., Nei M. (2005). Evolutionary dynamics of olfactory receptor genes in fishes and 277 tetrapods. Proceedings of the National Academy of Sciences 102:6039–44. 278 Niimura Y., Nei M. (2007). Extensive gains and losses of olfactory receptor genes in mammalian 279 evolution. PLoS ONE 2:e708. 280 Olender T., Lancet D., Nebert D. W. (2008). Update on the olfactory receptor (OR) gene 281 superfamily. Human Genomics 3:87. 282 Pluznick J. L., Zou D-J., Zhang X., Yan Q., Rodriguez-Gil D. J., Eisner C., Wells E., Greer C. 283 A., Wang T., Firestein S., Schnermann J., Caplan M. J. (2009). Functional expression of 284 the olfactory signaling system in the kidney. Proceedings of the National Academy of 285 Sciences 106:2059–2064. 286 Rhie A., McCarthy S. A., Fedrigo O., Damas J., Formenti G., Koren S., Uliano-Silva M., Chow 287 W., Fungtammasan A., Gedman G. L., Cantin L. J., Thibaud-Nissen F., Haggerty L., 288 Lee C., Ko B. J., Kim J., Bista I., Smith M., Haase B., Mountcastle J., Winkler S., Paez 289 S., Howard J., Vernes S. C., Lama T. M., Grutzner F., Warren W. C., Balakrishnan C. 290 N., Burt D., George J. M., Biegler M., Iorns D., Digby A., Eason D., Edwards T., 291 Wilkinson M., Turner G., Meyer A., Kautt A. F., Franchini P., Detrich H. W., Svardal 292 H., Wagner M., Naylor G. J. P., Pippel M., Malinsky M., Mooney M., Simbirsky M., 293 Hannigan B. T., Pesout T., Houck M., Misuraca A., Kingan S. B., Hall R., Kronenberg 294 Z., Korlach J., Sovi? I., Dunn C., Ning Z., Hastie A., Lee J., Selvaraj S., Green R. E., 295 Putnam N. H., Ghurye J., Garrison E., Sims Y., Collins J., Pelan S., Torrance J., Tracey 296 A., Wood J., Guan D., London S. E., Clayton D. F., Mello C. V., Friedrich S. R., Lovell 297 P. V., Osipova E., Al-Ajli F. O., Secomandi S., Kim H., Theofanopoulou C., Zhou Y., 298 Harris R. S., Makova K. D., Medvedev P., Hoffman J., Masterson P., Clark K., Martin 299 F., Howe K., Flicek P., Walenz B. P., Kwak W., Clawson H., Diekhans M., Nassar L., 300 Paten B., Kraus R. H. S., Lewin H., Crawford A. J., Gilbert M. T. P., Zhang G., 301 Venkatesh B., Murphy R. W., Koepfli K-P., Shapiro B., Johnson W. E., Palma F. D., 302 Margues-Bonet T., Teeling E. C., Warnow T., Graves J. M., Ryder O. A., Hausler D., 303 O’Brien S. J., Howe K., Myers E. W., Durbin R., Phillippy A. M., Jarvis E. D. (2021). 304 Towards complete and error-free genome assemblies of all vertebrate species. Nature 305 592:737–746. 306 Saito H., Chi Q., Zhuang H., Matsunami H., Mainland J. D. (2009). Odor coding by a 307 mammalian receptor repertoire. Science Signaling 2:ra9–ra9. 308 Santos P. S. C., Kellermann T., Uchanska-Ziegler B., Ziegler A. (2010). Genomic architecture of 309 MHC-linked odorant receptor gene repertoires among 16 vertebrate species. 310 Immunogenetics 62:569–84. 311 Silva M. C., Chibucos M., Munro J. B., Daugherty S., Coelho M. M., Silva J. C. (2020). 312 Signature of adaptive evolution in olfactory receptor genes in Cory’s Shearwater 313 supports molecular basis for smell in procellariiform seabirds. Scientific Reports 10:543. 314 25 Sin S. Y. W., Cloutier A., Nevitt G., Edwards S. V. (2022). Olfactory receptor subgenome and 315 expression in a highly olfactory procellariiform seabird. Genetics 220:iyab210. 316 Steiger S. S., Kuryshev V. Y., Stensmyr M. C., Kempenaers B., Mueller J. C. (2009). A 317 comparison of reptilian and avian olfactory receptor gene repertoires: Species-specific 318 expansion of group ? genes in birds. BMC Genomics 10:446. 319 Stoddard M. C., Eyster H. N., Hogan B. G., Morris D. H., Soucy E. R., Inouye D. W. (2020). 320 Wild hummingbirds discriminate nonspectral colors. Proceedings of the National 321 Academy of Sciences 117:15112–22. 322 Vandewege M. W., Mangum S. F., Gabaldón T., Castoe T. A., Ray D. A., Hoffmann F. G. 323 (2016). Contrasting patterns of evolutionary diversification in the olfactory repertoires 324 of reptile and bird genomes. Genome Biology and Evolution 8:470–80. 325 Wang Z., Pascual-Anaya J., Zadissa A., Li W., Niimura Y., Huang Z., Li C., White S., Xiong Z., 326 Fang D., Wang B., Ming Y., Chen Y., Zheng Y., Kuraku S., Pignatelli M., Herrero J., 327 Beal K., Nozawa M., Li Q., Wang J., Zhang H., Yu L., Shigenobu S., Wang J., Liu J., 328 Flicek P., Searle S., Wang J., Kuratani S., Yin Y., Aken B., Zhang G., Irie N. (2013). 329 The draft genomes of soft-shell turtle and green sea turtle yield insights into the 330 development and evolution of the turtle-specific body plan. Nature Genetics 45:701–6. 331 Warren W. C., Clayton D. F., Ellegren H., Arnold A. P., Hillier L. W., Künstner A., Searle S., 332 White S., Vilella A. J., Fairley S., Heger A., Kong L., Ponting C. P., Jarvis E. D., Mello 333 C. V., Minx P., Lovell P., Velho T. A. F., Ferris M., Balakrishnan C. N., Sinha S., Blatti 334 C., London S. E., Li Y., Lin Y-C., George J., Sweedler J., Southey B., Gunaratne P., 335 Watson M., Nam K., Backström N., Smeds L., Nabholz B., Itoh Y., Whitney O., 336 Pfenning A. R., Howard J., Völker M., Skinner B. M., Griffin D. K., Ye L., McLaren W. 337 M., Flicek P., Quesada V., Velasco G., Lopez-Otin C., Puente X. S., Olender T., Lancet 338 D., Smit A. F. A., Hubley R., Konkel M. K., Walker J. A., Batzer M. A., Gu W., Pollock 339 D. D., Chen L., Cheng Z., Eichler E. E., Stapley J., Slate J., Ekblom R., Birkhead T., 340 Burke T., Burt D., Scharff C., Adam I., Richard H., Sultan M., Soldatov A., Lehrach H., 341 Edwards S. V., Yang S-P., Li X., Graves T., Fulton L., Nelson J., Chinwalla A., Hou S., 342 Mardis E. R., Wilson R. K. (2010). The genome of a songbird. Nature 464:757–62. 343 Zhang G., Li B., Li C., Gilbert M. T. P., Jarvis E. D., Wang J., The Avian Genome Consortium. 344 (2014). Comparative genomic data of the Avian Phylogenomics Project. GigaScience 3. 345 Zimin A. V., Marçais G., Puiu D., Roberts M., Salzberg S. L., Yorke J. A. (2013). The 346 MaSuRCA genome assembler. Bioinformatics 29:2669–77. 347 348 26 II. EVOLUTION OF OLFACTORY RECEPTOR REPERTOIRES ACROSS AVIAN 349 PHYLOGENY 350 351 Abstract 352 Olfaction is a critical sensory modality, allowing animals to process information from 353 environmental chemicals. It plays a central role in recognizing food, mates, predators, territories, 354 and kin. Olfactory receptors (ORs), a gene family largely expressed in the olfactory epithelium, 355 are responsible for odor detection. To accommodate the incredible variety of odorants in nature, 356 olfactory receptors constitute the largest gene family in vertebrates, with over 1,000 genes in 357 some mammals and over 300 genes in some bird species. Birds are a highly diverse class of 358 vertebrates, inhabiting nearly all land environments, with a broad range of social systems and 359 foraging strategies. Yet, early 20th century studies dismissed the use of olfaction in birds, a 360 misconception that at one time pervaded sensory biology. More recently, studies have shown that 361 birds indeed rely on olfaction in behavior and ecology, such as locating food and nesting 362 material, and in individual and species recognition. To contribute to the rapidly expanding 363 knowledge of bird olfaction, in this study, we show that bird have many more OR genes that 364 previously detected, and that the majority of bird ORs are in an OR subgroup unique to birds, 365 called the gamma-c OR subfamily. Using a dataset of 70 long read bird genome assemblies, we 366 show that the highest surveyed OR counts occur in rails (Laterallus jamaicensis) and with the 367 lowest counts occurring in crows, specifically Corvus monedula and Corvus corone. We mapped 368 ancestral OR repertoires and show that OR counts declined early in the Neoaves lineage 60-70 369 million years ago, but OR counts remained high through the Cretaceous-Paleogene extinction 370 event in Palaeognathae and Galloanserae. We show that nocturnality increases OR counts, and 371 OR counts correlate with increased olfactory bulb size. Taken together, we show that the OR 372 superfamily in birds experienced dynamic births and deaths throughout the bird tree, reflecting 373 the ability of olfaction to adapt and support bird behavior and ecology. 374 375 Introduction 376 Olfaction is essential for survival and reproduction in many animals. It plays a central role in 377 foraging, avoiding predation, kin recognition, and territorial behavior. In vertebrates, air or 378 waterborne odor molecules are detected with olfactory receptors (ORs), a gene family of G 379 protein-coupled receptors expressed in the olfactory sensory neurons (OSNs) of the olfactory 380 epithelium (OE, Buck and Axel 1991, Strotmann et al. 1992). To accommodate the incredible 381 variety of odorants in nature, ORs constitute the largest gene family in vertebrates, with over 382 1,000 genes in some mammals and over 300 genes in some birds (Niimura et al. 2014, Niimura 383 and Nei 2005). 384 Birds are the most speciose class of terrestrial vertebrates, inhabiting nearly all land 385 environments. Among birds there is high diversity of social structures and foraging strategies, 386 yet birds were long thought to rely on visual rather than olfactory signals (Audubon 1826, Hill 387 1905). Recent behavioral work in birds has shown important roles for olfaction in foraging, 388 locating nest sites, seed caching behavior, and species recognition, among other behaviors 389 (Buitron and Nuechterlein 1985, Molina-Morales et al. 2020, Bonnadonna and Gagliardo 2021, 390 Wikelski et al. 2021, Van Huynh and Rice 2021). Additionally, specific bird species rely on a 391 28 highly specialized olfactory system for foraging, including Cathartes aura (turkey vulture) and 392 many seabirds (Procellariformes, Owre and Northington 1961, Grubb 1972). 393 In addition to the recent surge of interest in how olfaction influences bird behavior, we 394 showed that birds have many more OR genes in their genomes that previously realized (Driver 395 and Balakrishnan 2021, see Chapter 1). Genomic analysis divides bird species’ OR repertoires 396 into three phylogenetic subgroups: alpha, gamma, and gamma-c ORs (Niimura and Nei 2005, 397 Steiger et al. 2009, Driver and Balakrishnan 2021). The alpha and gamma OR subgroups are 398 shared across tetrapods: chicken alpha and gamma ORs form phylogenetic clades with alpha and 399 gamma ORs from amphibians, reptiles, and mammals (Niimura and Nei 2005, Steiger et al. 400 2009, Vandewege et al. 2016). This illustrates a degree of sequence conservation in the OR 401 repertoire of these subgroups despite at least 315 million years of divergence between mammal 402 and bird lineages (Laurin and Reisz 1995). Contrastingly, the gamma-c OR subgroup is only 403 present in birds (Niimura and Nei 2005, Steiger e al. 2009, Driver and Balakrishnan 2021). 404 Previous studies show that the gamma-c OR subfamily was the most abundant OR clade in most 405 species (Steiger et al. 2009, Khan et al. 2015). For example, the gamma-c subfamily constituted 406 over 85% of all OR genes in the zebra finch (60 total gamma-c ORs) and chicken (303 total 407 gamma-c ORs, Driver and Balakrishnan 2021). Phylogenetic analyses of OR repertoires 408 containing multiple bird species reveal that gamma-c ORs cluster into species-specific clades as 409 opposed to showing clear orthologous relationships among species (Zhan et al. 2013, Silva et al. 410 2020), suggesting possible species-specific roles for the gamma-c. Gamma-c ORs within a 411 species also have shorter phylogenetic terminal branch lengths compared to alpha and gamma 412 ORs, showing a high degree of sequence similarity between gamma-c genes (Steiger et al. 2009, 413 Silva et al. 2020). Together, these patterns suggest that gamma-c ORs evolve through a dynamic 414 29 birth-and-death model of gene evolution, with ubiquitous duplication events occurring over short 415 evolutionary time scales that post-date the divergence of many modern bird genera (Silva et al. 416 2020). However, without accurate counts of olfactory receptors across the bird phylogeny, we do 417 not know the patterns of olfactory receptor turnover across the vast diversity of the bird 418 phylogeny. 419 Only in the last five years have numerous long read bird assemblies become publicly 420 available on NCBI’s GenBank, making accurate comparisons of OR counts across the bird 421 phylogeny possible, including across all three of the major bird lineages: Palaeognathae, 422 Galloanserae, and Neoaves (Bravo et al. 2021). We therefore investigated OR gene family and 423 subfamily counts across the bird phylogeny to detect any lineage-specific gains and losses in 424 ORs. We tested for associations between OR counts and the diverse ecological niches and diets 425 of the our bird species dataset. From these results, we hope to understand the evolutionary 426 patterns of olfactory receptors, including the gamma-c, and gain a better understanding of the 427 importance of smell in the life of birds. 428 429 Methods 430 Assembly selection 431 We investigated OR diversity in birds by selecting multiple publicly available genome 432 assemblies on GenBank (https://www.ncbi.nlm.nih.gov/genbank/). Assemblies for each species 433 implemented some form of long-read sequencing technology, including Pacific Biosciences or 434 Oxford Nanopore methods. Genomes varied in the assembly methods used and in the size and 435 total number of contigs and scaffolds. We selected only assemblies using long read sequencing 436 30 due to the difficulty in recovering total OR counts in assemblies with shorter contigs (Driver and 437 Balakrishnan 2021). In total, we analyzed 70 different bird assemblies, including species from 438 the three main lineages of birds, the Palaeognathae, Galloanserae, and Neoaves. The set of 70 439 species represent diverse ecology, diets, and trophic levels. 440 441 OR identification and classification 442 To detect putatively functional ORs in the selected genomes, we created a BLAST query with a 443 set of 2,110 OR protein sequences from 6 mammals (Ornithorhynchus anatinus, Didelphis 444 virginiana, Bos taurus, Canis lupus, Rattus norvegicus, Macaca mulatta), 2 birds (Gallus gallus, 445 Taeniopygia guttata), and 1 crocodilian (Gavialis gangeticus). We obtained this query OR set by 446 combining previously published OR subgenomes (Niimura and Nei 2007; Niimura 2009; 447 Vandewege et al. 2016). Using this query file, we performed TBLASTN searches against all 70 448 bird genomes with a threshold of E < 1e–20. The TBLASTN –num_alignments option was set to 449 200,000 to capture all genomic ORs similar to a single query sequence. To remove 450 pseudogenized and truncated ORs, we filtered for hits > 250 amino acids long. For any single 451 location on the genome, we filtered out hits within 100?bp of each other, and selected the lowest 452 E-value associated with that location. 453 After obtaining unique BLAST hits, we extracted the associated nucleotide sequence 454 from the genome as well as 300-bp regions flanking the hit both upstream and downstream. We 455 used a modified Perl script from Beichman et al. (2019) to detect open reading frames (ORFs) 456 within each extracted region (Montague et al. 2014; Beichman et al. 2019). We then aligned 457 these ORFs to each other as well as to the human Olfactory Receptor Family 2 Subfamily J 458 31 Member 3 (OR2J3) sequence using the E-INS-I default parameters in MAFFT (Katoh and 459 Standley 2013). Using the previously characterized transmembrane domains of OR2J3 as a 460 guide, we removed any sequences that had five or more amino acid insertions or deletions within 461 a transmembrane domain in the alignment (McRae et al. 2012; Beichman et al. 2019). This 462 included ORFs with stop codons appearing prior to the end of the seventh transmembrane 463 domain. 464 Using this alignment, we recorded the position of the first amino acid in the first 465 transmembrane domain. To estimate the location of the ORF start codon, we used modified Perl 466 scripts from Beichman et al. (2019) to find the most appropriate methionine upstream of this 467 recorded transmembrane start position (Montague et al. 2014; Beichman et al. 2019). ORF 468 sequences were then truncated at the 5’ ends to begin with this methionine. This set of ORFs was 469 then aligned using the E-INSI-I parameters in MAFFT to a set of T. guttata reference ORs as 470 well as 11 non-OR rhodopsin-like G-protein coupled receptors (non-OR GPCRs) that functioned 471 as an outgroup (Katoh and Standley 2013; Niimura 2013; Vandewege et al. 2016; Beichman et 472 al. 2019). We then used clustalW to generate a neighbor-joining tree from this alignment with 473 1000 bootstraps, gaps removed, and Kimura's distance correction (Kimura 1980; Goujon et al. 474 2010). We then removed any ORFs that were phylogenetically more closely related to the non- 475 OR GPCRs. 476 We classified all remaining ORFs as functional ORs. We classified bird ORs into subfamilies 477 alpha, gamma, and gamma-c based on the subfamily of the query sequence used to identify the 478 OR and the location of the OR in one of the three distinct avian OR clades (Steiger et al. 2009; 479 32 Vandewege et al. 2016). We then counted the final number of OR sequences as well as the 480 number of ORs from each subfamily. 481 482 Estimation of tree topology 483 To analyze olfactory receptor counts in a phylogenetic context, we sought to create a phylogeny 484 of the bird species set. The bird species used in this study are a unique set, with no preexisting 485 published phylogenies containing all 70 species in a single tree. Therefore, we used topologies 486 from seven existing phylogenies in the literature. We used previously published phylogenies to 487 delineate relationships within the orders Accipitriformes and Passeriformes and within the 488 families Falconidae, Corvidae, and Psittacidae (Wright et al. 2008, Haring et al. 2012, Mindell et 489 al. 2018, Wink 2018, Oliveras et al. 2019). For topological relationships between orders, we 490 referenced two established competing phylogenies in the literature (Jarvis et al. 2014, Prum et al. 491 2015). We created two separate topologies, both following the same topology for within-family 492 level relationships, but one topology following the intra-order relationships in Jarvis et al. and 493 one following Prum et al. This choice to include multiple competing topologies is due to the 494 contentious nature of the phylogenetic relationships in birds following the Cretaceous-Paleogene 495 extinction (Jarvis et al. 2014). Between 60 to 70 million years ago, the Neoaves lineage of birds 496 underwent rapid diversification to form all modern day Neoaves orders, and the relative timing 497 of various lineage divergences is disputed between different molecular datasets (Jarvis et al. 498 2014, Prum et al. 2015). Therefore, we created two topologies corresponding to each phylogeny 499 (Jarvis et al. 2014, Prum et al. 2015). 500 501 33 Estimation of branch lengths 502 To determine the branch lengths for our literature-based topologies, we mined the 70 genomes 503 for ultraconserved elements (UCEs). We used the UCE 5K probe set available in the phyluce 504 pipeline to search for 5,472 UCEs from the 70 bird genomes (Faircloth et al. 2012). We 505 recovered 5,044 UCEs from this search and, using custom shell scripts, extracted these UCEs 506 from the bird assemblies. Using further shell scripts, we assigned the top hit in each bird 507 assembly from each UCE query to a fasta file. In this way, we obtained 5,044 fasta files, each 508 containing the top UCE hit from each bird assembly. We then aligned the individually-grouped 509 UCEs using the E-INSI-I parameters in MAFFT (Katoh and Standley 2013). We then ran the 510 FASconCAT perl script (Kuck and Meusemann 2010) to concatenate all UCEs from individual 511 species. Together, this created one concatenated alignment of all UCEs for the 70 bird species. 512 Using the input topologies and the UCE concatenated alignment, we generated branch 513 lengths using IQ-TREE (Minh et al. 2020). We used a partition file generated by FASconCAT 514 (Kuk and Meusemann 2010) to partition the concatenated alignment into each individual input 515 UCE. We set all partitions to the general time-reversible (GTR+FO) substitution mode, a 516 partition rich substitution model that allows all substitution rates and base frequencies to occur at 517 different rates (GTR), with base frequencies optimized by maximum likelihood (+FO) (Mihn et 518 al. 2020). We then ran IQ-TREE twice, one for each input topology (Jarvis et al. 2014, Prum et 519 al. 2015). We viewed output trees using iTOL software, and rooted the tree appropriately 520 (Letunic and Bork 2019). 521 522 Trait analyses: data collection 523 34 For each bird species, we collected a variety of trait data for comparisons with olfactory receptor 524 counts. As a positive control, previous research shows that olfactory receptor size positively 525 correlates with olfactory bulb size (Steiger et al. 2008). We used a previously published dataset 526 of olfactory bulb measurements, and recorded the ratio of log olfactory bulb volume to both 527 telencephalon volume (the section of the brain where the olfactory bulb is located) and total 528 brain volume (as recorded in Corfield et al. 2015). We omitted species in this analysis that were 529 not represented in the Corfield et al. dataset. Using information from Birds of the World 530 (Billerman et al. 2022), we recorded whether each species is nocturnal or diurnal, has a learned 531 song or innate song, and whether the species is mostly terrestrial or aquatic. When selecting 532 these traits, we considered the possible sensory trade-offs, such as decreased vision in nocturnal 533 species and increased reliance on auditory cues in song learning species. We also recorded the 534 trophic level and diet of each species from the EltonTratis 1.0 dataset (Wilman et al. 2014). To 535 understand the potential relationship between transposable element proliferation and olfactory 536 receptor counts, we also recorded the estimated overall genome size for each species, using the 537 Animal Genome Size Database (Gregory 2022). Overall variation in genome size is driven in 538 large part but the extent of repeat element proliferation (Kidwell 2002), and repeat element 539 proliferation is associated with gene duplication events (Kidwell 2002). For species without a 540 recorded genome size, sizes were averaged for recorded members of the same family. Species in 541 families without any recorded genome sizes were not included in this analysis. 542 543 Trait analyses: phylogenetic generalized least squares 544 35 To control for the phylogenetic non-independence of our trait comparisons across bird species, 545 we ran phylogenetic generalized least squares (PGLS) models. The phylogenetic trees with 546 branch lengths generated from the UCE dataset were converted to a correlation structure in R 547 using the ape package function corBrownian to estimate a Brownian motion (BM) model of trait 548 evolution and corMartens to estimate an Ornstein-Uhlenbeck (OU) model (Paradis and Schliep 549 2019). The OU model may better replicate actual biological processes due to an additional 550 parameter to the “random walk” of BM in that there is a greater attraction to an initial central 551 value the further the trait is from this value. We then used the function gls in the R nlme package 552 (Pinheiro and Bates 2022). This function fit a linear model to the traits of interest while 553 considering either the BM or OU correlation structure as defined by one of the two phylogenetic 554 trees. For each trait comparison, we compared the AIC values of each model to determine 555 whether to select BM or the additional parameter in OU. These methods were repeated for both 556 phylogenetic trees based on the two original topologies. 557 558 Phylogenetic analyses of olfactory receptor counts: ancestral state reconstructions 559 To estimate ancestral states across the bird phylogeny, we ran maximum likelihood estimates 560 under a Brownian motion model using the function fastAnc in the R phytools package (Revell 561 2012). The character state input to these analyses was the log of the total intact OR count. We 562 also obtained estimates of variance and 95% confidence intervals at each node. We used the 563 phytools function contMap to set the ancestral state reconstructions on both of the phylogenetic 564 trees, and used setMap and plot functions to generate the tree image (Revell 2012). 565 566 36 Phylogenetic analyses of OR counts: branch birth and death rates 567 To estimate rates of gene family birth and death across the bird phylogeny, we ran the program 568 Badirate (Librado et al. 2012). Badirate uses either a gain and death or birth, death, and 569 innovation stochastic population models in a phylogenetic context. Badirate has advantages over 570 other gene family birth and death modeling tools such as being able to set separate birth and 571 death rates, rather than the equal rates assumed by CAFE (Mendes et al. 2020). Badirate takes a 572 phylogenetic tree and a gene family table as input. The gene family table (or “size file”) can be 573 divided into known subfamily groups, to reduce the amount birth and death rates mask each 574 other. Here, the size file was divided into the total counts for the alpha, gamma, and gamma-c 575 OR subfamilies for each bird species. A free rates branch model was selected, giving each 576 branch its own birth and death rate. The birth and death estimation procedure used was a 577 parsimony-based method. Here, birth, death, and innovation rates are determined from counting 578 gain and loss events from the family members of internal nodes using the Wagner parsimony 579 algorithm and two equations from Vieira et al. (2007). We ran a birth and death rates model 580 along all tips and branch across both phylogenies. We recorded the birth and death rates at each 581 branch with particular attention to branches with high birth and death rates. 582 583 Results 584 OR totals 585 Across all 70 bird species examined, we found a total of 8,880 ORs. This included 551 alpha 586 (6.21% of total) and 2,427 gamma (27.33%) ORs. A total of 5,902 gamma-c ORs constituted 587 nearly two-thirds (66.46%) of the total bird ORs found. Individual species OR repertoires ranged 588 37 from 7 in Corvus corone and 9 in Corvus monedula to 399 in Laterallus jamaicensis and 385 in 589 Gallus gallus. Alpha OR counts in individual species ranged from 0-21, gamma counts range 590 from 5-134 ORs, and gamma-c ranged from 0-351 ORs, revealing a wide range of individual OR 591 subfamily counts across species. All ORs grouped into alpha, gamma, or gamma-c subfamilies. 592 Although theta ORs were previously reported in Gallus gallus and Taeniopygia guttata (Steiger 593 et al. 2009), we did not detect any ORs in the theta subfamily. 594 595 Ancestral state reconstruction 596 We generated ancestral state reconstructions of log-transformed total OR counts using maximum 597 likelihood methods and the fastAnc function in phytools. We then visualized the ancestral state 598 reconstructions across both topologies (Fig. 1a, b). Ancestral states were consistently highest in 599 the deepest nodes of the tree, prior to the divergence of Galloanserae from Neoaves. Across the 600 69 nodes within the phylogeny, five nodes are not within the Neoaves clade. These five nodes in 601 the top six highest ancestral character estimates in both topologies, with ancestral states in these 602 branches ranging from 5.16–5.39 in the Prum topology (Jarvis topology is consistent with Prum 603 topology unless stated otherwise). The only Neoavian branch within the top six highest ancestral 604 OR counts within is the ancestor of the Rallidae. In the Jarvis topology this is the highest 605 ancestral OR count (5.39, 95% CI 4.73–6.05), and in the Prum topology it is the second highest 606 branch (5.37, 95% CI 4.72–6.03). OR counts were consistently the lowest in the Corvidae 607 family, with all three nodes within Corvidae ranking lowest (ancestral state range within Corvide 608 nodes from Prum topology 3.38–3.60). Other consistently low-ranking branches were in parrots 609 (for example, node 117 in Prum 4.02, 95% CI 3.41–4.63) and in the node at the common 610 38 ancestor of all passerines (node 118 Prum 4.23, 95% CI 3.78–4.76). 611 39 A B 612 Fig 2.1. Ancestral character states of bird ORs in two topologies. (A) Ancestral character states of bird OR repertoires mapped onto 613 the Prum et al. 2015 topology. (B) Ancestral character states of bird OR repertoires mapped onto the Jarvis et al. 2014 topology. We 614 estimated ancestral character states using maximum likelihood methods an the phytools fastAnc function in R (Revell 2012). We 615 mapped colors to the phylogeny using contMap. 616 40 Similar to overall OR counts, each of the three OR subfamilies showed a general pattern 617 where some of the highest ancestral state reconstruction estimates occurred in the earliest 618 diverging nodes of the tree, and show a decline following the divergence of Neoaves (Fig. 2a-c). 619 In alpha ORs, the node with the highest ancestral character state is at the common ancestor of all 620 modern birds (Prum, alpha = 2.61, 95% CI 1.95–3.26; Fig. 2a). Ancestral alpha OR counts also 621 rebound in nodes leading to the carnivorous Accipitridae (for example node 97 Prum topology, 622 2.58, 95% CI 2.16–2.99). Alpha OR counts also increased at the Gruidae ancestral node (node 86 623 Prum topology, 2.41, 95% CI 1.98–2.85). Alpha ORs were lowest in Piciformes (node 103 Prum 624 topology, 1.02, 95% CI 0.53–1.52). After the divergence of Psittaciformes and Passeriformes, 625 alpha OR values decrease substantially, with the 11 nodes within this clade showing ancestral 626 states below an average of 1.48. 627 The gamma OR subfamily also shows high ancestral values at the common ancestor of all 628 modern birds (Prum, 4.00, 95% CI 3.42–4.59; Fig. 2b). Gamma ORs are high in different clades 629 throughout the phylogeny, including Accipitridae (ie., Prum node 100, 4.00, 95% CI 3.64–4.37), 630 and Psittaciformes (ie., Prum node 116, 3.91, 95% CI 3.50–4.33). Unlike alpha ORs, gamma 631 ORs remain high in parrots, but decline in passerines, and do not recover. The twenty lowest 632 ancestral state reconstructions for gamma ORs are the nodes within passerines (Prum topology, 633 all below 3.02). 634 For gamma-c ORs, the three highest nodes are within Galloanserae (Prum nodes ie., 71- 635 73 Prum node 72, 5.05, 95% CI 4.19–5.92; Prum Fig 2c), and is also high at the common 636 ancestor of all birds (Prum 4.85, 95% CI 3.48–6.24). After a decrease in the Neoaves common 637 ancestor, gamma-c counts increase in the ancestor of Gruidae (Prum 4.91, 95% CI 3.90–5.92). 638 41 Interestingly, despite an overall decrease in ORs in passerines, gamma-c ancestral states 639 increased in one lineage of oscine passerines including Motacillidae, Fringillidae, Thraupidae, 640 and Icteridae (ie., Prum node 135, 4.75, 95% CI 3.95–5.55). The smallest gamma-c values were 641 the nodes within Psittaciformes (ie., Prum node 114, 2.33, 95% CI 1.37–3.29) and Corvidae (ie., 642 Prum node 121, 2.61, 95% CI 1.75–3.48). 643 644 Olfactory receptor birth and death rates 645 To estimate the birth and death model of gene family evolution across our phylogeny, we ran 646 Badirate (Librado et al. 2012). We input the three gene subfamilies in separate rows, allowing 647 Badirate to estimate birth and death while simultaneously considering the three families 648 independently. Across both topologies, the largest birth rate occurred on the branch leading to 649 suboscines and oscines, following the divergence of Acanthisittidae (birth rate: Prum ? = 58.29, 650 Jarvis ? = 57.71, but see Discussion). Following this branch, additional high birth rates occurred 651 on various passerine lineages. Consistent with other results, a high birth rate occurred on the 652 branch leading to Rallidae (Prum ? = 17.88, Jarvis ? = 19.48). Due to different birth and death 653 rates among subfamilies, a small death rate was also found on the Rallidae ancestral branch 654 (Prum ? = 0.10, Jarvis ? = 0.11). Other high birth rates occurred in Neoaves, including 655 Theristicus caerulescens (Prum ? = 20.86, Jarvis ? = 21.69) and at the common ancestor of 656 Aquila chrysaetos and Accipiter gentilis (Prum ? = 11.98, Jarvis ? = 13.85). 657 The highest gene death rates in both topologies occurred in the earliest diverging lineages of 658 Neoaves (Fig. 3a,b). However, the relationships among modern Neoaves orders, occurring 60-70 659 million years ago, is highly debated, and is the main difference between the two topologies. Both 660 42 topologies showed an initial death rate in the ancestor of all Neoaves (Prum ? = 7.40, Jarvis ? = 661 14.10). However, this death rate was lower than subsequent death rates within different Neoaves 662 lineages. In the Prum topology, two major OR declines occur on these branches, the first 663 occurring following the first divergence within Neoaves, following the divergence of Strisores 664 (Prum ? = 36.61). The Neoavian OR then undergo a subsequent second decline following the 665 divergence of Gruidae, in the lineage leading to Aequorlitornithes, Accipitriformes, and all other 666 Neoaves (Prum ? = 33.51). The Jarvis topology detects two losses as well, one following the 667 divergence of Strisores (Jarvis ? = 60.47), and a second loss following the divergence of 668 Cursorimorphae (Charadriiformes and Gruiformes) and leading to all other Neoaves (Jarvis ? = 669 29.36). While both topologies agree that Strisores diverged prior to a decline in OR receptor 670 diversity, there is disagreement between the topologies on whether certain orders experienced 671 any, some, or all of this OR loss. For example, Phoenicopteriformes (flamingos) diverge prior to 672 either of these losses in the Jarvis topology, but diverge following both losses in the Prum 673 topology. 674 Additional high OR death rates occurred within Coraciimorphae following the divergence 675 of trogons (leading to barbets and woodpeckers, Prum ? = 20.21, Jarvis ? = 43.64; Fig. 3). Two 676 passerine linages also experienced high death rates, Sylviidae (Prum ? = 30.46, Jarvis ? = 28.99), 677 and Camarhynchus parvulus within Thraupidae (Prum ? = 19.48, Jarvis ? = 22.68). However, 678 these lineages also experienced different rates of changes within subfamilies, as both 679 experienced gene duplications as well (Prum Sylviidae 1.70, Camarhynchus ? = 1.44; Jarvis 680 Sylviidae ? = 1.61, Camarhynchus ? = 1.68). 681 43 In independent Badirate runs for the specific OR subfamilies, we detected the large death 682 rates consistent in the subfamily-specific ancestral state reconstructions declines from fastAnc. A 683 large reduction in gamma receptors occurred in the Australaves common ancestor (seriemas, 684 falcons, parrots, passerines; ? = Prum 22.58, Jarvis ? = 19.76), and then again a substantial 685 gamma decline occurred in the branch leading to all passerines (Prum ? = 17.57, Jarvis ? = 686 12.02). A large deline in alpha ORs occurred on a single branch leading to parrots and passerines 687 (Prum ? = 41.56, Jarvis ? = 64.15), but subsequent increases occurred in specific lineages, such 688 as Sylviidae (Prum ? = 49.07, Jarvis ? = 46.71). Gamma-c birth and death rates were similar to 689 the three family analyses, given influence of the large gamma-c counts on this analysis. 690 44 A B C 691 Fig 2.2. Ancestral state reconstruction of bird OR subfamily repertoires, generated by maximum likelihood using the fastAnc function 692 in phytools in R (Revell 2012). Topology displayed is derived from Prum et al. 2015 topology. (A) Ancestral reconstruction of alpha 693 OR subfamily, (B) gamma OR subfamily, and (C) gamma-c OR subfamily. 694 45 A B 695 Fig 2.3. Highest OR birth and death rates across two topologies. (A) Prum et al. 2015 topology with the top five largest OR birth rate 696 branches highlighted in green, and the top give largest OR death rate branches highlighted in red. (B) Jarvis et al. 2014 topology with 697 the top five largest birth rate branches highlighted in green and the top five largest OR death rate branches highlighted in red. 698 46 Comparisons of OR counts and traits 699 Using phylogenetic least squares, we compared OR counts across all 70 species with behavioral 700 and ecological phenotypes, including species diet, trophic level, environment type, song learning 701 ability, and nocturnality. None of the eight measured diet types showed a correlation with OR 702 counts. This lack of significant correlation included frugivore (t = 0.78, P = 0.44, BM model, 703 Prum topology), granivore (t = -0.14, P = 0.89, OU model, Prum), aquatic herbivore (t = 0.21, P 704 = 0.83, BM model, Prum), invertivore (t = 0.22, P = 0.82, OU model, Prum), nectarivore (t = 705 0.29, P = 0.29, OU model, Prum), omnivore (t = -0.57, P = 0.57, BM model, Prum), scavenger 706 (t = -1.14, P = 0.26, BM model, Prum), and vertivore (t = 1.38, P = 0.17, BM model, Prum). 707 Dividing the dataset into eight diet types may over partition the data and limit the number of 708 independent gains of the trait across the phylogeny. Therefore, we also looked at trophic level, 709 which more coarsely defines species as herbivores, carnivores, omnivores, and scavengers. Here 710 too, however, we did not see any significant correlation with herbivory (t = -0.85, P = 0.40, BM 711 model, Prum), carnivory (t = -1.08, P = 0.29, BM model, Prum), omnivory (t = -1.78, P = 0.08, 712 BM model, Prum), or scavenging (t = -1.14, P = 0.26, BM model, Prum). 713 We also detected no significant correlation with OR total count when defining species as 714 terrestrial or aquatic (t = 0.89, P = 0.38, BM model, Prum; t = 0.71, P = 0.48, BM model, 715 Jarvis). To test for reliance on auditory cues, we saw no correlation between OR counts and song 716 learning (t = -1.22, P = 0.22, BM model, Prum; t = -1.06, P = 0.30, BM model, Jarvis). Both 717 topologies however showed a significant positive correlation between OR count and nocturnality 718 (t = 2.83, P = 0.01, BM model, Prum; t = 3.00, P < 0.01, BM model, Jarvis; Prum Fig. 4). 719 47 In birds, olfactory bulb size is a long-standing measurement used to assess potential 720 reliance on olfactory ability (Cobb 1959, Bang and Cobb 1968, Zelenitsky et al. 2011). Research 721 has also shown a positive relationship between OR repertoire size and olfactory bulb size in birds 722 (Steiger et al. 2008, Steiger et al. 2009, Khan et al. 2015). Using measurements from 24 species 723 in Corfield et al. 2015, we found a significant positive correlation between the ratio of olfactory 724 bulb size to telencephalon size and OR counts in both topologies (t = 2.19, P = 0.04, BM model, 725 Prum, Fig. 5a). We saw the same significant correlation when measuring the ratio of olfactory 726 bulb size to overall brain size and comparing to OR count (t = 2.16, P = 0.04, BM model, Prum, 727 Fig. 5b). 728 We also compared counts of the three OR subfamilies, alpha, gamma, and gamma-c, with 729 the set of traits. The majority of traits compared did not show a significant correlation with OR 730 subfamily counts, however, several traits did show correlations with specific subfamilies. Alpha 731 OR counts were negatively correlated with nectarivory, with low counts in all three nectivorous 732 species, across two separate gains (Trochilidae, Thraupidae) (t = 2.59, P = 0.01, OU model, 733 Prum). Alpha OR counts were also negatively correlated with song learning, with low alpha OR 734 counts in passerines, parrots, and hummingbirds (t = -3.17, P < 0.01, BM model, Prum). Alpha 735 OR counts increased in granivorous species (t = 2.47, P = 0.02, OU model, Prum). Gamma-c OR 736 counts were also positively correlated with omnivorous species (t = -2.18, P = 0.03, BM model, 737 Prum). 738 48 739 Fig. 2.4. Comparison of OR counts between diurnal and nocturnal bird species. Using 740 phylogenetic generalized least squares methods, we detected a significant increase in nocturnal 741 species (t = 2.83, P = 0.01, BM model, Prum topology). 742 743 49 744 745 746 747 748 749 750 751 752 753 754 755 756 757 Fig. 2.5. Significant correlations between OR counts and olfactory bulb size to telencephalon 758 size ratio and whole brain ratio. (A) Comparison of total OR counts and olfactory bulb size 759 relative to telencephalon (Prum topology, t = 2.19, P = 0.04, BM model). (B) Comparison of 760 total OR counts and olfactory bulb size relative to whole brain (Prum topology, t = 2.16, P = 761 0.04, BM model). Olfactory bulb, telencephalon, and whole brain measurements from Corfield et 762 al. 2015. 763 50 Discussion 764 Olfactory capabilities are potentially widespread in birds 765 With the use of long read assemblies, we found that 47 of the 70 species analyzed had a 766 repertoire size of at least 75 ORs, and these 47 species were present across diverse lineages of 767 birds. This is in contrast to the previous study investigating olfactory receptor counts across the 768 bird phylogeny, which found repertoire sizes >75 ORs in only three of 48 species (Khan et al. 769 2015). We therefore show the robust use of long read genomes for characterizing bird OR 770 counts, as well as the potential importance of smell for birds across the phylogeny. These counts 771 were largely supported by the bird-specific gamma-c OR subfamily, which had an average of 84 772 ORs per species, or 66.46% of the total ORs recovered. Due to the lack of functional studies of 773 bird ORs, we know little about the gamma-c and their role in smell. While orthologs with 774 characterized binding odors exist in mammals and reptiles for bird alpha and gamma ORs (Saito 775 et al. 2009, Steiger et al. 2009, Vandewege et al. 2016), the gamma-c ORs do not have 776 comparable orthologs in other vertebrate classes. In the first investigation of bird olfactory 777 epithelium RNA expression, gamma-c OR expression was detected in the olfactory epithelium, 778 suggesting a role in olfaction (Sin et al. 2022). The role of gamma-c in olfaction would suggest 779 that gamma-c, and overall bird OR counts, are related to a species’ behavioral or ecological 780 reliance on smell, as is shown in other vertebrates, such as mammals (Niimura et al. 2014). 781 The highest OR count was in allus jamaicensis (black rail) at 399 ORs. Laterallus 782 jamaicensis lives in dense marsh habitat, and is also nocturnal, and is one of the most 783 challenging birds in North America to observe (Billerman 2022). The third highest OR count 784 Porphyrio hochstetteri (takahe) was also in the Rallidae family. These two species show a 785 51 remarkable and consistent high repertoire size within Rallidae, which exceeds all other Neoaves 786 species by at least 40 ORs. The second, fourth, and fifth highest OR counts (Gallus gallus 385, 787 Dromaius novaehollandiae 297, Aythya fuligula 273 ORs) include one Palaeognathae and two 788 Galloanserae species, illustrating the phylogenetic retention of high OR counts in these groups 789 and the possible importance of smell in these species through evolutionary history to present day. 790 Conversely, all four Corvus species in the dataset had relatively low counts, and in particular, the 791 lowest counts among all birds were Corvus monedula (jackdaw, 10 ORs) and Corvus corone 792 (carrion crow, 7 ORs). This could mean a decreased reliance on smell for these species, and 793 perhaps a tradeoff with other senses, such as increased reliance on vision, or other energy 794 investments in the brain, for example, increased cognitive performance (Cobb 1959). 795 796 OR counts declined in early diverging Neoaves 797 The most diverse ancestral nodes across the bird phylogeny were in Galloanserae, particularly 798 Anatidae, in the ancestor of all modern birds, and in common ancestor of all Neognathae. 799 Following the divergence of Neoaves, total OR counts decline, although the two major bird 800 topologies (Jarvis et al. 2014, Prum et al. 2015) disagree on the placement of these declines. Both 801 topologies agree that prior to the divergence of Strisores, ancestral OR counts remain high. 802 Following the Neoaves radiation, OR total counts do not recover to their previous states, with 803 one exception, in the Rallidae. In the Jarvis topology, the ancestral state of the common ancestor 804 between Laterallus jamaicensis and Porphyrio hochstetteri exceeds ancestral states prior to the 805 divergence of Neaoves. The ancestor of these two rail species possibly adapted to dense marsh 806 habitat with limited visibility, and where potential prey items are beneath substrate, ecological 807 52 conditions that could promote olfactory abilities. This pattern across the phylogeny shows that in 808 the birth and death model of gene family evolution, genes can decline substantially, but recover 809 under specific circumstances, likely driven by ecological selection. Consistent with the OR 810 counts in extant species, the lowest ancestral state OR counts occurred in crows, with the three 811 ancestral nodes in this clade having the lowest states of across all birds. Due to our species 812 sampling, it is unclear whether these low ancestral character states are unique to the genus 813 Corvus, or if these low counts extend to other Corvids, such as jays or magpies. Mining 814 additional Corvidae species for OR counts could help figure out where this decline took place on 815 the phylogeny. Behavioral experiments indicate that magpies (Pica hudsonia) can more easily 816 find cached food items scented with cod liver than unscented food, suggesting that perhaps the 817 very low OR counts occurred within Corvidae, perhaps in an ancestral Corvus species (Buitron 818 and Nuechterlein 1985). 819 Ancestral states of OR subfamily counts show that the history of each subfamily is 820 unique, and that the composition of the total OR counts in birds has changed over evolutionary 821 time. Three clades in particular show low alpha counts, the Trochilidae (hummingbird) clade, the 822 hornbill, bee-eater, woodpecker clade within Coraciimorphae, and the Psittaciformes and 823 Passeriformes clade. These two clades show consistently low levels of alpha ORs, despite 824 including species with diverse diets and habitats. A similar result is present in the Passeriformes 825 clade for gamma ORs, and in the Psittaciformes clade for gamma-c ORs. Although we do not 826 know the specific reason why these clades saw declines in these OR subfamilies, is possible that 827 the ORs no longer detect relevant odors in these clades, while other OR subfamilies retain 828 relevance. Like crows, these clades, particularly woodpeckers, parrots, and passerines, are 829 53 considered to have highly developed cognitive abilities, a potential tradeoff with olfactory 830 abilities (Cobb 1959). 831 832 Dynamic bird and death of ORs across the bird tree 833 Our analysis through Badirate detected non-zero birth and death rates for many branches across 834 the tree, showing a dynamic birth and death model of gene family evolution for ORs across 835 birds. For total OR counts, the largest expansions occurred on the branch separating the passerine 836 Acanthisitti from suboscines and oscines. This result is partially due to a very low OR count in 837 Acanthisitta chloris, which has the lowest OR count among all species (19 ORs), aside from 838 crows. This could potentially be an issue with obtaining the original Acanthisitta chloris DNA 839 sample, as this species is restricted to New Zealand and may be difficult to access. The contig 840 N50 and total number of contigs for the assembly were consistent with other assemblies used, 841 and the assembly was created with the standard Vertebrate Genomes Project pipeline. If this 842 large birth rate is indeed accurate, then following the divergence of Acanthisittidae (New 843 Zealand wrens), the ancestor of oscine and suboscine passerines experienced a birth rate nearly 844 three times higher than at any other point in the bird phylogeny. This birth rate substantially 845 impacted the gamma-c, as alpha and gamma ORs remain low across all passerines. 846 The ancestor of rails also had a high birth rate, and was one of the few branches on the 847 tree that had a high birth rate and one of the highest ancestral state reconstructions. This suggests 848 that an ancestral rail had one of the largest OR repertoires in birds, and that many of the ORs in 849 this repertoire arose recently, following the divergence from Gruidae (cranes). The gene birth 850 rate along this branch was also high in the gamma-c OR subfamily (? = 43.15). This paints the 851 54 possibility of an ancestral Gruiformes bird perhaps entering marsh habitat, experiencing gamma- 852 c OR duplications, and retaining those genes to aid in olfaction. This is in contrast to other bird 853 species with high OR counts and high ancestral state reconstructions, such as the chicken. The 854 branch leading to the chicken does have a small birth rate (? = 1.41), but the chicken’s large OR 855 repertoire size is largely due to the maintained ancestral state throughout Galloanserae evolution. 856 The highest death rates in birds occurred along the early diverging branches in Neoaves. 857 In the Prum topology, a high OR death rate occurs following the divergence of Strisores, then 858 Columbaves (cuckoos and turacos and doves and sandgrouse) diverge, and then a second high 859 death rate occurs in the rest of Neoaves. In the Jarvis topology, the loss is positioned following 860 the divergence of Phoenicopteriformes (flamingos), Columbiformes (doves), Pterocliformes 861 (sandgrouse), and Strisores. Following this death, the lineage leading to Charadriiformes and 862 Gruiformes (including rails) diverges, and the branch leading to all other Neoaves experiences 863 high rate of gene death. Between 60-70 million years ago, the Neoaves underwent a massive 864 radiation, splitting into all of today’s modern orders (Jarvis et al. 2014, Prum et al. 2015). This 865 rapid radiation is difficult for phylogeneticists to resolve, and is unclear from even a variety of 866 approaches, including using both coding and non-coding DNA regions to construct trees (Suh 867 2016). The time frame of this radiation includes the Cretaceous-Paleogene extinction, and 868 following the extinction of non-avian dinosaurs, birds likely began to occupy into newly 869 available niches. A previous study measuring the olfactory bulb size from fossilized Cretaceous 870 bird species showed that bulb size increased in early Neornithine and Palaeognathae evolution, 871 and perhaps olfaction aided these species during the Cretaceous-Paleogene extinction event 872 (Zelenitsky et al. 2011). However, the authors detect only one olfactory bulb increase in early 873 diverging Neoaves branches, in the branch leading to Gruiformes, Procellariformes, and other 874 55 mostly aquatic lineages, however, this comparison used a topology that we did not consider in 875 the current study (Zelenitsky et al 2011). Many other early diverging Neoaves lineages 876 experienced a decrease in olfactory bulb size (Zelenitsky et al. 2011). Therefore, although 877 Palaeognathae and Galloansarae retained s large olfactory bulb that originated in the ancestor of 878 all modern birds, this comparatively large olfactory bulb decreased in Neoaves. In agreement 879 with our OR counts, we show that although olfaction may have aided Palaeognathae and 880 Galloanserae through the Cretaceous extinction, we do not support the idea that during this same 881 time smell played a major role in the Neoaves radiation, but rather, that reliance on smell 882 decreased in most Neoaves lineages. 883 In OR subfamily birth and death rate analyses, Badirate similarly detected the decrease in 884 alpha OR counts, occurring in the ancestor of parrots and passerines, and in gamma ORs, 885 occurring in two ancestral branches, including the ancestor of passerines. Despite the low alpha 886 and gamma counts in passerines, as mentioned earlier, the common passerine ancestor 887 (excluding Acanthisittidae) experiences a radiation of gamma-c. This high gamma-c birth rate is 888 furthered by marginal lineage-specific gamma-c gains, including the Icteridae ancestor (? = 889 2.84), and again in the Icterid Agelaius phoeniceus (? = 4.39). It is possible that over 890 evolutionary time species shift reliance on different OR subfamilies to accommodate different 891 ecologies. 892 893 Olfactory bulb size correlates with OR repertoire counts 894 Consistent with previous work, olfactory bulb to brain size ratios positively correlated with total 895 OR repertoire counts. This further supports that both measurements can reliably be used as a 896 56 proxy for olfactory ability. One outlier species in comparison of bulb to brain size ratio with OR 897 count was the chicken, which had a much larger OR count relative to its olfactory bulb size. 898 While it is uncertain as to why chicken is such an outlier compared to other species (Fig. 5a,b), 899 the DNA reference source for the chicken assembly used here for OR counts was from a 900 domesticated bird. Previous studies show domesticated mammals, including rats, llamas, sheep, 901 pigs, and dogs, have a lower volume of olfactory structures relative to wild “ancestral” species 902 (Kruska 1980, Kruska 1988). Our measurements of olfactory bulb are slightly different, and 903 consider the relative volume of the olfactory bulb to the telencephalon or overall brain (Corfield 904 et al. 2015), however, decreased olfactory bulb volume could lead to the outlier position of the 905 chicken that we observed. Although other species in both the OR count dataset and olfactory 906 bulb size come from domesticated birds (for example, Taeniopygia guttata), people in the Indus 907 Valley were estimated to domesticate the chicken about 4,500 years ago, far longer than any 908 other bird species (Tixier-Boichard et al. 2011). Although there is no evidence of how OR 909 repertoire size is impacted by domestication, it is possible OR repertoire does not change at the 910 same rate as olfactory bulb size in response to domestication. In this case, the chicken may have 911 a reduced olfactory bulb, while retaining much of its ancestral OR repertoire. Additional studies 912 on different chicken breeds, as well as wild red junglefowl (Gallus gallus), could help reveal the 913 impact of domestication on OR counts. It is also unclear if Corfield et al. obtained a wild red 914 junglefowl in their morphological analysis or a domesticated chicken (Corfield et al. 2015). 915 Of the 24 species examined for olfactory bulb size, the two smallest olfactory bulbs were 916 in the genus Corvus- C. moneduloides and C. corone, which matched perfectly with our 917 extremely low counts of ORs in Corvus. The olfactory bulb of Corvus macorhynchos is very 918 small relative to the cerebral hemisphere and in one study did not have distinct posterior conchae 919 57 present (Yokosuba et al. 2009, Kondoh et al. 2011). Although it can be challenging to define 920 “intelligence” across many different bird species, it has been suggested that “intelligent” birds 921 have smaller olfactory bulbs (Cobb 1959). 922 923 Diet and song learning are not correlated with OR counts 924 Across all observed diets and trophic niches, there were no correlations with total OR counts. 925 This was true for a comparison that assigned species to one of eight potential diet niches and a 926 comparison that broke species into four trophic levels. The lack of a relationship was surprising, 927 because presumably different diet types attract specific foraging methods that vary in their 928 reliance on olfaction. This negative result could be due to potentially diverse ways to arrive at a 929 given diet. For example, the diet category ‘invertivore’ encompasses a variety of different 930 foraging strategies. Apus apus (swift) is a diurnal aerial hunter, while Cuculus canorus (cuckoo) 931 gleans arboreal insects, and Picoides pubescens (woodpecker) excavates insects from tree bark 932 (Billerman et al. 2022). However, these diverse foraging behaviors are all considered 933 ‘invertivores’ in EltonTraits (Wilman et al. 2014). Therefore, OR totals may better correlate with 934 particular foraging strategies as opposed to diet. More species should be surveyed for OR counts 935 to increase the number of species representing each foraging strategy. 936 Another possibility for the lack correlation between diet and OR counts is that dietary 937 changes do not greatly impact the total number of ORs. It is possible that shifts in olfactory 938 ability could occur due to change in sensitivity of existing ORs. Alternatively, only a small 939 number of OR gains and losses could potentially confer great changes to olfactory abilities, but 940 not be reflected when looking at the comparatively large number of ORs in the total repertoire. 941 58 For example, the three subfamilies could permit the detection of different types of odors, and so 942 a change in diet would only impact a given group of ORs or subfamily. We show that alpha ORs 943 significantly decreased in both nectivorous lineages (in Trochilidae and Thraupidae). Our PGLS 944 analysis did not include zero values, and the hummingbird Calypte anna had zero alpha ORs. 945 Therefore, our result, which only considers low counts in Phaethornis and Diglossa is further 946 supported by Calypte counts. We also saw an increase in alpha ORs for granivorous species, and 947 an increase in gamma-c ORs in omnivorous species, further suggesting that dietary shifts may 948 fine tune subfamily repertoires, as opposed to drastically altering total counts. 949 Similar to diet, song learning did not correlate with overall OR counts, but did correlate 950 with a decrease in alpha OR counts. This was due to exceptionally low alpha ORs in, 951 hummingbirds, parrots, and oscine passerines. Woodpeckers also had very low alpha OR counts 952 and are not song learners by standard measures, but forebrain nuclei used in territorial drum 953 displays are the same as used in songbird vocal learning (Schuppe et al. 2022). Therefore, song 954 learning may have a relationship with decreases in alpha ORs even moreso than detected in our 955 traditional trait analysis. 956 957 Nocturnality increases total OR counts 958 Nocturnality was positively associated with higher OR counts. Our results agree with a previous 959 morphological comparison that shows that nocturnality increases olfactory bulb size in birds, 960 whereas other ecological variables, including diet, do not show a correlation (Healy and Guilford 961 1990). Across the phylogeny, there were five nocturnal species and four presumed gains of 962 nocturnality– one in Strisores (Camprimulgus europaeus and Nyctibius grandis), one in Rallidae 963 59 (Laterallus jamaicensis), one in Strigiformes (Tyto alba), and one in Psittaciformes (Strigops 964 habroptila). Increased OR repertoire in nocturnal species is significant despite the diverse 965 behavior and ecology of the nocturnal species included. The Strisores species are aerial 966 insectivores, L. jamaicensis is a skulking invertivore in dense marsh habitat, T. alba is a 967 primarily mammalian predator, and S. habroptila is a giant herbivorous flightless parrot 968 (Billerman et al. 2022). Despite disparate underlying ecology, a nocturnal lifestyle is a strong 969 transition that greatly impacts the sensory biology of the organism, for example, owls lack a 970 functional UV-sensing shortwave sensitive 1 opsin but have greater hearing abilities (Grothe 971 2018, Höglund et al. 2019). Therefore, although different diets may gives rise to a variety of 972 foraging methods that may influence a species sensory biology in various ways, nocturnality has 973 a consistent signal in birds, where olfactory receptors significantly increase in number. 974 975 No evidence for influence of genome size on OR count 976 While high OR counts may reflect a true reliance on olfaction, we wanted to know if the 977 propensity of a genome to experience duplications, as measured by total genome size, was also 978 responsible for OR count. The location of many ORs in the genome can be found in large 979 clusters on unmapped contigs, flanked by repeat regions of DNA, and transposable elements 980 (Glusman et al. 2000, Vandewege et al. 2016, Driver & Balakrishnan 2021). In humans, large 981 OR clusters are interspersed with repetitive elements, particular LINEs (Glusman et al. 2000). 982 LINES are a common source of reverse transcriptase and can retrotranspose intron-less paralogs 983 into genomic DNA (Kidwell 2002). Additionally, transposable elements or DNA replication 984 slippage could increase DNA content, and carry local ORs along in the duplication. However, we 985 60 did not see any relationship between OR counts and overall genome size. Although a weak 986 correlation did appear, following phylogenetic correction we did not see a significant result. This 987 was somewhat surprising, since we noticed that hummingbirds, particularly Phaethornis 988 superciliosus, have low OR counts and hummingbirds have the smallest genome sizes of any 989 bird family (Gregory et al. 2009). However, the correlation between OR counts and genome size 990 is not significant when across the 70 species presented here. 991 992 Conclusion 993 We show a high level of dynamism in OR repertoire counts across the bird phylogeny. We show 994 that some birds have large OR repertoires, such as in rails, where the OR total count of 995 Laterallus jamaicensis, at 399, is roughly the same repertoire size at the lower end of mammals, 996 including primates (Niimura et al. 2014). We also show that some birds have very small OR 997 repertoires, such as crows in the genus Corvus, that, consistent with evidence from the 998 morphological features of the crow olfactory system, likely have a poor sense of smell (Kondoh 999 et al. 2011). In between these high and low OR repertoire extremes are ever-changing OR 1000 ancestral character states and branches experiencing OR gene family birth and death rates. 1001 Included among these branches is a high rate of death during the early diverging lineages of 1002 Neoaves, about 60-70 million years ago. Through evolutionary time, OR expansions and 1003 contractions of various degrees appear frequently in the phylogeny, showing the high turnover of 1004 a gene family undergoing the birth and death model of evolution. We show that nocturnality is 1005 an ecological factor that increases OR counts during evolution. We also find that increased OR 1006 61 counts are associated with a larger olfactory bulb, further suggesting that OR counts can be used 1007 as a proxy for reliance of a species on smell. 1008 Although we have characterized bird OR genomic repertoires in this study, not all OR genes will 1009 be functional or relevant to the olfactory system (Maßberg and Hatt 2018). In mammals, many 1010 ORs are expressed in tissues outside of the olfactory system, including roles in environmental 1011 responses in the skin and chemotaxis in sperm (Maßberg and Hatt 2018). Therefore, future gene 1012 expression studies of the bird olfactory epithelium can pinpoint which ORs within the genomic 1013 repertoire may be involved in olfaction. Finally, even for bird ORs expressed in the olfactory 1014 epithelium, it is unclear what odors cause a response in bird ORs. This is particularly true for the 1015 gamma-c ORs, which have no clear orthology to other vertebrate classes. For gamma-c ORs, 1016 binding properties are entirely unknown and cannot be compared with mammalian orthologs that 1017 may have known response odors. The subfamily-specific birds and deaths across the phylogeny 1018 are often difficult to explain using only bird ecology and behavior. Functional work in the future 1019 will allows us to better make sense of births and deaths across the tree, for example, why 1020 hummingbirds, parrots, and passerines have few alpha ORs. Our study provides the genomic data 1021 to further investigate the individual ORs within our counts, to better understand how birds use 1022 smell in their ecology and behavior. 1023 1024 References 1025 Audubon J.J. (1826). Account of the habits of the turkey buzzard, Vultur aura, particularly with 1026 the view of exploding the opinion generally entertained of its extraordinary power of 1027 smelling. Edinburgh New Philosophical Journal 2:172–184. ? 1028 Bang B. G., Cobb S. (1968). The size of the olfactory bulb in 108 species of birds. The Auk 1029 85:55–61. 1030 62 Beichman A. C., Koepfli K. P., Li G., Murphy W., Dobrynin P., Kliver S., Tinker M. T., Murray 1031 M. J., Johnson J., Lindblad-Toh K., Karlsson E. K., Lohmueller K. E., Wayne R. K. 1032 (2019). Aquatic adaptation and depleted diversity: a deep dive into the genomes of the 1033 sea otter and giant otter. Molecular Biology and Evolution 36:2631–2655. 1034 Billerman S. M., Keeney B. K., Rodewald P. G., Schulenberg T. S. (2022). Birds of the World. 1035 Cornell Laboratory of Ornithology, Ithaca, NY USA. 1036 https://birdsoftheworld.org/bow/home 1037 Bonadonna F., Gagliardo A. (2021). Not only pigeons: avian olfactory navigation studied ?by 1038 satellite telemetry. Ethology Ecology, & Evolution 33:273–289. ? 1039 Bravo G. A., Schmitt C. J., Edwards S. V. (2021). What have we learned from the first 500 avian 1040 genomes? Annual Review of Ecology, Evolution, and Systematics 52:611–639. 1041 Buck L., Axel R. (1991). A novel multigene family may encode odorant receptors: a molecular 1042 basis for odor recognition. Cell 65:175–187. 1043 Buitron D., Nuechterlein G. L. (1985). Experiments on olfactory detection of food caches by 1044 black-billed magpies. Condor 87:92–95. 1045 Cobb S. (1959). A note on the size of the avian olfactory bulb. Epilepsia 1:394–402. 1046 Corfield J. R., Price K., Iwaniuk A. N., Gutierrez-Ibañez C., Birkhead T., Wylie D. R. (2015). 1047 Diversity in olfactory bulb size in birds reflects allometry, ecology, and phylogeny. 1048 Frontiers in Neuroanatomy 9:102. 1049 Driver R. J., Balakrishnan C. N. (2021). Highly contiguous genomes improve the understanding 1050 of avian olfactory receptor repertoires. Integrative & Comparative Biology 61:1281– 1051 1290. ? 1052 Faircloth B. C., McCormack J. E., Crawford N. G., Harvey M. G., Brumfield R. T., Glenn T. C. 1053 (2012). Ultraconserved elements anchor thousands of genetic markers spanning multiple 1054 evolutionary timescales. Systematic Biology 61:717–726. 1055 Glusman G., Bahar A., Sharon D., Pilpel Y., White J., Lancet D. (2000). The olfactory receptor 1056 gene superfamily: data mining, classification, and nomenclature. Mammalian Genome 1057 11:1016–1023. 1058 Goujon M., McWilliam H., Li W., Valentin F., Squizzato S., Paern J., Lopez R. (2010). A new 1059 bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Research 1060 38:W695–W699. 1061 Gregory T. R., Andrews C. B., McGuire J. A., Witt C. C. (2009). The smallest avian genomes 1062 are found in hummingbirds. Proceedings of the Royal Society B 276:3753–3757. 1063 Gregory T.R. (2022). Animal Genome Size Database. http://www.genomesize.com 1064 Grothe B. (2018). How the barn owl computes auditory space. Trends in Neurosciences 41:115– 1065 117. 1066 Grubb, T. C. (1972). Smell and foraging in shearwaters and petrels. Nature 237:404–405. ? 1067 63 Haring E., Däubi B., Pinsker W., Kryukov A., Gamauf A. (2012). Genetic divergences and 1068 intraspecific variation in corvids of the genus Corvus (Aves: Passeriformes: Corvidae) – a 1069 first survey based on museum specimens. Journal of Zoological Systematics and 1070 Evolutionary Research 50:230–246. 1071 Healy S., Guilford T. (1990). Olfactory-bulb size and nocturnality in birds. Evolution 44:339– 1072 346. 1073 Hill A. (1905). Can birds smell? Nature 71:318–319. ? 1074 Höglund J., Mitkus M., Olsson P., Lind O., Drews A., Bloch N. I., Kelber A., Strandh M. (2019). 1075 Owls lack UV-sensitive cone opsin and red oil droplets, but see UV light at night: retinal 1076 transcriptomes and ocular media transmittance. Vision Research 158:109–119. 1077 Jarvis E. D., Mirarab S., Aberer A. J., Li B., Houde P., Li C., Ho S. Y., Faircloth B. C., Nabholz 1078 B., Howard J. T., Suh A., Weber C. C., da Fonseca R. R., Li J., Zhang F., Li H., Zhou L., 1079 Narula N., Liu L., Ganapathy G., Boussau B., Bayzid M. S., Zavidovych V., 1080 Subramanian S., Gabaldón T., Capella-Gutiérrez S., Huerta-Cepas J., Rekepalli B., 1081 Munch K., Schierup M., Lindow B., Warren W. C., Ray D., Green R. E., Bruford M. W., 1082 Zhan X., Dixon A., Li S., Li N., Huang Y., Derryberry E. P., Bertelsen M. F., Sheldon F. 1083 H., Brumfield R. T., Mello C. V., Lovell P .V., Wirthlin M., Schneider M. P., Prosdocimi 1084 F., Samaniego J. A., Vargas Velazquez A. M., Alfaro-Núñez A., Campos P. F., Petersen 1085 B., Sicheritz-Ponten T., Pas A., Bailey T., Scofield P., Bunce M., Lambert D. M., Zhou 1086 Q., Perelman P., Driskell A. C., Shapiro B., Xiong Z., Zeng Y., Liu S., Li Z., Liu B., Wu 1087 K., Xiao J., Yinqi X., Zheng Q., Zhang Y., Yang H., Wang J., Smeds L., Rheindt F. E., 1088 Braun M., Fjeldsa J., Orlando L., Barker F. K., Jønsson K. A., Johnson W., Koepfli K. P., 1089 O'Brien S., Haussler D., Ryder O. A., Rahbek C., Willerslev E., Graves G. R., Glenn T. 1090 C., McCormack J., Burt D., Ellegren H., Alström P., Edwards S. V., Stamatakis A., 1091 Mindell D. P., Cracraft J., Braun E. L., Warnow T., Jun W., Gilbert M. T., Zhang G. 1092 (2014). Whole-genome analyses resolve early branches in the tree of life of modern birds. 1093 Science 346:1320–1331. 1094 Joseph L., Toon A., Schirtzinger E. E., Wright T. F., Schodde R. (2012). A revised nomenclature 1095 and classification for family-group taxa of parrots (Psittaciformes). Zootaxa 3205:26–40. 1096 Katoh K., Standley D. M. (2013). MAFFT multiple sequence alignment software version 7: 1097 improvements in performance and usability. Molecular Biology and Evolution 30:772– 1098 780. 1099 Khan I., Yang Z., Maldonado E., Li C., Zhang G., Gilbert M. T. P., Jarvis E. D., O’Brien S. J., 1100 Johnson W. E., Antunes A. (2015). Olfactory receptor subgenomes linked with broad 1101 ecological adaptations in Sauropsida. Molecular Biology and Evolution 32:2832–2843. 1102 Kidwell, M. G. (2002). Transposable elements and the evolution of genome size in eukaryotes. 1103 Genetica 115:49–63. 1104 Kimura M. (1980). A simple method for estimating evolutionary rates of base substitutions 1105 through comparative studies of nucleotide sequences. Journal of Molecular Evolution 1106 16:111–120. 1107 64 Kondoh D., Nashimoto M., Kanayama S., Nakamuta N., Taniguchi K. (2011). Ultrastructural 1108 and histochemical properties of the olfactory system in the Japanese jungle crow, Corvus 1109 macrorhynchos. Journal of Veterinary Medical Science 8:1007. 1110 Kruska D. (1988). Effects of domestication on brain structure and behavior in mammals. Human 1111 Evolution 3:473–485. 1112 Kück P., Meusemann K. (2010). FASconCAT: convenient handling of data matrices. Molecular 1113 Phylogenetics and Evolution 56:1115–1118. 1114 Laurin M., Reisz R. R. (1995). A reevaluation of early amniote phylogeny. Zoological Journal of 1115 the Linnean Society 113:165–223. ? 1116 Letunic I., Bork P. (2019). Interactive tree of life (iTOL) v4: recent updates and new 1117 developments. Nucleic Acids Research 47:W256–W259. 1118 Librado P., Vieira F. G., Rozas J. (2012). Badirate: estimating family turnover rates by 1119 likelihood-based methods. Bioinformatics 28:279–281. 1120 Maßberg D., Hatt H. 2018. Human olfactory receptors: novel cellular functions outside of the 1121 nose. Physiological Reviews 98:1739–1763. 1122 McRae J. F., Mainland J. D., Jaeger S. R., Adipietro K. A., Matsunami H., Newcomb R. D. 1123 (2012). Genetic variation in the odorant receptor OR2J3 is associated with the ability to 1124 detect the “grassy” smelling odor, cis-3-hexen-1-ol. Chemical Senses 37:585–593. 1125 Mendes F. K., Vanderpool D., Fulton B., Hahn M. W. (2020). CAFE 5 models variation in 1126 evolutionary rates among gene families. Bioinformatics 36:5516–5518. 1127 Mindell D. P., Fuchs J., Johnson J. A. (2018). Phylogeny, taxonomy, and geographic diversity of 1128 diurnal raptors: Falconiformes, Accipitriformes, and Cathartiformes. In: Sarasola J., 1129 Grande J., Negro J. (eds) Birds of Prey. Springer, Cham. https://doi.org/10.1007/978-3- 1130 319-73745-4_1 1131 Minh B. Q., Schmidt H. A., Chernomor O., Schrempf D., Woodhams M. D., von Haeseler A., 1132 Lanfear R. (2020). IQ-TREE 2: new models and efficient methods for phylogenetic 1133 inference in the genomic era. Molecular Biology and Evolution 37:1530–1534. 1134 Molina-Morales M., Castro J., Albaladejo G., Parejo D. (2020). Precise cache detection by 1135 olfaction in a scatter-hoarder bird. Animal Behaviour 167:185–191. ? 1136 Montague M. J., Li G., Gandolfi B., Khan R., Aken B. L., Searle S. M. J., Minx P., Hillier L. W., 1137 Koboldt D. C., Davis B. W., Driscoll C. A., Barr C. S., Blackistone K., Quilez J., 1138 Lorente-Galdos B., Bonet-Marques T., Alkan C., Thomas G. W. C., Hahn M. W., 1139 Menotti-Raymond M., O’Brien S. J., Wilson R. K., Lyons L. A., Murphy W. J., Warren 1140 W. C. (2014). Comparative analysis of the domestic cat genome reveals genetic 1141 signatures underlying feline biology and domestication. Proceedings of the National 1142 Academy of Sciences 111:17230–17235. 1143 Niimura Y., Nei M. (2005). Evolutionary dynamics of olfactory receptor genes in fishes and 1144 tetrapods. Proceedings of the National Academy of Sciences 102:6039–6044. ? 1145 65 Niimura Y., Nei M. (2007). Extensive gains and losses of olfactory receptor genes in mammalian 1146 evolution. PLoS ONE 2:e708. 1147 Niimura Y. (2009). On the origin and evolution of vertebrate olfactory receptor genes: 1148 comparative genome analysis among 23 chordate species. Genome Biology and Evolution 1149 1:34–44. 1150 Niimura Y. (2013). Identification of olfactory receptor genes from mammalian genome 1151 sequences. Methods in Molecular Biology 1003:39–49. 1152 Niimura Y., Matsui A., Touhara K. (2014). Extreme expansion of the olfactory receptor gene 1153 repertoire in African elephants and evolutionary dynamics of orthologous gene groups in 1154 13 placental mammals. Genome Research 24:1485–1496. ? 1155 Oliveros C. H., Field D. J., Ksepka D. T., Barker F. K., Aleixo A., Andersen, M. J., Alström P., 1156 Benz B. W., Braun E. L., Braun M. J., Bravo G. A., Brumfield R. T., Chesser R. T., 1157 Claramunt S., Cracraft J., Cuervo A. M., Derryberry E. P., Glenn T. C., Harvey M. G., 1158 Hosner P. A., Joseph L., Kimball R. T., Mack A. L., Miskelly C. M., Peterson A. T., 1159 Robbins M. B., Sheldon F. H., Silveira L. F., Smith B. T., White N. D., Moyle R. G., 1160 Faircloth B. C. (2019). Earth history and the passerine superradiation. Proceedings of the 1161 National Academy of Sciences 116:7916–7925. 1162 Owre O. T., Northington P. O. (1961). Indication of the sense of smell in the turkey vulture, 1163 Cathartes aura (Linnaeus), from feeding tests. American Midland Naturalist 66:200–205. 1164 Paradis E., Schliep K. (2019). Ape 5.0: an environment for modern phylogenetics and 1165 evolutionary analysis in R. Bioinformatics 35:526–528. 1166 Pinheiro J., Bates D., DebRoy S., Sarkar D., R Core Team. (2021). nlme: linear and nonlinear 1167 mixed effects models. R package version 3.1-144 https://CRAN.R- 1168 project.org/package=nlme. 1169 Prum R. O., Berv J. S., Dornburg A., Field D. J., Townsend J. P., Moriarty Lemmon E., Lemmon 1170 A. R. (2015). A comprehensive phylogeny of birds (Aves) using targeted next-generation 1171 DNA sequencing. Nature 526:569–573. 1172 Revell L. J. (2012). Phytools: an R package for phylogenetic comparative biology (and other 1173 things). Methods in Ecology and Evolution 3:217–223. 1174 Saito H., Chi Q., Zhuang H., Matsunami H., Mainland J. D. (2009). Odor coding by a 1175 mammalian receptor repertoire. Science Signaling 2:ra9. 1176 Schuppe, E. R., Cantin L., Chakraborty M., Biegler M. T., Jarvis E. R., Chen C. C., Hara E., 1177 Bertelsen M. F., Witt C. C., Jarvis E. D., Fuxjager M. F. (2022). Forebrain nuclei linked 1178 to woodpecker territorial drum displays mirror those that enable vocal learning in 1179 songbirds. PLoS Biology 20:e3001751. 1180 Silva M. C., Chibucos M., Munro J. B., Daugherty S., Coelho M. M., Silva J. C. (2020). 1181 Signature of adaptive evolution in olfactory receptor genes in Cory’s shearwater supports 1182 molecular basis for smell in procellariiform seabirds. Scientific Reports 10:543. 1183 66 Sin S. Y. W., Cloutier A., Nevitt G., Edwards S. V. (2022). Olfactory receptor subgenome and 1184 expression in a highly olfactory procellariiform seabird. Genetics 220:iyab210. 1185 Steiger S. S., Fidler A. E., Valcu M., Kempenaers B. (2008). Avian olfactory receptors gene 1186 repertoires: evidence for a well-developed sense of smell in birds? Proceedings of the 1187 Royal Society B 275:2309–2317. 1188 Steiger S. S., Kuryshev V. Y., Stensmyr M. C., Kempenaers B., Mueller J. C. (2009). A 1189 comparison of reptilian and avian olfactory receptor gene repertoires: species-specific 1190 expansion of group ? in birds. BMC Genomics 10:446. ? 1191 Strotmann J., Wanner I., Krieger J., Raming, K. Breer, H. (1992). Expression of odorant 1192 receptors in spatially restricted subsets of chemosensory neurons. Neuroreport 3:1053– 1193 1056. ? 1194 Suh, A. (2016). The phylogenomic forest of bird trees contains a hard polytomy at the root of 1195 Neoaves. Zoologica Scripta 45:50–62. 1196 Tixier-Boichard M., Bed’hom B., Rognon X. (2011). Chicken domestication: from archeology to 1197 genomics. Comptes Rendus Biologies 3:197–204. 1198 Van Huynh A., Rice A. M. (2021). Odor preferences in hybrid chickadees: implications for 1199 reproductive isolation and asymmetric introgression. Behavioral Ecology and 1200 Sociobiology 75:129. ? 1201 Vandewege M. W., Mangum S. F., Gabaldón T., Castoe T. A., Ray D. A., Hoffmann F. G. 1202 (2016). Contrasting patterns of evolutionary diversification in the olfactory repertoires of 1203 reptile and bird genomes. Genome Biology and Evolution 8:470–480. ? 1204 Vieira F. G., Sánchez-Gracia A., Rozas J. (2007). Comparative genomic analysis of the odorant- 1205 binding protein family in 12 Drosophila genomes: purifying selection and birth-and- 1206 death evolution. Genome Biology 8:R235. 1207 Wikelski M., Quetting M., Cheng Y., Fiedler W., Flack A., Gagliardo A., Salas R., Zannoni N., 1208 Williams J. (2021). Smell of green leaf volatiles attracts white storks to freshly cut 1209 meadows. Scientific Reports 11:12912. ? 1210 Wilman H., Belmaker J., Simpson J., de la Rosa C., Rivadeneira M. M., Jetz W. (2014). 1211 EltonTraits 1.0: species-level foraging attributes of the world’s bird and mammals. 1212 Ecology 95:2027. 1213 Wink, M. (2018). Phylogeny of Falconidae and phylogeography of peregrine falcons. Ornis 1214 Hungarica 26:27–37. 1215 Yokosuba M., Hagiwara A., Saito T. R., Tsukahara N., Aoyama M., Wakabayashi Y., Sugita S., 1216 Ichikawa M. (2009). Histological properties of the nasal cavity and olfactory bulb of the 1217 Japanese jungle crow Corvus macrorhynchos. Chemical Senses 34:581–593. 1218 Zelenitsky D. K., Therrien F., Ridgely R. C., McGee A. R., Witmer L. M. (2011). Evolution of 1219 olfaction in non-avian theropod dinosaurs and birds. Proceedings of the Royal Society B 1220 278:3625–3634. 1221 67 Zhan X., Pan S., Wang J., Dixon A., He J., Muller M. G., Ni P., Hu L., Liu Y., Hou H., Chen Y., 1222 Xia J., Luo Q., Xu P., Chen Y., Liao S., Cao C., Gao S., Wang Z., Yue Z., Li G., Yin Y., 1223 Fox N., Wang J., Bruford M. W. (2013). Peregrine and saker falcon genome sequences 1224 provide insights into evolution of a predatory lifestyle. Nature Genetics 45:563–566. 1225 1226 68 III. FUNCTIONAL CHARACTERIZATION OF OLFACTORY RECEPTORS IN THE 1227 CONTEXT OF THEIR RADIATION IN BIRDS 1228 1229 Abstract 1230 Olfaction plays a critical role in animal behavior and ecology. In birds, olfaction is used in 1231 foraging, kin recognition, and mate choice. Odorants are detected by olfactory receptors (ORs), 1232 however ORs also function outside of the olfactory system in tissues throughout the body. Gene 1233 expression studies of the olfactory epithelium (OE) can inform researchers about which ORs are 1234 involved in olfaction. Such studies have occurred in reptiles and mammals, but have only 1235 occurred recently in birds, and in a limited number of species. Here, we perform the first formal 1236 measurement of OR expression in the OE across the bird phylogeny, targeting four species that 1237 span avian diversity and represent diverse ecology and behavior. We successfully detected the 1238 set of ORs from the genomic repertoire with expression in the olfactory system (OE) and 1239 pectoralis muscle tissues. Our results show that the majority of the genomic OR repertoire is 1240 expressed in the bird OE, including the large bird-specific gamma-c OR subfamily. We show 1241 that some gamma-c ORs are highly expressed in the OE relative to other bird ORs, and that 1242 many gamma-c ORs are present in the OE. In addition to indicating which ORs in birds are used 1243 in olfaction, my study will provide a framework for future functional assays pinpointing the 1244 odors perceived by birds. 1245 1246 Introduction 1247 Olfaction is essential for survival and reproduction in many animals. It plays a central 1248 role in foraging, avoiding predation, kin recognition, and territorial behavior. In vertebrates, air 1249 or waterborne odor molecules are detected with olfactory receptors (ORs) a gene family of G 1250 protein-coupled receptors expressed in the olfactory sensory neurons of the olfactory epithelium 1251 (OE, Buck and Axel 1991, Strotmann et al. 1992). To accommodate the incredible variety of 1252 odors in nature, ORs constitute the largest gene family in vertebrates, with over 1,000 genes in 1253 some mammals and over 300 genes in some birds (Niimura et al. 2014, Niimura and Nei 2005, 1254 Chapter II). 1255 The number of ORs in a species’ genome can be used to derive total genomic repertoire 1256 counts (Niimura et al. 2014), but not all of the genomic repertoire will be functional or relevant 1257 to the olfactory system (Maßberg and Hatt 2018). Many ORs are expressed in tissues outside of 1258 the olfactory system. Such ORs play diverse roles including regulating environmental responses 1259 in the skin and chemotaxis in sperm (Maßberg and Hatt 2018). Within this context of a complex 1260 gene family, understanding the function (or lack thereof) of specific ORs is a major challenge. 1261 Gene expression studies of the olfactory epithelium can distinguish ORs that likely bind odorants 1262 from ORs with other physiological roles and non-functional pseudogenes. Expression studies of 1263 the OE have occurred in all vertebrate classes, including in fish, amphibians, reptiles, and 1264 mammals (Ressler et al. 1993, Marchand et al. 2004, Komakov et al. 2008, Kishida et al. 2019). 1265 However, OE expression studies in birds have lagged behind other vertebrates (Sin et al. 2022). 1266 Birds are the most speciose class of terrestrial vertebrates, inhabiting nearly all land 1267 environments. Among birds there is high diversity of social structures and foraging strategies, 1268 yet birds were long thought to rely on visual rather than olfactory signals (Audubon 1826, Hill 1269 70 1905). Recent behavioral work in birds has shown important roles for olfaction in foraging, 1270 locating nest sites, seed caching behavior, and species recognition, among other behaviors 1271 (Buitron and Nuechterlein 1985, Molina-Morales et al. 2020, Bonadonna and Gagliardo 2021, 1272 Wikelski et al. 2021, Van Huynh and Rice 2021). Additionally, specific bird species rely on a 1273 highly specialized olfactory system for foraging, including Cathartes aura (turkey vulture) and 1274 many seabirds (order Procellariformes, Owre and Northington 1961, Grubb 1972, Bonadonna 1275 and Gagliardo 2021). 1276 To add to the recent surge of interest in how olfaction influences bird behavior, we 1277 showed that birds have many more OR genes in their genomes than previously realized (Driver 1278 and Balakrishnan 2021, see Chapter 1, Chapter 2). Genomic analysis divides bird species’ OR 1279 repertoires into three phylogenetic subgroups: alpha, gamma, and gamma-c ORs (Niimura and 1280 Nei 2005, Steiger et al. 2009, Driver and Balakrishnan 2021). The alpha and gamma OR 1281 subgroups are shared across tetrapods: chicken alpha and gamma ORs form phylogenetic clades 1282 with alpha and gamma ORs from amphibians, reptiles, and mammals (Niimura and Nei 2005, 1283 Steiger et al. 2009, Vandewege et al. 2016). This illustrates a degree of sequence conservation in 1284 the OR repertoire of these subfamilies despite at least 315 million years of divergence between 1285 mammalian and bird lineages (Lauren and Reisz 1995). Contrastingly, the gamma-c OR 1286 subfamily is only present in birds (Niimura and Nei 2005, Steiger et al. 2009, Driver and 1287 Balakrishnan 2021). Previous studies show that the gamma-c OR subfamily was the most 1288 abundant OR clade in most species (Steiger et al. 2009, Khan et al. 2015). For example, the 1289 gamma-c subfamily constituted over 85% of all OR genes in the zebra finch (60 total gamma-c 1290 ORs) and chicken (303 total gamma-c ORs, Driver and Balakrishnan 2021). Phylogenetic 1291 analyses of OR repertoires containing multiple bird species reveal that gamma-c ORs cluster into 1292 71 species-specific clades as opposed to showing clear orthologous relationships among species 1293 (Zhan et al. 2013, Silva et al. 2020), suggesting possible species-specific roles for the gamma-c. 1294 Gamma-c ORs within a species also have shorter phylogenetic terminal branch lengths compared 1295 to alpha and gamma ORs, showing a high degree of sequence similarity between gamma-c genes 1296 (Steiger et al. 2009, Silva et al. 2020). However, we cannot discern the functional roles of such 1297 ORs in smell without expression studies of the bird OE. 1298 Expression studies of the OE have not occurred in birds until recently, with only two 1299 studies published this year (Luo et al. 2022, Sin et al. 2022). In the Leach’s storm-petrel 1300 (Oceanodroma leucorhoa) the OE expressed over 30 different ORs from the 61 OR genomic 1301 repertoire, nearly all at low expression levels (Sin et al. 2022). Only two ORs were “highly” 1302 expressed relative to the other ORs, and neither were gamma-c ORs (Sin et al. 2022). In black- 1303 crowned night heron (Nycticorax nycticorax) the OE expressed 61 ORs of the 93 OR genomic 1304 repertoire, and again most ORs were lowly expressed (Luo et al. 2022). Little egret (Egretta 1305 garzetta), also expressed ORs at low levels in the OE, with 132 ORs present (Luo et al. 2022). 1306 However, for these three bird species, only short-read Illumina-based genome assemblies are 1307 available (Luo et al. 2022, Sin et al. 2022). Therefore, the total count of the genomic repertoire 1308 may be underestimated in these species (Driver and Balakrishnan 2021). Indeed, the little egret 1309 expresses 132 ORs but had a detectable genomic repertoire of only 108 ORs, providing strong 1310 evidence of an incomplete genomic count in these studies. 1311 To properly understand the portion of the genomic repertoire expressed in the OE, 1312 expression levels need to be compared to species with long-read assemblies (Driver and 1313 Balakrishnan 2021). Additionally, previous studies looked at either single bird species or at 1314 72 species within the same bird family (Luo et al. 2022, Sin et al. 2022), and therefore it is still 1315 unknown how expression vary when examining multiple bird orders. Given the dynamic birth 1316 and death rates of ORs across the bird phylogeny (Chapter II), it is possible that expression is 1317 also dynamic, and the portion of the OR repertoire that is relevant to smell may change between 1318 species. We hypothesize that bird express a subset of their genomic OR repertoire in the OE, and 1319 that the subset of ORs expressed varies across different species. These undetected ORs would 1320 represent either nonfunctional ORs or ORs with potentially unexplored and unknown functions 1321 in other tissues. 1322 1323 Methods 1324 Sample collection 1325 To determine the location OE and specific OE regions (the anterior, middle, and posterior 1326 conchae), we referenced morphological descriptions and images of the maxilla (Yokosuba et al. 1327 2009, Danner et al. 2017). We originally practiced dissections on bird carcasses donated by the 1328 North Carolina Museum of Natural Sciences. In this unique dissection, the maxilla is cut 1329 transversely through the nares and then from this incision the sides of the maxilla are cut 1330 proximally towards the lores. There are now three cuts in the maxilla, one transverse and distal, 1331 the other two sagittal from the nares to the lores. From this, the proximal half of the maxilla can 1332 be lifted up from the nares, exposing the tissue in the maxilla. We sampled as much tissue as 1333 possible in this part of the maxilla, and tried to sample from all three regions of the conchae, and 1334 placed immediately in microcentrifuge tubes on dry ice. Following sample collection, samples 1335 were stored in -80 C freezers. In the case of the hummingbird, maxillas were cut off at the lores, 1336 73 stored on dry ice and at -80 C, and dissection occurred at the time of extraction. We obtained 1337 pectoralis muscle at the same time, following olfactory epithelium sampling. 1338 We obtained olfactory epithelia from four bird species: chicken (Gallus gallus), Anna’s 1339 hummingbird (Calypte anna), zebra finch (Taeniopygia guttata), and brown-headed cowbird 1340 (Molothrus ater). In total, we obtained four OE samples from chicken and cowbird, and five OE 1341 samples from hummingbird and zebra finch. We obtained three pectoralis samples from chicken, 1342 zebra finch, and cowbird, but we did not obtain pectoralis for hummingbird. I personally 1343 sampled the chickens immediately following a routine dispatch in the laboratory of Dr. Ken 1344 Anderson at the Prestage Department of Poultry Science at North Carolina State University. The 1345 chickens were 21-week old hyline W-36 white leghorn hens. I personally collected the zebra 1346 finch samples from he laboratory of Dr. Richard Mooney in the Department of Neurobiology at 1347 the Duke University School of Medicine. All zebra finches were adult females from separate 1348 parents. Dr. Christopher Clark at the Department of Evolution, Ecology, and Organismal Biology 1349 at the University of California Riverside collected the Anna’s hummingbird maxilla, and I 1350 performed the olfactory epithelium dissections (permits USFWS MB-087454 and CDFW SC- 1351 006598 to Christopher Clark). Dr. Marc Schmidt at the Department of Biology at the University 1352 of Pennsylvania collected and dissected the cowbirds. All brown-headed cowbirds were adult 1353 males. All four species were sampled from captive populations, including the domesticated 1354 chicken and zebra finch. 1355 1356 RNA extractions and sequencing 1357 74 To extract RNA from the olfactory epithelium and pectoralis tissue, we cut a small amount of 1358 tissue (roughly 2x2 cm) from each sample, and cut samples on dry ice. We immediately 1359 transferred tissue to RNAzol RT (RNAzol® RT Brochure, 2017) and dissolved the sample with 1360 a homogenizer. We then added water to precipitate DNA, protein, and polysaccharides, and we 1361 centrifuged to remove these. We also added 4-bromoanisole for phase separation, and we 1362 performed this optional step of the protocol twice. We then precipitated the isolated RNA with 1363 ethanol, washed with isopropanol, and solubilized in water. We tested RNA concentration and 1364 purity using a Nanodrop, and tested for RNA quality and integrity using a BioAnalyzer at the 1365 Brody Integrative Genomics Core in the Department of Pathology & Laboratory medicine at 1366 East Carolina University. 1367 RNA quality was examined by the 4200 TapeStation (Agilent Technologies, Santa Clara, 1368 CA), with RNA integrity number (RIN) of samples ranged from 6 to 10. RNA concentration 1369 was determined by the Qubit Fluorometric Quantitation (Thermo Fisher, Waltham, MA), with 1370 150 ng of RNA samples used for each NGS library preparation. Stranded cDNA libraries were 1371 prepared using the TruSeq Stranded LT mRNA kit (Illumina, San Diego, CA) in accordance with 1372 the manufacturer’s protocol using the poly-adenylated RNA isolation. Sequencing of paired-end 1373 reads (100 bp × 2) was performed by pooling all the samples together on the NextSeq 2000 1374 system with a P3 200 cycles reagent. Raw sequence reads were de-multiplexed and trimmed for 1375 adapters by the on-instrument DRAGEN GenerateFastQ pipeline (v3.7.4). 1376 1377 Read mapping 1378 75 We mapped reads using the Spliced Transcripts Alignment to a Reference (STAR) aligner 1379 (Dobin et al. 2013). We were interested in OR expression specifically, so we generated the 1380 STAR reference genome not from the available species’ genome assemblies, but from our 1381 previously established genomic OR repertoires of each species (Chapter II). We found the 1382 genomic OR repertoires for chicken, hummingbird, zebra finch, and cowbird as described 1383 previously (Driver and Balakrishnan 2021, Chapter II). From our final curated OR alignments, 1384 we used custom R scripts and bedtools to extract nucleotides from the associated genome (R core 1385 team, Quinlan and Hall 2010). We generated the reference genome of OR sequences without 1386 using a GTF reference annotation. We then mapped reads to the genomic OR repertoires using 1387 STAR default parameters. 1388 1389 Counting and differential expression 1390 We counted the number of reads in output SAM files using the dplyr package in R (Wickham et 1391 al. 2022). To measure gene expression, we converted raw counts to counts per million (CPM). 1392 CPM is the total number of counts for a given locus divided by the total number of counts in the 1393 sample, and then multiplying by one million, which controls for sequencing depth of the sample. 1394 We analyzed differential gene expression using the limma and edgeR packages in R (Robinson et 1395 al. 2010, Ritchie et al. 2015). We used the TMM method to normalize expression data (Robinson 1396 and Oshlack 2010). We did not filter genes with low expression due to previous reports of many 1397 bird ORs showing low expression levels (Luo et al. 2022, Sin et al. 2022). A standard linear 1398 model with “tissue” (either pectoralis “PEC” or olfactory epithelium “OE”) as the independent 1399 variable was used for testing within chicken, zebra finch, and brown-headed cowbird. We only 1400 76 obtained OE tissue from hummingbird so we used only three species in the differential 1401 expression analyses. We adjusted P values for multiple testing using the Benjamini-Hochberg 1402 correction. We also ran a student’s t-test comparing CPM values between OE and pectoralis 1403 samples for chicken, zebra finch, and cowbird, as an alternative way to measure differential 1404 expression from a relatively small number of overall genes. For mapping to phylogenetic trees, 1405 we used trees created as described previously, using maximum likelihood methods in IQ-TREE 1406 (Minh et al. 2020). We overlayed expression heatmap plots to the phylogeny using the gheatmap 1407 function in ggtree in R (Yu 2020). 1408 1409 Results 1410 ORs found in tissues 1411 We sequenced whole-mRNA transcriptomes from the OE of four bird species and from the PEC 1412 of three species. Across all four species, we detected 590 expressed ORs out of 667 genomic 1413 ORs from Chapter II (Fig. 1, 88.46% of genomic ORs showed expression). Zebra finch was the 1414 only species that had its entire genomic OR repertoire expressed in the OE. Brown-headed 1415 cowbird expressed 136 of 137 ORs expressed in the OE (99.28%). Anna’s hummingbird also had 1416 a high proportion of its ORs expressed in the OE (99 of 109, 90.83%). Although chicken had the 1417 highest total number of OR genes expressed in the OE, with 286 ORs, chicken also had the 1418 largest genomic OR repertoire of the species sampled, and had the lowest overall proportion of 1419 OR expressed (286 of 352, 81.25%). 1420 There was a large amount of variation between samples, even within the same species 1421 and tissue (Fig. 2). For all four species, individual variation was high, with specific samples 1422 77 consistently showing higher OR expression than other samples. This was not due to different 1423 ORs being expressed between samples, but rather, consistent high or low expression across the 1424 entire OR repertoire. For example, the one zebra finch sample had an average OR expression of 1425 1.07 log CPM, whereas another sample had an average of -0.20 log CPM. Variation was high in 1426 all species, for example hummingbird had one sample with an average of 0.414 log CPM OR 1427 expression, and another sample had an average expression of -0.80 log CPM. Variation in log 1428 CPM between and within species for OE is visualize in figure 2. In addition to variation in 1429 expression, there was high variation in samples between total number of ORs expressed in OE. 1430 This was highest in the chicken, with one sample expressing 246 ORs, whereas another OE 1431 sample expressed only 9 ORs (Fig. 2). We saw a similar but less extreme version of this 1432 variation in other species, including hummingbird, with one OE sample containing 100 ORs and 1433 another sample containing only 3 ORs (Fig. 2). 1434 Contrary to mammals that express ORs with tissue-specific roles across the body 1435 (Maßberg and Hatt 2018), all ORs that were expressed in the OE were also expressed in 1436 pectoralis, so that no ORs were expressed exclusively in the pectoralis. Zebra finch had the 1437 largest number of OR genes expressed in the pectoralis, with 46 total (66.67 % of 69 genomic 1438 OR). Brown-headed cowbird expressed 35 ORs in the pectoralis (25.55% of 137 genomic ORs). 1439 The chicken had the smallest OR repertoire in the pectoralis, with only 17 ORs (4.83% of 352 1440 genomic ORs). 1441 Expression in the OE included ORs from the alpha, gamma, and gamma-c subfamilies. The ORs 1442 with the highest expression levels in Anna’s hummingbird zebra finch and cowbird were in the 1443 gamma-c subfamily. The chicken had at least one gene in all three subfamilies that showed high 1444 78 expression levels, although most abundant OR in the chicken (as well as the most abundant OR 1445 in this study) was a gamma-c OR. All OR subfamilies were also present in the pectoralis across 1446 the three samples species, although with fewer representatives. 1447 1448 Differential expression 1449 Overall, few ORs were differentially expressed between tissue comparisons, showing that the 1450 majority of ORs expressed in both tissues have similar expression levels following TMM 1451 normalization. However, fold changed tended to be in one direction, the higher expression in the 1452 OE. Due to the large variance between OE samples, these differential expression results were not 1453 significance. In all cases, OE samples had the highest levels of gene expression, and for ORs 1454 expressed in both tissues, OE expression was on average 266 times higher than in pectoralis in 1455 zebra finch, 40 times higher than in pectoralis in cowbird, and 26 times higher than in pectoralis 1456 in chicken. In zebra finch, of 46 total ORs expressed in both tissues there were five differentially 1457 expressed (DE) ORs between tissues (Fig 1., red asterisks). These consisted of one alpha OR (t = 1458 14.27, P-adj. < 0.01), one gamma OR (t = 13.49, P-adj. < 0.01), and three gamma-c ORs (t = 1459 13.27, P-adj. = 0.02; t = 14.12, P-adj. = 0.03; t = 12.40, P-adj. = 0.03). Four of the ORs were 1460 more highly expressed in OE compared to PEC, however, the alpha OR was more highly 1461 expressed in PEC compared to OE. This was the only OR to show this pattern in our dataset. In 1462 the cowbird, of the 35 ORs expressed in both tissues, a single gamma-c OR showed increased 1463 expression in OE compared to PEC (t = 14.48, P-adj. = 0.02). In the chicken, of 17 ORs 1464 expressed in both tissues, one gamma-c OR showed higher expression in OE (t = 16.90, P-adj = 1465 79 0.01). In our student’s T-test comparing OR expression between OE and pectoralis within 1466 species, we did not find any significant differences. 1467 1468 Discussion 1469 Most genomic ORs are expressed in OE 1470 We successfully detected OR expression in both the OE and pectoralis muscle tissues in three 1471 bird species, the chicken, zebra finch and brown-headed cowbird, and in the OE tissue of Anna’s 1472 hummingbird. This is the first study of OR expression levels in the OE for the bird orders 1473 represented here, including Galliformes, Trochiliformes, and Passeriformes. These three orders 1474 represent diverse lineages within the bird phylogeny- the Galloanseres, including Galliformes, 1475 separated from the Neoaves, including Trochiliformes and Passeriformes, 85 to 90 million years 1476 ago, and is one of the earliest diverging lineages within the extant birds. We show that the 1477 majority of genomic ORs are expressed in the OE in both Galloanseres and Neoaves species, 1478 illustrating that the majority of genomic ORs are involved in the olfactory system, and that this 1479 role is preserved across the phylogeny. Therefore, genomic OR counts across the bird phylogeny 1480 are likely relevant to the ecology and behavior of many bird species. These results agree with 1481 previous studies that showed that the majority of the genomic OR repertoires were also 1482 expressed in the OE of Leach’s storm-petrel, black-crowned night heron, and little egret (Luo et 1483 al. 2022, Sin et al. 2022). However, these genomic OR repertoires were determined by surveying 1484 short-read Illumina-based assemblies, that we have shown to undercount the number of genomic 1485 ORs (see Chapter I, Driver & Balakrishnan 2021). For example, the little egret expressed more 1486 ORs in the OE than were detected in the genome (Luo et al. 2022). Here, we present the first 1487 80 study comparing OR expression in the OE to the more reliable genomic OR counts from long 1488 read assemblies, and we continue to show that the majority of ORs are expressed in the OE. 1489 In the zebra finch, we found that all genomic ORs were expressed in the OE. Similarly, in 1490 the cowbird, we detected the expression of 136 of the 137 genomic ORs. This suggests that 1491 either the entire intact genomic OR repertoire of these species is functional and relevant to the 1492 olfactory system, or that we are still undercounting the genomic OR repertoires of these species, 1493 despite using long read genomes (see Chapter II). These expression results support the possibility 1494 that despite being highly contiguous, there are still problematic areas of long read assemblies, 1495 and that additional ORs may be present in these problem regions. ORs clusters in mammals and 1496 birds are flanked by repeat regions, thereby making the assembly of these regions particularly 1497 difficult (Glusman et al. 2000, Vandewege, Driver). Therefore, even current technologies may 1498 not resolve these regions. Additional surveys could be performed to extract putative ORs from 1499 our RNA-seq data that are not based on the previously determined genomic OR repertoires. 1500 These searches may pull out unique ORs not detected in the genomic repertoire. However, given 1501 the high sequence similarity of the gamma-c ORs, it may be difficult to assign reads to particular 1502 ORs with no genomic reference, as sequence differences may be between alleles as opposed to 1503 different genes. Conversely, in hummingbird and particularly the chicken, there are also genomic 1504 ORs absent from the OE and muscle, indicating that a portion of the genomic OR repertoire was 1505 either transcriptionally inactive in the individual birds we sampled, or expressed in other tissues. 1506 It is unclear what role these unexpressed ORs may play in birds, although it is likely that these 1507 ORs serve some function as their genomic sequences maintain an open reading frame (Chapter 1508 II). It is also possible that the expression of ORs, particularly in the OE, is dynamic and 1509 81 responsive to odorants in the environment, and that these ORs would be “turned on” in response 1510 to particular stimuli, which were not implemented in this study. 1511 82 1512 A B 1513 1514 Log CPM 1515 Log CPM 1516 1517 1518 * 1519 1520 C D 1521 * 1522 Log CPM Log CPM 1523 1524 * 1525 * * * 1526 * 1527 1528 Fig 3.1. OR gene expression in OE and pectoralis in log CPM. Left columns in each panel are 1529 OE, right columns pectoralis. Asterisks show differentially expressed ORs between tissue types. 1530 In B, the single column shows OE expression. Species genomic OR repertoires are depicted in 1531 phylogenetic trees. Each tip is one OR, and corresponding OE and pectoralis expression levels 1532 are shown next to the OR. (A) chicken Gallus gallus, (B) Anna’s hummingbird Calypte anna, 1533 (C) zebra finch Taeniopygia guttata, (D) brown-headed cowbird Molothrus ater. 1534 83 Log CPM Fig 3.2. OR expression levels and the total number of ORs expressed varied substantially 1535 between OE samples within species. Each column represents one OE sample from one of the 1536 four species included (chicken, hummingbird, zebra finch, cowbird). Colors show the log CPM 1537 count for individual OR genes, represented by each cell. Cell colors give expression levels for a 1538 particular OR, and cells in the same row do not necessarily correspond to the same OR, 1539 especially between different species. Zebra finch and hummingbird differences in expression 1540 between OE samples are especially strong. 1541 84 1542 High expression levels of ORs, including gamma-c ORs Compared to previous studies of the bird OE, we found relatively high expression of numerous ORs, including gamma-c ORs. Previous studies of OR expression in the bird OE showed that although a large number of ORs may be present (for example, 132 OE expressed ORs in little egret), that the majority of these ORs are expressed at low levels. For example, in little egret, all expressed ORs were below 1.5 TPM (read counts divided by length of each gene in kilobases), and all night heron ORs were expressed below 2 TPM, except one OR at 3.0 TPM (Luo et al. 2022). Additionally, only two ORs detected in the storm-petrel OE were expressed above 1.0 Log CPM (Sin et al. 2022). Of the two highly expressed ORs in the storm-petrel, one was in the alpha subfamily (OR5-11), and one was a member of the gamma subfamily (OR6-6, Sin et al. 2022). Although gamma-c ORs were present in the storm-petrel OE, all were expressed at low levels (OR family 14, Sin et al. 2022). Low OR expression levels are also reported in mammals, including humans (Olender et al. 2016). Each olfactory sensory neuron expresses only one OR, meaning that expression of any individual OR is restricted to a subset of the total number of olfactory sensory neurons, decreasing overall OR expression levels (Lomvardas et al. 2006). These previous results are consistent with our findings in hummingbird and cowbird, where all ORs expressed in the OE were relatively low when averaged across samples. In hummingbird all ORs were below expressed 1.0 Log CPM, and in cowbird only one OR was expressed above 1.0 Log CPM. However, across our zebra finch samples, we found that 32 of the ORs had expression levels above 1.0 Log CPM. Of these 32 ORs, 31 were in the gamma-c subfamily, and one OR was in the gamma subfamily. This is the highest expression level 85 reported for gamma-c ORs, and also shows that this expression level is consistent across a substantial fraction of the total zebra finch gamma-c OR repertoire, providing strong support that gamma-c ORs are integral in the olfactory system. In the chicken, we also detected higher OR expression levels than previous studies, with 18 chicken ORs expressed above 1.0 Log CPM when averaged across all chicken OE samples. In contrast to the zebra finch, these 18 ORs were diverse across OR subfamily type, including three alpha ORs, two gamma ORs, and 15 gamma-c ORs. This shows that across highly divergent lineages of birds, the expression levels of subfamilies differ substantially. This is consistent with genomic patterns that show reduced numbers of alpha and gamma ORs but increased numbers of gamma-c ORs in passerines (Chapter II), whereas Galloanseres maintains high levels of all subfamilies (Chapter II). The chicken therefore may rely on all subfamilies to detect odors, whereas zebra finch is more dependent on gamma-c. Whether gamma-c in zebra finch has replaced the functional roles of odor detection provided by alpha and gamma in chicken, or if zebra finch is simply detecting different odors, is unknown. In the chicken, the most highly expressed OR, a gamma-c OR (genomic coordinates CM000108.5_1785570_1786508) was the most highly expressed OR across all ORs and all species in our study, at 7.19 Log CPM. This is the most highly expressed bird OR ever reported, and the functional relevance of this OR could be investigated in future analyses. Few differentially expressed ORs and high variance between samples We detected relatively few differentially expressed OR when comparing OE and pectoralis muscle tissues within species. Differentially expressed ORs included a single gamma-c OR in 86 chicken and cowbird, and five ORs (one alpha, one gamma, three gamma-c) in zebra finch. All of these ORs, except one, showed higher expression in the OE as opposed to the pectoralis when measuring differential expression using the limma and edgeR packages following TPM normalization of the counts. We also performed student’s T-tests to compare our two tissues, but this did not show any significant differences. The zebra finch had the greatest overlap in expression between tissues, with 42 ORs present in both tissues and not showing differential expression between the tissues. These results suggest that for ORs present in both tissues, expression is similar, and that ORs may function in both tissues, perhaps in different functional roles. Alternatively, the ORs may function similarly across tissues, and such as performing essential “housekeeping” roles that are consistent and uniform across tissue types. For these ORs, a functional role in the olfactory system is therefore unclear despite expression in the OE. The lack of differential expression found in our study is due to the large amount of variation within OE samples of the same species. This high level of variation within the same species and tissue may mask true levels of differential expression. It is unclear why we have some individual OE samples that express all ORs at higher levels than other OE samples, even following correction for sequencing depth. There are several possibilities, including possible sampling and RNA extraction differences or errors. We performed dissections as uniformly as possible and in each case freshly sacrificed birds were dispatched and dissected in the same manner. We performed RNA extractions on different dates but we were consistent, with minimal time between removal from -80 C storage and dissolving in RNAzol. It is possible that between individuals, different parts of the OE were sampled in the final tissue sent for sequencing. The bird OE is divided into three sections, the anterior, middle, and posterior conchae (Danner et al. 2017). Although there is no evidence as to which region of the OE expresses more or fewer ORs, 87 Sin et al. specifically sampled from the anterior conchae (Sin et al 2022). We sampled from the OE generally, and did not target a particular region, therefore, slight differences in the region of the OE used for each sample may account for some of the variation that we observed. Alternatively, the variation seen between samples could reflect real biological variation between the individual birds of the same species. Specific individuals may express more ORs or be more sensitive to odorants than others. Although we controlled sex and age in the within species comparisons in this study, it is possible that other genetic factors cause different individuals of the same species to express ORs in the OE at different rates. It is also possible that OR expression in the OE is highly dynamic and dependent on some type of external stimuli. Zebra finch and chicken both showed variation between samples, and individuals from both species were sacrificed a sterile laboratory setting, and individuals came from the same enclosures. However, it is possible odorant stimuli in the air were slightly different between when each individual was sampled, and that the different birds were responding to different concentrations of odorants in the air that varied slightly between sampling efforts. These small effects could also explain the variation in number of ORs expressed, which in addition to expression levels, also varied substantially, particularly in chicken and hummingbird (Fig. 2). Future studies in more controlled settings, as well as examining variation in relation to genetic background, could help resolve the reason for this high variation. In turn, these studies could more accurately characterize differential expression between tissues after carefully controlling for this variation. Finally, our differential expression methods relied on limma and edgeR, methods used traditionally to analyze differential expression across transcriptome-wide data. Here, we apply these methods to a small set of target genes, the genomic OR repertoire for each species. It is unclear whether this alters the differential expression methods substantially. 88 However, a t-test performed on the expression data comparing tissue types also did not detect differential expression, so although there may be issues with differential expression, variation between samples remains a major issue. In the zebra finch, we detected one alpha OR that was more highly expressed in the pectoralis muscle than in the OE. This was the only instance in our dataset of the pectoralis showing significantly higher expression of an OR than the in the OE. Although this alpha OR was also present in the zebra finch OE, this expression pattern presents the interesting possibility of a bird OR with a primary function outside of olfaction, and potentially a function that is muscle specific. Further expression studies of the muscle, OE, and other tissues would help us understand the functional role of this OR in birds. Additionally, because this is an alpha OR, there are likely orthologous and paralogous relationships between the zebra finch OR and mammalian and reptilian alpha ORs (Steiger et al. 2009). It may be possible to match known expression levels or odor binding properties of the orthologous mammalian ORs to this alpha OR, to see if muscle expression is a consistent role in this alpha OR over evolutionary time. Conclusion We have shown that across multiple bird species and orders, that the majority of ORs found in bird genomes are expressed in the olfactory epithelium, solidifying the connection between genomic OR repertoire size and a species’ reliance on olfaction. We show that most ORs are lowly expressed, with a few exceptions. We also show for the first time that many gamma-c ORs are expressed in the OE, and that gamma-c ORs are often the most highly expressed ORs in the OE. This is the first time that a large gamma-c OR repertoire was shown to 89 be expressed in the OE, and we also report highest expression levels of gamma-c detected in the bird OE. Gamma-c ORs are bird-specific and OE expression studies in other vertebrate classes do not provide information about the functional role of gamma-c. We show the strongest evidence to date that this expansive bird OR subfamily has duplicated and retained duplications due to a relevance of this OR subfamily to olfaction. This expression study, the first of its kind in birds to look widely across the bird phylogeny, shows the importance of ORs and gamma-c ORs across distantly related bird lineages. By successfully detecting the expression of many ORs in the bird OE, these data will facilitate future work to select ORs for functional (“deorphanization”) experiments to identify the specific odorants that bird ORs can detect (see Saito et al. 2009). In addition to characterizing expression in the OE of bird species, we have implicated which ORs in the genomic repertoire are involved in olfaction, allowing for subsequent work to select and functionally test the unknown binding properties of bird ORs. References Audubon J.J. (1826). Account of the habits of the turkey buzzard, Vultur aura, particularly with the view of exploding the opinion generally entertained of its extraordinary power of smelling. Edinburgh New Philosophical Journal 2:172–184. ? Bonadonna F., Gagliardo A. (2021). Not only pigeons: avian olfactory navigation studied ?by satellite telemetry. Ethology Ecology, & Evolution 33:273–289. ? Buck L., Axel R. (1991). A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell 65:175–187. Buitron D., Nuechterlein G. L. (1985). Experiments on olfactory detection of food caches by black-billed magpies. Condor 87:92–95. Danner R. M., Gulson-Castillo E. R., James H. F., Dzielski S. A., Frank D. C., Sibbald E. T., Winkler D. W. (2017). Habitat-specific divergence of air conditioning structures in bird bills. The Auk 134:65–75. Dobin A., Davis C. A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., 90 Gingeras T. R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21. Driver R. J., Balakrishnan C. N. (2021). Highly contiguous genomes improve the understanding of avian olfactory receptor repertoires. Integrative & Comparative Biology 61:1281– 1290. ? Glusman G., Bahar A., Sharon D., Pilpel Y., White J., Lancet D. (2000). The olfactory receptor gene superfamily: data mining, classification, and nomenclature. Mammalian Genome 11:1016–1023. Grubb, T. C. (1972). Smell and foraging in shearwaters and petrels. Nature 237:404–405. Hill A. (1905). Can birds smell? Nature 71:318–319. ? Khan I., Yang Z., Maldonado E., Li C., Zhang G., Gilbert M. T. P., Jarvis E. D., O’Brien S. J., Johnson W. E., Antunes A. (2015). Olfactory receptor subgenomes linked with broad ecological adaptations in Sauropsida. Molecular Biology and Evolution 32:2832–2843. Kishida T., Go Y., Tatsumoto S., Tatsumi K., Kuraku S., Toda M. (2019). Loss of olfaction in sea snakes provides new perspectives on the aquatic adaptation of amniotes. Proceedings of the Royal Society B 286:20191828.? Kolmakov N. N., Kube M., Reinhardt R., Canario A. V. M. (2008). Analysis of the goldfish Carassius auratus olfactory epithelium transcriptome reveals the presence of numerous non-olfactory GPCR and putative receptors for progestin pheromones. BMC Genomics 9:429. Laurin M., Reisz R. R. (1995). A reevaluation of early amniote phylogeny. Zoological Journal of the Linnean Society 113:165–223. Lomvardas S., Barnea G., Pisapia D. J., Mendelsohn M., Kirkland J., Axel R. (2006). Interchromosomal interactions and olfactory receptor choice. Cell 126:403–413. ? Luo H., Luo S., Fang W., Lin Q., Chen X, Zhou X. (2022). Genomic insight into the nocturnal adaptation of the black-crowned night heron (Nycticorax nycticorax). BMC Genomics 23:683. Maßberg D., Hatt H. 2018. Human olfactory receptors: novel cellular functions outside of the nose. Physiological Reviews 98:1739–1763. Marchand J. E., Yang X., Chikaraishi D., Krieger J., Breer H., Kauer J. S. (2004). Olfactory receptor gene expression in tiger salamander olfactory epithelium. Journal of Comparative Neurology 474:453–467. Minh B. Q., Schmidt H. A., Chernomor O., Schrempf D., Woodhams M. D., von Haeseler A., Lanfear R. (2020). IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution 37:1530–1534. Molina-Morales M., Castro J., Albaladejo G., Parejo D. (2020). Precise cache detection by olfaction in a scatter-hoarder bird. Animal Behaviour 167:185–191. ? Niimura Y., Matsui A., Touhara K. (2014). Extreme expansion of the olfactory receptor gene 91 repertoire in African elephants and evolutionary dynamics of orthologous gene groups in 13 placental mammals. Genome Research 24:1485–1496. ? Niimura Y., Nei M. (2005). Evolutionary dynamics of olfactory receptor genes in fishes and tetrapods. Proceedings of the National Academy of Sciences 102:6039–6044. Olender T., Keydar I., Pinto J. M., Tatarskyy P., Alkelai A., Chien M., Fishilevich S., Restrepo D., Matsunami H., Gilad Y., Lancet D. (2016). The human olfactory transcriptome. BMC Genomics 17:619. Owre O. T., Northington P. O. (1961). Indication of the sense of smell in the turkey vulture, Cathartes aura (Linnaeus), from feeding tests. American Midland Naturalist 66:200–205. Quinlan A. R., Hall I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. R Core Team. (2022). R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org Ressler K. J., Sullivan S. L., Buck L. B. (1993). A zonal organization of odorant receptor gene expression in the olfactory epithelium. Cell 73:597–609. Ritchie M. E., Phipson B., Wu D., Hu Y., Law C., Shi W., Smyth G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43:e47. RNAzol® RT Brochure, (2017). Molecular Research Center, Inc. Cincinnati, OH. Robinson M. D., McCarthy D. J., Smyth G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139– 140. Robinson M. D., Oshlack A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11:R25. Saito H., Chi Q., Zhuang H., Matsunami H., Mainland J. D. (2009). Odor coding by a mammalian receptor repertoire. Science Signaling 2:ra9. Silva M. C., Chibucos M., Munro J. B., Daugherty S., Coelho M. M., Silva J. C. (2020). Signature of adaptive evolution in olfactory receptor genes in Cory’s shearwater supports molecular basis for smell in procellariiform seabirds. Scientific Reports 10:543. Sin S. Y. W., Cloutier A., Nevitt G., Edwards S. V. (2022). Olfactory receptor subgenome and expression in a highly olfactory procellariiform seabird. Genetics 220:iyab210. Steiger S. S., Kuryshev V. Y., Stensmyr M. C., Kempenaers B., Mueller J. C. (2009). A comparison of reptilian and avian olfactory receptor gene repertoires: species-specific expansion of group ? in birds. BMC Genomics 10:446. ? Strotmann J., Wanner I., Krieger J., Raming, K. Breer, H. (1992). Expression of odorant receptors in spatially restricted subsets of chemosensory neurons. Neuroreport 3:1053– 92 1056. ? Vandewege M. W., Mangum S. F., Gabaldón T., Castoe T. A., Ray D. A., Hoffmann F. G. (2016). Contrasting patterns of evolutionary diversification in the olfactory repertoires of reptile and bird genomes. Genome Biology and Evolution 8:470–480. ? Wickham H., François R., Henry L., Müller K. (2022). dplyr: A Grammar of Data Manipulation. R package version 1.0.10. https://CRAN.R-project.org/package=dplyr Wikelski M., Quetting M., Cheng Y., Fiedler W., Flack A., Gagliardo A., Salas R., Zannoni N., Williams J. (2021). Smell of green leaf volatiles attracts white storks to freshly cut meadows. Scientific Reports 11:12912. ? Yokosuba M., Hagiwara A., Saito T. R., Tsukahara N., Aoyama M., Wakabayashi Y., Sugita S., Ichikawa M. (2009). Histological properties of the nasal cavity and olfactory bulb of the Japanese jungle crow Corvus macrorhynchos. Chemical Senses 34:581–593. Yu, G. (2020). Using ggtree to visualize data on tree-like structures Current Protocols in Bioinformatics 69:e96. Zhan X., Pan S., Wang J., Dixon A., He J., Muller M. G., Ni P., Hu L., Liu Y., Hou H., Chen Y., Xia J., Luo Q., Xu P., Chen Y., Liao S., Cao C., Gao S., Wang Z., Yue Z., Li G., Yin Y., Fox N., Wang J., Bruford M. W. (2013). Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle. Nature Genetics 45:563–566. 93