Pan-tissue Transcriptome Analysis Of Long Noncoding RNAs In The .

Transcription

Kashyap et al. BMC Genomics(2020) EARCH ARTICLEOpen AccessPan-tissue transcriptome analysis of longnoncoding RNAs in the American beaverCastor canadensisAmita Kashyap1, Adelaide Rhodes2, Brent Kronmiller2, Josie Berger3, Ashley Champagne3, Edward W. Davis2,Mitchell V. Finnegan5, Matthew Geniza6, David A. Hendrix7,8, Christiane V. Löhr1, Vanessa M. Petro3,Thomas J. Sharpton9,10, Jackson Wells2, Clinton W. Epps4, Pankaj Jaiswal6, Brett M. Tyler2,6 andStephen A. Ramsey1,8*AbstractBackground: Long noncoding RNAs (lncRNAs) have roles in gene regulation, epigenetics, and molecularscaffolding and it is hypothesized that they underlie some mammalian evolutionary adaptations. However, for manymammalian species, the absence of a genome assembly precludes the comprehensive identification of lncRNAs.The genome of the American beaver (Castor canadensis) has recently been sequenced, setting the stage for thesystematic identification of beaver lncRNAs and the characterization of their expression in various tissues. Theobjective of this study was to discover and profile polyadenylated lncRNAs in the beaver using high-throughputshort-read sequencing of RNA from sixteen beaver tissues and to annotate the resulting lncRNAs based on theirpotential for orthology with known lncRNAs in other species.Results: Using de novo transcriptome assembly, we found 9528 potential lncRNA contigs and 187 high-confidencelncRNA contigs. Of the high-confidence lncRNA contigs, 147 have no known orthologs (and thus are putative novellncRNAs) and 40 have mammalian orthologs. The novel lncRNAs mapped to the Oregon State University (OSU)reference beaver genome with greater than 90% sequence identity. While the novel lncRNAs were on averageshorter than their annotated counterparts, they were similar to the annotated lncRNAs in terms of the relationshipsbetween contig length and minimum free energy (MFE) and between coverage and contig length. We identifiedbeaver orthologs of known lncRNAs such as XIST, MEG3, TINCR, and NIPBL-DT. We profiled the expression of the 187high-confidence lncRNAs across 16 beaver tissues (whole blood, brain, lung, liver, heart, stomach, intestine, skeletalmuscle, kidney, spleen, ovary, placenta, castor gland, tail, toe-webbing, and tongue) and identified both tissuespecific and ubiquitous lncRNAs.Conclusions: To our knowledge this is the first report of systematic identification of lncRNAs and their expressionatlas in beaver. LncRNAs—both novel and those with known orthologs—are expressed in each of the beavertissues that we analyzed. For some beaver lncRNAs with known orthologs, the tissue-specific expression patternswere phylogenetically conserved. The lncRNA sequence data files and raw sequence files are available via the websupplement and the NCBI Sequence Read Archive, respectively.Keywords: lncRNA, Beaver, Transcriptome, Long noncoding RNA, Castor canadensis, Expression atlas* Correspondence: stephen.ramsey@oregonstate.edu1Department of Biomedical Sciences, Oregon State University, Corvallis, OR,USA8School of Electrical Engineering and Computer Science, Oregon StateUniversity, Corvallis, OR, USAFull list of author information is available at the end of the article The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication o/1.0/) applies to the data made available in this article, unless otherwise stated.

Kashyap et al. BMC Genomics(2020) 21:153BackgroundLong noncoding RNAs (lncRNAs)—functional ribonucleic acids that do not encode proteins and are at least200 nucleotides (nt) in length [1]—regulate gene expression through diverse mechanisms including epigenetic,chromatin, and molecular scaffolding interactions. Forexample, the primary effector for X-chromosome inactivation, XIST, is a lncRNA [2]. More broadly, variousnoncoding RNAs (ncRNAs) have been implicated in hostdefense against specific pathogens and in responses tovarious stressors, including hypoxia [3, 4]. Mounting evidence implicating species-specific ncRNAs and generegulatory mechanisms in species adaptations [3, 5], including various species-specific responses to hypoxia [3,4], suggests that species-specific and taxon-specificlncRNAs may underlie some of the adaptations seen inmammalian evolution. However, out of more than fivethousand extant mammalian species (estimated as of2019), less than 90 have high-quality genome assembliesavailable (according to the Ensembl genome database [6]release 96), and for those that do not, the absence of agenome or transcriptome sequence precludes comprehensive sequencing-based identification of lncRNAs.The genome and three tissue transcriptomes of theAmerican beaver Castor canadensis (Order Rodentia,Family Castoridae) have recently been sequenced [7, 8],enabling the systematic search for molecular determinants of this semi-aquatic herbivore’s unique physiologic, anatomic, and behavioral adaptations. Forexample, the beaver’s ability to hold its breath for up tofifteen minutes [9] suggests adaptations in the brain,heart, liver, and lungs to mitigate hypoxia-associated tissue damage and optimize oxygen uptake [10]. The beaver’s abilities to digest tree bark [11] and certain toxicplants [12] may depend on adaptations of detoxifyingenzymes [13, 14] and lignocellulose-catabolizing gut microbes [15]. Such enzymatic adaptations may involvenovel lncRNAs. Indeed, lncRNAs have been implicatedin species-specific adaptations such as hibernation ingrizzly bears [16] and adaptation to cold in zebrafish[17]. Therefore, establishing a compendium of beaverlncRNAs (both novel lncRNAs and those that are orthologous to known lncRNAs in other species) is an important starting point for efforts to understand the roles ofnoncoding RNAs in regulating expression of genes thatunderlie beaver anatomy and physiology.Current high-throughput approaches for transcriptome profiling—especially for species for which only adraft reference genome is available—typically produce afragmented transcriptome [18]. As a result, in the absence of an annotated genome, delineating a lncRNAtranscript from a noncoding portion of a protein-codingtranscript poses a bioinformatics challenge. Because alncRNA is defined by not encoding a protein product, itPage 2 of 20is not possible to definitively identify a potential lncRNAby isolating a novel protein product, as is the case withan mRNA. Furthermore, lncRNAs often have weak sequence similarity across species [19], and the catalogueof validated lncRNAs outside of model vertebrates (human, mouse, rat) is incomplete. However, computationaltools are now available for accurately scoring a transcript’s coding potential based on its sequence (e.g., longest ORF and hexamer usage bias [20]), closing a keyinformatics gap for lncRNA discovery.We report on the first effort (of which we are aware)to systematically identify and map polyadenylatedlncRNAs in the American beaver. Our rationale for focusing on polyadenylated lncRNAs (vs. nonpolyadenylated lncRNAs) is twofold: (1) biologically, themajority of functional lncRNAs reported to date arepolyadenylated [21] and polyadenylated lncRNAs in general are expressed at higher abundances than nonpolyadenylated lncRNAs [22]; and (2) from a technicalstandpoint, use of poly-A selection enables strandspecific transcript profiling and avoids the requirementto validate (and ascertain the biases introduced by) theuse of ribosomal RNA (rRNA) probe reagents in a species for which the reagents have not previously beentested [23]. As the foundation for this effort, we used therecently-released Oregon State University beaver genome assembly (see Methods) and we acquired and analyzed high-throughput, short-read polyadenylated RNAsequence data from 16 beaver tissues. We designed andimplemented a computational analysis software pipelinefor (1) assembling a pan-tissue beaver transcriptome; (2)identifying candidate lncRNA contigs based on evidencefor coding potential and annotations of orthologousgenes; and (3) measuring expression levels of thelncRNA contigs in the 16-tissue atlas. We identified9528 potential lncRNA contigs which we then morestringently filtered by computational assessment of coding potential in order to minimize the number of codingtranscripts erroneously identified as lncRNAs. We thusidentified 187 putative lncRNAs in the beaver transcriptome, of which 147 appear to be novel and 40 are orthologs of known noncoding transcripts in other species,such as XIST, MEG3, TINCR, and NIPBL-DT. From themeasured expression levels of the 187 lncRNAs acrossthe 16 tissues, we (i) identified both tissue-specific andtissue-ubiquitous lncRNAs, (ii) correlated tissue expression profiles of three beaver lncRNAs with the tissue expression profiles of their orthologs and (iii) identifiedbiological pathways and biological processes that beaverlncRNAs may regulate. These results lay the groundworkfor studying the cellular and biochemical mechanismsunderlying the beaver’s unique physiology and providean analysis approach that can be used in lncRNA studiesin other species.

Kashyap et al. BMC Genomics(2020) 21:153Page 3 of 20ResultsScreening pipelineIn order to obtain a comprehensive profile of the noncoding transcriptome of the American beaver, wepaired-end sequenced polyadenylated RNA pooled fromsamples of sixteen different beaver tissues and de novoassembled a “pan-tissue” beaver polyadenylated RNAtranscriptome using Trinity (see Methods). We mergedthe transcript contigs into 86,714 non-redundant contigswhich became the basis for the remainder of the lncRNAscreen. As a test of the completeness of the pan-tissuebeaver polyadenylated RNA transcriptome, we used abenchmark set of 4014 genes (the mammalian Benchmarking Universal Single-Copy Ortholog [BUSCO]genes; see Methods) that had been previously validatedas universal single-copy orthologs across variousgenome-sequenced mammalian species [24]. We foundthat 66% of the mammalian BUSCO genes had highconfidence (E 10 5) matches to one or more contigsin the Trinity-assembled, pan-tissue, beaver polyadenylated RNA transcriptome.We filtered the 86,714 pan-tissue beaver transcript contigsto identify probable lncRNA contigs using five filtering steps,each shown in a row of Table 1: (1) identifying transcriptcontigs that have annotated orthologs in other species; thisincluded identifying contigs with lncRNA orthologs (“knownlncRNAs”, which were further curated); (2) filtering based oncontigs’ coding potential score (p 0.01) as predicted basedon their hexamer sequence content and the length of andcoverage of the transcript by the longest Open ReadingFrame (ORF); (3) more stringently filtering based on contigs’Coding Potential Assessment Tool (CPAT) score (q 0.01;see Methods) to obtain a set of high-confidence noncodingcontigs; (4) testing contigs for known protein domain sequences; and (5) aligning to the annotated reference beavergenome assembly, to determine if a transcript contig was inan untranslated region of a protein-coding gene. At Step 2,we obtained 9528 probable-noncoding contigs (seeAdditional file 3 Supplementary Data 1 for sequences). Witha more stringent cutoff to control for false discovery rate(Step 3), and including additional filtering steps (4) and (5),we found a total of 187 probable lncRNA contigs: 40 noncoding transcript contigs that are orthologous to a knownnoncoding transcript in another species such as human ormouse (“known lncRNAs”) and 147 noncoding transcriptcontigs (see Table 1, bottom row) that appear to be novelfrom a species orthology standpoint (“novel lncRNAs”) (seeAdditional file 4 Supplementary Data 2 for sequences).Length and secondary structure characterization ofknown and novel lncRNA contigsTo the extent that lncRNA biological function dependson a sufficiently stable structural conformation [25], inorder to quantitatively assess the noncoding contigs’ potential for function, we computationally modeled thesecondary structures and obtained model-based Minimum Free Energy (MFE) estimates for all 187 (knownand novel) contigs (see Methods). Both sets of lncRNAshad the expected inverse relationship between transcript(contig) length and MFE, though the relationship wasweaker in the novel lncRNAs (Fig. 1).Overall, the transcript contigs for known lncRNAswere significantly (p 10 9; Kolmogorov-Smirnov test)longer than those of the novel lncRNAs (Fig. 2).Whereas the annotated lncRNAs were in the range of204–4691 nt in length (consistent with GENCODE [26]),the putative novel lncRNA contigs were all below 400 ntin length. This is consistent with previous RNA-seqbased lncRNA studies which have tended to produceshorter contigs (less than 400 nt) even with genomeguided assembly [27, 28].In terms of read-depth coverage level in the transcriptome assembly, the distributions for the two sets of noncoding transcript contigs were both right-skewed (Fig. 3).Contigs with orthologs that are known noncoding transcripts (“known”) had higher average coverage depth(mode of 20.0, average of 369) than the noncoding transcript contigs with no known orthologs (“novel”; mode ofTable 1 Contig retention through the screening pipeline for novel lncRNAsStep% Contigs Eliminated# Contigs Eliminated# ContigsRemainingOrthology analysis (BLASTn)62.754,405 (a)32,309 novelProbable noncoding (CPAT p 0.01)70.122,7819528High confidence noncoding(CPAT q 0.01)98.19346182Pfam annotations00182align to genome and compare to MAKER annotations19.235147Columns as follows: “Step”, the name of the program or step in the screening pipeline; “% Contigs Eliminated”, the percentage of contigs from Column 4 of theprevious row in the table that were eliminated in this step of the analysis pipeline; “# Contigs Eliminated”, the number of contigs corresponding to thepercentage in Column 2; “# Contigs Remaining”, the number of contigs remaining after the row’s filtering Step was applied. The number of starting contigs beforestep 1 (“Orthology analysis”) was 86,714(a) This includes the 40 beaver contigs that we identified that are orthologs of known noncoding transcripts in other species (Fig. 9, purple rectangle). Thepercentage shown in column “% Contigs Eliminated” is for that specific step (row) relative to the number of contigs before that step.

Kashyap et al. BMC Genomics(2020) 21:153Page 4 of 20Fig. 1 Noncoding transcript contigs’ model-based structural stability is inversely correlated with length. Marks indicate lncRNA contigs that haveno known orthologs (“novel”; a) and that have known noncoding orthologs (“known”, b). The outlier in (b) is labeled by its known ortholog, XIST9.5, average of 19.4); the difference between the sets ofcontigs was not as striking for coverage as for length.The putative novel lncRNAs map back to the draft beavergenometypedensityNovel lncRNAs in the American beaverThe novel lncRNAs as a group performed similarly totheir annotated counterparts on the measures that weused to determine biological plausibility. Eight candidatelncRNAs stood out, however, for having the strongestevidence across the various measures (Table 2). Five oftypedensityAs a quality check, we aligned the 147 novel noncodingcontigs to a reference beaver genome assembly (OregonState University beaver genome assembly; see Methods).Every transcript contig aligned with upwards of 90%identity, and over 91% of putative novel lncRNA contigshad an alignment equivalent to at least 70% of the contig’s length (Additional file 1 Figure. S1). One contig(Ccan OSU1 lncRNA contig62060.1) had two nonoverlapping alignments within 33 nucleotides of eachother on the draft genome, which may indicate excisionof an intron. To further validate the 147 novel contigs,we aligned them against a completely independently-generated beaver genome assembly [7] using BLASTn(see Methods); 144 of them (all except contig72949.1,contig80019.1, and contig83657.1) aligned with a bestmatch E-value of less than 10 18. Of the 144 alignedcontigs, all of them had greater than 90% sequencemapped and 140 of them had greater than 95% 0300010300Contig Coverage DepthLength (nt)Fig. 2 The lncRNA contigs with known orthologs are longer thanthe novel lncRNA contigs. Density distributions of contig lengths forthe 147 novel noncoding transcript contigs (“novel”) and the 40noncoding transcript contigs that are orthologous to knownnoncoding transcripts (“known”)Fig. 3 In the pan-tissue transcriptome assembly, known lncRNAcontigs had overall higher coverage levels than novel lncRNAcontigs. Density distributions of contig coverage depths for the 147novel noncoding transcript contigs (“novel”) and the 40 noncodingtranscript contigs that are orthologous to known noncodingtranscripts (“known”). For both sets of noncoding transcript contigs,average depth of coverage in the assembly was not significantlycorrelated with contig length (Fig. 5)

Kashyap et al. BMC Genomics(2020) 21:153Page 5 of 20Table 2 Novel lncRNA contigs with strongest evidence across multiple correlatesContigMeasuremax(RPKM)Length (nt)MFE (kcal/mol)CoverageBLASTn Alignment Length (%)IntronicCcan OSU1 lncRNA contig41254.1367 96.826.71100.00noCcan OSU1 lncRNA contig46102.1334 103.578.42100.00no7.6Ccan OSU1 lncRNA contig46174.1333 126.516.66100.00no6.5Ccan OSU1 lncRNA contig43610.1350 140.810.2183.71no30.1Ccan OSU1 lncRNA contig44966.1341 149.811.8163.93no48.6Ccan OSU1 lncRNA contig45799.1336 7716.06100.00no8.0Ccan OSU1 lncRNA contig59927.1267 103.713.66100.75no13.0Ccan OSU1 lncRNA contig62060.1260 50.736.2569.23yes22.87.8Underlined text indicates that a particular contig was in the top ten, among all novel lncRNA contigs, for the given column feature (i.e., length, MFE, coverage, oralignment length). The BLASTn alignment length is computed as 100 (length of alignment)/(length of contig). The sixth column (Intronic) reflects whether thecontig’s alignment to the reference genome was gapped or not; a “yes” is indicative of a potential excised intron. The last column, max (RPKM), is the maximumRPKM for the contig across all tissues and was not a criteria for inclusion in the tablethese contigs were among the top ten contigs in termsof at least length and MFE. This concordance betweenlength and MFE is not surprising in light of the inverserelationship between transcript length and secondarystructural stability (Fig. 1). One novel lncRNA (CcanOSU1 lncRNA contig62060.1) was notable for havingtwo exons, as detected by gapped alignment to the beaver genome. All of the eight novel contigs had robust expression ( 6.5) in at least one tissue, as measured byReads Per Kilobase of transcript per Million (RPKM)(see Table 2; Fig. 4; Methods).Interestingly, none of the eight lncRNAs were amongthose contigs with the highest coverage. This may be explained by the weakness of the relationship betweenlength and observed coverage of novel lncRNA transcripts (Fig. 5). Furthermore, among the novel transcripts, the four contigs with exceptionally high coveragehad coverage that was, on average, 15-fold greater thanthat of the rest of the contigs. Additionally, all of thesecontigs with exceptionally high coverage were under250 nt long, while the ten longest novel lncRNAs wereover 300 nt.Beaver orthologs of known lncRNAs or known noncodingtranscript isoformsOf the 40 lncRNA contigs for which a high-confidenceortholog gene could be identified, the ortholog annotations included 16 long noncoding RNA genes, 12 noncoding antisense RNAs, ten noncoding isoforms ofprotein-coding genes, and two sense-overlapping RNAs(Table 3). The relatively large proportion (12 out of 40)of antisense RNAs is consistent with a previous reportthat antisense transcripts are highly prevalent in the human genome [29]. The list of 16 lncRNA genes includesbeaver orthologs for well-known lncRNAs such as XIST[2] (which was the longest of 187 high-confidencelncRNA contigs at 3967 nt), maternally expressed gene 3(MEG3) [30], terminal differentiation-induced noncoding RNA (TINCR) [31], and nipped-B homolog(Drosophila) long noncoding RNA bidirectional promoter (NIPBL-DT) [32].To assess the possible functional coherence of the beaver lncRNAs with known orthologs, we analyzed KEGGbiological pathway annotations for the human orthologsof the Table 3 (ortholog-mapped) lncRNAs for statisticalenrichment (see Methods). The analysis yielded sevensignificantly enriched (FDR 0.05) pathways (Table 4)whose constituent genes are (in human) significantlycorrelated in expression with the query lncRNAs.Tissue-level expression of beaver lncRNAsFollowing the lncRNA discovery phase of the analysis,we used RNA-seq to analyze lncRNA levels in the 16beaver tissues or anatomic structures (the same set oftissues from which we constructed the pooled transcriptome library): whole blood, brain, lung, liver, heart,stomach, intestine, skeletal muscle, kidney, spleen, ovaries, placenta, castor gland, tail skin, toe-webbing, andtongue. For each of the 187 contigs1 and in each of the16 tissues, we estimated the transcript abundance inRPKM (see Additional file 6 Table S2 and Methods).Heatmap visualization of the tissue-specific expressionprofiles of the 147 novel (Fig. 4) and 40 known (Fig. 6)lncRNA contigs revealed both tissue-specific and ubiquitously expressed beaver lncRNAs.Among the 147 novel lncRNA contigs, several contigsare notable: contig84039.1 has extremely high (RPKM1910) expression in castor sac relative to the other tissues (average RPKM of 64); contig81051.1 was ubiquitously expressed and had overall highest expression(average RPKM of 433); and a cluster of four contigs1In this subsection, in the interest of brevity, we identify contigswithout the “Ccan OSU1 lncRNA ” prefix.

Kashyap et al. BMC Genomics(2020) 21:153Page 6 of 20Fig. 4 Tissue-specific expression of novel lncRNAs in the American beaver. Heatmap rows correspond to the 147 contigs and columnscorrespond to the 16 tissues that were profiled. Cells are colored by log2(1 RPKM) expression level. Rows and columns are separately ordered byhierarchical agglomerative clustering and cut-based sub-dendrograms are colored (arbitrary color assignment to sub-clusters) as a guide forvisualization. Rows are labeled with abbreviated contig names, e.g., contig4731.1 instead of Ccan OSU1 lncRNA contig4731.1(contig80136.1, contig83384.1, contig72740.1, and contig83,657.1) are specifically expressed in stomach and kidney. From a tissue lncRNA expression standpoint, kidney and stomach clustered together in both the knownand novel lncRNA datasets, consistent with previousfindings from tissue transcriptome analysis [34]. Braintissue was notable for having several tissue-specificlncRNA contigs (contig76717.1, contig65642.1, andcontig43610.1). Finally, the heatmap analysis revealedthat contig44966.1 is strongly expressed (over 20 RPKM)in spleen and ovary (annotated as “gonad”), but not inother tissues (Fig. 4, left panel, fifth row from bottom); ithas no matches in the NCBI non-redundant nucleotidedatabase, lncRNAdb [35], or in RNA Central [36], suggesting that if it is indeed a functional beaver lncRNA, itis not known to be conserved in other rodents.Fig. 5 Contig average depth of read coverage in the assembly is not correlated with contig length. Marks indicate contigs that do not haveorthologs (a, 147 contigs) or that are orthologous to known noncoding transcripts (b, 40 contigs). The outlier in (b) is labeled by its knownortholog, XIST

Kashyap et al. BMC Genomics(2020) 21:153Page 7 of 20Table 3 Beaver noncoding contigs that are probable orthologs of known lncRNAs or noncoding transcriptsSymbol;annotationContigSpecies withortholog hitsHuman EnsemblGene IDAC037459.2;(antisense toCCAR2)Ccan OSU1lncRNAcontig74544.1Homo sapiensAC019068.1;antisenseCcan OSU1lncRNAcontig10709.1AC083843.1BLASTn annotationE%ID ntENSG00000253200 CCAR2 lncRNA (cell cycle and apoptosisregulator 2)8.0 10 4689Homo sapiensENSG00000233611 AC079135.1 gene, antisense lncRNA (TPA predicted)2.4 10 1277.6 143Ccan OSU1lncRNAcontig47288.1Homo sapiensENSG00000253433 AC083843.1 gene, lincRNA (TPA predicted)7.7 10 1388.4 69AC095055.1(antisense toSH3D19)Ccan OSU1lncRNAcontig41532.1Homo sapiensENSG00000270681 SH3D19 antisense noncoding RNA (SH3domain containing 19)8.1 10 58 82.9 274AC116667.1;(antisense toZFHX3)Ccan OSU1lncRNAcontig71613.1Homo sapiensENSG00000271009 ZFHX3 antisense (zinc finger homeobox 3) 1.8 10 4783.6 231AL161747.2;(antisense toSALL2)Ccan OSU1lncRNAcontig44345.1Homo sapiensENSG00000257096 SALL2 lncRNA (spalt-like transcriptionfactor 2)7.5 10 6884.4 288AP000233.2Ccan OSU1lncRNAcontig22249.1Homo sapiensENSG00000232512 AP000233.2 gene lincRNA (TPA predicted)9.0 10 5100 31AP003068.1;(antisense toVPS51)Ccan OSU1lncRNAcontig24716.1Homo sapiens, Musmusculus, Bos taurusENSG00000254501 VPS51 antisense (vacuolar protein sorting51)093.2 438AP003068.1;(antisense toVPS51)Ccan OSU1lncRNAcontig55707.1Mus musculus,ENSG00000254501 VPS51 antisense/reverse strand (vacuolarHomo sapiens, Gallusprotein sorting 51)gallus1.7 10 8392CTA-204B4.6†Ccan OSU1lncRNAcontig29141.1Homo sapiensENSG00000259758 CTA-204B4.6 gene lincRNA (TPA predicted)1206.2 10 83.5 491CTA-204B4.6Ccan OSU1lncRNAcontig30023.1Homo sapiensENSG00000259758 CTA-204B4.6 gene lincRNA (TPA predicted)1292.1 10 94.5 308DNM3OS;(antisense toDNM3)Ccan OSU1lncRNAcontig78034.1Homo sapiens;various primatesENSG00000230630 DNM3OS (DNM3 opposite strand/antisense RNA) lncRNA3.4 10 6989.8 216GNB4; lncRNAisoform*Ccan OSU1lncRNAcontig55083.1Homo sapiensENSG00000114450 GNB4 (guanine nucleotide binding protein 6.4 10 38(G protein), beta polypeptide 4)78.8 287AC007038.2;(antisense toKANSL1L)Ccan OSU1lncRNAcontig54664.1Homo sapiens, MusmusculusENSG00000272807 KANSL1L antisense transcript (KAT8regulatory NSL complex subunit 1-like)KCNA3;noncodingisoformCcan OSU1lncRNAcontig27553.1Homo sapiens, MusmusculusENSG00000177272 KCNA3 lncRNA (potassium voltage-gated2.3 10 channel, shaker-related subfamily, member 1393)85.5 502KCNA3;noncodingisoformCcan OSU1lncRNAcontig29471.1Homo sapiensENSG00000177272 KCNA3 lncRNA (potassium voltage-gated1.8 10 70channel, shaker-related subfamily, member3)78.7 475KCNA3;noncodingisoformCcan OSU1lncRNAcontig79757.1Homo sapiens7.6 10 31ENSG00000177272 KCNA3 lncRNA (potassium voltage-gatedchannel, shaker-related subfamily, member3)80.2 197KCNA3;noncodingisoformCcan OSU1lncRNAcontig81530.1Homo sapiens, MusmusculusENSG00000177272 KCNA3 lncRNA (potassium voltage-gated7.1 10 61channel, shaker-related subfamily, member3)87.7 211LINC01355Ccan OSU1lncRNAcontig54147.1Homo sapiensENSG00000261326 LINC01355 lncRNA1.1 10 40921552261251.0 10 85 87.5 295

Kashyap et al. BMC Genomics(2020) 21:153Page 8 of 20Table 3 Beaver noncoding contigs that are probable orthologs of known lncRNAs or noncoding transcripts (Continued)Symbol;annotationContigSpecies withortholog hitsHuman EnsemblGene IDBLASTn annotationLMLN;noncodingisoform*Ccan OSU1lncRNAcontig28300.1Homo sapiensENSG00000185621 LMLN (leishmanolysin-like(metallopeptidase M8 family)MEG3Ccan OSU1lncRNAcontig11359.1Homo sapiens, Musmusculus, PongoabeliiENSG00000214548 MEG3 lncRNA (maternally expressed 3)MEG3Ccan OSU1lncRNAcontig30419.1Homo sapiens,Pongo abeliiENSG00000214548 MEG3 lncRNA (maternally expressed 3)MEG3Ccan OSU1lncRNAcontig6442.1Homo sapiens, Musmusculus, PongoabeliiN4BP2L2-IT2*Ccan OSU1lncRNAcontig81871.1NIPBL-DTE%ID nt3.1 10 73 80.4 4141.6 10 933137.6 10 93313ENSG00000214548 MEG3 lncRNA (maternally expressed 3)2.2 10 123 93313Homo sapiensENSG00000281026 N4BP2L2-IT2 lncRNA (N4BPL2 intronictranscript 2)2.2 10 676.2 130Ccan OSU1lncRNAcontig25986.1Homo sapiensENSG00000285967 NIPBL lncRNA bidirectional promoter(Nipped-B homolog)3.6 10 3880.9 225PDK3; noncoding Ccan OSU1isoform*lncRNAcontig72478.1Homo sapiensENSG00000067992 PDK3 (pyruvate dehydrogenase kinase,isozyme 3)1.8 10 3784.2 171RASSF3;noncodingisoform*Ccan OSU1lncRNAcontig10200.1Homo sapiensENSG00000153179 RASSF3 (Ras associated (RalGDS/AF-6)domain family member 3)083.2 963RASSF3;noncodingisoform*Ccan OSU1lncRNAcontig10200.2Homo sapiensENSG00000153179 RASSF3 (Ras associated (RalGDS/AF-6)domain family member 3)083.3 962AC098818.2†;(antisense toBMP2K)Ccan OSU1lncRNAcontig59404.1Homo sapiensENSG00000260278 RP11-109G23.3 gene, antisense lncRNA4.5 10 5983.3 275TRIM56; senseoverlappingCcan OSU1lncRNAcontig18315.1Homo sapiensENSG00000169871 RP11-395B7.7 gene, sense overlappinglncRNA (TPA - predicted)4.7 10 2872.8 519RP11-395B7.7Ccan OSU1lncRNAcontig47935.1Homo sapiensENSG00000260336 RP11-395B7.7 gene, sense overlappinglncRNA (TPA - predicted)9.7 10 2273.9 284AC090948.1Ccan OSU1lncRNAcontig29838.1Homo sapiensENSG00000271964 RP11-415F23.2 gene, antisense lncRNA(TPA - predicted)1.5 10 2693.3 89AL591848.4†Ccan OSU1lncRNAcontig59344.1Homo sapiensENSG00000260855 RP11-439E19.10 gene, antisense lncRNA(TPA - predicted)4.9 10 496.9 32AC022893.2Ccan OSU1lncRNAcontig76877.1Homo sapiensENSG00000260838 RP11-531A24.3 gene, lincRNA (TPA predicted)3.6 10 3981.4 226AL355488.1(antisense toSLC16A4)Ccan OSU1lncRNAcontig17784.1Homo sapiensENSG00000273373 RP5-1074 L1.4 gene, antisense lncRNA (TPA 1.0 10 44- predicted)89.9 149THRB-AS1;(antisense toTHRB)Ccan OSU1lncRNAcontig53102.1Homo sapiensENSG00000228791 THRB antisense/reverse strand (thyroidhormone receptor, beta)6.8 10 1880.9 136TINCR; lncRNAisoformCcan OSU1lncRNAcontig14850.1Homo sapiensENSG00000223573 TINCR lncRNA (tissue differentiationinducing non-protein coding RNA)4.1 10 4482.2 225TUG1; lncRNAisoform

For example, the beaver's ability to hold its breath for up to fifteen minutes [9] suggests adaptations in the brain, heart, liver, and lungs to mitigate hypoxia-associated tis- sue damage and optimize oxygen uptake [10].