The whale shark genome reveals how genomic and ... - PNAS

文章推薦指數: 80 %
投票人數:10人

In the whale shark, large gene size and large neural gene size strongly ... fish genome, although both exon number and size are comparable. Skiptomaincontent PNASAugust25,2020117(34)20662-20671;firstpublishedAugust4,2020;https://doi.org/10.1073/pnas.1922576117 JessicaA.WeberaDepartmentofGenetics,HarvardMedicalSchool,Boston,MA02115;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite SeungGuParkbKoreanGenomicsCenter,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;cDepartmentofBiomedicalEngineering,SchoolofLifeSciences,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite VictorLuriadDepartmentofSystemsBiology,HarvardMedicalSchool,Boston,MA02115;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforVictorLuria SungwonJeonbKoreanGenomicsCenter,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;cDepartmentofBiomedicalEngineering,SchoolofLifeSciences,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforSungwonJeon Hak-MinKimbKoreanGenomicsCenter,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;cDepartmentofBiomedicalEngineering,SchoolofLifeSciences,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforHak-MinKim YeonsuJeonbKoreanGenomicsCenter,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;cDepartmentofBiomedicalEngineering,SchoolofLifeSciences,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite YoungjuneBhakbKoreanGenomicsCenter,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;cDepartmentofBiomedicalEngineering,SchoolofLifeSciences,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforYoungjuneBhak JeHunJuneClinomicsInc.,44919Ulsan,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforJeHunJun SangWhaKimfLaboratoryofAquaticBiomedicine,CollegeofVeterinaryMedicine,SeoulNationalUniversity,08826Seoul,RepublicofKorea;gResearchInstituteforVeterinaryScience,CollegeofVeterinaryMedicine,SeoulNationalUniversity,08826Seoul,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite WonHeeHonghHanwhaMarineBiologyResearchCenter,63642Jeju,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite SeminLeebKoreanGenomicsCenter,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;cDepartmentofBiomedicalEngineering,SchoolofLifeSciences,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforSeminLee YunSungChoeClinomicsInc.,44919Ulsan,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforYunSungCho AmirKargeriIT–ResearchComputing,HarvardMedicalSchool,Boston,MA02115;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforAmirKarger JohnW.CainjDepartmentofMathematics,HarvardUniversity,Cambridge,MA02138;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforJohnW.Cain AndreaManicakDepartmentofZoology,UniversityofCambridge,CB23EJCambridge,UnitedKingdom;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforAndreaManica SoonokKimlNationalInstituteofBiologicalResources,37242Incheon,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite Jae-HoonKimmCollegeofVeterinaryMedicineandVeterinaryMedicalResearchInstitute,JejuNationalUniversity,63243Jeju,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite JeremyS.EdwardsnDepartmentofChemistryandChemicalBiology,UNMComprehensiveCancerCenter,UniversityofNewMexico,Albuquerque,NM87131FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite Forcorrespondence: [email protected] [email protected] [email protected] JongBhakbKoreanGenomicsCenter,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;cDepartmentofBiomedicalEngineering,SchoolofLifeSciences,UlsanNationalInstituteofScienceandTechnology,44919Ulsan,RepublicofKorea;eClinomicsInc.,44919Ulsan,RepublicofKorea;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite Forcorrespondence: [email protected] [email protected] [email protected] GeorgeM.ChurchaDepartmentofGenetics,HarvardMedicalSchool,Boston,MA02115;FindthisauthoronGoogleScholar FindthisauthoronPubMed Searchforthisauthoronthissite ORCIDrecordforGeorgeM.Church Forcorrespondence: [email protected] [email protected] [email protected] ContributedbyGeorgeM.Church,June8,2020(sentforreviewDecember24,2019;reviewedbyManuelCorpasandXiaohuaHuang) Article Figures&SI Info&Metrics PDF SignificanceWesequencedandanalyzedthegenomeoftheendangeredwhaleshark,thelargestfishonEarth,andcomparedittothegenomesof84otherspeciesrangingfromyeasttohumans.Wefoundstrongscalingrelationshipsbetweengenomicandphysiologicalfeatures.Wepositthatthesescalingrelationships,someofwhichwereremarkablygeneral,moldthegenometointegratemetabolicconstraintspertainingtobodysizeandecologicalvariablessuchastemperatureanddepth.Unexpectedly,wealsofoundthatthesizeofneuralgenesisstronglycorrelatedwithlifespaninmostanimals.Inthewhaleshark,largegenesizeandlargeneuralgenesizestronglycorrelatewithlifespanandbodymass,suggestinglongergenelengthsarelinkedtolongerlifespans.AbstractTheendangeredwhaleshark(Rhincodontypus)isthelargestfishonEarthandalong-livedmemberoftheancientElasmobranchiiclade.Tocharacterizetherelationshipbetweengenomefeaturesandbiologicaltraits,wesequencedandassembledthegenomeofthewhalesharkandcompareditsgenomicandphysiologicalfeaturestothoseof83animalsandyeast.Weexaminedthescalingrelationshipsbetweenbodysize,temperature,metabolicrates,andgenomicfeaturesandfoundbothgeneralcorrelationsacrosstheanimalkingdomandfeaturesspecifictothewhalesharkgenome.Amonganimals,increasedlifespanispositivelycorrelatedtobodysizeandmetabolicrate.Severalgenomictraitsalsosignificantlycorrelatedwithbodysize,includingintronandgenelength.Ourlarge-scalecomparativegenomicanalysisuncoveredgeneralfeaturesofmetazoangenomearchitecture:Guanineandcytosine(GC)contentandcodonadaptationindexarenegativelycorrelated,andneuralconnectivitygenesarelongerthanaveragegenesinmostgenomes.Focusingonthewhalesharkgenome,weidentifiedmultiplefeaturesthatsignificantlycorrelatewithlifespan.Amongthesewereverylonggenelength,duetointronsbeinghighlyenrichedinrepetitiveelementssuchasCR1-likelonginterspersednuclearelements,andconsiderablylongerneuralgenesofseveraltypes,includingconnectivity,activity,andneurodegenerationgenes.Thewhalesharkgenomealsohasthesecondslowestevolutionaryrateobservedinvertebratestodate.Ourcomparativegenomicsapproachuncoveredmultiplegeneticfeaturesassociatedwithbodysize,metabolicrate,andlifespanandshowedthatthewhalesharkisapromisingmodelforstudiesofneuralarchitectureandlifespan.whalesharklifespanbodysizemetabolicrateneuralgenesTherelationshipsbetweenbodymass,longevity,andbasalmetabolicrate(BMR)acrossdiversehabitatsandtaxahavebeenresearchedextensivelyoverthelastcenturyandhaveledtogeneralizedrulesandscalingrelationshipsthatexplainmanyphysiologicalandgenetictrendsobservedacrossthetreeoflife.Whilethelargestextantanimalsontheplanetareaquatic,theimpactofmarinehabitatsonbodysizeandotherphysiologicalandgeneticcharacteristicsisonlybeginningtobediscovered(1).Inanefforttobetterunderstandtheselectivepressuresimposedonbodysizeinmarineenvironments,studiesofendothermicaquaticmammalshaveshownthatselectionforlargerbodysizeshasbeendrivenbytheminimizationofheatloss(2).Inectothermicvertebrates,however,therelationshipbetweenenvironmentaltemperatureandbodysizeismorecomplex.Inthesespecies,metabolicrateisdirectlydependentontemperature,anddecreasedtemperaturesarecorrelatedwithdecreasedBMRs,decreasedgrowthrates,longergenerationaltimes,andincreasedbodysizes(3,4).Thewhaleshark(Rhincodontypus)isthelargestextantfish,reacheslengthsof20m(5)and42tonsinmass(6)andhasamaximumlifespanestimatedat80y(6).Worldwidepopulationshavebeendeclining,andthewhalesharkhasbeenclassifiedasanendangeredspeciesbytheInternationalUnionforConservationofNature.Whalesharksareoneofthreespeciesoffilter-feedingsharksthatusemodifiedgillrakerstosieveplanktonandsmallnektonicpreyfromthewatercolumninamethodconvergentwiththatofthebaleenwhales(1).Unlikethetwosmallerfilter-feedingsharkspecies(Cetorhinusmaximus,Megachasmapelagios)thatinhabitcoldertemperatewaterswithincreasedpreyavailability,whalesharkshaveacosmopolitantropicalandwarmsubtropicaldistributionandhaverarelybeensightedinareaswithsurfacetemperatureslessthan21°C(7⇓–9).However,recentglobalpositioningsystem(GPS)taggingstudieshaverevealedthattheyroutinelydivetomesopelagic(200to1,000m)andbathypelagic(1,000to4,000m)zonestofeed,facingwatertemperatureslessthan4°C(10).Observationsofincreasedsurfaceoccupationfollowingdeeperdiveshaveledtothesuggestionthatthermoregulationisaprimarydriverfortheiroccupationofthewarmersurfacewaters(7,11).Sincelargerbodymassesretainheatforlongerperiodsoftime,thelargebodymassofwhalesharksmayslowtheircoolingupondivingandmaximizetheirdivetimestocolddepths,wherefoodisabundant,andcouldthusplayakeyroleinmetabolicregulation.Bodysize,environmentaltemperature,metabolicrate,andgenerationtimeareallcorrelatedwithvariationsinevolutionaryrates(12,13).Sincemanyofthesefactorsareinterconnected,modelingstudieshaveshownthatobservedevolutionaryrateheterogeneitycanbepredictedbyaccountingfortheimpactofbodysizeandtemperatureonmetabolicrate(14),suggestingthatthesefactorstogetherdrivetherateofevolutionthroughtheireffectsonmetabolism.Consistentwiththeseresults,brownbandedbambooshark,cloudycatshark,andelephantfishhavetheslowestevolutionaryratesreportedtodate(15,16).Moreover,genomesizeandintronsizehavealsobeenlinkedtometabolicrateinmultipleclades.Intronlengthvariesbetweenspeciesandplaysanimportantroleingeneregulationandsplice-siterecognition.Inananalysisofamniotegenomes,intronsizewasreducedinspecieswithmetabolicallydemandingpoweredflightandcorrelatedwithoverallreductionsingenomesize(17,18).However,sincemostpreviousstudieswerelimitedbypoortaxonomicsamplingandabsenceofgenomedataforthedeepestbranchesofthevertebratetree,comprehensivecomparativegenomicanalysesacrossgnathostomesarenecessarytogainadeeperunderstandingoftheevolutionarysignificanceofthecorrelationsbetweengenomesize,intronsize,andmetabolicdemands.Herewesequenced,assembled,andanalyzedthegenomeofthewhalesharkandcompareditsgenomeandbiologicaltraitstothoseof84eukaryoticspecieswithafocusongnathostomessuchasfishes,birds,andmammals.Inparticular,weidentifiedscalingrelationshipsbetweenbodysize,temperature,metabolicrates,andgenomicfeaturesandfoundgeneralgeneticandphysiologicalcorrelationsthatspantheanimalkingdom.Wealsoexaminedcharacteristicsuniquetothewhalesharkanditsslow-evolving,largegenome.ResultsTheWhaleSharkGenome.TheDNAofanR.typusindividualwassequencedtoadepthof164×usingacombinationofIlluminashort-insert,mate-pair,andTruSeqSyntheticLongRead(TSLR)libraries(SIAppendix,TablesS1,S2,S12,andS13),resultingina3.2-Gbgenome(SIAppendix,Fig.S1andTableS4)withascaffoldN50of2.56Mb(SIAppendix,TablesS2,S5,andS6).Asliding-windowapproachwasusedtocalculateguanineandcytosine(GC)contentandresultedinagenome-wideaverageof42%,whichissimilartothecoelacanthandelephantfish(SIAppendix,Fig.S2).Roughly50%ofthewhalesharkgenomeiscomposedoftransposableelements(TEs),whichwereidentifiedusingbothhomology-basedandabinitioapproaches(19,20).Ofthese,longinterspersednuclearelements(LINEs)madeup27%ofthetotalTEsidentified(SIAppendix,TableS7).Acombinationofhomology-basedandabinitiogenomeannotationmethods(21,22)resultedinatotalof28,483predictedprotein-codinggenes(SIAppendix,TablesS8–S11).CorrelationofPhysiologicalCharacteristicswithGenomeFeaturesacross85Taxa.BodymassisintrinsicallylinkedtophysiologicaltraitssuchaslifespanandBMR(23).Tobetterunderstandhowgenomictraitscorrelatewithphysiologicalandecologicalparameterssuchasbodyweight,lifespan,temperature,andmetabolicrate,wecomparedthewhalesharkto83animalsandyeast(SIAppendix,TablesS15andS16)usingphysiologicalandgenomicdata(Fig.1andSIAppendix,Figs.S3–S6andTableS16).Acrossthe85speciesexamined,wefindastrongpositivecorrelationwithsignificantPvaluesbetweenthelog-transformedvaluesforbodyweightandmaximumlifespan(Spearman’scorrelationcoefficientρ=0.787,Fig.2AandSIAppendix,TableS17A)andBMR(SIAppendix,TableS17AandFig.S9A,ρ=0.962,exponentB=0.688;SIAppendix,Fig.S24;n=84species,yeastisexcluded),consistentwithpreviousreports(23).Comparisonsofphysiologicaltraitsandgenomecharacteristicsacrossthese84animalsandyeastrevealedseveralgeneticfeaturesthatalsoscaledwithbodyweight.Amongthese,totalgenelength,intronlength,andgenomesizeallshowamoderatestatisticalcorrelationwithbodymass,lifespan,andBMR(ρ=0.4to0.7)(Fig.2B–EandSIAppendix,TableS17A).Theseresultsareconsistentwithpreviousfindingsofdecreasedintronsizecorrelatingwithincreasedmetabolicrates.Furthermore,genomesizeandrelativeintronsizearestronglycorrelated(ρ=0.72)(Fig.2BandSIAppendix,TableS17A),withthethreesharksandthepikabeingnotableoutliers.Moreover,genomesize,measuredasgoldenpathlength,scaleswithgenesize,measuredasthesummedlengthofexonsandintronspergene(powerlawexponentB=1.31,SIAppendix,Fig.S25).Additionally,wefindthat,unlikeinbacteria(24)andcrustaceans(25),genomesizeinChordatesscalespositivelywithtemperature(SIAppendix,Fig.S9D;B=0.97,SIAppendix,Fig.S26).Downloadfigure Openinnewtab Downloadpowerpoint Fig.1.Comparativegenomicanalysisacross85speciesrevealstraitslinkedtolifespanandbodyweight.(Top)Imageofawhaleshark.(Bottom)ThephylogenetictreewasconstructedusingtheNCBIcommontree(https://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi)withoutdivergencetimes.Therowsshowthefollowingvaluesin85species:fivegenomicparameters(A–E),goldenpathlength(F),maximumlifespan(G),bodyweight(H),maximumlifespancontrolledbyweight0.25(I),bodytemperature(optimaltemperaturesforcold-bloodedanimals)(J),basalmetabolicrate(yeastisexcluded)(K),andbasalmetabolicrateadjustedbyweight(yeastisexcluded)(L).Theexonlength(C)showslengthofexonsinthecodingregion.Theexonlengthsofyeast(medianlength=1,032bp)andfruitfly(medianlength=217bp)andtheweightofyeast(6E-14kg),fruitfly(4.43E-10kg),andnematode(1.2E-09kg)arenotshownhereastheyaretooextremetofitineachchart.Therelativeintronlength(E)wascalculatedbydividingthetotalintronlengthbetweenthefirstcodingexonandthelastcodingexonbytheCDSlength.Theninecolorsofboxesandbarsindicatebiologicalclassification(gray:Hyperoartia,Ascidiacea,Chromadorea,Insecta,andSaccharomycetes;turquoise:Chondrichthyes(thecyancolorindicateswhaleshark);lightblue:Actinopterygii;aquamarine:Sarcopterygii;darkgreen:Amphibia;lightgreen:Reptilia;darkyellow:Aves;orange:Mammalia).Downloadfigure Openinnewtab Downloadpowerpoint Fig.2.Scalingrelationshipsbetweengenomicandphysiologicpropertiesacross85species.Foreachplot,thepropertiesonthexaxisandyaxiswereusedtocalculatethe481Spearman’srankcorrelationcoefficient:(A)Log10(Maximumlifespan)andLog10(Weight);(B)genomesizeandrelativeintronlength;(C)maximumlifespanandrelativeintronlength;(D)Log10(Weight)andLog10(Relativeintronlength);(E)Log10(GenomeSize)andLog10(Weight);(F)genomesizeandCAI;(G)GCcontentandCAI;(H)GC3andCAI;and(I)intronlengthbetweencodingexonsandCAI.AllPvaluesandrho(ρ)valuesareshownatthetopofeachplot.Overlappingspeciesnamesinthesamelayerwerenotplotted.Theninedotcolorsindicatebiologicalclassification(gray:Hyperoartia,Ascidiacea,Chromadorea,Insecta,andSaccharomycetes;turquoise:Chondrichthyes[thecyancolorindicateswhaleshark];lightblue:Actinopterygii;aquamarine:Sarcopterygii;darkgreen:Amphibia;lightgreen:484Reptilia;darkyellow:Aves;orange:Mammalia).Ourcomparisonsofgenomefeaturesrevealedthatexonlengthisremarkablyconstantacrossanimals,regardlessofgenomesizeorintronlength(Fig.1C).Earlyobservationsofthisphenomenonacrosssmallnumbersoftaxaledtothesuggestionthatthesplicingmachineryimposesaminimumexonsizewhileexonskippingbeginstopredominatewhenexonsexceed∼500nucleotidesinlength(26).Interestingly,wealsofindatightcorrelation(ρ=0.975)betweentheoverallGCcontentandGC3,theGCcontentofthethirdcodonposition(SIAppendix,Fig.S9BandTableS17),whilebothfeaturesarenegativelycorrelatedwiththecodonadaptationindex(CAI)(ρ=−0.804andρ=−0.837,respectively;Fig.2GandHandSIAppendix,TableS17)inEukaryotaandnegativelycorrelatedwiththegenomesizeinMammalia(ρ=−0.440andρ=−0.456,respectively)(SIAppendix,TableS17).Theseresultsarepartiallysupportedbypreviousresearch,whichshowedthatGC3isnegativelycorrelatedwithbodymass,genomesize,andspecieslongevitywithin1,138placentalmammalorthologs(27).However,ourresultsusingwhole-genomedatadonotsupporttheGC3correlationwithbodymassandlongevity(ρ=0.074andρ=−0.267;SIAppendix,TableS17).Thus,exonandintronlengthmayaffectbodymassandlongevitythroughastrongassociationbetweenGCcontentandcodingsequencelength(28).Additionally,CAIandintronsizearemoderatelypositivelycorrelated(ρ=0.493;Fig.2IandSIAppendix,TableS17).SincetheCAIandcodonusagebiashaveaninverserelationship,thisisconsistentwiththenegativecorrelationbetweenintronlengthandcodonusagebiasinmulticellularorganisms(29).WhaleSharkLongevityandGenomeCharacteristics.Theallometricscalingrelationshipsbetweenlongevity,mass,temperature,andmetabolicratearewellestablished(23),andthelonglifespanofthewhalesharkcanbeexplainedbyitslargemassandtheextremelylowmass-andtemperature-adjustedBMR(Fig.1HandL).Therehasbeenconsiderabledebateintheliteratureovertheevolutionarycausesandconsequencesofgenomesize,particularlyasitrelatestoBMR.At3.2Gb,thewhalesharkgenomeissignificantlylargerthantheelephantfishgenome,althoughbothexonnumberandsizearecomparable.Similartothebrownbandedbamboosharkandcloudycatshark,thewhalesharkisnotableamongfishforitslongintrons(Fig.1EandSIAppendix,Figs.S3EandS4E).Analysesofsingle-copyorthologousgene(SCOG)clustersdidnotrevealanylargeintrongainsorlossesinwhaleshark,brownbandedbambooshark,orcloudycatshark(SIAppendix,Fig.S10),althoughretrotransposonanalysisrevealedasignificantexpansionofCR1-likeLINEsandPenelope-likeelementswithinintrons(Fig.3AandSIAppendix,Figs.S11–S15).TheCR1-likeLINEsarethedominantfamilyofTEsinnonavianreptilesandbirds(30).Inthesethreesharks,theproportionofCR1-likeLINEelementsaccountsformorethan10%ofthetotalintronlength(SIAppendix,Fig.S13C),whichishigherthanthatoftheanolelizard,aspeciesknownforexpandedCR1-likeLINEs(30).Thetotallengthofintronicrepetitiveelementsisasgreatasintheopossumgenome,knowntoberichinrepetitiveelements(31)(SIAppendix,Fig.S11A).Althoughthewhalesharkhasthefourthlongestrepetitiveelements(SIAppendix,Fig.S11A),ithasthehighestproportionofLINEs(SIAppendix,Fig.S12B),particularlyCR1-likeLINEsandCR1-ZenonlikeLINEs(SIAppendix,Fig.S13CandD).Inthewhalesharkgenome,38%oftheCR1-likeLINEs,39%oftheCR1-ZenonlikeLINEs,and30%ofthePenelope-likeelementsarelocatedinintronicregions(SIAppendix,Fig.S14).Strikingly,mostgenes(morethan88%)inthewhalesharkgenomehaveCR1-likeLINEelementswithintheirintrons(SIAppendix,Fig.S15),aproportionhigherthaninotherChondrichthians.Moreover,56%ofwhalesharkgenesalsohaveLINE1elements(SIAppendix,Fig.S15).Thus,thewhalesharkhasarelativelylargegenomeandlongintronsduetoanexpansionofmultipletypesofrepetitiveelements.Downloadfigure Openinnewtab Downloadpowerpoint Fig.3.Repetitiveelements,evolutionaryratemodel,andflowofgenesinthewhalesharkgenome.(A)Eachpiechartsummarizesthelengthsofpredictedintronicrepetitiveelements(labeledatthetopofpies).Valuesfrom84animalswereaveragedacrosssixclasses(Mammalia,Aves,Reptilia,Amphibia,Sarcopterygii,Actinopterygii).Thewhalesharkandtheelephantfisharelistedseparately.Yeastwasexcludedfromtheseanalyses.(B)AllpairwisedistancesfromsealampreywerecalculatedusingtheR-package“ape”(32).Thespecieswereorderedbythepairwisedistances.Theeightbarcolorsindicatebiologicalclassification(turquoise:Chondrichthyes(thecyancolorindicateswhaleshark);lightblue:Actinopterygii;aquamarine:Sarcopterygii;darkgreen:Amphibia;lightgreen:Reptilia;darkyellow:Aves;orange:Mammalia).(C)Whilemostgenes(∼58%)inthewhalesharkgenomeareancient,some(∼5.4%)areofintermediateage,afew(∼2%)areyoung,andasignificantfraction(∼34.6%)arenew.(D)Maximum-likelihoodphylogenetictreeof28species(fororderswithmorethanonememberrepresentedinour85-speciesdataset,onespecieswasrandomlyselected).Bootstrapsupportvaluesare100unlessotherwisemarkedatthenodes.Terminalbranchesarecoloredaccordingtothebiologicalclassification(gray:Hyperoartia,Ascidiacea,Chromadorea,Insecta,andSaccharomycetes;turquoise:Chondrichthyes(thecyancolorindicateswhaleshark);lightblue:Actinopterygii;aquamarine:Sarcopterygii;darkgreen:Amphibia;lightgreen:Reptilia;darkyellow:Aves;orange:Mammalia).Codonusageandtheevolutionaryageofgenesareassociatedinmetazoans(33).Interestingly,twoprincipalcomponentanalyses(PCA)ofrelativesynonymouscodonusage(RSCU)from85and79species(6specieshavingdistantcodonusagepatternswereexcluded),respectively,revealedthatthewhalesharkpatternofRSCUismostsimilartothatofthecoelacanth,withwell-separatedpatternsofRSCUforeachclass(SIAppendix,Fig.S16).Whilethewhalesharkgenomehasarelativelyshortexonlength(smallerthanthoseof25species;SIAppendix,TableS11),notably,ithasasmallernumberofexonspergenethanallbut3species(yeast,fruitfly,andbrownbandedbambooshark)(SIAppendix,Figs.S3BandS4GandTableS11).Thus,thewhalesharkcodingsequence(CDS)lengthisshorterthantheCDSlengthofallspeciesexceptthebrownbandedbamboosharkandthevasetunicate(Fig.1DandSIAppendix,Fig.S4D).EvolutionaryRateandHistoricalDemographyoftheWhaleShark.Analysesofthewhalesharkgenomeshowthatitisthesecondslowestevolvingvertebrateyetcharacterized.Arelativeratetestandtwoclusteranalysesrevealedthatthewhalesharkhasaslowerevolutionaryratethanthoseofbrownbandedbambooshark,ofcloudycatshark,andofallotherbonyvertebratesexamined,includingthecoelacanth(16)(Fig.3BandSIAppendix,Fig.S17andTablesS18–S20).Theseresultssupportpreviousworkpredictingaslowevolutionaryrateinectothermic,large-bodiedspecieswithrelativelylowbodytemperatures(comparedtosimilarlysizedwarm-bloodedvertebrates)(14).Theyalsoareconsistentwithpreviousstudiesofnucleotidesubstitutionratesinelasmobranchs,whicharesignificantlylowerthanthoseofmammals(34,35).Aphylogeneticanalysisofthe175SCOGclustersfromthewhalesharkand27otheranimalgenomes(Fig.3D)showedadivergenceoftheElasmobranchii(sharks)andHolocephali(chimaeras)roughly333millionyearsago(MYA)andoftheChondrichthyesfromthebonyvertebratesabout358MYA(Fig.3D),consistentwithpreviousestimates.Tobetterunderstandtheevolutionaryhistoryofthegeneswithinthewhalesharkgenome,weevaluatedtheageofthewhalesharkprotein-codinggenesbasedonproteinsequencesimilarity(36).Groupingthewhalesharkgenesintofourbroadevolutionaryeras,weobservedthat,whilethemajority(58%)ofgenesareancient(olderthan684MYA),afew(∼5.4%)aremiddleage(684to199MYA),fewer(∼2%)areyoung(199to93MYA),andmany(34.6%)arenew(93MYAtopresent)(Fig.3C).Normalizingthenumberofgenesbyevolutionarytimesuggeststhatgeneturnoverishighestnearthepresenttime(SIAppendix,Fig.S18).Examiningtheageofgenesshowsthatmanygenesareancientandalsothatmanygenesappearveryyoung(SIAppendix,Fig.S19).Theseresultshighlightboththeconservationofalargepartofthecodinggenomeandtheinnovativepotentialofthewhalesharkgenome,sincemanynewgeneshaveappearedwithinthelast93millionyears.LengthofNeuralGenesandCorrelationwithPhysiologicalFeatures.Genelengthhasrecentlyemergedasanimportantfeatureofneuralgenes,aslonggenesarepreferentiallyexpressedinneuraltissuesandtheirexpressionisundertighttranscriptionalandepigeneticcontrol(37).Within84animalsandyeast,wecomparedthedimensionsofaveragegeneswiththoseof10categoriesofneuralgenes(neuronalconnectivity,celladhesion,olfactoryreceptors,ionchannels,unfoldedprotein-response–associatedgenes,neuronalactivityandmemory,neuropeptides,homeoboxgenes,synapticgenes,andneurodegeneration)(Fig.4AandSIAppendix,Figs.S20andS21).Interestingly,wefoundthatneuronalconnectivitygenesarelongerthanaveragegenesinmostvertebrates,withthelengthincreasebeingsignificantinwhalesharkandmostmammals,aswellasincoelacanthandplatypus(Fig.4AandSIAppendix,Fig.S21A).Surprisingly,wefoundthatneuralgenesarescaledtoaveragegeneswithanexponentgreaterthan1(B=1.038,SIAppendix,Fig.S27A),withthewhaleshark,brownbandedbambooshark,andcloudycatsharkshowinganextremelengtheningofneuralgenes(SIAppendix,Figs.S20,S21,andS27).Moreover,wefoundthatcelladhesion,ionchannels,unfoldedprotein-response–relatedgenes,andneurodegenerationgenesareincreasedinlengthinthewhalesharkandtwoothersharkspecies(Fig.4BandSIAppendix,Fig.S28),suggestingthatthismaybeageneralfeatureofsharks.Finally,neuronalfunctionsareenrichedinlonggenesinmorethan60species(Fig.4CandSIAppendix,TableS21andDatasetS1).Downloadfigure Openinnewtab Downloadpowerpoint Fig.4.TherelationshipbetweengenelengthandneuralgenesandSCOGfamilieswithcorrelationsbetweengenelengthandmaximumlifespan,weight,andBMR.(A)Neuronalconnectivitygenesarelongerthanaveragegenesin84animals.Thexandyaxesshowtheaveragegenelengthandthegenelengthofneuronalconnectivity-relatedgenes,respectively.Thedasheddiagonallinerepresents“y=x.”Spearman’srhocorrelationcoefficientandPvalueareshowninthetoprightcorneroftheplot.(B)Ofthe12categoriesofneuralgenesthatweanalyzedinthewhalesharkgenome,severalarelongerthanaveragewhalesharkgenes.(C)MostcommonGOtermsarerelevanttoneuralfunction.GOterms,shownbasedonthenumberofspeciesinwhichtheywerefound,werecomputedwithGeneSetEnrichmentAnalysis.(D)EnrichedGOfunctionsinSCOGfamiliesinwhichrelativeintronlengthpositivelycorrelateswithmaximumlifespan.ForeachGOterm,blackboxesindicatehumangenesymbolsrepresentativeofthefamily.Todeterminewhichgenesarelinkedwiththreephysiologicaltraits(maximumlifespan,bodyweight,andBMR),weexaminedthecorrelationofgenesizeandthreephysiologicaltraitsinSCOGfamilies.Wefound172SCOGfamiliesinwhichgenelengthssignificantlycorrelatedtothreephysiologicaltraits(SIAppendix,TablesS22–S24).GeneOntology(GO)analysesofthe172SCOGfamiliesshowedstatisticalenrichmentofbiologicalprocessessuchasregulationoftelomeremaintenanceviatelomerase(GO:0032210—CCT3,CDK7,MAPKAPK5,NAT10,andXRCC5)andtRNAmetabolicprocesses(GO:0006399—DDX1,EEF1E1,KARS,LARS2,NARS2,NAT10,PUS10,RARS2,RTCB,andZBTB8OS;Fig.4DandSIAppendix,TablesS22andS23),bothofwhichareassociatedwithlongevityandcancer(38,39).Furthermore,thesizeofDLD,agenewithproteolyticactivityandmetabolicfunction(40,41),significantlycorrelateswithBMR(ρ=0.67)(SIAppendix,Fig.S23).Theseresultssuggestthatanevolutionaryrelationshipexistsbetweengenesizeandseveralphysiologicaltraitssizesuchasbodysize,metabolicrate,andlifespan.Thisholdsparticularlyforgeneswhosefunctionsareessentialforlivinglonglives,suchastelomeremaintenanceandmetabolicactivity.DiscussionWesequencedandassembledthegenomeofthewhaleshark(R.typus),anendangeredspeciesthatisthelargestextantfishonEarth.Wecomparedittothegenomesof84eukaryoticspecies,andrelatedgenomictraitstophysiologicaltraitsandenvironmentalvariablessuchastemperaturetounderstandhowecologicalconstraintsshapegenomes.Severalmajorfindingsemergedfromourcomparativeevolutionaryanalyses.First,at3.2Gb,therelativelylargegenomeofthewhalesharkisthesecondslowestevolvingvertebrategenomefoundtodateandhasastrikingnumberofCR1-likeLINEtransposableelements.Second,inmostgenomes,wefoundthatmajorgenomictraits,includingintronlengthandgenelength,scalewithbodysize,temperature,andlifespan.Somegenomictraitsarecorrelatedtometabolicrate,whichscaleswithbothanimalmassandtemperature,thusreflectingbothphysiologyandenvironment.Theseresultssuggestthatecologicalvariablesmoldbothgenomesandmorphology.Third,wefoundthatGCcontentandcodonadaptationindexarenegativelycorrelated.Furthermore,whilethecorrelationofGC3contentandoverallgenomicGCcontenthadpreviouslybeenestablished(42),weextendedthevalidityofthiscorrelationtoawiderangeofspecies,denselysamplingchordategenomesatgenomescale.Fourth,unexpectedly,wefoundthatneuralconnectivitygenesaresubstantiallylongerthanaveragegenes.Whileithaspreviouslybeenobservedthatneuralgenesarelongerthanaveragegenesinthehumangenome,ourcomparativeanalysishasdramaticallyextendedtherangeofthisobservationtomorethan80species.Interestingly,wefoundthatintronsarelongerinthesharkgenomesthaninmostotherspeciesduetothehighproportionofrepetitiveelements.Finally,wefoundthatneuralgenesofseveraltypes,includingneurodegenerationgenes,aremuchlongerthanaveragegenesinspecieswithlonglifespans.Asageneralapproach,studyingwhetherdistinctquantitativetraitsarecorrelatedatvastlydifferentspatialandtemporalscalesisanimportantdiscoverytool.First,forpairsoftraitsthecorrelationofwhichwasnotanticipated,suchasintronsizeandbodyweight,thequantificationofscalingenablesthegenerationofmechanistichypotheses.Second,examiningtherelationshipsbetweenquantitativetraitsonalargeevolutionaryscale,aswehavedonehereinagroupof85EukaryoticspeciescenteredonChordates,enablestheidentificationofthemathematicalfunctionsthatbestdescribetherelationshipsbetweentraits.Forsomepairsoftraits,thesefunctionscanbeexpressedaspowerlawequationsthatmaybesuccinctlysummarizedasscalingexponents.Itshouldalsobenotedthat,formanyofthestronglycorrelatedtraits,therearenotableoutliers,suchasthebowheadwhale,incomparisonsoflongevity.Whentraitssuchasgenomesizeandlifespancorrelate,large-scaleevolutionarycomparisonscanbeusedtoidentifytheoutlierspeciesthataremostsuitedforaddressingparticularresearchquestions.Together,theseresultsshowthepowerofthecomparativeevolutionaryapproachandofmathematicalmodelingtouncoverbothgeneralandspecificrelationshipsthatrevealhowgenomearchitectureisshapedbysizeandecology.MethodsSamplePreparationandSequencing.GenomicDNAwasisolatedfromhearttissueacquiredfroma7-y-old,4.5-mdeceasedmalewhalesharkfromtheHanwhaAquarium,Jeju,Korea.DNAlibrarieswereconstructedusingaTruSeqDNAlibrarykitfortheshort-readlibrariesandaNexteraMatePairsampleprepkitforthemate-pairlibraries.SequencingwasperformedusingtheIlluminaHiSeq2500platform.Librariesweresequencedtoacombineddepthof164×(SIAppendix,TablesS1andS2).GenomeAssemblyandAnnotation.Readswerequality-filtered(SIAppendix,TableS3),andtheerror-correctedreadsfromtheshortinsertsizelibraries(<1kb)andmate-pairlibraries(>1kb)wereusedtoassemblethewhalesharkgenomeusingSOAPdenovo2(43).AsthequalityoftheassembledgenomecanbeaffectedbytheK-mersize,weusedmulti-K-mervalues(minimum45tomaximum63)withthe“all”commandintheSOAPdenovo2package(43).Thegapsbetweenthescaffoldswereclosedintwoiterationswiththeshortinsertlibraries(<1kb)usingtheGapCloserprogramintheSOAPdenovo2package(43).WethenalignedtheshortinsertsizereadstothescaffoldsusingBWA-MEM(44)withdefaultoptions.VariantswereidentifiedusingSAMtools(45)andscaffoldswereerrorcorrectedbysubstitutingtheshortinsertreadallele.Forheterozygousmappedalleles,thefirstvariantwassubstituted.Finally,wemappedtheIlluminaTruSeqTSLRstotheassembly,correctedthegapscoveredbythesyntheticlongreadstoreduceerroneousgapregionsintheassembly(SIAppendix,TablesS5andS13),andassessedthegenomeassemblyandgenomecompletenessusingtheBUSCOapproach(SIAppendix,TableS14)(46).TheGCdistributionofthewhalesharkgenomewascalculatedusingasliding-windowapproach.Weemployed10-kbslidingwindowstoscanthegenomeandcalculatetheGCcontent.TandemrepeatswerepredictedusingtheTandemRepeatsFinderprogram(version4.07)(47).TEswereidentifiedusingbothhomology-basedandabinitioapproaches.TheRepbasedatabase(version19.02)(48)andRepeatMasker(version4.0.5)(19)wereusedforthehomology-basedapproach,andRepeatModeler(version1.0.7)(20)wasusedfortheabinitioapproach.Allpredictedrepetitiveelementsweremergedusingin-housePerlscripts.Twocandidategenesetswerebuilttopredicttheprotein-codinggenesinthewhalesharkgenomeusingAUGUSTUS(22)andEvidenceModeler(21),respectively(SIAppendix,1.7Annotationofprotein-codinggenes).GenomicContextCalculations.From85species(SIAppendix,TableS15),wecomputedthefollowinggenomicfactors:GC3(GCcontentatthirdcodonposition),CAI,numberandlengthofcodingexon(s),andrelativeintronlengthbetweenthefirstandlastexon(orcodingexon).CDSsequenceswithprematurestopcodonsandlengthsthatwerenotmultiplesofthreewereexcluded.Therelativeintronlengthwascalculatedbydividingthetotalintronlengthbetweenfirstandlastexon(orcodingexon)bytheCDSlength(ormessengerRNAlength).GC3wascomputedfromconcatenatedthirdcodonnucleotides(49).WemeasuredRSCUusingthemethodfromSharpetal.(50)andtheCAIinaCDSusingSharpandLi’smethod(51)foreachofthe85species.ThePCAonRSCUwasperformedusingtheRpackages(version3.3.0)(52)ggplot2(53)andggfortify(54).OrthologousGeneFamilyClusteringandPhylogenyConstruction.Toidentifyorthologousgenefamiliesamongthewhalesharkandtheother85species,wedownloadedallpair-wisereciprocalBLASTPresultsusingthe“peptidealignfeature”intheEnsemblgenomedatabaseproject(release86)(55).Togeneratepair-wiseorthologuesthatwerenotavailableintheEnsemblresources,weperformedreciprocalBLASTP(56)withthe“-evalue1e-05-segno-max_hsps_per_subject1-use_sw_tback”options.Fromthepair-wisereciprocalBLASTPresultsamongthe85species,wegeneratedsimilaritymatricesbyconnectingpossibleorthologouspairs.Toconstrainthecomputationalload,wedidnotjoinadditionalnodeswhenthenumberofnodeswaslargerthan1,500.ThenormalizedweightsforthesimilaritymatrixwerecalculatedusingtheOrthoMCLapproach(57).Weidentifiedorthologousgenefamiliesusinganin-houseC++scriptbasedontheMarkovCluster(MCL)algorithm(58)withinflationindex1.3.Atotalof1,556,795geneswereassignedto245,314clustersincluding209,992singletons,and175single-copygenefamilieswereextractedfrom28species.MultiplesequencealignmentswereperformedusingMUSCLE3.8.31(59)andwereconcatenatedwithoutgapregions.ThephylogenetictreewasconstructedusingRAxML8.2(60)withmaximumlikelihood(1,000bootstraps),usingthePROTCATLGaminoacidsubstitutionmodel(Fig.3D).GeneAgeEstimation.PhylostratigraphyusesBLASTP-scoredsequencesimilaritytoestimatetheminimalageofeveryprotein-codinggene.TheNationalCenterforBiotechnologyInformation(NCBI)nonredundantdatabasewasqueriedwithaproteinsequencetodetectthemostdistantspeciesinwhichasufficientlysimilarsequenceispresentandthenpositthatthegeneisatleastasoldastheageofthecommonancestor(36).UsingNCBItaxonomyforeveryspecies,thetimingoflineagedivergenceeventswasestimatedwithTimeTree(61).Tofacilitatedetectionofproteinsequencesimilarity,weusedthee-valuethresholdof10−3.Weevaluatedtheminimalevolutionaryageofallprotein-codinggenestheproteinsequencelengthsofwhicharebetween40aminoacidsand4,000aminoacids.First,wecountedthenumberofgenesineachphylostratum(PS),fromthemostancient(PS1,cellularorganisms)tothemostrecent(PS20,R.typus).Itshouldbenotedthat,withintheRhincodontidaefamily(PS18)andtheRhincodongenus(PS19),thewhalesharkiscurrentlytheonlyspecieswithasequencedgenome.Therefore,thelargenumberofgenesthatappearedspecies-specific(7,647genesinphylostratumR.typus,SIAppendix,Fig.S19)mayincludemarginallyoldergenesthatarerestrictedtothegenusRhincodon(PS19)ortothefamilyRhincodontidae(PS18),butcannotpresentlybeassignedtothesetwophylostratauntiladditionalhigh-qualitygenomesaresequencedandassembledforspeciesintheseclades.Toevaluatebroadevolutionarypatterns,weaggregatedthecountsfromseveralphylostrataintofourbroadevolutionaryeras:ancient(PS1to7,cellularorganismstoDeuterostomia,4,204to684MYA),middle(PS8to14,ChordatatoSelachii,684to199MYA),young(PS15to17,GaleomorphatoOrectolobiformes,199to93.2MYA),andnewest(PS18to20,RhincodontidaetoR.typus,93.2MYAtopresent).Toestimatethegeneflowpertimeunit,wenormalizedthenumberofgenesinanerabytheageandthedurationofthatevolutionaryera.CorrelationTestsinOrthologousGeneFamilies.Fromthese85species,weselected9,180SCOGgenefamiliesfoundinatleast40speciestocalculatethecorrelationbetweengenelengthandthreephysiologicalproperties(themaximumlifespan,bodyweight,andBMR).Weidentifiedgenefamiliesthathadsignificantcorrelationsbetweengenelengthandmaximumlifespan(3,521genes),bodymass(2,620genes),andBMR(3,267genes).ThestatisticalsignificanceofcorrelationswasevaluatedbycalculatingSpearman’srho(ρ)correlationcoefficientandapplyingtheBenjamini–Hochbergadjustment(adjustedPvalue≤0.05).Allofthesegenefamiliesweresubjecttoalignmentfilteringcriterionthatincludedmorethan50%ofconservedexon–exonboundaries(intronposition)intheirCDSalignments.Thisstepreducestheeffectofgenelengthchangesduetointrongainorlossandincreasestheaccuracyofmultiplesequencealignments(SIAppendix,Fig.S23).Finally,weacquiredfoursetsofgenefamilieswhereweobservedcorrelationsbetweengenelengthandthreeproperties:1)18genefamiliesinwhichgenelengthcorrelatedwiththemaximumlifespanonly(SIAppendix,TableS24),2)3genefamiliescorrelatedwiththebodyweightonly(SIAppendix,TableS24),3)7genefamiliescorrelatedwiththeBMRonly,and4)148genefamiliescorrelatedwithallthreephysiologicalproperties(SIAppendix,TableS23).StatisticalAnalysis.Forallpairsofmedianvaluesofphysiologicalandgenomicfeaturesassessedinthisanalysisof85species,theSpearman’srankcorrelationcoefficientrho(ρ)valueswerecalculatedusingthecor.testpackageinRwiththefollowingoptions:method=“spearman,”exact=“true,”andwereplottedusingtheggplot2package.IftheresultingPvalueswerelowerthan2.2e-16,thesmallestvalueoutputusingthispackage,P<2.2e-16,waslisted,ratherthananexactvalue.Forselectedpairsofvalues,theplotsweredisplayedusingmedianvaluesorlog10-transfomedvalues,asappropriate.Second,fortheninepairsofgenomicandphysiologicalfeaturesthescalingcorrelationsofwhichwereevaluatedinFig.2,therobustnessoftheSpearman’scorrelationcoefficientswascalculatedusingaleave-one-outjackknifingprocedure(62),performedusingtheStatPlusprogram(n=85species,85iterations).TheresultsarereportedinSIAppendix,TableS17B,wheretheSpearman’scorrelationcoefficientvalues(ρ)areshownalongwithmeasuresofρvariation(minimum,maximum,andSD).Third,thevariability,skewness,andkurtosiswereevaluatedforallmedianvaluesofphysiologicalandgenomicfeaturesandwerealsoevaluatedforthenineSpearman’scorrelationcoefficientdistributionsgeneratedbyjackknifing.Fourth,foreveryphysiologicalandgenomicfeatureassessedinthisanalysis,pairwisecomparisonsbetweenthe85speciesweredoneastwo-sidedWilcoxonrank-sumtestsanddisplayedascorrelationmatrices.AllPvalueswereadjustedusingtheBenjamini–Hochbergprocedure,log-transformed,anddisplayedinacolorscalerangingfrom0.000to0.01;valueshigherthan0.01areshowningray.ScalingAnalysis.TheadjustmentofthebasalmetabolicratetomassisbasedonGillooly’sEq.1(14)relatingthemass-adjustedbasalmetabolicratetomassandtemperatureB=b0M–1/4e–E/kT,whereB=basalmetabolicrate,b0isacoefficientindependentofbodysizeandtemperature,M=organismmass,Ei=averageactivationenergyforenzyme-catalyzedbiochemicalreactionsofmetabolism(∼0.65eV),T=absolutetemperature(forpoikiloterms,theenvironmentaltemperatureatwhichtheorganismlives;forhomeotherms,oftheorganismitself),ande–E/kT=ArrheniusorBoltzmannfactor,whichincludesk=8.62*10–5eV⋅K–1(Boltzmannconstant).Furthermore,wecomparedBMRvaluesthatweremeasuredexperimentallytoBMRvaluescalculatedwithGillooly’sEq.1(14).WefoundaverystrongcorrelationbetweenmeasuredandcalculatedBMRvalues(SIAppendix,Fig.S7A,Spearman’sρ=0.954,n=24species;SIAppendix,Fig.S7B,ρ=0.935,n=21speciesexcludingcattle,pig,andhuman,whichhaveveryhighBMRs).Sincethewhalesharkroutinelydivestocold,deepdepths,wealsocalculatedthewhalesharkBMRacrossitstemperaturerange(SIAppendix,Fig.S8).DataandMaterialsAvailability.Thewhalesharkwhole-genomeprojecthasbeendepositedintheINSDC:InternationalNucleotideSequenceDatabaseCollaborationunderaccessionno.QPMN00000000.TheversiondescribedinthispaperisversionQPMN01000000.DNAsequencingreadshavebeenuploadedtotheNCBISequenceReadArchive(SRP155581).TheC++codeusedfortheMCLalgorithmwasuploadedtotheGitHubrepository(https://github.com/jsungwon/MCL-clustering).AcknowledgmentsWethankDr.MarkErdmannforgenerouslyprovidingthewhalesharkphotographusedinFig.1.PortionsofFig.3werecreatedwithhttps://biorender.com/.V.L.thanksMarcW.Kirschner(M.W.K.)andAnneO’Donnell-Luria(A.O.L.)fordiscussionsandsupport.ThisworkwassupportedbytheGenomeKoreaProjectinUlsan(800genomesequencing)ResearchFund(1.180017.01)oftheUlsanNationalInstituteofScienceandTechnology(UNIST);theGenomeKoreaProjectinUlsan(200genomesequencing)ResearchFund(1.180024.01)ofUNIST;NIHGrantsR01HD073104andR01HD091846(toM.W.K.);andtheWilliamRandolphHearstFundAwardandaBostonChildren’sHospitalCareerDevelopmentFellowship(toA.O.L.).Footnotes↵1J.A.W.,S.G.P.,andV.L.contributedequallytothiswork.↵2J.S.E.,J.B.,andG.M.C.contributedequallytothiswork.↵3Towhomcorrespondencemaybeaddressed.Email:jsedwards{at}salud.unm.edu,jongbhak{at}genomics.org,orgchurch{at}genetics.med.harvard.edu.Authorcontributions:J.A.W.,V.L.,Y.S.C.,J.B.,andG.M.C.designedresearch;H.-M.K.,S.W.K.,W.H.H.,Y.S.C.,S.K.,andJ.-H.K.performedresearch;J.A.W.,S.G.P.,V.L.,S.J.,H.-M.K.,Y.J.,Y.B.,J.H.J.,S.L.,A.K.,andJ.W.C.analyzeddata;andJ.A.W.,S.G.P.,V.L.,S.J.,A.M.,J.S.E.,J.B.,andG.M.C.wrotethepaper.Reviewers:M.C.,CambridgePrecisionMedicineLimited;andX.H.,UniversityofCaliforniaSanDiego.Theauthorsdeclarenocompetinginterest.Datadeposition:Thewhalesharkwhole-genomeprojectdatahavebeendepositedatINSDC:InternationalNucleotideSequenceDatabaseCollaboration(accessionno.QPMN00000000).TheversiondescribedinthispaperisversionQPMN01000000.DNAsequencingreadshavebeenuploadedtotheNationalCenterforBiotechnologyInformationSequenceReadArchive(SRP155581).TheC++codeusedfortheMarkovCluster(MCL)algorithmwasuploadedtotheGitHubrepository(https://github.com/jsungwon/MCL-clustering).Thisarticlecontainssupportinginformationonlineathttps://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1922576117/-/DCSupplemental.PublishedunderthePNASlicense. References↵J.A.Goldbogenetal.,Whywhalesarebigbutnotbigger:Physiologicaldriversandecologicallimitsintheageofoceangiants.Science366,1367–1372(2019).OpenUrlAbstract/FREEFullText↵W.Gearty,C.R.McClain,J.L.Payne,Energetictradeoffscontrolthesizedistributionofaquaticmammals.Proc.Natl.Acad.Sci.U.S.A.115,4194–4199(2018).OpenUrlAbstract/FREEFullText↵D.Atkinson,Temperatureandorganismsize:Abiologicallawforectotherms?Adv.Ecol.Res.25,1–58(1994).OpenUrl↵D.Atkinson,B.J.Ciotti,D.J.Montagnes,Protistsdecreaseinsizelinearlywithtemperature:Ca.2.5%degreesC(-1).Proc.Biol.Sci.270,2605–2611(2003).OpenUrlCrossRefPubMed↵C.-T.Chen,“PreliminaryreportonTaiwan’swhalesharkfishery”inElasmobranchBiodiversity,ConservationandManagement:ProceedingsoftheInternationalSeminarandWorkshop,Sabah,Malaysia,July1997,S.Fowler,T.M.Reed,F.A.Dipper,Eds.(IUCN,Gland,Switzerland,1997),pp.162–167.↵H.H.Hsu,S.J.Joung,R.E.Hueter,K.M.Liu,Ageandgrowthofthewhaleshark(Rhincodontypus)inthenorth-westernPacific.Mar.Freshw.Res.65,1145–1154(2014).OpenUrl↵J.G.Colman,Areviewofthebiologyandecologyofthewhaleshark.J.FishBiol.51,1219–1234(1997).OpenUrlCrossRef↵D.Rowat,K.S.Brooks,Areviewofthebiology,fisheriesandconservationofthewhalesharkRhincodontypus.J.FishBiol.80,1019–1056(2012).OpenUrlCrossRefPubMed↵A.M.Sequeira,C.Mellin,L.Floch,P.G.Williams,C.J.Bradshaw,Inter-oceanasynchronyinwhalesharkoccurrencepatterns.J.Exp.Mar.Biol.Ecol.450,21–29(2014).OpenUrl↵J.P.Tyminski,R.delaParra-Venegas,J.GonzálezCano,R.E.Hueter,Verticalmovementsandpatternsindivingbehaviorofwhalesharksasrevealedbypop-upsatellitetagsintheeasternGulfofMexico.PLoSOne10,e0142156(2015).OpenUrlCrossRef↵M.Thums,M.Meekan,J.Stevens,S.Wilson,J.Polovina,Evidenceforbehaviouralthermoregulationbytheworld’slargestfish.J.R.Soc.Interface10,20120477(2013).OpenUrlCrossRef↵A.P.Martin,S.R.Palumbi,Bodysize,metabolicrate,generationtime,andthemolecularclock.Proc.Natl.Acad.Sci.U.S.A.90,4087–4091(1993).OpenUrlAbstract/FREEFullText↵C.D.Laird,B.L.McConaughy,B.J.McCarthy,Rateoffixationofnucleotidesubstitutionsinevolution.Nature224,149–154(1969).OpenUrlCrossRefPubMed↵J.F.Gillooly,A.P.Allen,G.B.West,J.H.Brown,TherateofDNAevolution:Effectsofbodysizeandtemperatureonthemolecularclock.Proc.Natl.Acad.Sci.U.S.A.102,140–145(2005).OpenUrlAbstract/FREEFullText↵B.Venkateshetal.,Elephantsharkgenomeprovidesuniqueinsightsintognathostomeevolution.Nature505,174–179(2014).OpenUrlCrossRefPubMed↵C.T.Amemiyaetal.,TheAfricancoelacanthgenomeprovidesinsightsintotetrapodevolution.Nature496,311–316(2013).OpenUrlCrossRefPubMed↵Q.Zhang,S.V.Edwards,Theevolutionofintronsizeinamniotes:Aroleforpoweredflight?GenomeBiol.Evol.4,1033–1043(2012).OpenUrlCrossRefPubMed↵A.Kapusta,A.Suh,C.Feschotte,Dynamicsofgenomesizeevolutioninbirdsandmammals.Proc.Natl.Acad.Sci.U.S.A.114,E1460–E1469(2017).OpenUrlAbstract/FREEFullText↵J.A.Bedell,I.Korf,W.Gish,MaskerAid:AperformanceenhancementtoRepeatMasker.Bioinformatics16,1040–1041(2000).OpenUrlCrossRefPubMed↵G.Abrusán,N.Grundmann,L.DeMester,W.Makalowski,TEclass:Atoolforautomatedclassificationofunknowneukaryotictransposableelements.Bioinformatics25,1329–1330(2009).OpenUrlCrossRefPubMed↵B.J.Haasetal.,AutomatedeukaryoticgenestructureannotationusingEVidenceModelerandtheProgramtoAssembleSplicedAlignments.GenomeBiol.9,R7(2008).OpenUrlCrossRefPubMed↵M.Stanke,B.Morgenstern,AUGUSTUS:Awebserverforgenepredictionineukaryotesthatallowsuser-definedconstraints.NucleicAcidsRes.33,W465-7(2005).OpenUrlCrossRefPubMed↵G.B.West,J.H.Brown,B.J.Enquist,Ageneralmodelfortheoriginofallometricscalinglawsinbiology.Science276,122–126(1997).OpenUrlAbstract/FREEFullText↵N.Sabath,E.Ferrada,A.Barve,A.Wagner,Growthtemperatureandgenomesizeinbacteriaarenegativelycorrelated,suggestinggenomicstreamliningduringthermaladaptation.GenomeBiol.Evol.5,966–977(2013).OpenUrlCrossRefPubMed↵K.Alfsnes,H.P.Leinaas,D.O.Hessen,Genomesizeinarthropods:Differentrolesofphylogeny,habitatandlifehistoryininsectsandcrustaceans.Ecol.Evol.7,5939–5947(2017).OpenUrlCrossRef↵D.A.Sterner,T.Carlo,S.M.Berget,Architecturallimitsonsplitgenes.Proc.Natl.Acad.Sci.U.S.A.93,15081–15085(1996).OpenUrlAbstract/FREEFullText↵J.Romiguier,V.Ranwez,E.J.Douzery,N.Galtier,ContrastingGC-contentdynamicsacross33mammaliangenomes:Relationshipwithlife-historytraitsandchromosomesizes.GenomeRes.20,1001–1009(2010).OpenUrlAbstract/FREEFullText↵J.L.Oliver,A.Marín,ArelationshipbetweenGCcontentandcoding-sequencelength.J.Mol.Evol.43,216–223(1996).OpenUrlCrossRefPubMed↵A.E.Vinogradov,Intron-genomesizerelationshiponalargeevolutionaryscale.J.Mol.Evol.49,376–384(1999).OpenUrlCrossRefPubMed↵A.Suhetal.,MultiplelineagesofancientCR1retroposonsshapedtheearlygenomeevolutionofamniotes.GenomeBiol.Evol.7,205–217(2014).OpenUrlPubMed↵A.J.Gentlesetal.,Evolutionarydynamicsoftransposableelementsintheshort-tailedopossumMonodelphisdomestica.GenomeRes.17,992–1004(2007).OpenUrlAbstract/FREEFullText↵E.Paradis,J.Claude,K.Strimmer,APE:AnalysesofphylogeneticsandevolutioninRlanguage.Bioinformatics20,289–290(2004).OpenUrlCrossRefPubMed↵Y.Prat,M.Fromer,N.Linial,M.Linial,Codonusageisassociatedwiththeevolutionaryageofgenesinmetazoangenomes.BMCEvol.Biol.9,285(2009).OpenUrlCrossRefPubMed↵A.P.Martin,G.J.Naylor,S.R.Palumbi,RatesofmitochondrialDNAevolutioninsharksareslowcomparedwithmammals.Nature357,153–155(1992).OpenUrlCrossRefPubMed↵A.P.Martin,Substitutionratesoforganelleandnucleargenesinsharks:Implicatingmetabolicrate(again).Mol.Biol.Evol.16,996–1002(1999).OpenUrlPubMed↵T.Domazet-Loso,J.Brajković,D.Tautz,Aphylostratigraphyapproachtouncoverthegenomichistoryofmajoradaptationsinmetazoanlineages.TrendsGenet.23,533–539(2007).OpenUrlCrossRefPubMed↵H.W.Gabeletal.,DisruptionofDNA-methylation-dependentlonggenerepressioninRettsyndrome.Nature522,89–93(2015).OpenUrlCrossRefPubMed↵K.L.Rudolphetal.,Longevity,stressresponse,andcancerinagingtelomerase-deficientmice.Cell96,701–712(1999).OpenUrlCrossRefPubMed↵Z.Keetal.,Translationfidelitycoevolveswithlongevity.AgingCell16,988–993(2017).OpenUrl↵N.E.Babady,Y.P.Pang,O.Elpeleg,G.Isaya,Crypticproteolyticactivityofdihydrolipoamidedehydrogenase.Proc.Natl.Acad.Sci.U.S.A.104,6158–6163(2007).OpenUrlAbstract/FREEFullText↵M.H.Odièvreetal.,AnovelmutationinthedihydrolipoamidedehydrogenaseE3subunitgene(DLD)resultinginanatypicalformofα-ketoglutaratedehydrogenasedeficiency.Hum.Mutat.25,323–324(2005).OpenUrlCrossRefPubMed↵J.P.Thiery,G.Macaya,G.Bernardi,Ananalysisofeukaryoticgenomesbydensitygradientcentrifugation.J.Mol.Biol.108,219–235(1976).OpenUrlCrossRefPubMed↵R.Luoetal.,SOAPdenovo2:Anempiricallyimprovedmemory-efficientshort-readdenovoassembler.Gigascience1,18(2012).OpenUrlCrossRefPubMed↵H.Li,Aligningsequencereads,clonesequencesandassemblycontigswithBWA-MEM.arXiv:1303.3997(16March2013).↵H.Lietal.;1000GenomeProjectDataProcessingSubgroup,Thesequencealignment/mapformatandSAMtools.Bioinformatics25,2078–2079(2009).OpenUrlCrossRefPubMed↵F.A.Simão,R.M.Waterhouse,P.Ioannidis,E.V.Kriventseva,E.M.Zdobnov,BUSCO:Assessinggenomeassemblyandannotationcompletenesswithsingle-copyorthologs.Bioinformatics31,3210–3212(2015).OpenUrlCrossRefPubMed↵G.Benson,Tandemrepeatsfinder:AprogramtoanalyzeDNAsequences.NucleicAcidsRes.27,573–580(1999).OpenUrlCrossRefPubMed↵J.Jurkaetal.,RepbaseUpdate,adatabaseofeukaryoticrepetitiveelements.Cytogenet.GenomeRes.110,462–467(2005).OpenUrlCrossRefPubMed↵T.Tatarinova,E.Elhaik,M.Pellegrini,Cross-speciesanalysisofgenicGC3contentandDNAmethylationpatterns.GenomeBiol.Evol.5,1443–1456(2013).OpenUrlCrossRefPubMed↵P.M.Sharp,T.M.Tuohy,K.R.Mosurski,Codonusageinyeast:Clusteranalysisclearlydifferentiateshighlyandlowlyexpressedgenes.NucleicAcidsRes.14,5125–5143(1986).OpenUrlCrossRefPubMed↵P.M.Sharp,W.H.Li,ThecodonAdaptationIndex:Ameasureofdirectionalsynonymouscodonusagebias,anditspotentialapplications.NucleicAcidsRes.15,1281–1295(1987).OpenUrlCrossRefPubMed↵R.C.Team,R:ALanguageandEnvironmentforStatisticalComputing(RFoundationforStatisticalComputing,Vienna,2014),2013.↵H.Wickham,ggplot2:ElegantGraphicsforDataAnalysis,(Springer,2016).↵M.Horikoshi,Y.Tang,W.Li,ggfortify:UnifiedinterfacetovisualizestatisticalresultsofpopularRpackages.RJournal8,478–489(2016).OpenUrl↵J.Herreroetal.,Ensemblcomparativegenomicsresources.Database(Oxford)2016,bav096(2016).OpenUrlCrossRefPubMed↵C.Camachoetal.,BLAST+:Architectureandapplications.BMCBioinformatics10,421(2009).OpenUrlCrossRefPubMed↵L.Li,C.J.StoeckertJr.,D.S.Roos,OrthoMCL:Identificationoforthologgroupsforeukaryoticgenomes.GenomeRes.13,2178–2189(2003).OpenUrlAbstract/FREEFullText↵A.J.Enright,S.VanDongen,C.A.Ouzounis,Anefficientalgorithmforlarge-scaledetectionofproteinfamilies.NucleicAcidsRes.30,1575–1584(2002).OpenUrlCrossRefPubMed↵R.C.Edgar,MUSCLE:Multiplesequencealignmentwithhighaccuracyandhighthroughput.NucleicAcidsRes.32,1792–1797(2004).OpenUrlCrossRefPubMed↵A.Stamatakis,RAxMLversion8:Atoolforphylogeneticanalysisandpost-analysisoflargephylogenies.Bioinformatics30,1312–1313(2014).OpenUrlCrossRefPubMed↵S.Kumar,G.Stecher,M.Suleski,S.B.Hedges,TimeTree:Aresourcefortimelines,timetrees,anddivergencetimes.Mol.Biol.Evol.34,1812–1819(2017).OpenUrlCrossRef↵A.J.Bishara,J.B.Hittner,Confidenceintervalsforcorrelationswhendataarenotnormal.Behav.Res.Methods49,294–309(2017).OpenUrl PreviousNext Backtotop ArticleAlerts UserName* Password* Submit EmailArticle ThankyouforyourinterestinspreadingthewordonPNAS.NOTE:Weonlyrequestyouremailaddresssothatthepersonyouarerecommendingthepagetoknowsthatyouwantedthemtoseeit,andthatitisnotjunkmail.Wedonotcaptureanyemailaddress. YourEmail* YourName* SendTo* Entermultipleaddressesonseparatelinesorseparatethemwithcommas. Youaregoingtoemailthefollowing Thewhalesharkgenomerevealshowgenomicandphysiologicalpropertiesscalewithbodysize MessageSubject (YourName)hassentyouamessagefromPNAS MessageBody (YourName)thoughtyouwouldliketoseethePNASwebsite. YourPersonalMessage CAPTCHAThisquestionisfortestingwhetherornotyouareahumanvisitorandtopreventautomatedspamsubmissions. SendMessage CitationTools Thewhalesharkgenomerevealshowgenomicandphysiologicalpropertiesscalewithbodysize JessicaA.Weber,SeungGuPark,VictorLuria,SungwonJeon,Hak-MinKim,YeonsuJeon,YoungjuneBhak,JeHunJun,SangWhaKim,WonHeeHong,SeminLee,YunSungCho,AmirKarger,JohnW.Cain,AndreaManica,SoonokKim,Jae-HoonKim,JeremyS.Edwards,JongBhak,GeorgeM.Church ProceedingsoftheNationalAcademyofSciencesAug2020,117(34)20662-20671;DOI:10.1073/pnas.1922576117 CitationManagerFormats BibTeX Bookends EasyBib EndNote(tagged) EndNote8(xml) Medlars Mendeley Papers RefWorksTagged RefManager RIS Zotero RequestPermissions Share Thewhalesharkgenomerevealshowgenomicandphysiologicalpropertiesscalewithbodysize JessicaA.Weber,SeungGuPark,VictorLuria,SungwonJeon,Hak-MinKim,YeonsuJeon,YoungjuneBhak,JeHunJun,SangWhaKim,WonHeeHong,SeminLee,YunSungCho,AmirKarger,JohnW.Cain,AndreaManica,SoonokKim,Jae-HoonKim,JeremyS.Edwards,JongBhak,GeorgeM.Church ProceedingsoftheNationalAcademyofSciencesAug2020,117(34)20662-20671;DOI:10.1073/pnas.1922576117 ShareThisArticle: Copy TweetWidget FacebookLike Mendeley ArticleClassifications BiologicalSciencesEvolution TableofContents Submit SignupforthePNASHighlightsnewslettertogetin-depthstoriesofsciencesenttoyourinboxtwiceamonth: SignupforArticleAlerts Signup Jumptosection ArticleAbstractResultsDiscussionMethodsAcknowledgmentsFootnotesReferencesFigures&SIInfo&MetricsPDF YouMayAlsobeInterestedin HumanexploitationofCaribbeansharks AstudyfindsthathumanactivityhasledtoaseveredecreaseinCaribbeansharkabundance. Imagecredit:SeanMattson(InternationalCenterforTropicalAgriculture,Cali,Colombia). Climateandwingcolorationindragonflies Astudysuggeststhatmaledragonflieswillevolvesmallermelaninwingpatchesby2070asglobalwarmingcontinues. Imagecredit:Pixabay/liggraphy. COVID-19andairpollutiondisparities Thelargestpandemic-relateddeclinesinnitrogendioxidepollutionoccurredintheleastWhitecensustractsintheUnitedStates,whichnonethelessfacedhigherlevelsduringthepandemicthanmostWhitecommunitiesbeforethepandemic. Imagecredit:Pixabay/SD-Pictures. CoreConcept:InthewakeofCOVID-19,decentralizedclinicaltrialsgetpopular Theapproachcanenablefaster,morediversestudyenrollments.Butuncontrolledenvironmentscouldalsopotentiallyleadtofaultydataandflawedconclusions. Imagecredit:Shutterstock/Andrey_Popov. Opinion:Towardinclusiveglobalgovernanceofhumangenomeediting Genomeeditingtechnologyhasgrowntooquickly,andstakeholdersinthedebatearetoodiverse,forcurrentapproachestoestablisharobustregulatoryregime. Imagecredit:Shutterstock/vchal. SimilarArticles



請為這篇文章評分?