Towards complete and error-free genome assemblies of all

文章推薦指數: 80 %
投票人數:10人

The Vertebrate Genome Project has used an optimized pipeline to ... 1 kb) ushered in more affordable and scalable genome sequencing. Skiptomaincontent Thankyouforvisitingnature.com.YouareusingabrowserversionwithlimitedsupportforCSS.Toobtain thebestexperience,werecommendyouuseamoreuptodatebrowser(orturnoffcompatibilitymodein InternetExplorer).Inthemeantime,toensurecontinuedsupport,wearedisplayingthesitewithoutstyles andJavaScript. Advertisement nature articles article Towardscompleteanderror-freegenomeassembliesofallvertebratespecies DownloadPDF Subjects EvolutionarygeneticsGenomeassemblyalgorithmsMolecularevolutionResearchdata AbstractHigh-qualityandcompletereferencegenomeassembliesarefundamentalfortheapplicationofgenomicstobiology,disease,andbiodiversityconservation.However,suchassembliesareavailableforonlyafewnon-microbialspecies1,2,3,4.Toaddressthisissue,theinternationalGenome10K(G10K)consortium5,6hasworkedoverafive-yearperiodtoevaluateanddevelopcost-effectivemethodsforassemblinghighlyaccurateandnearlycompletereferencegenomes.Herewepresentlessonslearnedfromgeneratingassembliesfor16speciesthatrepresentsixmajorvertebratelineages.Weconfirmthatlong-readsequencingtechnologiesareessentialformaximizinggenomequality,andthatunresolvedcomplexrepeatsandhaplotypeheterozygosityaremajorsourcesofassemblyerrorwhennothandledcorrectly.Ourassembliescorrectsubstantialerrors,addmissingsequenceinsomeofthebesthistoricalreferencegenomes,andrevealbiologicaldiscoveries.Theseincludetheidentificationofmanyfalsegeneduplications,increasesingenesizes,chromosomerearrangementsthatarespecifictolineages,arepeatedindependentchromosomebreakpointinbatgenomes,andacanonicalGC-richpatterninprotein-codinggenesandtheirregulatoryregions.Adoptingtheselessons,wehaveembarkedontheVertebrateGenomesProject(VGP),aninternationalefforttogeneratehigh-quality,completereferencegenomesforalloftheroughly70,000extantvertebratespeciesandtohelptoenableaneweraofdiscoveryacrossthelifesciences. DownloadPDF MainChromosome-levelreferencegenomesunderpinthestudyoffunctional,comparative,andpopulationgenomicswithinandacrossspecies.Thefirsthigh-qualitygenomeassembliesofhuman1andothermodelspecies(forexample,Caenorhabditiselegans2,mouse3,andzebrafish4)wereputtogetherusing500–1,000-basepair(bp)Sangersequencingreadsofthousandsofhierarchicallyorganizedcloneswith200–300-kilobase(kb)inserts,andchromosomegeneticmaps.Thisapproachrequiredtremendousmanualeffort,softwareengineering,andcost,indecade-longprojects.Whole-genomeshotgunapproachessimplifiedthelogistics(forexample,inhuman7andDrosophila8),andlaternext-generationsequencingwithshorter(30–150-bp)sequencingreadsandshortinsertsizes(forexample,1 kb)usheredinmoreaffordableandscalablegenomesequencing9.However,theshorterreadsresultedinlower-qualityassemblies,fragmentedintothousandsofpieces,wheremanygenesweremissing,truncated,orincorrectlyassembled,resultinginannotationandothererrors10.Sucherrorscanrequiremonthsofmanualefforttocorrectindividualgenesandyearstocorrectanentireassembly.Genomicheterozygosityposedadditionalproblems,becausehomologoushaplotypesinadiploidorpolyploidgenomeareforcedtogetherintoasingleconsensusbystandardassemblers,sometimescreatingfalsegeneduplications11,12,13,14.Toaddresstheseproblems,theG10Kconsortium5,6initiatedtheVertebrateGenomesProject(VGP;https://vertebrategenomesproject.org)withtheultimateaimofproducingatleastonehigh-quality,nearerror-freeandgapless,chromosome-level,haplotype-phased,andannotatedreferencegenomeassemblyforeachofthe71,657extantnamedvertebratespeciesandusingthesegenomestoaddressfundamentalquestionsinbiology,disease,andbiodiversityconservation.Towardsthisend,havinglearnedthelessonsofhavingtoomanyvariables thatmakeconclusionsmoredifficulttoreachintheG10KfromtheG10KAssemblathon2effort15,wefirstevaluatedmultiplegenomesequencingandassemblyapproachesextensivelyononespecies,theAnna’shummingbird(Calypteanna).Wethendeployedthebest-performingmethodacrosssixteenspeciesrepresentingsixmajorvertebrateclasses,withawidediversityofgenomiccharacteristics.Drawingontheprincipleslearned,weimprovedthesemethodsfurther,discoveredparametersandapproachesthatworkbetterforspecieswithdifferentgenomiccharacteristics,andmadebiologicaldiscoveriesthathadnotbeenpossiblewiththepreviousassemblies.Complete,accurateassembliesrequirelongreadsWechoseafemaleAnna’shummingbirdbecauseithasarelativelysmallgenome(about1 Gb),isheterogametic(hasbothZandWsexchromosomes),andhasanannotatedreferenceofthesameindividualbuiltfromshortreads16.Weobtained12newsequencingdatatypes,includingbothshortandlongreads(80 bpto100 kb),andlong-rangelinkinginformation(40 kbtomorethan 100Mb),generatedusingeighttechnologies(SupplementaryTable1).Webenchmarkedalltechnologiesandassemblyalgorithms(SupplementaryTable2)inisolationandinmanycombinations(SupplementaryTable3).Toourknowledge,thiswasthefirstsystematicanalysisofmanysequencetechnologies,assemblyalgorithms,andassemblyparametersappliedonthesameindividual.Wefoundthatprimarycontiguoussequences(contigs)(pseudo-haplotype;SupplementaryNote 1)assembledfromPacificBiosciencescontinuouslongreads(CLR)orOxfordNanoporelongreads(ONT)wereapproximately30-to300-foldlongerthanthoseassembledfromIlluminashortreads(SR),regardlessofdatatypecombinationorassemblyalgorithmused(Fig.1a,SupplementaryTable3).ThehighestcontigNG50sforshort-read-onlyassemblieswereabout0.025to0.169 Mb,whereasforlongreadstheywereabout4.6to7.66 Mb(Fig.1a);contigNG50isanassemblymetricbasedonaweightedmedianofthelengthsofitsgaplesssequencesrelativetotheestimatedgenomesize.AfterfixingafunctioninthePacBioFALCONsoftware17thatcausedartificialbreaksincontigsbetweenstretchesofhighlyhomozygousandheterozygoushaplotypesequences(SupplementaryNote 1,SupplementaryTable2),contigNG50nearlytripledto12.77 Mb(Fig.1a).Thesefindingsareconsistentwiththeoreticalpredictions18anddemonstratethat,givencurrentsequencingtechnologyandassemblyalgorithms,itisnotpossibletoachievehighcontigcontinuitywithshortreadsalone,asitistypicallyimpossibletobridgethroughrepeatsthatarelongerthanthereadlength.Fig.1:ComparativeanalysesofAnna’shummingbirdgenomeassemblieswithvariousdatatypes.a,ContigNG50valuesoftheprimarypseudo-haplotype.b,ScaffoldNG50values.c,Numberofjoins(gaps).d,Numberofmis-joinerrorscomparedwiththecuratedassembly.Thecuratedassemblyhasnoremainingconflictswiththerawdataandthusnoknownmis-joins.*SameasCLR + linked + Opt. + Hi-C,butwithcontigsgeneratedwithanupdatedFALCON17versionandearlierHi-CSalsaversion(v2.0versusv2.2;SupplementaryTable2)forlessaggressivecontigjoining.e,f,Hi-Cinteractionheat mapsbeforeandaftermanualcuration,whichidentified34chromosomes.Gridlinesindicatescaffoldboundaries.Redarrow,examplemis-jointhatwascorrectedduringcuration.g,Karyotypeoftheidentifiedchromosomes(n = 36 + ZW),consistentwithpreviousfindings70.h,Correlationbetweenestimatedchromosomesizes(inMb)basedonkaryotypeimagesingandassembledscaffoldsinSupplementaryTable4(bCalAna1)onalog–logscale.v1.0,VGPassemblyv1.0pipeline;linked,10XGenomicslinkedreads;Hi-C,Hi-Cproximityligation;1D,2D,OxfordNanoporelongreads;NRGene,NRGenepaired-endIlluminareads;SR,paired-endIlluminashortreads.FullsizeimageIterativeassemblypipelineScaffoldsgeneratedwithallthreescaffoldingtechnologies(thatis,10XGenomicslinkedreads(10XG),Bionanoopticalmaps(Opt.),andArimaGenomics,DovetailGenomics,orPhaseGenomicsHi-C)wereapproximately50%to150%longerthanthosegeneratedusingoneortwotechnologies,regardlessofwhetherwestartedwithshort-orlong-read-basedcontigs(Fig.1b,ExtendedDataFig.1a,SupplementaryTable3).Thesefindingsincludeimprovementswemadetoeachapproach(SupplementaryNote 1,SupplementaryTables4,5,SupplementaryFig.1).Despitesimilarscaffoldcontinuity,theshort-read-onlyassemblieshadfromabout18,000toabout70,000gaps,whereasthelong-readassemblieshadsubstantiallyfewer(about400toabout4,000)gaps(Fig.1c).Manygapsintheshort-readassemblieswereinrepeatorGC-richregions.Consideringthecuratedversionofthisassemblytobemoreaccurate,wealsoidentifiedroughly5,000to8,000mis-joinsinshort-read-basedassemblies,whereaslong-read-basedassemblieshadonlyfrom20toaround700mis-joins(Fig.1d).Thesemis-joinsincludedchimericjoinsandinversions.Afterwecuratedthisassemblyforcontamination,assemblyerrors,andHi-C-basedchromosomeassignments(Fig.1e,f),thefinalhummingbirdassemblyhad33scaffoldsthatcloselymatchedthechromosomekaryotypeinnumber(33of36autosomesplussexchromosomes)andestimatedsizes(approximately2to200 Mb;Fig.1g,h),withonly1to30gapsperautosome(bCalAnn1inSupplementaryTable6).Ofthefiveautosomeswithonlyonegapeach,three(chromosomes14,15,and19)hadcompletespanningsupportbyatleasttwotechnologies(reliableblocks,ExtendedDataFig.1c;bCalAnn1inSupplementaryTable6),indicatingthatthechromosomecontigswerenearlycomplete.However,theyweremissinglongarraysofvertebratetelomererepeatswithin1 kboftheirends(ExtendedDataFig.1c;bCalAnn1inSupplementaryTables6,7).AssemblypipelineacrossvertebratediversityUsingtheformulathatgavethehighest-qualityhummingbirdgenome,webuiltaniterativeVGPassemblypipeline(v1.0)withhaplotype-separatedCLRcontigs,followedbyscaffoldingwithlinkedreads,opticalmaps,andHi-C,andthengapfilling,basecallpolishing,andfinallymanualcuration(ExtendedDataFigs.2a,3a).Wesystematicallytestedourpipelineon15additionalspeciesspanningallmajorvertebrateclasses:mammals,birds,non-avianreptiles,amphibians,teleostfishes,andacartilaginousfish(SupplementaryTables8,9,SupplementaryNote 2).Forthezebrafinch,weusedDNAfromthesamemaleaswasusedtogeneratethepreviousreferencegenome19,andincludedafemaletrioforbenchmarkinghaplotypecompleteness,wheresequencedreadsfromtheparentswereusedtobinparentalhaplotypereadsfromtheoffspringbeforeassembly20(ExtendedDataFigs.2a,3b).Wesetinitialminimumassemblymetricgoalsof:1 MbcontigNG50;10 MbscaffoldNG50;assigning90%ofthesequencetochromosomes,structurallyvalidatedbyatleasttwoindependentlinesofevidence;Q40averagebasequality;andhaplotypesassembledascompletelyandcorrectlyaspossible.Whenthesemetricswereachieved,mostgeneswereassembledwithgaplessexonandintronstructures11,andfewerthan3%hadframe-shiftbaseerrorsidentifiedinannotation.Q40isthemathematicalinflectionpointatwhichgenesgofromusuallycontaininganerrortousuallynot21.Ofthecuratedassemblies(SupplementaryTable10,SupplementaryNote 2),16of17achievedthedesiredcontinuitymetrics(ExtendedDataTable1).ScaffoldNG50wassignificantlycorrelatedwithgenomesize(Fig.2a),suggestingthatlargergenomestendtohavelargerchromosomes.Onaverage,98.3%oftheassembledbaseshadreliableblockNG50srangingfrom2.3to40.2 Mb;collapsedrepeatbases22withabnormallyhighCLRreadcoverage(morethan3s.d.)rangedfrom0.7to31.4 MbperGb;andthecompletenessofthegenomeassembliesrangedfrom87.2to98.1%,withlessthan4.9%falselyduplicatedregions,consistentwiththefalseduplicationratewefoundfortheconservedBUSCOvertebrategeneset(ExtendedDataTable1,SupplementaryTables11,12).Fig.2:Impactofrepeatsandheterozygosityonassemblyquality.a,CorrelationbetweenscaffoldNG50andgenomesizeofthecuratedassemblies.b,NonlinearcorrelationbetweencontigNG50andrepeatcontent,beforeandaftercuration.c,CorrelationbetweennumberofgapsperGbassembledandrepeatcontent.d,Correlationbetweenprimaryassemblysizerelativetoestimatedgenomesize(y axis)andgenomeheterozygosity(x axis),beforeandafterpurgingoffalseduplications.Assemblysizesabove100%indicatethepresenceoffalseduplicationsandthosebelow100%indicatecollapsedrepeats.e,f,Correlationsbetweengenomeduplicationrate usingk-mers23(e)andconservedBUSCOvertebrategeneset(f),andgenomeheterozygositybeforeandafterpurgingoffalseduplications.g,h,Asine,f,butwithwhole-genomerepeatcontentbeforeandafterpurgingoffalseduplications.Genomesize,heterozygosity,andrepeatcontentwereestimatedfrom31-mercountsusingGenomeScope71,exceptforthechannelbullblenny,astheestimateswereunreliable(see Methods).Repeatcontentwasmeasuredbymodellingthek-mermultiplicityfromsequencingreads.SequenceduplicationrateswereestimatedwithMerqury23using21-mers.*P 1&&(GT=’’AA’’||GT = ‘’Aa’’)’-Hla.VGPTrioPipelinev1.0–v1.6Thetriopipelineissimilarlydesignedtothestandardpipeline,exceptfortheuseofparentaldata(ExtendedDataFig.3b).Whenparentalgenomesareavailable,thechild’sCLRreadsarebinnedtomaternalandpaternalhaplotypes,andassembledseparatelyashaplotype-specificcontigs(haplotigs)usingTrioCanu20.Inbrief,parentalspecificmarkerk-merswerecollectedusingMeryl23fromtheparentalIlluminaWGSreadsoftheparents.Thesemarkerswerefilteredandusedtobinthechild’sCLRread.Ahaplotypewasassignedgiventhemarkersobserved,normalizedbythetotalmarkersineachhaplotype.Thesubsequentpurging,scaffolding,andpolishingstepsweresimilarlyupdatedwiththeuseofPurge_Dups14(v1.6).WeextendedbinningtolinkedreadsandHi-Creads,byexcludingreadpairsthathadanyparental-specificmarker.ThebinnedHi-Creadswereusedtoscaffolditshaplotypeassembly,andpolishedwiththebinnedlinkedreadsfromtheobservationofhaplotypeswitchingusingthestandardpolishingapproach.Duringcuration,oneofthehaplotypeassemblieswiththehigherQVand/orcontiguitywaschosenastherepresentativehaplotype.Theheterogameticsexchromosomefromtheunchosenhaplotypewasaddedtotherepresentativeassembly.However,whilecuratingseveraltrios,wefoundthatinregionsoflowdivergencebetweensharedparentalhomogameticsexchromosomes(thatis,XorZ),asmallfractionofoffspringCLRdatawasmis-assignedtothewronghaplotype.Thismis-alignmentresultedinaduplicate,low-coverageoffspringXorZassemblyinthepaternal(formammals)ormaternal(forbirds)haplotype,respectively,whichrequiredremovalduringcuration.Weareworkingonmethodstoimprovethebinningaccuracyforresolutionofthisissuegoingforward.Forthefemalezebrafinchinparticular,contigsweregeneratedbeforethebinningwasautomatedintheCanuassemblerasTrioCanu1.7,andthereforeamanualbinningprocesswasappliedasdescribedintheoriginalTrio-binningpaper20(SupplementaryMethods).Contigswereassembledforeachhaplotypeusingthebinnedreads,excludingunclassifiedreads.ThecontigswerepolishedwithtworoundsofArrowpolishingusingthebinnedreads,andscaffoldedfollowingthev1.0pipelinewithnopurging.AdditionalscaffoldingroundswithBionano(s4)andHi-Cwereapplied.Scaffoldswererenamedaccordingtotheprimaryscaffoldassemblyofthesameindividual(s5),withsexchromosomesgroupedasZinthepaternalassemblyandWinthematernalassemblyfollowingsyntenytotheZchromosomefromthecuratedmalezebrafinchVGPassembly.TworoundsofSRpolishingwereappliedusinglinkedreads,bymappingonbothhaplotypes.Afterhaplotypeswitcheswerediscovered,additionalroundsofpolishingwereappliedusingbinnedlinkedreads(SupplementaryMethods).MitochondrialgenomeassemblySimilartootherrecentmethods93,94,wedevelopedareference-guidedMTassemblypipeline.MTreadsintherawCLRdatawereidentifiedbymappingthewholereadsettoanexistingreferencesequenceofthespecificspeciesorofcloselyrelatedspeciesusingBlasr.FilteredmtDNACLRswereassembledintoasinglecontigusingCanuv1.8,polishedwithArrowusingCLRandthenFreeBayesv1.0.2togetherwithbcftoolsv1.9usingshortreadsfromthe10XGdata(ExtendedDataFig.3c).Theoverlappingsequencesattheendsofthecontigweretrimmed,andtheremainingcontigsequencecircularized.ThemitoVGPpipelineismadeavailableathttps://github.com/VGP/vgp-assembly/tree/master/mitoVGP.AmoredetailedprotocoldescriptionoftheassemblypipelineandnewdiscoveriesfromtheMTassembliesarepublishedelsewhere33.CurationTheVGPgenomeassemblypipelineproduceshighqualityassemblies,yetnoautomatedmethodtodateisfreefromtheproductionoferrors,especiallyduringthescaffoldingstages.Tominimizetheimpactoftheremainingalgorithmicshortcomings,wesubjectedallassembliestorigorousmanualcuration.Alldatageneratedforaspeciesinthisstudyandotherpubliclyavailabledata(forexample,geneticmaps,genesetsandgenomeassembliesofthesameorcloselyrelatedspecies)werecollated,alignedtotheprimaryassemblyandanalysedingEVAL95(https://vgp-geval.sanger.ac.uk/index.html),visualizingdiscordancesinafeaturebrowserandissuelists.Inparallel,Hi-CdataweremappedtotheprimaryassemblyandvisualizedusingJuicebox96and/orHiGlass97.Withthesedata,genomecuratorsidentifiedmis-joins,missedjoinsandotheranomalies,andcorrectedtheprimaryassemblyaccordingly.Nochangewasmadewithoutunambiguousevidencefromavailabledatatypes;forexample,aHi-CsuggestedjoinwouldnotbemadeunlesssupportedbyBioNanomaps,long-readdata,orgenealignments.Whensequencingtheheterogameticsex,weidentifiedsexchromosomesbasedonhalfcoverage,homologyalignmentstosexchromosomesinotherspecies,andthepresenceofsexchromosome-specificgenes.ContaminationremovalAsuccessionofsearcheswasusedtoidentifypotentialcontaminantsinthegeneratedassemblies.1)AmegaBLAST98searchagainstadatabaseofcommoncontaminants(ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/contam_in_euks.fa.gz)requiringe ≤ 1 × 10−4,reportingmatches with≥98%sequenceidentityandmatchlength50–99bp, ≥94%andmatchlength100–199bp,or ≥90%andmatchlength200bporabove.2)Avecscreen(https://www.ncbi.nlm.nih.gov/tools/vecscreen/)searchagainstadatabaseofadaptorsequences(ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/adaptors_for_screening_euks.fa)3)Aftersoft-maskingrepeatsusingWindowmasker75,amegaBLASTsearchagainstchromosome-levelassembliesfromRefSeqrequiringe ≤ 1 × 10−4,matchscore≥100,andsequenceidentity ≥98%;regionsmatchinghighlyconservedrDNAswereignored.Manualinspectionoftheresultswasnecessarytodifferentiatecontaminationfromconservationand/orhorizontalgenetransfer.Adaptorsequencesweremasked;othercontaminantsequenceswereremoved.AssemblieswerealsocheckedforrunsofNsattheendsofscaffolds,createdasartefactsoftheiterativescaffoldingprocess,andwhenfoundtheyweretrimmed.OrganellegenomesTheseweredetectedbyamegaBLASTsearchagainstadatabaseofknownorganellegenomesrequiringe ≤ 1 × 10−4,sequenceidentity ≥90%,andmatchlength ≥500;thedatabasesareavailableatftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/mito.nt.gzandftp://ftp.ncbi.nlm.nih.gov/refseq/release/plastid/*genomic.fna.gz.Onlyscaffoldsconsistingentirelyoforganellesequenceswereassumedtobeorganellegenomes,andreplacedbythegenomefromtheseparateorganelleassemblypipeline.OrganellematchesembeddedinnuclearsequencesthatwerefoundtobeNuMTswerekept.FalseduplicationremovalRetainedfalseduplicationswereidentifiedusingPurge_Haplotigs13runeitherafterscaffoldingandpolishing(Anna’shummingbird,kākāpō,malezebrafinch,femalezebrafinch,platypus,palespear-nosedbat,andgreaterhorseshoebat)oronthec1beforescaffolding(two-linedcaecilian,fliercichlid,Canadalynx,andGoode’sthornscrubtortoise).SubsequentmanualcurationidentifiedadditionalhaplotypicduplicationsforthelistedassembliesandalsothosethatwerenottreatedwithPurge_Haplotigs(Easternhappy,climbingperch,zig-zageel).Theevidenceusedincludedreadcoverage,sequenceself-comparison,transcriptalignments,BionanomapalignmentsandHi-C2Dmaps,allconfirmingthesuperfluousnatureofoneallele.Theidentifiedhaplotypeduplicationsweremovedfromtheprimarytothealternateassembly.ChromosomeassignmentForascaffoldtobeannotatedasachromosome,weusedevidencefromHi-CaswellasgeneticlinkageorFISHkaryotypemappingwhenavailable.ForHi-Cevidence,weconsideredascaffoldasacompletechromosome(albeitwithgaps)whentherewasaclearunbrokendiagonalintheJuiceboxorHiGlassplotsforthatscaffoldandnootherlargescaffoldsthatcouldbejoinedtothatsamescaffold;ifpresentandnounambiguousjoinwaspossible,wenameditasanunlocalizedscaffoldforthatchromosome.Whenwecouldnotfindevidenceofacompletechromosome,wekeptthescaffoldnumberforitsname.Wenamedallevidence-validatedscaffoldsaschromosomesdowntothesmallestHi-Cboxunitresolutionallowedwiththesecharacteristics.Whentherewasanestablishedchromosometerminologyforagivenspeciesorsetofspecies,weusetheestablishedterminologyexceptwhenournewassembliesrevealederrorsintheolderassembly,suchasscaffold/chromosomefusions,fissions,rearrangements,andnon-chromosomenames.Forspecieswithoutanestablishedchromosometerminology,wenamedthescaffoldsaschromosomesnumbers1,2,3…,indescendingorderofscaffoldsize.Forthesexchromosomes,weusedthelettersXandYformammalsandZandWforbirds.UsingcomparativegenomicstoassessassemblystructureIncaseswhereahigh-qualitychromosome-levelgenomewasavailableforacloselyrelatedspecies,comparativegenomeanalysiswasperformed.Thepolishedprimaryassembly(t3.p)wasmappedtotherelatedgenomeusingMashMap286with--pi75-s300000.Thenumberofchromosomaldifferenceswasidentifiedusingacustomscriptavailableathttps://github.com/jdamas13/assembly_comparison.Thisresultedintheidentificationof~60to~450regionsforeachgenomeassemblyflankingputativemisassembliesorlineage-specificgenomerearrangements.Toidentifywhichwererealmisassemblies,theidentifieddiscrepancieswerecommunicatedtothecurationteamformanualverification(seeabove).Toidentifyanypossibleremainingmis-joins,eachcuratedavianandmammalianassemblywascomparedwiththezebrafinch(taeGut2)orhuman(hg38)genomes,respectively.PairwisealignmentsbetweeneachoftheVGPassembliesandthecladereferenceweregeneratedwithLastZ99(version1.04)usingthefollowingparameters:C = 0E = 30H = 2000K = 3000L = 2200O = 400.ThepairwisealignmentswereconvertedintotheUCSC‘chain’and‘net’formatswithaxtChain(parameters:-minScore = 1000-verbose = 0-linearGap = medium)followedbychainAntiRepeat,chainSort,chainPreNet,chainNetandnetSyntenic,allwithdefaultparameters100.Pairwisesyntenyblocksweredefinedusingmaf2synteny101at100-,300-,and500-kbresolutions.Evolutionarybreakpointregionsweredetectedandclassifiedusinganadhocstatisticalapproach102.Thisanalysisidentified2to90genomicregionsperassemblythatcouldbeflankingmisassemblies,lineage-specificchromosomerearrangements,orreference-specificchromosomerearrangements(116inthehumanand26inthezebrafinch).Determiningtheunderlyingcauseforeachoftheflaggedregionswillneedfurtherverification.AllalignmentsareavailableforvisualizationattheEvolutionHighwaycomparativechromosomebrowser(http://eh-demo.ncsa.illinois.edu/vgp/).AnnotationNCBIandEnsemblannotationpipelineusedinthisstudyaredescribedinthe SupplementaryMethods.EvaluationDetailedmethodsforothertypesofevaluation,includingBUSCOruns,mis-joinandmissed-joinidentification,reliableblocks,collapsedrepeats,telomeres,RNA-seqandATAC–seqmapping,andfalsegeneduplicationsareinthe SupplementaryMethods.Nostatisticalmethodswereusedtopredeterminesamplesize,theexperimentswerenotrandomized,andtheinvestigatorswerenotblindedtogroupduringexperimentsandoutcomeassessment.ReportingsummaryFurtherinformationonresearchdesignisavailableinthe NatureResearchReportingSummarylinkedtothispaper. Dataavailability Allrawdata,intermediateandfinalassembliesarepubliclyavailableviaGenomeArk(https://vgp.github.io/genomeark),archivedonNCBI/EBIBioProjectunderaccessionPRJNA489243withannotations,andbrowsableontheUCSCGenomeBrowser(https://hgdownload.soe.ucsc.edu/hubs/VGP/).ThefinalprimaryassemblyfromtheautomatedpipelinebeforecurationisbrowsableongEVAL(https://vgp-geval.sanger.ac.uk)withallfourrawdatamappings.TheVGPassemblypipelineisavailableasastand-alonepipeline(https://github.com/VGP/vgp-assembly)aswellasaworkflowonDNAnexus(https://platform.dnanexus.com/).AVGP-specificassemblyhubportalintheU.C.SantaCruzbrowserisavailableasagatewaytoaccessallVGPgenomeassembliesandannotations(https://hgdownload.soe.ucsc.edu/hubs/VGP). Codeavailability AllcodesusedintheVGPAssemblyPipelineandtheVGPTrioPipelinearepubliclyavailableathttps://github.com/VGP/vgp-assembly/tree/master/pipeline. References1.InternationalHumanGenomeSequencingConsortium.Initialsequencingandanalysisofthehumangenome.Nature409,860–921(2001).ADS  GoogleScholar  2.Sulston,J.etal.TheC.elegansgenomesequencingproject:abeginning.Nature356,37–41(1992).PubMed  ADS  GoogleScholar  3.MouseGenomeSequencingConsortium.Initialsequencingandcomparativeanalysisofthemousegenome.Nature420,520–562(2002). GoogleScholar  4.Howe,K.etal.Thezebrafishreferencegenomesequenceanditsrelationshiptothehumangenome.Nature496,498–503(2013).CAS  PubMed  PubMedCentral  ADS  GoogleScholar  5.Genome10KCommunityofScientists.Genome10K:aproposaltoobtainwhole-genomesequencefor10,000vertebratespecies.J.Hered.100,659–674(2009).PubMedCentral  GoogleScholar  6.Koepfli,K.-P.,Paten,B.,theGenome10KCommunityofScientists&O’Brien,S.J.TheGenome10KProject:awayforward.Annu.Rev.Anim.Biosci.3,57–111(2015).CAS  PubMed  PubMedCentral  GoogleScholar  7.Venter,J.C.etal.Thesequenceofthehumangenome.Science291,1304–1351(2001).CAS  PubMed  ADS  GoogleScholar  8.Adams,M.D.etal.ThegenomesequenceofDrosophilamelanogaster.Science287,2185–2195(2000).PubMed  GoogleScholar  9.Shendure,J.&Ji,H.Next-generationDNAsequencing.Nat.Biotechnol.26,1135–1145(2008).CAS  PubMed  GoogleScholar  10.Yin,Z.-T.etal.Revisitingavian‘missing’genesfromdenovoassembledtranscripts.BMCGenomics20,4(2019).PubMed  PubMedCentral  GoogleScholar  11.Korlach,J.etal.DenovoPacBiolong-readandphasedaviangenomeassembliescorrectandaddtoreferencegenesgeneratedwithintermediateandshortreads.Gigascience6,1–16(2017).CAS  PubMed  PubMedCentral  GoogleScholar  12.Kelley,D.R.&Salzberg,S.L.Detectionandcorrectionoffalsesegmentalduplicationscausedbygenomemis-assembly.GenomeBiol.11,R28(2010).PubMed  PubMedCentral  GoogleScholar  13.Roach,M.J.,Schmidt,S.A.&Borneman,A.R.PurgeHaplotigs:alleliccontigreassignmentforthird-gendiploidgenomeassemblies.BMCBioinformatics19,460(2018).CAS  PubMed  PubMedCentral  GoogleScholar  14.Guan,D.etal.Identifyingandremovinghaplotypicduplicationinprimarygenomeassemblies.Bioinformatics36,2896–2898(2020).CAS  PubMed  PubMedCentral  GoogleScholar  15.Bradnam,K.R.etal.Assemblathon2:evaluatingdenovomethodsofgenomeassemblyinthreevertebratespecies.Gigascience2,10(2013).PubMed  PubMedCentral  GoogleScholar  16.Zhang,G.etal.Comparativegenomicsrevealsinsightsintoaviangenomeevolutionandadaptation.Science346,1311–1320(2014).CAS  PubMed  PubMedCentral  ADS  GoogleScholar  17.Chin,C.-S.etal.Phaseddiploidgenomeassemblywithsingle-moleculereal-timesequencing.Nat.Methods13,1050–1054(2016).CAS  PubMed  PubMedCentral  GoogleScholar  18.Bresler,G.,Bresler,M.&Tse,D.Optimalassemblyforhighthroughputshotgunsequencing.BMCBioinformatics14(Suppl.5),S18(2013).PubMed  PubMedCentral  GoogleScholar  19.Warren,W.C.etal.Thegenomeofasongbird.Nature464,757–762(2010).CAS  PubMed  PubMedCentral  ADS  GoogleScholar  20.Koren,S.etal.Denovoassemblyofhaplotype-resolvedgenomeswithtriobinning.Nat.Biotechnol.(2018).21.Koren,S.,Phillippy,A.M.,Simpson,J.T.,Loman,N.J.&Loose,M.Replyto‘Errorsinlong-readassembliescancriticallyaffectproteinprediction’.Nat.Biotechnol.37,127–128(2019).CAS  PubMed  GoogleScholar  22.Vollger,M.R.etal.Long-readsequenceandassemblyofsegmentalduplications.Nat.Methods16,88–94(2019).CAS  PubMed  GoogleScholar  23.Rhie,A.,Walenz,B.P.,Koren,S.&Phillippy,A.M.Merqury:reference-freequality,completeness,andphasingassessmentforgenomeassemblies.GenomeBiol.21,245(2020).CAS  PubMed  PubMedCentral  GoogleScholar  24.Waterhouse,R.M.etal.BUSCOapplicationsfromqualityassessmentstogenepredictionandphylogenomics.Mol.Biol.Evol.35,543–548(2018).CAS  PubMed  GoogleScholar  25.Howe,K.etal.Significantlyimprovingthequalityofgenomeassembliesthroughcuration.Gigascience10,giaa153(2021).PubMed  PubMedCentral  GoogleScholar  26.Zhou,Y.etal.Platypusandechidnagenomesrevealmammalianbiologyandevolution.Naturehttps://doi.org/10.1038/s41586-020-03039-0(2021).27.Kim,J.etal.Falsegeneandchromosomelossesaffectedbyassemblyandsequenceerrors.Preprintathttps://doi.org/10.1101/2021.04.09.438906(2021).28.Lewin,H.A.,Graves,J.A.M.,Ryder,O.A.,Graphodatsky,A.S.&O’Brien,S.J.Precisionnomenclatureforthenewgenomics.Gigascience8,giz086(2019).PubMed  PubMedCentral  GoogleScholar  29.Kronenberg,Z.N.etal.ExtendedhaplotypephasingofdenovogenomeassemblieswithFALCON-Phase.Nat.Commun.https://doi.org/10.1038/s41467-020-20536-y(2021).30.Ewing,B.,Hillier,L.,Wendl,M.C.&Green,P.Base-callingofautomatedsequencertracesusingphred.I.Accuracyassessment.GenomeRes.8,175–185(1998).CAS  PubMed  GoogleScholar  31.Tomaszkiewicz,M.,Medvedev,P.&Makova,K.D.YandWchromosomeassemblies:approachesanddiscoveries.TrendsGenet.33,266–282(2017).CAS  PubMed  GoogleScholar  32.Kolesnikov,A.A.&Gerasimov,E.S.Diversityofmitochondrialgenomeorganization.Biochem.(Mosc.)77,1424–1435(2012).CAS  GoogleScholar  33.Formenti,G.etal.Completevertebratemitogenomesrevealwidespreadrepeatsandgeneduplications.GenomeBiol.(inthepress).34.Harrison,G.L.A.etal.Fournewavianmitochondrialgenomeshelpgettobasicevolutionaryquestionsinthelatecretaceous.Mol.Biol.Evol.21,974–983(2004).CAS  PubMed  GoogleScholar  35.Zhao,H.etal.ThecompletemitochondrialgenomeoftheAnabastestudineus(Perciformes,Anabantidae).MitochondrialDNAADNAMapp.Seq.Anal.27,1005–1007(2016).CAS  PubMed  GoogleScholar  36.Suzuki,A.etal.Howthekinetochorecouplesmicrotubuleforceandcentromerestretchtomovechromosomes.Nat.CellBiol.18,382–392(2016).CAS  PubMed  PubMedCentral  GoogleScholar  37.Pfenning,A.R.etal.Convergenttranscriptionalspecializationsinthebrainsofhumansandsong-learningbirds.Science346,1256846(2014).PubMed  PubMedCentral  GoogleScholar  38.Robinson,R.Formammals,lossofyolkandgainofmilkwenthandinhand.PLoSBiol.6,e77(2008).PubMed  PubMedCentral  GoogleScholar  39.Brandl,K.etal.Yip1domainfamily,member6(Yipf6)mutationinducesspontaneousintestinalinflammationinmice.Proc.NatlAcad.Sci.USA109,12650–12655(2012).CAS  PubMed  ADS  GoogleScholar  40.Malmstrøm,M.etal.Evolutionoftheimmunesysteminfluencesspeciationratesinteleostfishes.Nat.Genet.48,1204–1210(2016).PubMed  GoogleScholar  41.Japundžić-Žigon,N.,Lozić,M.,Šarenac,O.&Murphy,D.Vasopressin&oxytocinincontrolofthecardiovascularsystem:anupdatedreview.Curr.Neuropharmacol.18,14–33(2020).PubMed  PubMedCentral  GoogleScholar  42.Cataldo,I.,Azhari,A.&Esposito,G.Areviewofoxytocinandarginine-vasopressinreceptorsandtheirmodulationofautismspectrumdisorder.Front.Mol.Neurosci.11,27(2018).PubMed  PubMedCentral  GoogleScholar  43.Warren,W.C.etal.Genomeanalysisoftheplatypusrevealsuniquesignaturesofevolution.Nature453,175–183(2008).CAS  PubMed  PubMedCentral  ADS  GoogleScholar  44.Ko,B.J.etal.Widespreadfalsegenegainscausedbyduplicationerrorsingenomeassemblies.Preprintathttps://doi.org/10.1101/2021.04.09.438957(2021).45.Lemaire,S.etal.Characterizingtheinterplaybetweengenenucleotidecompositionbiasandsplicing.GenomeBiol.20,259(2019).CAS  PubMed  PubMedCentral  GoogleScholar  46.Zhang,L.,Kasif,S.,Cantor,C.R.&Broude,N.E.GC/AT-contentspikesasgenomicpunctuationmarks.Proc.NatlAcad.Sci.USA101,16855–16860(2004).CAS  PubMed  ADS  GoogleScholar  47.Jarvis,E.D.etal.Globalviewofthefunctionalmolecularorganizationoftheaviancerebrum:mirrorimagesandfunctionalcolumns.J.Comp.Neurol.521,3614–3665(2013).PubMed  PubMedCentral  GoogleScholar  48.Kubikova,L.,Wada,K.&Jarvis,E.D.Dopaminereceptorsinasongbirdbrain.J.Comp.Neurol.518,741–769(2010).CAS  PubMed  PubMedCentral  GoogleScholar  49.Sémon,M.&Wolfe,K.H.Rearrangementratefollowingthewhole-genomeduplicationinteleosts.Mol.Biol.Evol.24,860–867(2007).PubMed  GoogleScholar  50.Jebb,D.etal.Sixreference-qualitygenomesrevealevolutionofbatadaptations.Nature583,578–584(2020).CAS  PubMed  ADS  GoogleScholar  51.Schneider,V.A.etal.EvaluationofGRCh38anddenovohaploidgenomeassembliesdemonstratestheenduringqualityofthereferenceassembly.GenomeRes.27,849–864(2017).CAS  PubMed  PubMedCentral  GoogleScholar  52.Warren,W.C.etal.Anewchickengenomeassemblyprovidesinsightintoaviangenomestructure.G3(Bethesda)7,109–117(2017).CAS  GoogleScholar  53.Meredith,R.W.etal.ImpactsoftheCretaceousTerrestrialRevolutionandKPgextinctiononmammaldiversification.Science334,521–524(2011).CAS  PubMed  ADS  GoogleScholar  54.Rodriguez-Agudo,D.etal.StarD5:anERstressproteinregulatesplasmamembraneandintracellularcholesterolhomeostasis.J.LipidRes.60,1087–1098(2019).CAS  PubMed  PubMedCentral  GoogleScholar  55.Kim,J.etal.Reconstructionandevolutionaryhistoryofeutherianchromosomes.Proc.NatlAcad.Sci.USA114,E5379–E5388(2017).CAS  PubMed  GoogleScholar  56.Lin,B.,Dutta,B.&Fraser,I.D.C.Systematicinvestigationofmulti-TLRsensingidentifiesregulatorsofsustainedgeneactivationinmacrophages.CellSyst.5,25–37.e3(2017).CAS  PubMed  PubMedCentral  GoogleScholar  57.Theofanopoulou,C.,Gedman,G.L.,Cahill,J.A.,Boeckx,C.&Jarvis,E.D.Universalnomenclatureforoxytocin-vasotocinligandandreceptorfamilies.Naturehttps://doi.org/10.1038/s41586-020-03040-7(2021).58.OcampoDaza,D.&Haitina,T.Reconstructionofthecarbohydrate6-Osulfotransferasegenefamilyevolutioninvertebratesrevealsnovelmember,CHST16,lostinamniotes.GenomeBiol.Evol.12,993–1012(2020).PubMed  GoogleScholar  59.Damas,J.etal.BroadhostrangeofSARS-CoV-2predictedbycomparativeandstructuralanalysisofACE2invertebrates.Proc.NatlAcad.Sci.USA117,22311–22322(2020).CAS  PubMed  GoogleScholar  60.Dussex,N.etal.Populationgenomicsrevealstheimpactoflong-termsmallpopulationsizeinthecriticallyendangeredkākāpō.CellGenom.(inthepress).61.Teeling,E.C.etal.Batbiology,genomes,andtheBat1Kproject:togeneratechromosome-levelgenomesforalllivingbatspecies.Annu.Rev.Anim.Biosci.6,23–46(2018).PubMed  GoogleScholar  62.Lewin,H.A.etal.EarthBioGenomeProject:sequencinglifeforthefutureoflife.Proc.NatlAcad.Sci.USA115,4325–4333(2018).CAS  PubMed  GoogleScholar  63.Jarvis,E.D.etal.Whole-genomeanalysesresolveearlybranchesinthetreeoflifeofmodernbirds.Science346,1320–1331(2014).CAS  PubMed  PubMedCentral  ADS  GoogleScholar  64.Li,S.etal.Genomicsignaturesofnear-extinctionandrebirthofthecrestedibisandotherendangeredbirdspecies.GenomeBiol.15,557(2014).PubMed  PubMedCentral  GoogleScholar  65.Koren,S.&Phillippy,A.M.Onechromosome,onecontig:completemicrobialgenomesfromlong-readsequencingandassembly.Curr.Opin.Microbiol.23,110–120(2015).CAS  PubMed  GoogleScholar  66.Jenjaroenpun,P.etal.Completegenomicandtranscriptionallandscapeanalysisusingthird-generationsequencing:acasestudyofSaccharomycescerevisiaeCEN.PK113-7D.NucleicAcidsRes.46,e38(2018).CAS  PubMed  PubMedCentral  GoogleScholar  67.Tyson,J.R.etal.MinION-basedlong-readsequencingandassemblyextendstheCaenorhabditiselegansreferencegenome.GenomeRes.28,266–274(2018).CAS  PubMed  PubMedCentral  GoogleScholar  68.Miga,K.H.etal.Telomere-to-telomereassemblyofacompletehumanXchromosome.Nature585,79–84(2020).CAS  PubMed  PubMedCentral  ADS  GoogleScholar  69.Logsdon,G.A.etal.Thestructure,functionandevolutionofacompletehumanchromosome8.Naturehttps://doi.org/10.1038/s41586-021-03420-7(2021).70.Beçak,M.L.,Beçak,W.,Roberts,F.L.,Shoffner,R.N.&Volpe,P.(eds.)ChromosomeAtlas:Fish,Amphibians,Reptiles,andBirdsVol.2(Springer,1973).71.Vurture,G.W.etal.GenomeScope:fastreference-freegenomeprofilingfromshortreads.Bioinformatics33,2202–2204(2017).CAS  PubMed  PubMedCentral  GoogleScholar  72.Kumar,S.,Stecher,G.,Suleski,M.&Hedges,S.B.TimeTree:aresourcefortimelines,timetrees,anddivergencetimes.Mol.Biol.Evol.34,1812–1819(2017).CAS  PubMed  GoogleScholar  73.Ondov,B.D.etal.Mash:fastgenomeandmetagenomedistanceestimationusingMinHash.GenomeBiol.17,132(2016).PubMed  PubMedCentral  GoogleScholar  74.Ning,Z.&Harry,E.Scaff10Xhttps://github.com/wtsi-hpag/Scaff10X.75.Morgulis,A.,Gertz,E.M.,Schäffer,A.A.&Agarwala,R.WindowMasker:window-basedmaskerforsequencedgenomes.Bioinformatics22,134–141(2006).CAS  PubMed  GoogleScholar  76.Chin,C.-S.etal.Nonhybrid,finishedmicrobialgenomeassembliesfromlong-readSMRTsequencingdata.Nat.Methods10,563–569(2013).CAS  PubMed  GoogleScholar  77.Koren,S.etal.Canu:scalableandaccuratelong-readassemblyviaadaptivek-merweightingandrepeatseparation.GenomeRes.27,722–736(2017).CAS  PubMed  PubMedCentral  GoogleScholar  78.Weisenfeld,N.I.,Kumar,V.,Shah,P.,Church,D.M.&Jaffe,D.B.Directdeterminationofdiploidgenomesequences.GenomeRes.27,757–767(2017).CAS  PubMed  PubMedCentral  GoogleScholar  79.Ghurye,J.etal.IntegratingHi-Clinkswithassemblygraphsforchromosome-scaleassembly.PLoSComput.Biol.15,e1007273(2019).CAS  PubMed  PubMedCentral  GoogleScholar  80.Lieberman-Aiden,E.etal.Comprehensivemappingoflong-rangeinteractionsrevealsfoldingprinciplesofthehumangenome.Science326,289–293(2009).CAS  PubMed  PubMedCentral  ADS  GoogleScholar  81.Luo,R.etal.SOAPdenovo2:anempiricallyimprovedmemory-efficientshort-readdenovoassembler.Gigascience1,18(2012).PubMed  PubMedCentral  GoogleScholar  82.English,A.C.etal.Mindthegap:upgradinggenomeswithPacificBiosciencesRSlong-readsequencingtechnology.PLoSONE7,e47768(2012).CAS  PubMed  PubMedCentral  ADS  GoogleScholar  83.Bishara,A.etal.Readcloudsuncovervariationincomplexregionsofthehumangenome.GenomeRes.25,1570–1580(2015).CAS  PubMed  PubMedCentral  GoogleScholar  84.Walker,B.J.etal.Pilon:anintegratedtoolforcomprehensivemicrobialvariantdetectionandgenomeassemblyimprovement.PLoSONE9,e112963(2014).PubMed  PubMedCentral  ADS  GoogleScholar  85.Garrison,E.&Marth,G.Haplotype-basedvariantdetectionfromshort-readsequencing.Preprintathttp://arxiv.org/abs/1207.3907(2012).86.Jain,C.,Koren,S.,Dilthey,A.,Phillippy,A.M.&Aluru,S.Afastadaptivealgorithmforcomputingwhole-genomehomologymaps.Bioinformatics34,i748–i756(2018).CAS  PubMed  PubMedCentral  GoogleScholar  87.BionanoGenomics,Inc.BionanoSoftwareDownloads.https://bionanogenomics.com/support/software-downloads/.88.ArimaGenomics,Inc.ArimaGenomicsMappingPipeline.https://github.com/ArimaGenomics/mapping_pipeline.89.Li,H.&Durbin,R.FastandaccurateshortreadalignmentwithBurrows–Wheelertransform.Bioinformatics25,1754–1760(2009).CAS  PubMed  PubMedCentral  GoogleScholar  90.Li,H.Minimap2:pairwisealignmentfornucleotidesequences.Bioinformatics34,3094–3100(2018).CAS  PubMed  PubMedCentral  GoogleScholar  91.Chaisson,M.J.&Tesler,G.Mappingsinglemoleculesequencingreadsusingbasiclocalalignmentwithsuccessiverefinement(BLASR):applicationandtheory.BMCBioinformatics13,238(2012).CAS  PubMed  PubMedCentral  GoogleScholar  92.Li,H.etal.Thesequencealignment/mapformatandSAMtools.Bioinformatics25,2078–2079(2009).PubMed  PubMedCentral  GoogleScholar  93.Dierckxsens,N.,Mardulyn,P.&Smits,G.NOVOPlasty:denovoassemblyoforganellegenomesfromwholegenomedata.NucleicAcidsRes.45,e18(2017).PubMed  GoogleScholar  94.Soorni,A.,Haak,D.,Zaitlin,D.&Bombarely,A.Organelle_PBA,apipelineforassemblingchloroplastandmitochondrialgenomesfromPacBioDNAsequencingdata.BMCGenomics18,49(2017).PubMed  PubMedCentral  GoogleScholar  95.Chow,W.etal.gEVAL — aweb-basedbrowserforevaluatinggenomeassemblies.Bioinformatics32,2508–2510(2016).CAS  PubMed  PubMedCentral  GoogleScholar  96.Durand,N.C.etal.JuiceboxprovidesavisualizationsystemforHi-Ccontactmapswithunlimitedzoom.CellSyst.3,99–101(2016).CAS  PubMed  PubMedCentral  GoogleScholar  97.Kerpedjiev,P.etal.HiGlass:web-basedvisualexplorationandanalysisofgenomeinteractionmaps.GenomeBiol.19,125(2018).PubMed  PubMedCentral  GoogleScholar  98.Camacho,C.etal.BLAST+:architectureandapplications.BMCBioinformatics10,421(2009).PubMed  PubMedCentral  GoogleScholar  99.Harris,R.S.ImprovedPairwiseAlignmentofGenomicDNA. Thesis,PennsylvaniaStateUniv.(2007).100.Kent,W.J.,Baertsch,R.,Hinrichs,A.,Miller,W.&Haussler,D.Evolution’scauldron:duplication,deletion,andrearrangementinthemouseandhumangenomes.Proc.NatlAcad.Sci.USA100,11484–11489(2003).CAS  PubMed  ADS  GoogleScholar  101.Kolmogorov,M.,Raney,B.,Paten,B.&Pham,S.Ragout—areference-assistedassemblytoolforbacterialgenomes.Bioinformatics30,i302–i309(2014).CAS  PubMed  PubMedCentral  GoogleScholar  102.Farré,M.etal.Novelinsightsintochromosomeevolutioninbirds,archosaurs,andreptiles.GenomeBiol.Evol.8,2442–2451(2016).PubMed  PubMedCentral  GoogleScholar  103.Guan,D.Asset.https://github.com/dfguan/asset.104.Tarailo-Graovac,M.&Chen,N.UsingRepeatMaskertoidentifyrepetitiveelementsingenomicsequences.Curr.Protoc.Bioinformatics 25,4.10.1–4.10.14(2009). GoogleScholar  105.Krumsiek,J.,Arnold,R.&Rattei,T.Gepard:arapidandsensitivetoolforcreatingdotplotsongenomescale.Bioinformatics23,1026–1028(2007).CAS  PubMed  GoogleScholar  106.Harry,E.PretextView.https://github.com/wtsi-hpag/PretextView.107.Kurtz,S.etal.Versatileandopensoftwareforcomparinglargegenomes.GenomeBiol.5,R12(2004).PubMed  PubMedCentral  GoogleScholar  108.Nattestad,M.Dot.https://github.com/MariaNattestad/dot.DownloadreferencesAcknowledgementsWethankthefollowingpersonsforfeedbackandsupport:R.Johnson,E.Karlsson,K.LindbladToh,W.Jun,I.Korf,W.Haerty,G.Etherington,B.Clavijo,andA.Komissarovfordiscussionsintheearlystagesoftheproject;R.FullerforhelpwiththeG10Kwebsitemaintenance,andH.SegalforhelpwithwithVGPwebsitedevelopment;M.LinhPhamforhelpwithinitialgrantwriting;L.Shalmiyevforadministrativehelp;D.Church,G.Kol,K.Baruch,O.Barad,I.Liachko,E.Muzychenko,S.Garg,andM.Kolmogorovforpreliminaryanalysesperformedononeormoregenomes;K.Oliver,C.CortonandJ.Skeltonfordatageneration;E.Harryfortechnicalsupportinscaff10xandPretext;C.MazzoniforcoordinatingstudentsandtrainingatLeibnizInstituteforZooandWildlifeResearchandBerlinCenterforGenomicsinBiodiversityResearch;andM.Driller,C.Caswara,M.Vafadar,N.Hill,D.DePanis,A.Whibley,B.Maloney,C.Mitchell,G.Gallo,J.Gaige,K.Amoako-Boadu,M.JoseGomez,M.Montero,D.Ratnikov,S.Brown,S.Zylka,S.Marcus,andT.CarrascoforcompletingtrainingandtestingtheVGPpipelinebyproducingordinalrepresentativegenomeassembliesnotdescribedinthismanuscript.Wethankourcompanypartners(listedbelow),NCBI,EBI,andAmazonAWS,includingAWSforsponsoringsequencestorage.J.FekecsandD.Lejacreatedtheanimalimages,andJ.Kimmodifiedthemtosilhouettes.Wethankthemfortheirpermissiontopublish.A.R.,S.K.,B.P.W.andA.M.P.weresupportedbytheIntramuralResearchProgramoftheNHGRI,NIH(1ZIAHG200398).A.R.wasalsosupportedbytheKoreaHealthTechnologyR&DProjectthroughKHIDI,fundedbytheMinistryofHealth&Welfare,RepublicofKorea(HI17C2098).S.A.M.,I.B.andR.D.weresupportedbyWellcomeTrustgrantWT207492;W.C.,M.Smith,Z.N.,Y.S.,J.C.,S.Pelan,J.T.,A.T.,J.W.andKerstinHowebyWT206194;L.H.,F.M.,KevinHoweandP.FlicekbyWT108749/Z/15/Z,WT218328/B/19/ZandtheEuropeanMolecularBiologyLaboratory.O.F.andE.D.J.weresupportedbyHowardHughesMedicalInstituteandRockefellerUniversitystart-upfundsforthisproject.J.D.andH.A.L.weresupportedbytheRobertandRosabelOsborneEndowment.M.U.-S.receivedfundingfromtheEuropeanUnion’sHorizon2020researchandinnovationprogrammeundertheMarieSkłodowska-Curiegrantagreement(750747).F.T.-N.,J.Hoffman,P.MastersonandK.C.weresupportedbytheIntramuralResearchProgramoftheNLM,NIH.C.L.,B.J.K.,J.KimandH.K.weresupportedbytheMarineBiotechnologyProgramofKIMST,fundedbytheMinistryofOceanandFisheries,RepublicofKorea(20180430).M.C.wassupportedbySloanResearchFellowship(FG-2020-12932).S.C.V.wasfundedbyaMaxPlanckResearchGroupawardfromtheMaxPlanckSociety,andaHumanFrontiersScienceProgram(HFSP)Researchgrant(RGP0058/2016).T.M.L.,W.E.J.andtheCanadalynxgenomewerefundedbytheMaineDepartmentofInlandFisheries&Wildlife(F11AF01099),includingwhenW.E.J.heldaNationalResearchCouncilResearchAssociateshipAwardattheWalterReedArmyInstituteofResearch(WRAIR).C.B.wassupportedbytheNSF(1457541and1456612).D.B.wasfundedbyTheUniversityofQueensland(HFSP-RGP0030/2015).D.I.wassupportedbyScienceExchangeInc.(PaloAlto,CA).H.W.D.wassupportedbyNSFgrants(OPP-0132032ICEFISH2004Cruise,PLR-1444167andOPP-1955368)andtheMarineScienceCenteratNortheasternUniversity(416).G.J.P.N.andthethornyskategenomewerefundedbyLenfestOceanProgram(30884).M.P.wasfundedbytheGermanFederalMinistryofEducationandResearch(01IS18026C).M.MalinskywassupportedbyanEMBOfellowship(ALTF456-2016).Thefollowingauthors’contributionsweresupportedbytheNIH:S.Selvaraj(R44HG008118);C.V.M.,S.R.F.,P.V.L.(R21DC014432/DC/NIDCD);K.D.M.(R01GM130691);H.C.(5U41HG002371-19);M.D.(U41HG007234);andB.P.(R01HG010485).D.G.wassupportedbytheNationalKeyResearchandDevelopmentProgramofChina(2017YFC1201201,2018YFC0910504and2017YFC0907503).F.O.A.wassupportedbyAl-GannasQatariSocietyandTheCulturalVillageFoundation-Katara,Doha,StateofQatarandMonashUniversityMalaysia.C.T.wassupportedbyTheRockefellerUniversity.M.HillerwassupportedbytheLOEWE-CentreforTranslationalBiodiversityGenomics(TBG)fundedbytheHessenStateMinistryofHigherEducation,ResearchandtheArts(HMWK).H.C.wassupportedbytheNHGRI(5U41HG002371-19).R.H.S.K.wasfundedbytheMaxPlanckSocietywithcomputationalresourcesatthebwUniClusterandBinACfundedbytheMinistryofScience,ResearchandtheArtsBaden-WürttembergandtheUniversitiesoftheStateofBaden-Württemberg,Germany(bwHPC-C5).B.V.wassupportedbytheBiomedicalResearchCouncilofA*STAR,Singapore.T.M.-B.wasfundedbytheEuropeanResearchCouncilundertheEuropeanUnion’sHorizon2020researchandinnovationprogramme(864203),MINECO/FEDER,UE(BFU2017-86471-P),UnidaddeExcelenciaMaríadeMaeztu,AEI(CEX2018-000792-M),aHowardHughesInternationalEarlyCareeraward,ObraSocial“LaCaixa”andSecretariad’UniversitatsiRecercaandCERCAProgrammedelDepartamentd’EconomiaiConeixementdelaGeneralitatdeCatalunya(GRC2017SGR880).E.C.T.wassupportedbytheEuropeanResearchCouncil(ERC-2012-StG311000)andanIrishResearchCouncilLaureateAward.M.T.P.G.wassupportedbyanERCConsolidatorAward681396-ExtinctionGenomics,andaDanishNationalResearchFoundationCenterGrant(DNRF143).T.W.wassupportedbytheNSF(1458652).J.M.GraveswassupportedbytheAustralianResearchCouncil(CEO561477).E.W.M.waspartiallysupportedbytheGermanFederalMinistryofEducationandResearch(01IS18026C).ComplementarysequencingsupportfortheAnna’shummingbirdandseveralgenomeswasprovidedbyPacificBiosciences,BionanoGenomics,DovetailGenomics,ArimaGenomics,PhaseGenomics,10XGenomics,NRGene,OxfordNanoporeTechnologies,Illumina,andDNAnexus.AllothersequencingandassemblywereconductedattheRockefellerUniversity,SangerInstitute,andMaxPlanckInstituteDresdengenomelabs.PartofthisworkusedthecomputationalresourcesoftheNIHHPCBiowulfcluster(https://hpc.nih.gov).WeacknowledgefundingfromtheWellcomeTrust(108749/Z/15/Z)andtheEuropeanMolecularBiologyLaboratory.WethankLeComitéScientifiqueRégionalduPatrimoineNaturelandDirectiondel’Environnement,del’AménagementetduLogement,Guyanneforresearchapprovalsandexportpermits.AuthorinformationAuthornotesTheseauthorscontributedequally:ArangRhie,ShaneA.McCarthy,OlivierFedrigoTheseauthorsjointlysupervisedthiswork:KerstinHowe,EugeneW.Myers,RichardDurbin,AdamM.Phillippy,ErichD.JarvisAffiliationsGenomeInformaticsSection,ComputationalandStatisticalGenomicsBranch,NationalHumanGenomeResearchInstitute,NationalInstitutesofHealth,Bethesda,MD,USAArangRhie, SergeyKoren, BrianP.Walenz & AdamM.PhillippyDepartmentofGenetics,UniversityofCambridge,Cambridge,UKShaneA.McCarthy, IlianaBista, DengfengGuan & RichardDurbinWellcomeSangerInstitute,Cambridge,UKShaneA.McCarthy, WilliamChow, IlianaBista, MichelleSmith, MilanMalinsky, ZeminNing, YingSims, JoannaCollins, SarahPelan, JamesTorrance, AlanTracey, JonathanWood, KerstinHowe & RichardDurbinVertebrateGenomeLab,TheRockefellerUniversity,NewYork,NY,USAOlivierFedrigo, GiulioFormenti, BettinaHaase, JacquelynMountcastle, SadyePaez & ErichD.JarvisTheGenomeCenter,UniversityofCaliforniaDavis,Davis,CA,USAJoanaDamas & HarrisA.LewinLaboratoryofNeurogeneticsofLanguage,TheRockefellerUniversity,NewYork,NY,USAGiulioFormenti, GregoryL.Gedman, LindseyJ.Cantin, SadyePaez, MatthewT.Biegler, ConstantinaTheofanopoulou & ErichD.JarvisLeibnizInstituteforZooandWildlifeResearch,DepartmentofEvolutionaryGenetics,Berlin,GermanyMarcelaUliano-SilvaBerlinCenterforGenomicsinBiodiversityResearch,Berlin,GermanyMarcelaUliano-SilvaDNAnexusInc.,MountainView,CA,USAArkarachaiFungtammasan, MariaSimbirsky & BrettT.HanniganInterdisciplinaryPrograminBioinformatics,SeoulNationalUniversity,Seoul,RepublicofKoreaJuwanKim, ChulLee & HeebalKimDepartmentofAgriculturalBiotechnologyandResearchInstituteofAgricultureandLifeSciences,SeoulNationalUniversity,Seoul,RepublicofKoreaByungJuneKo & HeebalKimUniversityofSouthernCalifornia,LosAngeles,CA,USAMarkChaisson & RobelE.DagnewNationalCenterforBiotechnologyInformation,NationalLibraryofMedicine,NIH,Bethesda,MD,USAFrancoiseThibaud-Nissen, JinnaHoffman, PatrickMasterson & KarenClarkEuropeanMolecularBiologyLaboratory,EuropeanBioinformaticsInstitute,WellcomeGenomeCampus,Hinxton,UKLeanneHaggerty, FergalMartin, KevinHowe & PaulFlicekMaxPlanckInstituteofMolecularCellBiologyandGenetics,Dresden,GermanySylkeWinkler, MartinPippel, EkaterinaOsipova & EugeneW.MyersDRESDEN-conceptGenomeCenter,Dresden,GermanySylkeWinklerNovogene,Durham,NC,USAJasonHowardNeurogeneticsofVocalCommunicationGroup,MaxPlanckInstituteforPsycholinguistics,Nijmegen,TheNetherlandsSonjaC.VernesDondersInstituteforBrain,CognitionandBehaviour,Nijmegen,TheNetherlandsSonjaC.VernesSchoolofBiology,UniversityofStAndrews,StAndrews,UKSonjaC.VernesUniversityofMassachusettsCooperativeFishandWildlifeResearchUnit,Amherst,MA,USATanyaM.LamaSchoolofBiologicalScience,TheEnvironmentInstitute,UniversityofAdelaide,Adelaide,SouthAustralia,AustraliaFrankGrutznerBondLifeSciencesCenter,UniversityofMissouri,Columbia,MO,USAWesleyC.WarrenDepartmentofBiology,EastCarolinaUniversity,Greenville,NC,USAChristopherN.BalakrishnanUQGenomics,UniversityofQueensland,Brisbane,Queensland,AustraliaDaveBurtDepartmentofBiologicalSciences,ClemsonUniversity,Clemson,SC,USAJuliaM.GeorgeTheGeneticRescueFoundation,Wellington,NewZealandDavidIornsKākāpōRecovery,DepartmentofConservation,Invercargill,NewZealandAndrewDigby & DarylEasonDepartmentofZoology,UniversityofOtago,Dunedin,NewZealandBruceRobertsonUniversityofArizonaGeneticsCore,Tucson,AZ,USATaylorEdwardsDepartmentofLifeSciences,NaturalHistoryMuseum,London,UKMarkWilkinsonSchoolofNaturalSciences,BangorUniversity,Gwynedd,UKGeorgeTurnerDepartmentofBiology,UniversityofKonstanz,Konstanz,GermanyAxelMeyer, AndreasF.Kautt, PaoloFranchini & RobertH.S.KrausDepartmentofOrganismicandEvolutionaryBiology,HarvardUniversity,Cambridge,MA,USAAndreasF.KauttDepartmentofMarineandEnvironmentalSciences,NortheasternUniversityMarineScienceCenter,Nahant,MA,USAH.WilliamDetrichIIIDepartmentofBiology,UniversityofAntwerp,Antwerp,BelgiumHannesSvardalNaturalisBiodiversityCenter,Leiden,TheNetherlandsHannesSvardalInstituteofBiology,Karl-FranzensUniversityofGraz,Graz,AustriaMaximilianWagnerFloridaMuseumofNaturalHistory,UniversityofFlorida,Gainesville,FL,USAGavinJ.P.NaylorCenterforSystemsBiology,Dresden,GermanyMartinPippel, EkaterinaOsipova & EugeneW.MyersZoologicalInstitute,UniversityofBasel,Basel,SwitzerlandMilanMalinskyTag.bio,SanFrancisco,CA,USAMarkMooneyUCSantaCruzGenomicsInstitute,UniversityofCalifornia,SantaCruz,CA,USATrevorPesout, RichardE.Green, ErikGarrison, HiramClawson, MarkDiekhans, LuisNassar, BenedictPaten & DavidHausslerSanDiegoZooGlobal,Escondido,CA,USAMarlysHouck, AnnMisuraca & OliverA.RyderPacificBiosciences,MenloPark,CA,USASarahB.Kingan, RichardHall, ZevKronenberg, IvanSović, ChristopherDunn & JonasKorlachDigitalBioLogic,Ivanić-Grad,CroatiaIvanSovićBionanoGenomics,SanDiego,CA,USAAlexHastie & JoyceLeeArimaGenomics,SanDiego,CA,USASiddarthSelvarajDovetailGenomics,SantaCruz,CA,USARichardE.Green & JayGhuryeIndependentResearcher,SantaCruz,CA,USANicholasH.PutnamCNAG-CRG,CentreforGenomicRegulation,BarcelonaInstituteofScienceandTechnology,Barcelona,SpainIvoGutUniversitatPompeuFabra,Barcelona,SpainIvoGutDepartmentofComputerScience,UniversityofMarylandCollegePark,CollegePark,MD,USAJayGhuryeSchoolofComputerScienceandTechnology,CenterforBioinformatics,HarbinInstituteofTechnology,Harbin,ChinaDengfengGuanDepartmentofPsychology,InstituteforMindandBiology,UniversityofChicago,Chicago,IL,USASarahE.LondonDepartmentofGeneticsandBiochemistry,ClemsonUniversity,Clemson,SC,USADavidF.ClaytonDepartmentofBehavioralNeuroscience,OregonHealthandScienceUniversity,Portland,OR,USAClaudioV.Mello, SamanthaR.Friedrich & PeterV.LovellMaxPlanckInstituteforthePhysicsofComplexSystems,Dresden,GermanyEkaterinaOsipovaMonashUniversityMalaysiaGenomicsFacility,SchoolofScience,SelangorDarulEhsan,MalaysiaFarooqO.Al-AjliTropicalMedicineandBiologyMultidisciplinaryPlatform,MonashUniversityMalaysia,SelangorDarulEhsan,MalaysiaFarooqO.Al-AjliQatarFalconGenomeProject,Doha,QatarFarooqO.Al-AjliDepartmentofBiosciences,UniversityofMilan,Milan,ItalySimonaSecomandieGnome,Inc.,Seoul,RepublicofKoreaHeebalKim & WooriKwakLOEWECentreforTranslationalBiodiversityGenomics,Frankfurt,GermanyMichaelHillerSenckenbergResearchInstitute,Frankfurt,GermanyMichaelHillerGoethe-University,FacultyofBiosciences,Frankfurt,GermanyMichaelHillerBGI-Shenzhen,Shenzhen,ChinaYangZhouDepartmentofBiology,PennsylvaniaStateUniversity,UniversityPark,PA,USARobertS.Harris & KaterynaD.MakovaCenterforMedicalGenomics,PennsylvaniaStateUniversity,UniversityPark,PA,USAKaterynaD.Makova & PaulMedvedevCenterforComputationalBiologyandBioinformatics,PennsylvaniaStateUniversity,UniversityPark,PA,USAKaterynaD.Makova & PaulMedvedevDepartmentofComputerScienceandEngineering,PennsylvaniaStateUniversity,UniversityPark,PA,USAPaulMedvedevDepartmentofBiochemistryandMolecularBiology,PennsylvaniaStateUniversity,UniversityPark,PA,USAPaulMedvedevHoonygen,Seoul,KoreaWooriKwakDepartmentofMigration,MaxPlanckInstituteofAnimalBehavior,Radolfzell,GermanyRobertH.S.KrausDepartmentofBiologicalSciences,UniversidaddelosAndes,Bogotá,ColombiaAndrewJ.CrawfordCenterforEvolutionaryHologenomics,TheGLOBEInstitute,UniversityofCopenhagen,Copenhagen,DenmarkM.ThomasP.GilbertUniversityMuseum,NTNU,Trondheim,NorwayM.ThomasP.GilbertChinaNationalGenebank,BGI-Shenzhen,Shenzhen,ChinaGuojieZhangVillumCenterforBiodiversityGenomics,SectionforEcologyandEvolution,DepartmentofBiology,UniversityofCopenhagen,Copenhagen,DenmarkGuojieZhangStateKeyLaboratoryofGeneticResourcesandEvolution,KunmingInstituteofZoology,ChineseAcademyofSciences,Kunming,ChinaGuojieZhangCenterforExcellenceinAnimalEvolutionandGenetics,ChineseAcademyofSciences,Kunming,ChinaGuojieZhangInstituteofMolecularandCellBiology,A*STAR,Biopolis,Singapore,SingaporeByrappaVenkateshCentreforBiodiversity,RoyalOntarioMuseum,Toronto,Ontario,CanadaRobertW.MurphySmithsonianConservationBiologyInstitute,CenterforSpeciesSurvival,NationalZoologicalPark,Washington,DC,USAKlaus-PeterKoepfli & WarrenE.JohnsonDepartmentofEcologyandEvolutionaryBiology,UniversityofCaliforniaSantaCruz,SantaCruz,CA,USABethShapiro & DavidHausslerHowardHughesMedicalInstitute,ChevyChase,MD,USABethShapiro & ErichD.JarvisTheWalterReedBiosystematicsUnit,MuseumSupportCenterMRC-534,SmithsonianInstitution,Suitland,MD,USAWarrenE.JohnsonWalterReedArmyInstituteofResearch,SilverSpring,MD,USAWarrenE.JohnsonDepartmentofBiologicalSciences,EarlhamInstitute,UniversityofEastAnglia,Norwich,UKFedericaDiPalmaInstituteofEvolutionaryBiology(UPF-CSIC),PRBB,Barcelona,SpainTomasMarques-BonetCatalanInstitutionofResearchandAdvancedStudies(ICREA),Barcelona,SpainTomasMarques-BonetCentreforGenomicRegulation(CRG),BarcelonaInstituteofScienceandTechnology(BIST),Barcelona,SpainTomasMarques-BonetInstitutCatalàdePaleontologiaMiquelCrusafont,UniversitatAutònomadeBarcelona,Barcelona,SpainTomasMarques-BonetSchoolofBiologyandEnvironmentalScience,UniversityCollegeDublin,Dublin,IrelandEmmaC.TeelingDepartmentofComputerScience,TheUniversityofIllinoisatUrbana-Champaign,Urbana,IL,USATandyWarnowSchoolofLifeScience,LaTrobeUniversity,Melbourne,Victoria,AustraliaJenniferMarshallGravesDepartmentofEvolution,Behavior,andEcology,UniversityofCaliforniaSanDiego,LaJolla,CA,USAOliverA.RyderLaboratoryofGenomicsDiversity-CenterforComputerTechnologies,ITMOUniversity,St.Petersburg,RussianFederationStephenJ.O’BrienGuyHarveyOceanographicCenter,HalmosCollegeofNaturalSciencesandOceanography,NovaSoutheasternUniversity,FortLauderdale,FL,USAStephenJ.O’BrienDepartmentofEvolutionandEcology,UniversityofCaliforniaDavis,Davis,CA,USAHarrisA.LewinJohnMuirInstitutefortheEnvironment,UniversityofCaliforniaDavis,Davis,CA,USAHarrisA.LewinFacultyofComputerScience,TechnicalUniversityDresden,Dresden,GermanyEugeneW.MyersAuthorsArangRhieViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarShaneA.McCarthyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarOlivierFedrigoViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJoanaDamasViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGiulioFormentiViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSergeyKorenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarcelaUliano-SilvaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarWilliamChowViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarArkarachaiFungtammasanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJuwanKimViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarChulLeeViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarByungJuneKoViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarkChaissonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGregoryL.GedmanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarLindseyJ.CantinViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFrancoiseThibaud-NissenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarLeanneHaggertyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarIlianaBistaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMichelleSmithViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBettinaHaaseViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJacquelynMountcastleViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSylkeWinklerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSadyePaezViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJasonHowardViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSonjaC.VernesViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTanyaM.LamaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFrankGrutznerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarWesleyC.WarrenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarChristopherN.BalakrishnanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDaveBurtViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJuliaM.GeorgeViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMatthewT.BieglerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDavidIornsViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAndrewDigbyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDarylEasonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBruceRobertsonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTaylorEdwardsViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarkWilkinsonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGeorgeTurnerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAxelMeyerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAndreasF.KauttViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPaoloFranchiniViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarH.WilliamDetrichIIIViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarHannesSvardalViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMaximilianWagnerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGavinJ.P.NaylorViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMartinPippelViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMilanMalinskyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarkMooneyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMariaSimbirskyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBrettT.HanniganViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTrevorPesoutViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarlysHouckViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAnnMisuracaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSarahB.KinganViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRichardHallViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarZevKronenbergViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarIvanSovićViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarChristopherDunnViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarZeminNingViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAlexHastieViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJoyceLeeViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSiddarthSelvarajViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRichardE.GreenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarNicholasH.PutnamViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarIvoGutViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJayGhuryeViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarErikGarrisonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarYingSimsViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJoannaCollinsViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSarahPelanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJamesTorranceViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAlanTraceyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJonathanWoodViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRobelE.DagnewViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDengfengGuanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSarahE.LondonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDavidF.ClaytonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarClaudioV.MelloViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSamanthaR.FriedrichViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPeterV.LovellViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarEkaterinaOsipovaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFarooqO.Al-AjliViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSimonaSecomandiViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarHeebalKimViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarConstantinaTheofanopoulouViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMichaelHillerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarYangZhouViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRobertS.HarrisViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKaterynaD.MakovaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPaulMedvedevViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJinnaHoffmanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPatrickMastersonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKarenClarkViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFergalMartinViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKevinHoweViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPaulFlicekViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBrianP.WalenzViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarWooriKwakViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarHiramClawsonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarkDiekhansViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarLuisNassarViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBenedictPatenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRobertH.S.KrausViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAndrewJ.CrawfordViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarM.ThomasP.GilbertViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGuojieZhangViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarByrappaVenkateshViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRobertW.MurphyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKlaus-PeterKoepfliViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBethShapiroViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarWarrenE.JohnsonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFedericaDiPalmaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTomasMarques-BonetViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarEmmaC.TeelingViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTandyWarnowViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJenniferMarshallGravesViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarOliverA.RyderViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDavidHausslerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarStephenJ.O’BrienViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJonasKorlachViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarHarrisA.LewinViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKerstinHoweViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarEugeneW.MyersViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRichardDurbinViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAdamM.PhillippyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarErichD.JarvisViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarContributionsWrotethepaperandco-coordinatedthestudy:A.R.,E.D.J.,A.M.P.,R.D.,E.W.M.,KerstinHowe,S.A.M.,O.F.Coordinationwithvendors:J.Korlach,S.Selvaraj,R.E.G.,A.H.,M.Mooney.Collectedsamples:M.T.P.G.,W.E.J.,R.W.M.,G.Z.,B.V.,M.T.B.,J.Howard,S.C.V.,T.M.L.,F.G.,W.C.W.,D.B.,J.M.George,M.T.B.,D.I.,A.D.,D.E.,B.R.,T.E.,M.Wilkinson,G.T.,A.Meyer,A.F.K.,P.Franchini,H.W.D.,H.S.,M.Wagner,G.J.P.N.,R.D.,E.D.J.,E.C.T.,R.H.S.K.Generatedgenomedata:O.F.,I.B.,M.Smith,B.H.,J.M.,S.W.,C.B.,A.Meyer,A.F.K.,P.Franchini,I.G.,D.F.C.,C.V.M.Generatedgenomeassemblies:A.R.,S.A.M.,S.K.,M.P.,S.B.K.,R.H.,J.G.,Z.N.,J.L.,B.P.W.,M.Malinsky.Generated/modifiedsoftware:S.K.,A.R.,S.B.K.,R.H.,Z.K.,J.Korlach,I.S.,C.D.,Z.N.,A.H.,J.L.,J.G.,E.G.,C.V.M.,S.R.F.,N.H.P.Pipelinedevelopment:A.R.,S.A.M.,G.F.,S.K.,M.U.-S.,A.F.,M.Simbirsky,B.T.H.,T.P.,M.P.,E.W.M.,R.D.,A.M.P.GeneratedMTassemblies:G.F.,J.Korlach.Curation:KerstinHowe,W.C.,Y.S.,J.C.,S.Pelan,J.T.,A.T.,J.W.,Y.Z.,J.D.,H.A.L.Sexchromosomes:Y.Z.,R.S.H.,K.D.M.,P.Medvedev,J.M.Graves.Hummingbirdkaryotypeanalyses:M.Houck,A.Misuraca,M.P.,E.W.M.,E.D.J.Annotation:F.T.-N.,L.H.,J.Hoffman,P.Masterson,K.C.,F.M.,KevinHowe,P.Flicek,D.B.Evaluationanalysis:A.R.,J.D.,M.U.-S.,J.Kim,C.L.,B.J.K.,M.C.,G.L.G.,L.J.C.,F.T.-N.,L.H.,J.M.George,J.G.,R.E.D.,D.G.,S.E.L.,D.F.C.,C.V.M.,S.R.F.,P.V.L.,E.O.,F.O.A.-A.,S.Secomandi,C.T.,M.Hiller,H.K.,KerstinHowe,E.W.M.,R.D.,A.M.P.,E.D.J.Biologicalfindings:J.D.,J.Kim,C.L.,B.J.K.,G.L.G.,L.J.C.,H.A.L.,A.R.,E.D.J.Dataavailability:A.R.,S.A.M.,W.C.,A.F.,S.Paez,M.Simbirsky,B.T.H.,B.P.W.,W.K.,H.C.,M.D.,L.N.,B.P.,A.M.P.,E.D.J.G10Kcouncil,founders,andcoordinationofVGP:T.M.-B.,A.J.C.,F.D.P.,R.D.,M.T.P.G.,E.D.J.,K.-P.K.,H.A.L.,R.W.M.,E.W.M.,E.C.T.,B.V.,G.Z.,A.M.P.,S.Paez,J.M.Graves,O.A.R.,D.H.,S.J.O.,T.W.andB.S.Allauthorsreviewedthemanuscript.CorrespondingauthorsCorrespondenceto KerstinHowe,EugeneW.Myers,RichardDurbin,AdamM.PhillippyorErichD.Jarvis.Ethicsdeclarations Competinginterests Duringthecontributingperiod,B.T.H.,M.Simbirsky,A.F.andM.MooneywereemployeesofDNAnexusInc.S.B.K.,R.H.,Z.K.,J.Korlach,I.S.andC.D.werefull-timeemployeesatPacificBiosciences,acompanydevelopingsingle-moleculelongreadsequencingtechnologies.R.E.G.,N.H.P.,andJ.G.wereaffiliatedwithDovetailGenomics,acompanydevelopinggenomeassemblytools,includingHi-C.I.G.wasaffiliatedwithOxfordNanoporeTechnologies,acompanygeneratinglongreadsequencingtechnologies.A.H.andJ.LwereemployeesofBionanoGenomics,acompanydevelopingopticalmapsforgenomeassembly.S.SelvarajwasanemployeeofArimaGenomics,acompanydevelopingHi-Cdataforgenomeassemblies.R.D.isascientificadvisoryboardmemberofDovetailInc.P.FlicekisamemberoftheScientificAdvisoryBoardsofFabricGenomics,Inc.,andEagleGenomics,Ltd.H.C.receivesroyaltiesfromthesaleofUCSCGenomeBrowsersourcecode,LiftOver,GBiB,andGBiClicensestocommercialentities.S.K.hasreceivedtravelfundstospeakatsymposiaorganizedbyOxfordNanopore.M.D.andL.N.receiveroyaltiesfromlicensingofUCSCGenomeBrowser.ForW.E.J.,thecontenthereisnottobeconstruedastheviewsoftheDAorDOD.Allotherauthorsdeclarenocompetinginterests. AdditionalinformationPeerreviewinformationNaturethanksMichaelSchatz,JustinZookandtheother,anonymous,reviewer(s)fortheircontributiontothepeerreviewofthiswork.Peerreviewerreportsareavailable.Publisher’snoteSpringerNatureremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.ExtendeddatafiguresandtablesExtendedDataFig.1AssessmentofcompletenessoftheAnna’shummingbirdassembly.a,b,StepsandNG50continuityvaluesoftheVGPassemblypipelinethatgavethehighestqualityassemblyforAnna’shummingbird(a)andCanadalynx(b)inthisstudy.ThespecificstepsareoutlinedfurtherinExtendedDataFig.2a,andMethods.c,Whole-genomealignmentofCLR(red),linkedreads(green),opticalmaps(blue),andHi-Creads(purple)oftheAnna’shummingbird,alongwithtelomeremotif(TTAGGGanditsreversecomplement,yellow)andgaps(grey)usingAssetsoftware103.Foreachdatatype,thefirstrowshowsthemappedcoverage,andthesecondshowsthenumberofcountsoflowcoverageorsignsofcollapsedrepeats.Largerchromosomalscaffolds(1–19)havefewergapsandlowcoverageorcollapsedregionscomparedwiththemicrochromosomes(20–33).Chromosomes14,15and19oftheAnna’shummingbirdwerethemoststructurallyreliablescaffolds,havingonlyonegapeachwithnolow-supportregions.Wedefinedreliableblocksasthosesupportedbyatleasttwotechnologies.Reliableblocksexcludedregionswithstructuralassemblyerrors,suchascollapsedrepeatsorunresolvedsegmentalduplications.Low-supportregionsarethosewherethereliableblocksrowhasapeak.ExtendedDataFig.2VGPassemblypipelineappliedacrossmultiplespecies.a,Iterativeassemblypipelineofsequencedatatypes(colouredasinb)withincreasingchromosomaldistance.Thinbars,sequencereads;thickblackbars,assembledcontigs;blackbarswithspaceandarcinglinks,scaffolds;greybars,gapsplacedbyprevioussteps;thickredborder,trackingofanexamplecontiginthepipeline.Thecurationstepshowsanexampleofamis-assemblybreakidentifiedbysequencecoverage(grey,left)andanexampleofaninversionerror(right)detectedbytheopticalmap.b,Intra-moleculelengthdistributionofthefourdatatypesusedtogeneratetheassembliesof16vertebratespecies,weightedbythefractionofbasesineachlengthbin(logscaled).Moleculelengthabove1 kbwasmeasuredfromreadlengthforCLR,estimatedmoleculecoverageforlinkedreads,rawmoleculelengthforopticalmaps,andinteractiondistanceforHi-Creads.Foreachspecies,thefragmentlengthdistributionofeachdatatypewassimilartothosefortheAnna’shummingbird,withdifferencesprimarilyinfluencedbytissuetype,preservationmethod,andcollectionorstorageconditions(unpublisheddata).ExtendedDataFig.3Flowchartsofassemblypipelinesusedtogeneratehigh-qualityassembliesinthisstudy.a,StandardVGPassemblypipelinewhensequencingdataofoneindividual,thatgeneratedthehighestqualityassemblies:generateprimarypseudo-haplotypeandalternatehaplotypecontigswithCLRusingFALCON-Unzip17;generatescaffoldswithlinkedreadsusingScaff10x74;breakmis-joinsandfurtherscaffoldwithopticalmapsusingSolve87;generatechromosome-scalescaffoldswithHi-CreadsusingSalsa279;fillingapsandpolishbase-errorswithCLRusingArrow(PacificBioSciences);performtwoormoreroundsofshort-readpolishingwithlinkedreadsusingFreeBayes85;andperformexpertmanualcurationtocorrectpotentialassemblyerrorsusinggEVAL25,95b,StandardVGPtrioassemblypipelinewhenDNAisavailableforachildandparents20.Dashedlineindicatesthattheotherhaplotypewentthroughthesamestepsbeforecuration.Inadditiontothecuratedassembliesofbothhaplotypes,arepresentativehaplotypewithbothsexchromosomesissubmitted.c,Mitochondrialassemblypipeline.Figurekeyappliestoa–c.Stepsnewlyintroducedinv1.5–v1.6arehighlightedinlightblue.c,contigs;p,purgedfalseduplicationsfromprimarycontigs;q,purgedalternatecontigs;s,scaffolds;t,polishedscaffolds.Furtherdetailsandinstructionsareavailableelsewhere33andathttps://github.com/VGP/vgp-assembly.ExtendedDataFig.4Relationshipbetweencollapsesandgenomiccharacteristics.a,Correlationbetweenthetotalnumberofcollapsesandpercentagerepeatcontentestimatedinthesubmittedcuratedversionsofn=17genomesfrom16species.b,CorrelationbetweentotalnumberofbasesincollapsedregionsperGbandrepeatcontent.c,CorrelationbetweentotalmissingbasescollapsedperGbandrepeatcontent.d,Correlationbetweentotalnumberofgenes(codingandnon-coding)inthecollapsedregionsandrepeatcontent.e,Lackofcorrelationbetweentheaveragecollapsedsizeandrepeatcontent.f,Lackofcorrelationbetweenthetotalnumberofcollapsesandpercentageheterozygosity.g,LackofcorrelationbetweenthetotalnumberofcollapsesperGbandgenomesize.Genomesize,heterozygosity,andrepeatcontentwereestimatedfrom31-mercountsusingGenomeScope71.Reportedareadjustedr2andPvaluesfromF-statistics.h,CumulativecollapsedbasesperGbineachcollapseandpercentagerepeatmasked.Eachcircleiscolouredbyspecieswithitssizerelativetothelengthofthe collapseasitappearsintheassembly.Collapsesabovethehorizontalbar(>90%)arefurtherclassifiedascollapsedhigh-copyrepeats,andthosebelowthehorizontalbarareclassifiedassegmentalduplications(low-copyrepeats).i,Majorrepeattypesincollapsedhigh-copyrepeats.MostoftherepeatsweremaskedonlywithWindowMasker75,withnoannotationavailablebyRepeatMasker104.j,Minorrepeattypesincollapsedrepeats.Thisisabreakdownoftherepeatscategorizedas‘Others’ini,owingtothesmallerscale.Barcoloursiniandjareasinh.Notesmallerscaleinjcomparedwithi.Collapsedsatellitearrayswerealmostexclusivelyfoundintheplatypus,comprising~2.5Mb.Collapsedsimplerepeatswerethemajorsourceinthethornyskate(~400kb).TherewasahigherproportionofLTRsinbirds,LINEsandSINEsinmammals,andDNArepeatsintheamphibian.Amongthegenesinthecollapses,manywererepetitiveshortnon-codingRNAs.PvaluesfromF-statistics.ExtendedDataFig.5Falseduplicationmechanismsingenomeassembly.a,Falseheterotype(haplotype)duplicationsoccurswhenmoredivergentsequencereadsfromeachhaplotypeA(blue)andB(red)(maternalandpaternal)formgreaterdivergentpathsintheassemblygraph(bubbles),whilenearlyidenticalhomozygoussequences(black)becomecollapsed.Whentheassemblygraphisproperlyformedandcorrectlyresolved(greenarrow),oneofthehaplotype-specificpaths(redorblue)ischosenforbuildinga‘primary’pseudo-haplotypeassemblyandtheotherissetapartasan‘alternate’assembly.Whenthegraphisnotcorrectlyresolved(purplearrow),oneoffourtypesofpatternareformedinthecontigsandsubsequentscaffolds.Dependingonthesupportingevidence,thescaffoldereitherkeepsthesehaplotypecontigsonseparatescaffoldsorbringsthemtogetheronthesamescaffold,oftenseparatedbygaps:1.Separatecontigs:bothcontigsareretainedintheprimarycontigset,anerroroftenobservedwhenhaplotype-specificsequencesarehighlydiverged.2.Flankingcontigs:theassemblygraphispartiallyformed,connectingthehomozygoussequenceofthe5′sidetoonehaplotype(blue)andthe3′sidetotheotherhaplotype(red).3.Partialflankingcontigs:onlyonehaplotype(blue)flanksonesideofthehomozygoussequence.4.Failedconnectingofcontigs:allhaplotypesequencesfailtoproperlyconnecttoflankinghomozygoussequences.b,Falsehomotypeduplicationsoccurwhereasequencefromthesamegenomiclocusisduplicated,andareoftwotypes:1.Overlappingsequencesatcontigboundaries:incurrentoverlap-layout-consensusassemblers,branchingsequencesinassemblygraphsthatarenotselectedastheprimarypathhaveasmalloverlappingsequence(purple),dovetailingtotheprimarypathwhereitoriginatedabranch.Thesizeoftheduplicatedsequenceisoftenthelengthofacorrectedread.Subsequentscaffoldingresultsintandemduplicatedsequenceswithagapbetween.2.Under-collapsedsequences:sequencingerrorsinreads(redx)randomlyorsystematicallypileup,formingunder-collapsedsequences.Subsequentduplicationerrorsinthescaffoldingaresimilartotheheterotypeduplications.Purge_haplotigs13alignsequencestothemselvestofindasmallersequencethatalignsfullytoalargercontigorscaffold,andremovesheterotypeduplicationtypes1,3,and4.Purge_dups14additionallyusescoverageinformationtodetectheterotypeduplicationtype2andhomotypeduplications.Wedistinguishedthetwotypesofduplicationsby:1)haplotype-specificvariantsinreadsaligningathalfcoveragetoeachheterotypeduplication;2)differingconsensusqualitythatresultedfromreadcoveragefluctuationswhenaligningreadstohomotypeduplications;and3)k-mercopynumberanomaliesinwhichhomotypeduplicationswereobservedintheassemblywithmorethantheexpectednumberofcopies.ExtendedDataFig.6Falseduplicationexamplesfixedduringmanualcuration.a,Anexampleofaheterotypeduplicationinthefemalezebrafinch,non-trioassembly.Left,aself-dotplotofthisregiongeneratedwithGepard105,withsequencescolouredbyhaplotypes.Gaps,duplicatedsequences(greenandpurple),andhaplotype-specificmarkerdensitiesareindicatedatthetop.Right,adetailedalignmentviewofthegreenhaplotypeduplicationwithpaternalandmaternalmarkers,self-alignmentcomponents,transcriptsannotated,contigs,bionanomaps,andrepeatcomponentsdisplayedingEVAL95.b,Exampleofahomotypeduplicationfoundinthehummingbirdassembly.ThesewerecausedbyanalgorithmbuginFALCON,whichwaslaterfixed.c,Exampleofacombinedduplicationinvolvingbothheterotype(green)andhomotype(orange)duplications.Assemblygraphstructureisshownontheleftforclarity,highlightingtheoverlappingsitesatthecontigboundaryshadedfollowingtheduplicationtype.Assemblyerrorsincludingtheabovefalseduplicationsweredetectedandfixedduringthecurationprocess.ExtendedDataFig.7Evidenceofnear-completechromosomescaffoldsintheVGPassemblies.ShownareHi-Cinteractionheat mapsforeachspeciesaftercuration,visualizedwithPretextView106.Ascaffoldisconsideredaputativearm-to-armchromosomewhenallHi-Creadpairsinarowandcolumnmaptoasquare(thatis,anassembledchromosome)onthediagonalwithoutanyotherinteractionsoffthediagonal.Thosewithremainingoff-diagonalmatchestosmallerscaffoldsarenotlinkedbecauseofambiguousorderororientation,andareinsteadsubmittedas‘unlocalized’belongingtotherelevantchromosome.Bandsatthetopofeachheat mapshowscaffoldsidentifiedasX,Z(blue)orY,W(red)sexchromosomes.TheHi-CmapoffAstCal1isnotincludedaswehadnoremainingtissueleftoftheanimalusedtogenerateHi-Creads.ExtendedDataFig.8ComparisonofchromosomalorganizationbetweenpreviousandnewVGPassemblies.a,Zebrafinchmalecomparedtoapreviousreferenceassemblyofthesameanimal.b,Platypusmalecomparedwithapreviousreferencefemaleassembly(sotheYchromosomesareabsentinthepreviousreference).c,Hummingbirdfemalecomparedtoapreviousreferenceofthesameanimal.d,Climbingperchcomparedtoapreviousreference.EachrowrepresentsaVGP-generatedchromosomeforthetargetspecies.Coloursdepictidentitywiththereference(seekeytotheright);morethanonecolourindicatesreorganizationintheVGPassemblyrelativetothereference.Thelineswithineachblockdepictorientationrelativetothereference;apositiveslopeisthesameorientationasthereference,whereasanegativeslopeistheinverseorientation.Gapsarewhiteboxeswithnolines,inthereferencerelativetotheVGPassembly.AwhiteboxfortheentirechromosomemeansanewlyidentifiedchromosomeintheVGPassembly.Top20isthelongest20scaffoldsofthehummingbirdandclimbingperchassemblies.AccessionnumbersoftheassembliescomparedarelistedinSupplementaryTable19.ExtendedDataFig.9Haplotype-resolvedsexchromosomesandmitochondrialgenomes.a,Alignmentscatterplot,generatedwithMUMmerNUCmer107,visualizedwithdot108,ofmaternalandpaternalchromosomesfromthefemalezebrafinchtrio-basedassembly.Blue,sameorientation;red,inversion;orange,repeatsbetweenhaplotypes.ThepaternalZchromosomeishighlydivergentfromthematernalW,andthusmostlyunaligned.b,AlignmentscatterplotofassembledZandWchromosomesacrossthethreebirdspecies,approximatedwithMashMap286.Segmentsof300 kb(green),500 kb(blue),and1 Mb(purple)areshadeddarkerwithhighersequenceidentity,withaminimumof85%.ThesmallersizeandhigherrepeatcontentoftheWchromosomeareclearlyvisible.c,XandYchromosomesegmentsofthemammals(platypus,Canadalynx,palespear-nosedbat,andgreaterhorseshoebat)showingahigherdensityofrepeatswithinthemammalianXchromosomethantheavianZchromosome.d,VGPkākāpōmitochondrialgenomeassemblyrevealspreviouslymissingrepetitivesequences(adding2,232 bp)intheoriginofreplicationregion,containingan83-bprepeatunit.e,VGPclimbingperchmitochondrialgenomeassemblyshowingaduplicationoftrnL2andpartialduplicationofNad1,whichwereabsentfromthepriorreference.Orangearrowsandredlines,tRNAgenesandtheiralignments;darkgreyarrowsandgreyshading,allothergenesandtheiralignments;black,non-codingregions;greenline,conventionalstartingpointofthecircularsequence.ExtendedDataFig.10Largehaplotypeinversionswithdirectevidenceinthezebrafinchtrioassembly.a,Twoinversions(greenandred)inchromosome5foundfromtheMUMmerNUCmer107alignmentofthematernalandpaternalhaplotypeassemblies,visualizedwithdot108.b,Hi-Cinteractionplotshowingthatthetrio-binnedHi-Cdataremovemostoftheinteractionsfromtheotherhaplotype(redarrows),whichcouldbeerroneouslyclassifiedasamis-assemblyifonlyonehaplotypewasusedasareference.c,An8.5-Mbinversionfoundonchromosome11andacomplicated8.1-Mbrearrangementonchromosome13betweenmaternalandpaternalhaplotypes.d,Nomis-assemblysignalsweredetectedfromthebinnedHi-Cinteractionplots,indicatingthatthehaplotype-specificinversionsarereal.e,HalfthePacBioCLRspanandBionanoopticalmapsagreewiththeinversionbreakpointsinchromosome11,supportingthehaplotype-specificinversion.ExtendedDataFig.11Polishingartefacts.a,AnexampleofunevenmappingcoverageintheprimaryandalternatesequencepairoftheAnna’shummingbirdassembly.Inthisexample,thealternate(alt)sequencewasbuiltathigherquality,attractingalllinked-readsforpolishing.Thematchinglocusintheprimary(pri)assemblywasleftunpolished,resultinginframeshifterrorsintheTLK1gene.b,Haplotype-specificmarkers(redformaternal,blueforpaternal)anderrormarkersfoundintheassemblyontheZchromosome(inheritedfromthepaternalside)ofthetrio-binnedfemalezebrafinchassembly.Eachrowshowsmarkersbeforeshort-readpolishing,mappingallreadstobothhaplotypeassemblies,andpolishingbymappingpaternallybinnedreadstothepaternalassembly.PolishingimprovesQV,butintroduceshaplotypeswitcherrorswhenusingreadsfrombothhaplotypesasshowninrow2.Thiscanbeavoidedwhenusinghaplotypebinnedreadsforpolishing.c,Exampleofover-polishing.Thenuclearmitochondria(NuMT)sequencewastransformedasafullmitochondria(MT)sequenceduringlong-readpolishingowingtotheabsenceoftheMTcontig,wheretheNuMTattractedalllongreadsfromtheMT.Incomparison,thetrio-binnedassemblyhadtheMTsequenceassembledinplace,preventingmis-placingofMTreadsduringreadmapping.ExtendedDataFig.12Chromosomeevolutionamongthebatspeciessequenced.a,Genessurroundinganinversioninthegreaterhorseshoebat,relativetohumanchromosome15(redhighlight).TheSTARD5geneisdirectlydisruptedbythisinversion,whichseparatesexons1–5fromexon6inthegreaterhorseshoebat.b,RNA-seqtracksshowingthelackofRNAsplicingevidenceofSTARD5transcriptsinthegreaterhorseshoebat(bottom)incomparisontothepalespear-nosedbatwheretheSTARD5geneisnotdisrupted(top).c,Circosplotsofchromosomeorganizationrelationshipsbetweentheeachoftheanalysedbatsandsegmentsofthehumanchromosomes1,2,6and10.Redstar,breakpointlocationinhumanchromosome6,depictingthefissionoftheboreoeutherianchromosome5inthebatancestor;bluestar,theregionupstreamofthebreakpointinthebats;greenstar,theregiondownstreamofthebreakpointinthebats.Theredstarredbreakpointwasconfirmedasreused,asopposedtoassemblyerrors,inchromosomalrearrangementsofthepalespear-nosedbat,Egyptianfruitbat,andgreaterhorseshoebat.Thereisnoevidenceofreuseforthevelvetyfree-tailedbat.Wecouldnotconfirmbreakpointreuseinthegreatermouse-earedbatorKuhl’spipistrelleatthechromosomalscalebecausetheywereonsmallscaffoldsthatmaynotbecompletelyassembled.ExtendedDataTable1SummarymetricsofthecuratedandsubmittedvertebratespeciesassembliesFullsizetableExtendedDataTable2AnnotationsummarystatisticsinpreviousandnewlyassembledVGPreferencegenomesFullsizetableSupplementaryinformation SupplementaryInformationThisfilecontainsSupplementarytext,SupplementaryNotes1-7,SupplementaryFigures1-6andSupplementaryreferences.ReportingSummarySupplementaryTablesThisfilecontainsSupplementaryTables1-23.PeerReviewFileRightsandpermissions OpenAccessThisarticleislicensedunderaCreativeCommonsAttribution4.0InternationalLicense,whichpermitsuse,sharing,adaptation,distributionandreproductioninanymediumorformat,aslongasyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade.Theimagesorotherthirdpartymaterialinthisarticleareincludedinthearticle’sCreativeCommonslicense,unlessindicatedotherwiseinacreditlinetothematerial.Ifmaterialisnotincludedinthearticle’sCreativeCommonslicenseandyourintendeduseisnotpermittedbystatutoryregulationorexceedsthepermitteduse,youwillneedtoobtainpermissiondirectlyfromthecopyrightholder.Toviewacopyofthislicense,visithttp://creativecommons.org/licenses/by/4.0/. ReprintsandPermissionsAboutthisarticleCitethisarticleRhie,A.,McCarthy,S.A.,Fedrigo,O.etal.Towardscompleteanderror-freegenomeassembliesofallvertebratespecies. Nature592,737–746(2021).https://doi.org/10.1038/s41586-021-03451-0DownloadcitationReceived:22May2020Accepted:12March2021Published:28April2021IssueDate:29April2021DOI:https://doi.org/10.1038/s41586-021-03451-0SharethisarticleAnyoneyousharethefollowinglinkwithwillbeabletoreadthiscontent:GetshareablelinkSorry,ashareablelinkisnotcurrentlyavailableforthisarticle.Copytoclipboard ProvidedbytheSpringerNatureSharedItcontent-sharinginitiative Furtherreading GenomicinsightsintobodysizeevolutioninCarnivorasupportPeto’sparadox XinHuang DiSun GuangYang BMCGenomics(2021) Accuratelong-readdenovoassemblyevaluationwithInspector YuChen YixinZhang ZechenChong GenomeBiology(2021) Telomere-to-telomereassemblyofafishYchromosomerevealstheoriginofayoungsexchromosomepair LingzhanXue YuGao LuohaoXu GenomeBiology(2021) LeafGo:LeaftoGenome,aquickworkflowtoproducehigh-qualitydenovoplantgenomesusinglong-readsequencingtechnology PatrickDriguez SalimBougouffa LucaErmini GenomeBiology(2021) Assemblingvertebrategenomes KatharineH.Wrighton NatureReviewsGenetics(2021) CommentsBysubmittingacommentyouagreetoabidebyourTermsandCommunityGuidelines.Ifyoufindsomethingabusiveorthatdoesnotcomplywithourtermsorguidelinespleaseflagitasinappropriate. DownloadPDF AssociatedContent Special VertebrateGenomesProject Advertisement Explorecontent Researcharticles News Opinion ResearchAnalysis Careers Books&Culture Podcasts Videos Currentissue Browseissues Collections Subjects FollowusonFacebook FollowusonTwitter Signupforalerts RSSfeed Aboutthejournal JournalStaff AbouttheEditors JournalInformation Ourpublishingmodels EditorialValuesStatement Awards JournalImpact Contact Editorialpolicies HistoryofNature Sendanewstip Publishwithus ForAuthors ForReferees Submitmanuscript Search Searcharticlesbysubject,keywordorauthor Showresultsfrom Alljournals Thisjournal Search Advancedsearch Quicklinks Explorearticlesbysubject Findajob Guidetoauthors Editorialpolicies



請為這篇文章評分?