Towards complete and error-free genome assemblies of all
文章推薦指數: 80 %
The Vertebrate Genome Project has used an optimized pipeline to ... 1 kb) ushered in more affordable and scalable genome sequencing. Skiptomaincontent Thankyouforvisitingnature.com.YouareusingabrowserversionwithlimitedsupportforCSS.Toobtain thebestexperience,werecommendyouuseamoreuptodatebrowser(orturnoffcompatibilitymodein InternetExplorer).Inthemeantime,toensurecontinuedsupport,wearedisplayingthesitewithoutstyles andJavaScript. Advertisement nature articles article Towardscompleteanderror-freegenomeassembliesofallvertebratespecies DownloadPDF Subjects EvolutionarygeneticsGenomeassemblyalgorithmsMolecularevolutionResearchdata AbstractHigh-qualityandcompletereferencegenomeassembliesarefundamentalfortheapplicationofgenomicstobiology,disease,andbiodiversityconservation.However,suchassembliesareavailableforonlyafewnon-microbialspecies1,2,3,4.Toaddressthisissue,theinternationalGenome10K(G10K)consortium5,6hasworkedoverafive-yearperiodtoevaluateanddevelopcost-effectivemethodsforassemblinghighlyaccurateandnearlycompletereferencegenomes.Herewepresentlessonslearnedfromgeneratingassembliesfor16speciesthatrepresentsixmajorvertebratelineages.Weconfirmthatlong-readsequencingtechnologiesareessentialformaximizinggenomequality,andthatunresolvedcomplexrepeatsandhaplotypeheterozygosityaremajorsourcesofassemblyerrorwhennothandledcorrectly.Ourassembliescorrectsubstantialerrors,addmissingsequenceinsomeofthebesthistoricalreferencegenomes,andrevealbiologicaldiscoveries.Theseincludetheidentificationofmanyfalsegeneduplications,increasesingenesizes,chromosomerearrangementsthatarespecifictolineages,arepeatedindependentchromosomebreakpointinbatgenomes,andacanonicalGC-richpatterninprotein-codinggenesandtheirregulatoryregions.Adoptingtheselessons,wehaveembarkedontheVertebrateGenomesProject(VGP),aninternationalefforttogeneratehigh-quality,completereferencegenomesforalloftheroughly70,000extantvertebratespeciesandtohelptoenableaneweraofdiscoveryacrossthelifesciences. DownloadPDF MainChromosome-levelreferencegenomesunderpinthestudyoffunctional,comparative,andpopulationgenomicswithinandacrossspecies.Thefirsthigh-qualitygenomeassembliesofhuman1andothermodelspecies(forexample,Caenorhabditiselegans2,mouse3,andzebrafish4)wereputtogetherusing500–1,000-basepair(bp)Sangersequencingreadsofthousandsofhierarchicallyorganizedcloneswith200–300-kilobase(kb)inserts,andchromosomegeneticmaps.Thisapproachrequiredtremendousmanualeffort,softwareengineering,andcost,indecade-longprojects.Whole-genomeshotgunapproachessimplifiedthelogistics(forexample,inhuman7andDrosophila8),andlaternext-generationsequencingwithshorter(30–150-bp)sequencingreadsandshortinsertsizes(forexample,1 kb)usheredinmoreaffordableandscalablegenomesequencing9.However,theshorterreadsresultedinlower-qualityassemblies,fragmentedintothousandsofpieces,wheremanygenesweremissing,truncated,orincorrectlyassembled,resultinginannotationandothererrors10.Sucherrorscanrequiremonthsofmanualefforttocorrectindividualgenesandyearstocorrectanentireassembly.Genomicheterozygosityposedadditionalproblems,becausehomologoushaplotypesinadiploidorpolyploidgenomeareforcedtogetherintoasingleconsensusbystandardassemblers,sometimescreatingfalsegeneduplications11,12,13,14.Toaddresstheseproblems,theG10Kconsortium5,6initiatedtheVertebrateGenomesProject(VGP;https://vertebrategenomesproject.org)withtheultimateaimofproducingatleastonehigh-quality,nearerror-freeandgapless,chromosome-level,haplotype-phased,andannotatedreferencegenomeassemblyforeachofthe71,657extantnamedvertebratespeciesandusingthesegenomestoaddressfundamentalquestionsinbiology,disease,andbiodiversityconservation.Towardsthisend,havinglearnedthelessonsofhavingtoomanyvariables thatmakeconclusionsmoredifficulttoreachintheG10KfromtheG10KAssemblathon2effort15,wefirstevaluatedmultiplegenomesequencingandassemblyapproachesextensivelyononespecies,theAnna’shummingbird(Calypteanna).Wethendeployedthebest-performingmethodacrosssixteenspeciesrepresentingsixmajorvertebrateclasses,withawidediversityofgenomiccharacteristics.Drawingontheprincipleslearned,weimprovedthesemethodsfurther,discoveredparametersandapproachesthatworkbetterforspecieswithdifferentgenomiccharacteristics,andmadebiologicaldiscoveriesthathadnotbeenpossiblewiththepreviousassemblies.Complete,accurateassembliesrequirelongreadsWechoseafemaleAnna’shummingbirdbecauseithasarelativelysmallgenome(about1 Gb),isheterogametic(hasbothZandWsexchromosomes),andhasanannotatedreferenceofthesameindividualbuiltfromshortreads16.Weobtained12newsequencingdatatypes,includingbothshortandlongreads(80 bpto100 kb),andlong-rangelinkinginformation(40 kbtomorethan 100Mb),generatedusingeighttechnologies(SupplementaryTable1).Webenchmarkedalltechnologiesandassemblyalgorithms(SupplementaryTable2)inisolationandinmanycombinations(SupplementaryTable3).Toourknowledge,thiswasthefirstsystematicanalysisofmanysequencetechnologies,assemblyalgorithms,andassemblyparametersappliedonthesameindividual.Wefoundthatprimarycontiguoussequences(contigs)(pseudo-haplotype;SupplementaryNote 1)assembledfromPacificBiosciencescontinuouslongreads(CLR)orOxfordNanoporelongreads(ONT)wereapproximately30-to300-foldlongerthanthoseassembledfromIlluminashortreads(SR),regardlessofdatatypecombinationorassemblyalgorithmused(Fig.1a,SupplementaryTable3).ThehighestcontigNG50sforshort-read-onlyassemblieswereabout0.025to0.169 Mb,whereasforlongreadstheywereabout4.6to7.66 Mb(Fig.1a);contigNG50isanassemblymetricbasedonaweightedmedianofthelengthsofitsgaplesssequencesrelativetotheestimatedgenomesize.AfterfixingafunctioninthePacBioFALCONsoftware17thatcausedartificialbreaksincontigsbetweenstretchesofhighlyhomozygousandheterozygoushaplotypesequences(SupplementaryNote 1,SupplementaryTable2),contigNG50nearlytripledto12.77 Mb(Fig.1a).Thesefindingsareconsistentwiththeoreticalpredictions18anddemonstratethat,givencurrentsequencingtechnologyandassemblyalgorithms,itisnotpossibletoachievehighcontigcontinuitywithshortreadsalone,asitistypicallyimpossibletobridgethroughrepeatsthatarelongerthanthereadlength.Fig.1:ComparativeanalysesofAnna’shummingbirdgenomeassemblieswithvariousdatatypes.a,ContigNG50valuesoftheprimarypseudo-haplotype.b,ScaffoldNG50values.c,Numberofjoins(gaps).d,Numberofmis-joinerrorscomparedwiththecuratedassembly.Thecuratedassemblyhasnoremainingconflictswiththerawdataandthusnoknownmis-joins.*SameasCLR + linked + Opt. + Hi-C,butwithcontigsgeneratedwithanupdatedFALCON17versionandearlierHi-CSalsaversion(v2.0versusv2.2;SupplementaryTable2)forlessaggressivecontigjoining.e,f,Hi-Cinteractionheat mapsbeforeandaftermanualcuration,whichidentified34chromosomes.Gridlinesindicatescaffoldboundaries.Redarrow,examplemis-jointhatwascorrectedduringcuration.g,Karyotypeoftheidentifiedchromosomes(n = 36 + ZW),consistentwithpreviousfindings70.h,Correlationbetweenestimatedchromosomesizes(inMb)basedonkaryotypeimagesingandassembledscaffoldsinSupplementaryTable4(bCalAna1)onalog–logscale.v1.0,VGPassemblyv1.0pipeline;linked,10XGenomicslinkedreads;Hi-C,Hi-Cproximityligation;1D,2D,OxfordNanoporelongreads;NRGene,NRGenepaired-endIlluminareads;SR,paired-endIlluminashortreads.FullsizeimageIterativeassemblypipelineScaffoldsgeneratedwithallthreescaffoldingtechnologies(thatis,10XGenomicslinkedreads(10XG),Bionanoopticalmaps(Opt.),andArimaGenomics,DovetailGenomics,orPhaseGenomicsHi-C)wereapproximately50%to150%longerthanthosegeneratedusingoneortwotechnologies,regardlessofwhetherwestartedwithshort-orlong-read-basedcontigs(Fig.1b,ExtendedDataFig.1a,SupplementaryTable3).Thesefindingsincludeimprovementswemadetoeachapproach(SupplementaryNote 1,SupplementaryTables4,5,SupplementaryFig.1).Despitesimilarscaffoldcontinuity,theshort-read-onlyassemblieshadfromabout18,000toabout70,000gaps,whereasthelong-readassemblieshadsubstantiallyfewer(about400toabout4,000)gaps(Fig.1c).Manygapsintheshort-readassemblieswereinrepeatorGC-richregions.Consideringthecuratedversionofthisassemblytobemoreaccurate,wealsoidentifiedroughly5,000to8,000mis-joinsinshort-read-basedassemblies,whereaslong-read-basedassemblieshadonlyfrom20toaround700mis-joins(Fig.1d).Thesemis-joinsincludedchimericjoinsandinversions.Afterwecuratedthisassemblyforcontamination,assemblyerrors,andHi-C-basedchromosomeassignments(Fig.1e,f),thefinalhummingbirdassemblyhad33scaffoldsthatcloselymatchedthechromosomekaryotypeinnumber(33of36autosomesplussexchromosomes)andestimatedsizes(approximately2to200 Mb;Fig.1g,h),withonly1to30gapsperautosome(bCalAnn1inSupplementaryTable6).Ofthefiveautosomeswithonlyonegapeach,three(chromosomes14,15,and19)hadcompletespanningsupportbyatleasttwotechnologies(reliableblocks,ExtendedDataFig.1c;bCalAnn1inSupplementaryTable6),indicatingthatthechromosomecontigswerenearlycomplete.However,theyweremissinglongarraysofvertebratetelomererepeatswithin1 kboftheirends(ExtendedDataFig.1c;bCalAnn1inSupplementaryTables6,7).AssemblypipelineacrossvertebratediversityUsingtheformulathatgavethehighest-qualityhummingbirdgenome,webuiltaniterativeVGPassemblypipeline(v1.0)withhaplotype-separatedCLRcontigs,followedbyscaffoldingwithlinkedreads,opticalmaps,andHi-C,andthengapfilling,basecallpolishing,andfinallymanualcuration(ExtendedDataFigs.2a,3a).Wesystematicallytestedourpipelineon15additionalspeciesspanningallmajorvertebrateclasses:mammals,birds,non-avianreptiles,amphibians,teleostfishes,andacartilaginousfish(SupplementaryTables8,9,SupplementaryNote 2).Forthezebrafinch,weusedDNAfromthesamemaleaswasusedtogeneratethepreviousreferencegenome19,andincludedafemaletrioforbenchmarkinghaplotypecompleteness,wheresequencedreadsfromtheparentswereusedtobinparentalhaplotypereadsfromtheoffspringbeforeassembly20(ExtendedDataFigs.2a,3b).Wesetinitialminimumassemblymetricgoalsof:1 MbcontigNG50;10 MbscaffoldNG50;assigning90%ofthesequencetochromosomes,structurallyvalidatedbyatleasttwoindependentlinesofevidence;Q40averagebasequality;andhaplotypesassembledascompletelyandcorrectlyaspossible.Whenthesemetricswereachieved,mostgeneswereassembledwithgaplessexonandintronstructures11,andfewerthan3%hadframe-shiftbaseerrorsidentifiedinannotation.Q40isthemathematicalinflectionpointatwhichgenesgofromusuallycontaininganerrortousuallynot21.Ofthecuratedassemblies(SupplementaryTable10,SupplementaryNote 2),16of17achievedthedesiredcontinuitymetrics(ExtendedDataTable1).ScaffoldNG50wassignificantlycorrelatedwithgenomesize(Fig.2a),suggestingthatlargergenomestendtohavelargerchromosomes.Onaverage,98.3%oftheassembledbaseshadreliableblockNG50srangingfrom2.3to40.2 Mb;collapsedrepeatbases22withabnormallyhighCLRreadcoverage(morethan3s.d.)rangedfrom0.7to31.4 MbperGb;andthecompletenessofthegenomeassembliesrangedfrom87.2to98.1%,withlessthan4.9%falselyduplicatedregions,consistentwiththefalseduplicationratewefoundfortheconservedBUSCOvertebrategeneset(ExtendedDataTable1,SupplementaryTables11,12).Fig.2:Impactofrepeatsandheterozygosityonassemblyquality.a,CorrelationbetweenscaffoldNG50andgenomesizeofthecuratedassemblies.b,NonlinearcorrelationbetweencontigNG50andrepeatcontent,beforeandaftercuration.c,CorrelationbetweennumberofgapsperGbassembledandrepeatcontent.d,Correlationbetweenprimaryassemblysizerelativetoestimatedgenomesize(y axis)andgenomeheterozygosity(x axis),beforeandafterpurgingoffalseduplications.Assemblysizesabove100%indicatethepresenceoffalseduplicationsandthosebelow100%indicatecollapsedrepeats.e,f,Correlationsbetweengenomeduplicationrate usingk-mers23(e)andconservedBUSCOvertebrategeneset(f),andgenomeheterozygositybeforeandafterpurgingoffalseduplications.g,h,Asine,f,butwithwhole-genomerepeatcontentbeforeandafterpurgingoffalseduplications.Genomesize,heterozygosity,andrepeatcontentwereestimatedfrom31-mercountsusingGenomeScope71,exceptforthechannelbullblenny,astheestimateswereunreliable(see Methods).Repeatcontentwasmeasuredbymodellingthek-mermultiplicityfromsequencingreads.SequenceduplicationrateswereestimatedwithMerqury23using21-mers.*P 1&&(GT=’’AA’’||GT = ‘’Aa’’)’-Hla.VGPTrioPipelinev1.0–v1.6Thetriopipelineissimilarlydesignedtothestandardpipeline,exceptfortheuseofparentaldata(ExtendedDataFig.3b).Whenparentalgenomesareavailable,thechild’sCLRreadsarebinnedtomaternalandpaternalhaplotypes,andassembledseparatelyashaplotype-specificcontigs(haplotigs)usingTrioCanu20.Inbrief,parentalspecificmarkerk-merswerecollectedusingMeryl23fromtheparentalIlluminaWGSreadsoftheparents.Thesemarkerswerefilteredandusedtobinthechild’sCLRread.Ahaplotypewasassignedgiventhemarkersobserved,normalizedbythetotalmarkersineachhaplotype.Thesubsequentpurging,scaffolding,andpolishingstepsweresimilarlyupdatedwiththeuseofPurge_Dups14(v1.6).WeextendedbinningtolinkedreadsandHi-Creads,byexcludingreadpairsthathadanyparental-specificmarker.ThebinnedHi-Creadswereusedtoscaffolditshaplotypeassembly,andpolishedwiththebinnedlinkedreadsfromtheobservationofhaplotypeswitchingusingthestandardpolishingapproach.Duringcuration,oneofthehaplotypeassemblieswiththehigherQVand/orcontiguitywaschosenastherepresentativehaplotype.Theheterogameticsexchromosomefromtheunchosenhaplotypewasaddedtotherepresentativeassembly.However,whilecuratingseveraltrios,wefoundthatinregionsoflowdivergencebetweensharedparentalhomogameticsexchromosomes(thatis,XorZ),asmallfractionofoffspringCLRdatawasmis-assignedtothewronghaplotype.Thismis-alignmentresultedinaduplicate,low-coverageoffspringXorZassemblyinthepaternal(formammals)ormaternal(forbirds)haplotype,respectively,whichrequiredremovalduringcuration.Weareworkingonmethodstoimprovethebinningaccuracyforresolutionofthisissuegoingforward.Forthefemalezebrafinchinparticular,contigsweregeneratedbeforethebinningwasautomatedintheCanuassemblerasTrioCanu1.7,andthereforeamanualbinningprocesswasappliedasdescribedintheoriginalTrio-binningpaper20(SupplementaryMethods).Contigswereassembledforeachhaplotypeusingthebinnedreads,excludingunclassifiedreads.ThecontigswerepolishedwithtworoundsofArrowpolishingusingthebinnedreads,andscaffoldedfollowingthev1.0pipelinewithnopurging.AdditionalscaffoldingroundswithBionano(s4)andHi-Cwereapplied.Scaffoldswererenamedaccordingtotheprimaryscaffoldassemblyofthesameindividual(s5),withsexchromosomesgroupedasZinthepaternalassemblyandWinthematernalassemblyfollowingsyntenytotheZchromosomefromthecuratedmalezebrafinchVGPassembly.TworoundsofSRpolishingwereappliedusinglinkedreads,bymappingonbothhaplotypes.Afterhaplotypeswitcheswerediscovered,additionalroundsofpolishingwereappliedusingbinnedlinkedreads(SupplementaryMethods).MitochondrialgenomeassemblySimilartootherrecentmethods93,94,wedevelopedareference-guidedMTassemblypipeline.MTreadsintherawCLRdatawereidentifiedbymappingthewholereadsettoanexistingreferencesequenceofthespecificspeciesorofcloselyrelatedspeciesusingBlasr.FilteredmtDNACLRswereassembledintoasinglecontigusingCanuv1.8,polishedwithArrowusingCLRandthenFreeBayesv1.0.2togetherwithbcftoolsv1.9usingshortreadsfromthe10XGdata(ExtendedDataFig.3c).Theoverlappingsequencesattheendsofthecontigweretrimmed,andtheremainingcontigsequencecircularized.ThemitoVGPpipelineismadeavailableathttps://github.com/VGP/vgp-assembly/tree/master/mitoVGP.AmoredetailedprotocoldescriptionoftheassemblypipelineandnewdiscoveriesfromtheMTassembliesarepublishedelsewhere33.CurationTheVGPgenomeassemblypipelineproduceshighqualityassemblies,yetnoautomatedmethodtodateisfreefromtheproductionoferrors,especiallyduringthescaffoldingstages.Tominimizetheimpactoftheremainingalgorithmicshortcomings,wesubjectedallassembliestorigorousmanualcuration.Alldatageneratedforaspeciesinthisstudyandotherpubliclyavailabledata(forexample,geneticmaps,genesetsandgenomeassembliesofthesameorcloselyrelatedspecies)werecollated,alignedtotheprimaryassemblyandanalysedingEVAL95(https://vgp-geval.sanger.ac.uk/index.html),visualizingdiscordancesinafeaturebrowserandissuelists.Inparallel,Hi-CdataweremappedtotheprimaryassemblyandvisualizedusingJuicebox96and/orHiGlass97.Withthesedata,genomecuratorsidentifiedmis-joins,missedjoinsandotheranomalies,andcorrectedtheprimaryassemblyaccordingly.Nochangewasmadewithoutunambiguousevidencefromavailabledatatypes;forexample,aHi-CsuggestedjoinwouldnotbemadeunlesssupportedbyBioNanomaps,long-readdata,orgenealignments.Whensequencingtheheterogameticsex,weidentifiedsexchromosomesbasedonhalfcoverage,homologyalignmentstosexchromosomesinotherspecies,andthepresenceofsexchromosome-specificgenes.ContaminationremovalAsuccessionofsearcheswasusedtoidentifypotentialcontaminantsinthegeneratedassemblies.1)AmegaBLAST98searchagainstadatabaseofcommoncontaminants(ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/contam_in_euks.fa.gz)requiringe ≤ 1 × 10−4,reportingmatches with≥98%sequenceidentityandmatchlength50–99bp, ≥94%andmatchlength100–199bp,or ≥90%andmatchlength200bporabove.2)Avecscreen(https://www.ncbi.nlm.nih.gov/tools/vecscreen/)searchagainstadatabaseofadaptorsequences(ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/adaptors_for_screening_euks.fa)3)Aftersoft-maskingrepeatsusingWindowmasker75,amegaBLASTsearchagainstchromosome-levelassembliesfromRefSeqrequiringe ≤ 1 × 10−4,matchscore≥100,andsequenceidentity ≥98%;regionsmatchinghighlyconservedrDNAswereignored.Manualinspectionoftheresultswasnecessarytodifferentiatecontaminationfromconservationand/orhorizontalgenetransfer.Adaptorsequencesweremasked;othercontaminantsequenceswereremoved.AssemblieswerealsocheckedforrunsofNsattheendsofscaffolds,createdasartefactsoftheiterativescaffoldingprocess,andwhenfoundtheyweretrimmed.OrganellegenomesTheseweredetectedbyamegaBLASTsearchagainstadatabaseofknownorganellegenomesrequiringe ≤ 1 × 10−4,sequenceidentity ≥90%,andmatchlength ≥500;thedatabasesareavailableatftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/mito.nt.gzandftp://ftp.ncbi.nlm.nih.gov/refseq/release/plastid/*genomic.fna.gz.Onlyscaffoldsconsistingentirelyoforganellesequenceswereassumedtobeorganellegenomes,andreplacedbythegenomefromtheseparateorganelleassemblypipeline.OrganellematchesembeddedinnuclearsequencesthatwerefoundtobeNuMTswerekept.FalseduplicationremovalRetainedfalseduplicationswereidentifiedusingPurge_Haplotigs13runeitherafterscaffoldingandpolishing(Anna’shummingbird,kākāpō,malezebrafinch,femalezebrafinch,platypus,palespear-nosedbat,andgreaterhorseshoebat)oronthec1beforescaffolding(two-linedcaecilian,fliercichlid,Canadalynx,andGoode’sthornscrubtortoise).SubsequentmanualcurationidentifiedadditionalhaplotypicduplicationsforthelistedassembliesandalsothosethatwerenottreatedwithPurge_Haplotigs(Easternhappy,climbingperch,zig-zageel).Theevidenceusedincludedreadcoverage,sequenceself-comparison,transcriptalignments,BionanomapalignmentsandHi-C2Dmaps,allconfirmingthesuperfluousnatureofoneallele.Theidentifiedhaplotypeduplicationsweremovedfromtheprimarytothealternateassembly.ChromosomeassignmentForascaffoldtobeannotatedasachromosome,weusedevidencefromHi-CaswellasgeneticlinkageorFISHkaryotypemappingwhenavailable.ForHi-Cevidence,weconsideredascaffoldasacompletechromosome(albeitwithgaps)whentherewasaclearunbrokendiagonalintheJuiceboxorHiGlassplotsforthatscaffoldandnootherlargescaffoldsthatcouldbejoinedtothatsamescaffold;ifpresentandnounambiguousjoinwaspossible,wenameditasanunlocalizedscaffoldforthatchromosome.Whenwecouldnotfindevidenceofacompletechromosome,wekeptthescaffoldnumberforitsname.Wenamedallevidence-validatedscaffoldsaschromosomesdowntothesmallestHi-Cboxunitresolutionallowedwiththesecharacteristics.Whentherewasanestablishedchromosometerminologyforagivenspeciesorsetofspecies,weusetheestablishedterminologyexceptwhenournewassembliesrevealederrorsintheolderassembly,suchasscaffold/chromosomefusions,fissions,rearrangements,andnon-chromosomenames.Forspecieswithoutanestablishedchromosometerminology,wenamedthescaffoldsaschromosomesnumbers1,2,3…,indescendingorderofscaffoldsize.Forthesexchromosomes,weusedthelettersXandYformammalsandZandWforbirds.UsingcomparativegenomicstoassessassemblystructureIncaseswhereahigh-qualitychromosome-levelgenomewasavailableforacloselyrelatedspecies,comparativegenomeanalysiswasperformed.Thepolishedprimaryassembly(t3.p)wasmappedtotherelatedgenomeusingMashMap286with--pi75-s300000.Thenumberofchromosomaldifferenceswasidentifiedusingacustomscriptavailableathttps://github.com/jdamas13/assembly_comparison.Thisresultedintheidentificationof~60to~450regionsforeachgenomeassemblyflankingputativemisassembliesorlineage-specificgenomerearrangements.Toidentifywhichwererealmisassemblies,theidentifieddiscrepancieswerecommunicatedtothecurationteamformanualverification(seeabove).Toidentifyanypossibleremainingmis-joins,eachcuratedavianandmammalianassemblywascomparedwiththezebrafinch(taeGut2)orhuman(hg38)genomes,respectively.PairwisealignmentsbetweeneachoftheVGPassembliesandthecladereferenceweregeneratedwithLastZ99(version1.04)usingthefollowingparameters:C = 0E = 30H = 2000K = 3000L = 2200O = 400.ThepairwisealignmentswereconvertedintotheUCSC‘chain’and‘net’formatswithaxtChain(parameters:-minScore = 1000-verbose = 0-linearGap = medium)followedbychainAntiRepeat,chainSort,chainPreNet,chainNetandnetSyntenic,allwithdefaultparameters100.Pairwisesyntenyblocksweredefinedusingmaf2synteny101at100-,300-,and500-kbresolutions.Evolutionarybreakpointregionsweredetectedandclassifiedusinganadhocstatisticalapproach102.Thisanalysisidentified2to90genomicregionsperassemblythatcouldbeflankingmisassemblies,lineage-specificchromosomerearrangements,orreference-specificchromosomerearrangements(116inthehumanand26inthezebrafinch).Determiningtheunderlyingcauseforeachoftheflaggedregionswillneedfurtherverification.AllalignmentsareavailableforvisualizationattheEvolutionHighwaycomparativechromosomebrowser(http://eh-demo.ncsa.illinois.edu/vgp/).AnnotationNCBIandEnsemblannotationpipelineusedinthisstudyaredescribedinthe SupplementaryMethods.EvaluationDetailedmethodsforothertypesofevaluation,includingBUSCOruns,mis-joinandmissed-joinidentification,reliableblocks,collapsedrepeats,telomeres,RNA-seqandATAC–seqmapping,andfalsegeneduplicationsareinthe SupplementaryMethods.Nostatisticalmethodswereusedtopredeterminesamplesize,theexperimentswerenotrandomized,andtheinvestigatorswerenotblindedtogroupduringexperimentsandoutcomeassessment.ReportingsummaryFurtherinformationonresearchdesignisavailableinthe NatureResearchReportingSummarylinkedtothispaper. Dataavailability Allrawdata,intermediateandfinalassembliesarepubliclyavailableviaGenomeArk(https://vgp.github.io/genomeark),archivedonNCBI/EBIBioProjectunderaccessionPRJNA489243withannotations,andbrowsableontheUCSCGenomeBrowser(https://hgdownload.soe.ucsc.edu/hubs/VGP/).ThefinalprimaryassemblyfromtheautomatedpipelinebeforecurationisbrowsableongEVAL(https://vgp-geval.sanger.ac.uk)withallfourrawdatamappings.TheVGPassemblypipelineisavailableasastand-alonepipeline(https://github.com/VGP/vgp-assembly)aswellasaworkflowonDNAnexus(https://platform.dnanexus.com/).AVGP-specificassemblyhubportalintheU.C.SantaCruzbrowserisavailableasagatewaytoaccessallVGPgenomeassembliesandannotations(https://hgdownload.soe.ucsc.edu/hubs/VGP). Codeavailability AllcodesusedintheVGPAssemblyPipelineandtheVGPTrioPipelinearepubliclyavailableathttps://github.com/VGP/vgp-assembly/tree/master/pipeline. References1.InternationalHumanGenomeSequencingConsortium.Initialsequencingandanalysisofthehumangenome.Nature409,860–921(2001).ADS GoogleScholar 2.Sulston,J.etal.TheC.elegansgenomesequencingproject:abeginning.Nature356,37–41(1992).PubMed ADS GoogleScholar 3.MouseGenomeSequencingConsortium.Initialsequencingandcomparativeanalysisofthemousegenome.Nature420,520–562(2002). GoogleScholar 4.Howe,K.etal.Thezebrafishreferencegenomesequenceanditsrelationshiptothehumangenome.Nature496,498–503(2013).CAS PubMed PubMedCentral ADS GoogleScholar 5.Genome10KCommunityofScientists.Genome10K:aproposaltoobtainwhole-genomesequencefor10,000vertebratespecies.J.Hered.100,659–674(2009).PubMedCentral GoogleScholar 6.Koepfli,K.-P.,Paten,B.,theGenome10KCommunityofScientists&O’Brien,S.J.TheGenome10KProject:awayforward.Annu.Rev.Anim.Biosci.3,57–111(2015).CAS PubMed PubMedCentral GoogleScholar 7.Venter,J.C.etal.Thesequenceofthehumangenome.Science291,1304–1351(2001).CAS PubMed ADS GoogleScholar 8.Adams,M.D.etal.ThegenomesequenceofDrosophilamelanogaster.Science287,2185–2195(2000).PubMed GoogleScholar 9.Shendure,J.&Ji,H.Next-generationDNAsequencing.Nat.Biotechnol.26,1135–1145(2008).CAS PubMed GoogleScholar 10.Yin,Z.-T.etal.Revisitingavian‘missing’genesfromdenovoassembledtranscripts.BMCGenomics20,4(2019).PubMed PubMedCentral GoogleScholar 11.Korlach,J.etal.DenovoPacBiolong-readandphasedaviangenomeassembliescorrectandaddtoreferencegenesgeneratedwithintermediateandshortreads.Gigascience6,1–16(2017).CAS PubMed PubMedCentral GoogleScholar 12.Kelley,D.R.&Salzberg,S.L.Detectionandcorrectionoffalsesegmentalduplicationscausedbygenomemis-assembly.GenomeBiol.11,R28(2010).PubMed PubMedCentral GoogleScholar 13.Roach,M.J.,Schmidt,S.A.&Borneman,A.R.PurgeHaplotigs:alleliccontigreassignmentforthird-gendiploidgenomeassemblies.BMCBioinformatics19,460(2018).CAS PubMed PubMedCentral GoogleScholar 14.Guan,D.etal.Identifyingandremovinghaplotypicduplicationinprimarygenomeassemblies.Bioinformatics36,2896–2898(2020).CAS PubMed PubMedCentral GoogleScholar 15.Bradnam,K.R.etal.Assemblathon2:evaluatingdenovomethodsofgenomeassemblyinthreevertebratespecies.Gigascience2,10(2013).PubMed PubMedCentral GoogleScholar 16.Zhang,G.etal.Comparativegenomicsrevealsinsightsintoaviangenomeevolutionandadaptation.Science346,1311–1320(2014).CAS PubMed PubMedCentral ADS GoogleScholar 17.Chin,C.-S.etal.Phaseddiploidgenomeassemblywithsingle-moleculereal-timesequencing.Nat.Methods13,1050–1054(2016).CAS PubMed PubMedCentral GoogleScholar 18.Bresler,G.,Bresler,M.&Tse,D.Optimalassemblyforhighthroughputshotgunsequencing.BMCBioinformatics14(Suppl.5),S18(2013).PubMed PubMedCentral GoogleScholar 19.Warren,W.C.etal.Thegenomeofasongbird.Nature464,757–762(2010).CAS PubMed PubMedCentral ADS GoogleScholar 20.Koren,S.etal.Denovoassemblyofhaplotype-resolvedgenomeswithtriobinning.Nat.Biotechnol.(2018).21.Koren,S.,Phillippy,A.M.,Simpson,J.T.,Loman,N.J.&Loose,M.Replyto‘Errorsinlong-readassembliescancriticallyaffectproteinprediction’.Nat.Biotechnol.37,127–128(2019).CAS PubMed GoogleScholar 22.Vollger,M.R.etal.Long-readsequenceandassemblyofsegmentalduplications.Nat.Methods16,88–94(2019).CAS PubMed GoogleScholar 23.Rhie,A.,Walenz,B.P.,Koren,S.&Phillippy,A.M.Merqury:reference-freequality,completeness,andphasingassessmentforgenomeassemblies.GenomeBiol.21,245(2020).CAS PubMed PubMedCentral GoogleScholar 24.Waterhouse,R.M.etal.BUSCOapplicationsfromqualityassessmentstogenepredictionandphylogenomics.Mol.Biol.Evol.35,543–548(2018).CAS PubMed GoogleScholar 25.Howe,K.etal.Significantlyimprovingthequalityofgenomeassembliesthroughcuration.Gigascience10,giaa153(2021).PubMed PubMedCentral GoogleScholar 26.Zhou,Y.etal.Platypusandechidnagenomesrevealmammalianbiologyandevolution.Naturehttps://doi.org/10.1038/s41586-020-03039-0(2021).27.Kim,J.etal.Falsegeneandchromosomelossesaffectedbyassemblyandsequenceerrors.Preprintathttps://doi.org/10.1101/2021.04.09.438906(2021).28.Lewin,H.A.,Graves,J.A.M.,Ryder,O.A.,Graphodatsky,A.S.&O’Brien,S.J.Precisionnomenclatureforthenewgenomics.Gigascience8,giz086(2019).PubMed PubMedCentral GoogleScholar 29.Kronenberg,Z.N.etal.ExtendedhaplotypephasingofdenovogenomeassemblieswithFALCON-Phase.Nat.Commun.https://doi.org/10.1038/s41467-020-20536-y(2021).30.Ewing,B.,Hillier,L.,Wendl,M.C.&Green,P.Base-callingofautomatedsequencertracesusingphred.I.Accuracyassessment.GenomeRes.8,175–185(1998).CAS PubMed GoogleScholar 31.Tomaszkiewicz,M.,Medvedev,P.&Makova,K.D.YandWchromosomeassemblies:approachesanddiscoveries.TrendsGenet.33,266–282(2017).CAS PubMed GoogleScholar 32.Kolesnikov,A.A.&Gerasimov,E.S.Diversityofmitochondrialgenomeorganization.Biochem.(Mosc.)77,1424–1435(2012).CAS GoogleScholar 33.Formenti,G.etal.Completevertebratemitogenomesrevealwidespreadrepeatsandgeneduplications.GenomeBiol.(inthepress).34.Harrison,G.L.A.etal.Fournewavianmitochondrialgenomeshelpgettobasicevolutionaryquestionsinthelatecretaceous.Mol.Biol.Evol.21,974–983(2004).CAS PubMed GoogleScholar 35.Zhao,H.etal.ThecompletemitochondrialgenomeoftheAnabastestudineus(Perciformes,Anabantidae).MitochondrialDNAADNAMapp.Seq.Anal.27,1005–1007(2016).CAS PubMed GoogleScholar 36.Suzuki,A.etal.Howthekinetochorecouplesmicrotubuleforceandcentromerestretchtomovechromosomes.Nat.CellBiol.18,382–392(2016).CAS PubMed PubMedCentral GoogleScholar 37.Pfenning,A.R.etal.Convergenttranscriptionalspecializationsinthebrainsofhumansandsong-learningbirds.Science346,1256846(2014).PubMed PubMedCentral GoogleScholar 38.Robinson,R.Formammals,lossofyolkandgainofmilkwenthandinhand.PLoSBiol.6,e77(2008).PubMed PubMedCentral GoogleScholar 39.Brandl,K.etal.Yip1domainfamily,member6(Yipf6)mutationinducesspontaneousintestinalinflammationinmice.Proc.NatlAcad.Sci.USA109,12650–12655(2012).CAS PubMed ADS GoogleScholar 40.Malmstrøm,M.etal.Evolutionoftheimmunesysteminfluencesspeciationratesinteleostfishes.Nat.Genet.48,1204–1210(2016).PubMed GoogleScholar 41.Japundžić-Žigon,N.,Lozić,M.,Šarenac,O.&Murphy,D.Vasopressin&oxytocinincontrolofthecardiovascularsystem:anupdatedreview.Curr.Neuropharmacol.18,14–33(2020).PubMed PubMedCentral GoogleScholar 42.Cataldo,I.,Azhari,A.&Esposito,G.Areviewofoxytocinandarginine-vasopressinreceptorsandtheirmodulationofautismspectrumdisorder.Front.Mol.Neurosci.11,27(2018).PubMed PubMedCentral GoogleScholar 43.Warren,W.C.etal.Genomeanalysisoftheplatypusrevealsuniquesignaturesofevolution.Nature453,175–183(2008).CAS PubMed PubMedCentral ADS GoogleScholar 44.Ko,B.J.etal.Widespreadfalsegenegainscausedbyduplicationerrorsingenomeassemblies.Preprintathttps://doi.org/10.1101/2021.04.09.438957(2021).45.Lemaire,S.etal.Characterizingtheinterplaybetweengenenucleotidecompositionbiasandsplicing.GenomeBiol.20,259(2019).CAS PubMed PubMedCentral GoogleScholar 46.Zhang,L.,Kasif,S.,Cantor,C.R.&Broude,N.E.GC/AT-contentspikesasgenomicpunctuationmarks.Proc.NatlAcad.Sci.USA101,16855–16860(2004).CAS PubMed ADS GoogleScholar 47.Jarvis,E.D.etal.Globalviewofthefunctionalmolecularorganizationoftheaviancerebrum:mirrorimagesandfunctionalcolumns.J.Comp.Neurol.521,3614–3665(2013).PubMed PubMedCentral GoogleScholar 48.Kubikova,L.,Wada,K.&Jarvis,E.D.Dopaminereceptorsinasongbirdbrain.J.Comp.Neurol.518,741–769(2010).CAS PubMed PubMedCentral GoogleScholar 49.Sémon,M.&Wolfe,K.H.Rearrangementratefollowingthewhole-genomeduplicationinteleosts.Mol.Biol.Evol.24,860–867(2007).PubMed GoogleScholar 50.Jebb,D.etal.Sixreference-qualitygenomesrevealevolutionofbatadaptations.Nature583,578–584(2020).CAS PubMed ADS GoogleScholar 51.Schneider,V.A.etal.EvaluationofGRCh38anddenovohaploidgenomeassembliesdemonstratestheenduringqualityofthereferenceassembly.GenomeRes.27,849–864(2017).CAS PubMed PubMedCentral GoogleScholar 52.Warren,W.C.etal.Anewchickengenomeassemblyprovidesinsightintoaviangenomestructure.G3(Bethesda)7,109–117(2017).CAS GoogleScholar 53.Meredith,R.W.etal.ImpactsoftheCretaceousTerrestrialRevolutionandKPgextinctiononmammaldiversification.Science334,521–524(2011).CAS PubMed ADS GoogleScholar 54.Rodriguez-Agudo,D.etal.StarD5:anERstressproteinregulatesplasmamembraneandintracellularcholesterolhomeostasis.J.LipidRes.60,1087–1098(2019).CAS PubMed PubMedCentral GoogleScholar 55.Kim,J.etal.Reconstructionandevolutionaryhistoryofeutherianchromosomes.Proc.NatlAcad.Sci.USA114,E5379–E5388(2017).CAS PubMed GoogleScholar 56.Lin,B.,Dutta,B.&Fraser,I.D.C.Systematicinvestigationofmulti-TLRsensingidentifiesregulatorsofsustainedgeneactivationinmacrophages.CellSyst.5,25–37.e3(2017).CAS PubMed PubMedCentral GoogleScholar 57.Theofanopoulou,C.,Gedman,G.L.,Cahill,J.A.,Boeckx,C.&Jarvis,E.D.Universalnomenclatureforoxytocin-vasotocinligandandreceptorfamilies.Naturehttps://doi.org/10.1038/s41586-020-03040-7(2021).58.OcampoDaza,D.&Haitina,T.Reconstructionofthecarbohydrate6-Osulfotransferasegenefamilyevolutioninvertebratesrevealsnovelmember,CHST16,lostinamniotes.GenomeBiol.Evol.12,993–1012(2020).PubMed GoogleScholar 59.Damas,J.etal.BroadhostrangeofSARS-CoV-2predictedbycomparativeandstructuralanalysisofACE2invertebrates.Proc.NatlAcad.Sci.USA117,22311–22322(2020).CAS PubMed GoogleScholar 60.Dussex,N.etal.Populationgenomicsrevealstheimpactoflong-termsmallpopulationsizeinthecriticallyendangeredkākāpō.CellGenom.(inthepress).61.Teeling,E.C.etal.Batbiology,genomes,andtheBat1Kproject:togeneratechromosome-levelgenomesforalllivingbatspecies.Annu.Rev.Anim.Biosci.6,23–46(2018).PubMed GoogleScholar 62.Lewin,H.A.etal.EarthBioGenomeProject:sequencinglifeforthefutureoflife.Proc.NatlAcad.Sci.USA115,4325–4333(2018).CAS PubMed GoogleScholar 63.Jarvis,E.D.etal.Whole-genomeanalysesresolveearlybranchesinthetreeoflifeofmodernbirds.Science346,1320–1331(2014).CAS PubMed PubMedCentral ADS GoogleScholar 64.Li,S.etal.Genomicsignaturesofnear-extinctionandrebirthofthecrestedibisandotherendangeredbirdspecies.GenomeBiol.15,557(2014).PubMed PubMedCentral GoogleScholar 65.Koren,S.&Phillippy,A.M.Onechromosome,onecontig:completemicrobialgenomesfromlong-readsequencingandassembly.Curr.Opin.Microbiol.23,110–120(2015).CAS PubMed GoogleScholar 66.Jenjaroenpun,P.etal.Completegenomicandtranscriptionallandscapeanalysisusingthird-generationsequencing:acasestudyofSaccharomycescerevisiaeCEN.PK113-7D.NucleicAcidsRes.46,e38(2018).CAS PubMed PubMedCentral GoogleScholar 67.Tyson,J.R.etal.MinION-basedlong-readsequencingandassemblyextendstheCaenorhabditiselegansreferencegenome.GenomeRes.28,266–274(2018).CAS PubMed PubMedCentral GoogleScholar 68.Miga,K.H.etal.Telomere-to-telomereassemblyofacompletehumanXchromosome.Nature585,79–84(2020).CAS PubMed PubMedCentral ADS GoogleScholar 69.Logsdon,G.A.etal.Thestructure,functionandevolutionofacompletehumanchromosome8.Naturehttps://doi.org/10.1038/s41586-021-03420-7(2021).70.Beçak,M.L.,Beçak,W.,Roberts,F.L.,Shoffner,R.N.&Volpe,P.(eds.)ChromosomeAtlas:Fish,Amphibians,Reptiles,andBirdsVol.2(Springer,1973).71.Vurture,G.W.etal.GenomeScope:fastreference-freegenomeprofilingfromshortreads.Bioinformatics33,2202–2204(2017).CAS PubMed PubMedCentral GoogleScholar 72.Kumar,S.,Stecher,G.,Suleski,M.&Hedges,S.B.TimeTree:aresourcefortimelines,timetrees,anddivergencetimes.Mol.Biol.Evol.34,1812–1819(2017).CAS PubMed GoogleScholar 73.Ondov,B.D.etal.Mash:fastgenomeandmetagenomedistanceestimationusingMinHash.GenomeBiol.17,132(2016).PubMed PubMedCentral GoogleScholar 74.Ning,Z.&Harry,E.Scaff10Xhttps://github.com/wtsi-hpag/Scaff10X.75.Morgulis,A.,Gertz,E.M.,Schäffer,A.A.&Agarwala,R.WindowMasker:window-basedmaskerforsequencedgenomes.Bioinformatics22,134–141(2006).CAS PubMed GoogleScholar 76.Chin,C.-S.etal.Nonhybrid,finishedmicrobialgenomeassembliesfromlong-readSMRTsequencingdata.Nat.Methods10,563–569(2013).CAS PubMed GoogleScholar 77.Koren,S.etal.Canu:scalableandaccuratelong-readassemblyviaadaptivek-merweightingandrepeatseparation.GenomeRes.27,722–736(2017).CAS PubMed PubMedCentral GoogleScholar 78.Weisenfeld,N.I.,Kumar,V.,Shah,P.,Church,D.M.&Jaffe,D.B.Directdeterminationofdiploidgenomesequences.GenomeRes.27,757–767(2017).CAS PubMed PubMedCentral GoogleScholar 79.Ghurye,J.etal.IntegratingHi-Clinkswithassemblygraphsforchromosome-scaleassembly.PLoSComput.Biol.15,e1007273(2019).CAS PubMed PubMedCentral GoogleScholar 80.Lieberman-Aiden,E.etal.Comprehensivemappingoflong-rangeinteractionsrevealsfoldingprinciplesofthehumangenome.Science326,289–293(2009).CAS PubMed PubMedCentral ADS GoogleScholar 81.Luo,R.etal.SOAPdenovo2:anempiricallyimprovedmemory-efficientshort-readdenovoassembler.Gigascience1,18(2012).PubMed PubMedCentral GoogleScholar 82.English,A.C.etal.Mindthegap:upgradinggenomeswithPacificBiosciencesRSlong-readsequencingtechnology.PLoSONE7,e47768(2012).CAS PubMed PubMedCentral ADS GoogleScholar 83.Bishara,A.etal.Readcloudsuncovervariationincomplexregionsofthehumangenome.GenomeRes.25,1570–1580(2015).CAS PubMed PubMedCentral GoogleScholar 84.Walker,B.J.etal.Pilon:anintegratedtoolforcomprehensivemicrobialvariantdetectionandgenomeassemblyimprovement.PLoSONE9,e112963(2014).PubMed PubMedCentral ADS GoogleScholar 85.Garrison,E.&Marth,G.Haplotype-basedvariantdetectionfromshort-readsequencing.Preprintathttp://arxiv.org/abs/1207.3907(2012).86.Jain,C.,Koren,S.,Dilthey,A.,Phillippy,A.M.&Aluru,S.Afastadaptivealgorithmforcomputingwhole-genomehomologymaps.Bioinformatics34,i748–i756(2018).CAS PubMed PubMedCentral GoogleScholar 87.BionanoGenomics,Inc.BionanoSoftwareDownloads.https://bionanogenomics.com/support/software-downloads/.88.ArimaGenomics,Inc.ArimaGenomicsMappingPipeline.https://github.com/ArimaGenomics/mapping_pipeline.89.Li,H.&Durbin,R.FastandaccurateshortreadalignmentwithBurrows–Wheelertransform.Bioinformatics25,1754–1760(2009).CAS PubMed PubMedCentral GoogleScholar 90.Li,H.Minimap2:pairwisealignmentfornucleotidesequences.Bioinformatics34,3094–3100(2018).CAS PubMed PubMedCentral GoogleScholar 91.Chaisson,M.J.&Tesler,G.Mappingsinglemoleculesequencingreadsusingbasiclocalalignmentwithsuccessiverefinement(BLASR):applicationandtheory.BMCBioinformatics13,238(2012).CAS PubMed PubMedCentral GoogleScholar 92.Li,H.etal.Thesequencealignment/mapformatandSAMtools.Bioinformatics25,2078–2079(2009).PubMed PubMedCentral GoogleScholar 93.Dierckxsens,N.,Mardulyn,P.&Smits,G.NOVOPlasty:denovoassemblyoforganellegenomesfromwholegenomedata.NucleicAcidsRes.45,e18(2017).PubMed GoogleScholar 94.Soorni,A.,Haak,D.,Zaitlin,D.&Bombarely,A.Organelle_PBA,apipelineforassemblingchloroplastandmitochondrialgenomesfromPacBioDNAsequencingdata.BMCGenomics18,49(2017).PubMed PubMedCentral GoogleScholar 95.Chow,W.etal.gEVAL — aweb-basedbrowserforevaluatinggenomeassemblies.Bioinformatics32,2508–2510(2016).CAS PubMed PubMedCentral GoogleScholar 96.Durand,N.C.etal.JuiceboxprovidesavisualizationsystemforHi-Ccontactmapswithunlimitedzoom.CellSyst.3,99–101(2016).CAS PubMed PubMedCentral GoogleScholar 97.Kerpedjiev,P.etal.HiGlass:web-basedvisualexplorationandanalysisofgenomeinteractionmaps.GenomeBiol.19,125(2018).PubMed PubMedCentral GoogleScholar 98.Camacho,C.etal.BLAST+:architectureandapplications.BMCBioinformatics10,421(2009).PubMed PubMedCentral GoogleScholar 99.Harris,R.S.ImprovedPairwiseAlignmentofGenomicDNA. Thesis,PennsylvaniaStateUniv.(2007).100.Kent,W.J.,Baertsch,R.,Hinrichs,A.,Miller,W.&Haussler,D.Evolution’scauldron:duplication,deletion,andrearrangementinthemouseandhumangenomes.Proc.NatlAcad.Sci.USA100,11484–11489(2003).CAS PubMed ADS GoogleScholar 101.Kolmogorov,M.,Raney,B.,Paten,B.&Pham,S.Ragout—areference-assistedassemblytoolforbacterialgenomes.Bioinformatics30,i302–i309(2014).CAS PubMed PubMedCentral GoogleScholar 102.Farré,M.etal.Novelinsightsintochromosomeevolutioninbirds,archosaurs,andreptiles.GenomeBiol.Evol.8,2442–2451(2016).PubMed PubMedCentral GoogleScholar 103.Guan,D.Asset.https://github.com/dfguan/asset.104.Tarailo-Graovac,M.&Chen,N.UsingRepeatMaskertoidentifyrepetitiveelementsingenomicsequences.Curr.Protoc.Bioinformatics 25,4.10.1–4.10.14(2009). GoogleScholar 105.Krumsiek,J.,Arnold,R.&Rattei,T.Gepard:arapidandsensitivetoolforcreatingdotplotsongenomescale.Bioinformatics23,1026–1028(2007).CAS PubMed GoogleScholar 106.Harry,E.PretextView.https://github.com/wtsi-hpag/PretextView.107.Kurtz,S.etal.Versatileandopensoftwareforcomparinglargegenomes.GenomeBiol.5,R12(2004).PubMed PubMedCentral GoogleScholar 108.Nattestad,M.Dot.https://github.com/MariaNattestad/dot.DownloadreferencesAcknowledgementsWethankthefollowingpersonsforfeedbackandsupport:R.Johnson,E.Karlsson,K.LindbladToh,W.Jun,I.Korf,W.Haerty,G.Etherington,B.Clavijo,andA.Komissarovfordiscussionsintheearlystagesoftheproject;R.FullerforhelpwiththeG10Kwebsitemaintenance,andH.SegalforhelpwithwithVGPwebsitedevelopment;M.LinhPhamforhelpwithinitialgrantwriting;L.Shalmiyevforadministrativehelp;D.Church,G.Kol,K.Baruch,O.Barad,I.Liachko,E.Muzychenko,S.Garg,andM.Kolmogorovforpreliminaryanalysesperformedononeormoregenomes;K.Oliver,C.CortonandJ.Skeltonfordatageneration;E.Harryfortechnicalsupportinscaff10xandPretext;C.MazzoniforcoordinatingstudentsandtrainingatLeibnizInstituteforZooandWildlifeResearchandBerlinCenterforGenomicsinBiodiversityResearch;andM.Driller,C.Caswara,M.Vafadar,N.Hill,D.DePanis,A.Whibley,B.Maloney,C.Mitchell,G.Gallo,J.Gaige,K.Amoako-Boadu,M.JoseGomez,M.Montero,D.Ratnikov,S.Brown,S.Zylka,S.Marcus,andT.CarrascoforcompletingtrainingandtestingtheVGPpipelinebyproducingordinalrepresentativegenomeassembliesnotdescribedinthismanuscript.Wethankourcompanypartners(listedbelow),NCBI,EBI,andAmazonAWS,includingAWSforsponsoringsequencestorage.J.FekecsandD.Lejacreatedtheanimalimages,andJ.Kimmodifiedthemtosilhouettes.Wethankthemfortheirpermissiontopublish.A.R.,S.K.,B.P.W.andA.M.P.weresupportedbytheIntramuralResearchProgramoftheNHGRI,NIH(1ZIAHG200398).A.R.wasalsosupportedbytheKoreaHealthTechnologyR&DProjectthroughKHIDI,fundedbytheMinistryofHealth&Welfare,RepublicofKorea(HI17C2098).S.A.M.,I.B.andR.D.weresupportedbyWellcomeTrustgrantWT207492;W.C.,M.Smith,Z.N.,Y.S.,J.C.,S.Pelan,J.T.,A.T.,J.W.andKerstinHowebyWT206194;L.H.,F.M.,KevinHoweandP.FlicekbyWT108749/Z/15/Z,WT218328/B/19/ZandtheEuropeanMolecularBiologyLaboratory.O.F.andE.D.J.weresupportedbyHowardHughesMedicalInstituteandRockefellerUniversitystart-upfundsforthisproject.J.D.andH.A.L.weresupportedbytheRobertandRosabelOsborneEndowment.M.U.-S.receivedfundingfromtheEuropeanUnion’sHorizon2020researchandinnovationprogrammeundertheMarieSkłodowska-Curiegrantagreement(750747).F.T.-N.,J.Hoffman,P.MastersonandK.C.weresupportedbytheIntramuralResearchProgramoftheNLM,NIH.C.L.,B.J.K.,J.KimandH.K.weresupportedbytheMarineBiotechnologyProgramofKIMST,fundedbytheMinistryofOceanandFisheries,RepublicofKorea(20180430).M.C.wassupportedbySloanResearchFellowship(FG-2020-12932).S.C.V.wasfundedbyaMaxPlanckResearchGroupawardfromtheMaxPlanckSociety,andaHumanFrontiersScienceProgram(HFSP)Researchgrant(RGP0058/2016).T.M.L.,W.E.J.andtheCanadalynxgenomewerefundedbytheMaineDepartmentofInlandFisheries&Wildlife(F11AF01099),includingwhenW.E.J.heldaNationalResearchCouncilResearchAssociateshipAwardattheWalterReedArmyInstituteofResearch(WRAIR).C.B.wassupportedbytheNSF(1457541and1456612).D.B.wasfundedbyTheUniversityofQueensland(HFSP-RGP0030/2015).D.I.wassupportedbyScienceExchangeInc.(PaloAlto,CA).H.W.D.wassupportedbyNSFgrants(OPP-0132032ICEFISH2004Cruise,PLR-1444167andOPP-1955368)andtheMarineScienceCenteratNortheasternUniversity(416).G.J.P.N.andthethornyskategenomewerefundedbyLenfestOceanProgram(30884).M.P.wasfundedbytheGermanFederalMinistryofEducationandResearch(01IS18026C).M.MalinskywassupportedbyanEMBOfellowship(ALTF456-2016).Thefollowingauthors’contributionsweresupportedbytheNIH:S.Selvaraj(R44HG008118);C.V.M.,S.R.F.,P.V.L.(R21DC014432/DC/NIDCD);K.D.M.(R01GM130691);H.C.(5U41HG002371-19);M.D.(U41HG007234);andB.P.(R01HG010485).D.G.wassupportedbytheNationalKeyResearchandDevelopmentProgramofChina(2017YFC1201201,2018YFC0910504and2017YFC0907503).F.O.A.wassupportedbyAl-GannasQatariSocietyandTheCulturalVillageFoundation-Katara,Doha,StateofQatarandMonashUniversityMalaysia.C.T.wassupportedbyTheRockefellerUniversity.M.HillerwassupportedbytheLOEWE-CentreforTranslationalBiodiversityGenomics(TBG)fundedbytheHessenStateMinistryofHigherEducation,ResearchandtheArts(HMWK).H.C.wassupportedbytheNHGRI(5U41HG002371-19).R.H.S.K.wasfundedbytheMaxPlanckSocietywithcomputationalresourcesatthebwUniClusterandBinACfundedbytheMinistryofScience,ResearchandtheArtsBaden-WürttembergandtheUniversitiesoftheStateofBaden-Württemberg,Germany(bwHPC-C5).B.V.wassupportedbytheBiomedicalResearchCouncilofA*STAR,Singapore.T.M.-B.wasfundedbytheEuropeanResearchCouncilundertheEuropeanUnion’sHorizon2020researchandinnovationprogramme(864203),MINECO/FEDER,UE(BFU2017-86471-P),UnidaddeExcelenciaMaríadeMaeztu,AEI(CEX2018-000792-M),aHowardHughesInternationalEarlyCareeraward,ObraSocial“LaCaixa”andSecretariad’UniversitatsiRecercaandCERCAProgrammedelDepartamentd’EconomiaiConeixementdelaGeneralitatdeCatalunya(GRC2017SGR880).E.C.T.wassupportedbytheEuropeanResearchCouncil(ERC-2012-StG311000)andanIrishResearchCouncilLaureateAward.M.T.P.G.wassupportedbyanERCConsolidatorAward681396-ExtinctionGenomics,andaDanishNationalResearchFoundationCenterGrant(DNRF143).T.W.wassupportedbytheNSF(1458652).J.M.GraveswassupportedbytheAustralianResearchCouncil(CEO561477).E.W.M.waspartiallysupportedbytheGermanFederalMinistryofEducationandResearch(01IS18026C).ComplementarysequencingsupportfortheAnna’shummingbirdandseveralgenomeswasprovidedbyPacificBiosciences,BionanoGenomics,DovetailGenomics,ArimaGenomics,PhaseGenomics,10XGenomics,NRGene,OxfordNanoporeTechnologies,Illumina,andDNAnexus.AllothersequencingandassemblywereconductedattheRockefellerUniversity,SangerInstitute,andMaxPlanckInstituteDresdengenomelabs.PartofthisworkusedthecomputationalresourcesoftheNIHHPCBiowulfcluster(https://hpc.nih.gov).WeacknowledgefundingfromtheWellcomeTrust(108749/Z/15/Z)andtheEuropeanMolecularBiologyLaboratory.WethankLeComitéScientifiqueRégionalduPatrimoineNaturelandDirectiondel’Environnement,del’AménagementetduLogement,Guyanneforresearchapprovalsandexportpermits.AuthorinformationAuthornotesTheseauthorscontributedequally:ArangRhie,ShaneA.McCarthy,OlivierFedrigoTheseauthorsjointlysupervisedthiswork:KerstinHowe,EugeneW.Myers,RichardDurbin,AdamM.Phillippy,ErichD.JarvisAffiliationsGenomeInformaticsSection,ComputationalandStatisticalGenomicsBranch,NationalHumanGenomeResearchInstitute,NationalInstitutesofHealth,Bethesda,MD,USAArangRhie, SergeyKoren, BrianP.Walenz & AdamM.PhillippyDepartmentofGenetics,UniversityofCambridge,Cambridge,UKShaneA.McCarthy, IlianaBista, DengfengGuan & RichardDurbinWellcomeSangerInstitute,Cambridge,UKShaneA.McCarthy, WilliamChow, IlianaBista, MichelleSmith, MilanMalinsky, ZeminNing, YingSims, JoannaCollins, SarahPelan, JamesTorrance, AlanTracey, JonathanWood, KerstinHowe & RichardDurbinVertebrateGenomeLab,TheRockefellerUniversity,NewYork,NY,USAOlivierFedrigo, GiulioFormenti, BettinaHaase, JacquelynMountcastle, SadyePaez & ErichD.JarvisTheGenomeCenter,UniversityofCaliforniaDavis,Davis,CA,USAJoanaDamas & HarrisA.LewinLaboratoryofNeurogeneticsofLanguage,TheRockefellerUniversity,NewYork,NY,USAGiulioFormenti, GregoryL.Gedman, LindseyJ.Cantin, SadyePaez, MatthewT.Biegler, ConstantinaTheofanopoulou & ErichD.JarvisLeibnizInstituteforZooandWildlifeResearch,DepartmentofEvolutionaryGenetics,Berlin,GermanyMarcelaUliano-SilvaBerlinCenterforGenomicsinBiodiversityResearch,Berlin,GermanyMarcelaUliano-SilvaDNAnexusInc.,MountainView,CA,USAArkarachaiFungtammasan, MariaSimbirsky & BrettT.HanniganInterdisciplinaryPrograminBioinformatics,SeoulNationalUniversity,Seoul,RepublicofKoreaJuwanKim, ChulLee & HeebalKimDepartmentofAgriculturalBiotechnologyandResearchInstituteofAgricultureandLifeSciences,SeoulNationalUniversity,Seoul,RepublicofKoreaByungJuneKo & HeebalKimUniversityofSouthernCalifornia,LosAngeles,CA,USAMarkChaisson & RobelE.DagnewNationalCenterforBiotechnologyInformation,NationalLibraryofMedicine,NIH,Bethesda,MD,USAFrancoiseThibaud-Nissen, JinnaHoffman, PatrickMasterson & KarenClarkEuropeanMolecularBiologyLaboratory,EuropeanBioinformaticsInstitute,WellcomeGenomeCampus,Hinxton,UKLeanneHaggerty, FergalMartin, KevinHowe & PaulFlicekMaxPlanckInstituteofMolecularCellBiologyandGenetics,Dresden,GermanySylkeWinkler, MartinPippel, EkaterinaOsipova & EugeneW.MyersDRESDEN-conceptGenomeCenter,Dresden,GermanySylkeWinklerNovogene,Durham,NC,USAJasonHowardNeurogeneticsofVocalCommunicationGroup,MaxPlanckInstituteforPsycholinguistics,Nijmegen,TheNetherlandsSonjaC.VernesDondersInstituteforBrain,CognitionandBehaviour,Nijmegen,TheNetherlandsSonjaC.VernesSchoolofBiology,UniversityofStAndrews,StAndrews,UKSonjaC.VernesUniversityofMassachusettsCooperativeFishandWildlifeResearchUnit,Amherst,MA,USATanyaM.LamaSchoolofBiologicalScience,TheEnvironmentInstitute,UniversityofAdelaide,Adelaide,SouthAustralia,AustraliaFrankGrutznerBondLifeSciencesCenter,UniversityofMissouri,Columbia,MO,USAWesleyC.WarrenDepartmentofBiology,EastCarolinaUniversity,Greenville,NC,USAChristopherN.BalakrishnanUQGenomics,UniversityofQueensland,Brisbane,Queensland,AustraliaDaveBurtDepartmentofBiologicalSciences,ClemsonUniversity,Clemson,SC,USAJuliaM.GeorgeTheGeneticRescueFoundation,Wellington,NewZealandDavidIornsKākāpōRecovery,DepartmentofConservation,Invercargill,NewZealandAndrewDigby & DarylEasonDepartmentofZoology,UniversityofOtago,Dunedin,NewZealandBruceRobertsonUniversityofArizonaGeneticsCore,Tucson,AZ,USATaylorEdwardsDepartmentofLifeSciences,NaturalHistoryMuseum,London,UKMarkWilkinsonSchoolofNaturalSciences,BangorUniversity,Gwynedd,UKGeorgeTurnerDepartmentofBiology,UniversityofKonstanz,Konstanz,GermanyAxelMeyer, AndreasF.Kautt, PaoloFranchini & RobertH.S.KrausDepartmentofOrganismicandEvolutionaryBiology,HarvardUniversity,Cambridge,MA,USAAndreasF.KauttDepartmentofMarineandEnvironmentalSciences,NortheasternUniversityMarineScienceCenter,Nahant,MA,USAH.WilliamDetrichIIIDepartmentofBiology,UniversityofAntwerp,Antwerp,BelgiumHannesSvardalNaturalisBiodiversityCenter,Leiden,TheNetherlandsHannesSvardalInstituteofBiology,Karl-FranzensUniversityofGraz,Graz,AustriaMaximilianWagnerFloridaMuseumofNaturalHistory,UniversityofFlorida,Gainesville,FL,USAGavinJ.P.NaylorCenterforSystemsBiology,Dresden,GermanyMartinPippel, EkaterinaOsipova & EugeneW.MyersZoologicalInstitute,UniversityofBasel,Basel,SwitzerlandMilanMalinskyTag.bio,SanFrancisco,CA,USAMarkMooneyUCSantaCruzGenomicsInstitute,UniversityofCalifornia,SantaCruz,CA,USATrevorPesout, RichardE.Green, ErikGarrison, HiramClawson, MarkDiekhans, LuisNassar, BenedictPaten & DavidHausslerSanDiegoZooGlobal,Escondido,CA,USAMarlysHouck, AnnMisuraca & OliverA.RyderPacificBiosciences,MenloPark,CA,USASarahB.Kingan, RichardHall, ZevKronenberg, IvanSović, ChristopherDunn & JonasKorlachDigitalBioLogic,Ivanić-Grad,CroatiaIvanSovićBionanoGenomics,SanDiego,CA,USAAlexHastie & JoyceLeeArimaGenomics,SanDiego,CA,USASiddarthSelvarajDovetailGenomics,SantaCruz,CA,USARichardE.Green & JayGhuryeIndependentResearcher,SantaCruz,CA,USANicholasH.PutnamCNAG-CRG,CentreforGenomicRegulation,BarcelonaInstituteofScienceandTechnology,Barcelona,SpainIvoGutUniversitatPompeuFabra,Barcelona,SpainIvoGutDepartmentofComputerScience,UniversityofMarylandCollegePark,CollegePark,MD,USAJayGhuryeSchoolofComputerScienceandTechnology,CenterforBioinformatics,HarbinInstituteofTechnology,Harbin,ChinaDengfengGuanDepartmentofPsychology,InstituteforMindandBiology,UniversityofChicago,Chicago,IL,USASarahE.LondonDepartmentofGeneticsandBiochemistry,ClemsonUniversity,Clemson,SC,USADavidF.ClaytonDepartmentofBehavioralNeuroscience,OregonHealthandScienceUniversity,Portland,OR,USAClaudioV.Mello, SamanthaR.Friedrich & PeterV.LovellMaxPlanckInstituteforthePhysicsofComplexSystems,Dresden,GermanyEkaterinaOsipovaMonashUniversityMalaysiaGenomicsFacility,SchoolofScience,SelangorDarulEhsan,MalaysiaFarooqO.Al-AjliTropicalMedicineandBiologyMultidisciplinaryPlatform,MonashUniversityMalaysia,SelangorDarulEhsan,MalaysiaFarooqO.Al-AjliQatarFalconGenomeProject,Doha,QatarFarooqO.Al-AjliDepartmentofBiosciences,UniversityofMilan,Milan,ItalySimonaSecomandieGnome,Inc.,Seoul,RepublicofKoreaHeebalKim & WooriKwakLOEWECentreforTranslationalBiodiversityGenomics,Frankfurt,GermanyMichaelHillerSenckenbergResearchInstitute,Frankfurt,GermanyMichaelHillerGoethe-University,FacultyofBiosciences,Frankfurt,GermanyMichaelHillerBGI-Shenzhen,Shenzhen,ChinaYangZhouDepartmentofBiology,PennsylvaniaStateUniversity,UniversityPark,PA,USARobertS.Harris & KaterynaD.MakovaCenterforMedicalGenomics,PennsylvaniaStateUniversity,UniversityPark,PA,USAKaterynaD.Makova & PaulMedvedevCenterforComputationalBiologyandBioinformatics,PennsylvaniaStateUniversity,UniversityPark,PA,USAKaterynaD.Makova & PaulMedvedevDepartmentofComputerScienceandEngineering,PennsylvaniaStateUniversity,UniversityPark,PA,USAPaulMedvedevDepartmentofBiochemistryandMolecularBiology,PennsylvaniaStateUniversity,UniversityPark,PA,USAPaulMedvedevHoonygen,Seoul,KoreaWooriKwakDepartmentofMigration,MaxPlanckInstituteofAnimalBehavior,Radolfzell,GermanyRobertH.S.KrausDepartmentofBiologicalSciences,UniversidaddelosAndes,Bogotá,ColombiaAndrewJ.CrawfordCenterforEvolutionaryHologenomics,TheGLOBEInstitute,UniversityofCopenhagen,Copenhagen,DenmarkM.ThomasP.GilbertUniversityMuseum,NTNU,Trondheim,NorwayM.ThomasP.GilbertChinaNationalGenebank,BGI-Shenzhen,Shenzhen,ChinaGuojieZhangVillumCenterforBiodiversityGenomics,SectionforEcologyandEvolution,DepartmentofBiology,UniversityofCopenhagen,Copenhagen,DenmarkGuojieZhangStateKeyLaboratoryofGeneticResourcesandEvolution,KunmingInstituteofZoology,ChineseAcademyofSciences,Kunming,ChinaGuojieZhangCenterforExcellenceinAnimalEvolutionandGenetics,ChineseAcademyofSciences,Kunming,ChinaGuojieZhangInstituteofMolecularandCellBiology,A*STAR,Biopolis,Singapore,SingaporeByrappaVenkateshCentreforBiodiversity,RoyalOntarioMuseum,Toronto,Ontario,CanadaRobertW.MurphySmithsonianConservationBiologyInstitute,CenterforSpeciesSurvival,NationalZoologicalPark,Washington,DC,USAKlaus-PeterKoepfli & WarrenE.JohnsonDepartmentofEcologyandEvolutionaryBiology,UniversityofCaliforniaSantaCruz,SantaCruz,CA,USABethShapiro & DavidHausslerHowardHughesMedicalInstitute,ChevyChase,MD,USABethShapiro & ErichD.JarvisTheWalterReedBiosystematicsUnit,MuseumSupportCenterMRC-534,SmithsonianInstitution,Suitland,MD,USAWarrenE.JohnsonWalterReedArmyInstituteofResearch,SilverSpring,MD,USAWarrenE.JohnsonDepartmentofBiologicalSciences,EarlhamInstitute,UniversityofEastAnglia,Norwich,UKFedericaDiPalmaInstituteofEvolutionaryBiology(UPF-CSIC),PRBB,Barcelona,SpainTomasMarques-BonetCatalanInstitutionofResearchandAdvancedStudies(ICREA),Barcelona,SpainTomasMarques-BonetCentreforGenomicRegulation(CRG),BarcelonaInstituteofScienceandTechnology(BIST),Barcelona,SpainTomasMarques-BonetInstitutCatalàdePaleontologiaMiquelCrusafont,UniversitatAutònomadeBarcelona,Barcelona,SpainTomasMarques-BonetSchoolofBiologyandEnvironmentalScience,UniversityCollegeDublin,Dublin,IrelandEmmaC.TeelingDepartmentofComputerScience,TheUniversityofIllinoisatUrbana-Champaign,Urbana,IL,USATandyWarnowSchoolofLifeScience,LaTrobeUniversity,Melbourne,Victoria,AustraliaJenniferMarshallGravesDepartmentofEvolution,Behavior,andEcology,UniversityofCaliforniaSanDiego,LaJolla,CA,USAOliverA.RyderLaboratoryofGenomicsDiversity-CenterforComputerTechnologies,ITMOUniversity,St.Petersburg,RussianFederationStephenJ.O’BrienGuyHarveyOceanographicCenter,HalmosCollegeofNaturalSciencesandOceanography,NovaSoutheasternUniversity,FortLauderdale,FL,USAStephenJ.O’BrienDepartmentofEvolutionandEcology,UniversityofCaliforniaDavis,Davis,CA,USAHarrisA.LewinJohnMuirInstitutefortheEnvironment,UniversityofCaliforniaDavis,Davis,CA,USAHarrisA.LewinFacultyofComputerScience,TechnicalUniversityDresden,Dresden,GermanyEugeneW.MyersAuthorsArangRhieViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarShaneA.McCarthyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarOlivierFedrigoViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJoanaDamasViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGiulioFormentiViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSergeyKorenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarcelaUliano-SilvaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarWilliamChowViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarArkarachaiFungtammasanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJuwanKimViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarChulLeeViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarByungJuneKoViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarkChaissonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGregoryL.GedmanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarLindseyJ.CantinViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFrancoiseThibaud-NissenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarLeanneHaggertyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarIlianaBistaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMichelleSmithViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBettinaHaaseViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJacquelynMountcastleViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSylkeWinklerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSadyePaezViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJasonHowardViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSonjaC.VernesViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTanyaM.LamaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFrankGrutznerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarWesleyC.WarrenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarChristopherN.BalakrishnanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDaveBurtViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJuliaM.GeorgeViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMatthewT.BieglerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDavidIornsViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAndrewDigbyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDarylEasonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBruceRobertsonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTaylorEdwardsViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarkWilkinsonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGeorgeTurnerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAxelMeyerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAndreasF.KauttViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPaoloFranchiniViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarH.WilliamDetrichIIIViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarHannesSvardalViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMaximilianWagnerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGavinJ.P.NaylorViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMartinPippelViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMilanMalinskyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarkMooneyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMariaSimbirskyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBrettT.HanniganViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTrevorPesoutViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarlysHouckViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAnnMisuracaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSarahB.KinganViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRichardHallViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarZevKronenbergViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarIvanSovićViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarChristopherDunnViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarZeminNingViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAlexHastieViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJoyceLeeViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSiddarthSelvarajViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRichardE.GreenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarNicholasH.PutnamViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarIvoGutViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJayGhuryeViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarErikGarrisonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarYingSimsViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJoannaCollinsViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSarahPelanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJamesTorranceViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAlanTraceyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJonathanWoodViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRobelE.DagnewViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDengfengGuanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSarahE.LondonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDavidF.ClaytonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarClaudioV.MelloViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSamanthaR.FriedrichViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPeterV.LovellViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarEkaterinaOsipovaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFarooqO.Al-AjliViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarSimonaSecomandiViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarHeebalKimViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarConstantinaTheofanopoulouViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMichaelHillerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarYangZhouViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRobertS.HarrisViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKaterynaD.MakovaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPaulMedvedevViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJinnaHoffmanViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPatrickMastersonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKarenClarkViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFergalMartinViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKevinHoweViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarPaulFlicekViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBrianP.WalenzViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarWooriKwakViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarHiramClawsonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarkDiekhansViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarLuisNassarViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBenedictPatenViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRobertH.S.KrausViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAndrewJ.CrawfordViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarM.ThomasP.GilbertViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGuojieZhangViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarByrappaVenkateshViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRobertW.MurphyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKlaus-PeterKoepfliViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBethShapiroViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarWarrenE.JohnsonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFedericaDiPalmaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTomasMarques-BonetViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarEmmaC.TeelingViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarTandyWarnowViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJenniferMarshallGravesViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarOliverA.RyderViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarDavidHausslerViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarStephenJ.O’BrienViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarJonasKorlachViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarHarrisA.LewinViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarKerstinHoweViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarEugeneW.MyersViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarRichardDurbinViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarAdamM.PhillippyViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarErichD.JarvisViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarContributionsWrotethepaperandco-coordinatedthestudy:A.R.,E.D.J.,A.M.P.,R.D.,E.W.M.,KerstinHowe,S.A.M.,O.F.Coordinationwithvendors:J.Korlach,S.Selvaraj,R.E.G.,A.H.,M.Mooney.Collectedsamples:M.T.P.G.,W.E.J.,R.W.M.,G.Z.,B.V.,M.T.B.,J.Howard,S.C.V.,T.M.L.,F.G.,W.C.W.,D.B.,J.M.George,M.T.B.,D.I.,A.D.,D.E.,B.R.,T.E.,M.Wilkinson,G.T.,A.Meyer,A.F.K.,P.Franchini,H.W.D.,H.S.,M.Wagner,G.J.P.N.,R.D.,E.D.J.,E.C.T.,R.H.S.K.Generatedgenomedata:O.F.,I.B.,M.Smith,B.H.,J.M.,S.W.,C.B.,A.Meyer,A.F.K.,P.Franchini,I.G.,D.F.C.,C.V.M.Generatedgenomeassemblies:A.R.,S.A.M.,S.K.,M.P.,S.B.K.,R.H.,J.G.,Z.N.,J.L.,B.P.W.,M.Malinsky.Generated/modifiedsoftware:S.K.,A.R.,S.B.K.,R.H.,Z.K.,J.Korlach,I.S.,C.D.,Z.N.,A.H.,J.L.,J.G.,E.G.,C.V.M.,S.R.F.,N.H.P.Pipelinedevelopment:A.R.,S.A.M.,G.F.,S.K.,M.U.-S.,A.F.,M.Simbirsky,B.T.H.,T.P.,M.P.,E.W.M.,R.D.,A.M.P.GeneratedMTassemblies:G.F.,J.Korlach.Curation:KerstinHowe,W.C.,Y.S.,J.C.,S.Pelan,J.T.,A.T.,J.W.,Y.Z.,J.D.,H.A.L.Sexchromosomes:Y.Z.,R.S.H.,K.D.M.,P.Medvedev,J.M.Graves.Hummingbirdkaryotypeanalyses:M.Houck,A.Misuraca,M.P.,E.W.M.,E.D.J.Annotation:F.T.-N.,L.H.,J.Hoffman,P.Masterson,K.C.,F.M.,KevinHowe,P.Flicek,D.B.Evaluationanalysis:A.R.,J.D.,M.U.-S.,J.Kim,C.L.,B.J.K.,M.C.,G.L.G.,L.J.C.,F.T.-N.,L.H.,J.M.George,J.G.,R.E.D.,D.G.,S.E.L.,D.F.C.,C.V.M.,S.R.F.,P.V.L.,E.O.,F.O.A.-A.,S.Secomandi,C.T.,M.Hiller,H.K.,KerstinHowe,E.W.M.,R.D.,A.M.P.,E.D.J.Biologicalfindings:J.D.,J.Kim,C.L.,B.J.K.,G.L.G.,L.J.C.,H.A.L.,A.R.,E.D.J.Dataavailability:A.R.,S.A.M.,W.C.,A.F.,S.Paez,M.Simbirsky,B.T.H.,B.P.W.,W.K.,H.C.,M.D.,L.N.,B.P.,A.M.P.,E.D.J.G10Kcouncil,founders,andcoordinationofVGP:T.M.-B.,A.J.C.,F.D.P.,R.D.,M.T.P.G.,E.D.J.,K.-P.K.,H.A.L.,R.W.M.,E.W.M.,E.C.T.,B.V.,G.Z.,A.M.P.,S.Paez,J.M.Graves,O.A.R.,D.H.,S.J.O.,T.W.andB.S.Allauthorsreviewedthemanuscript.CorrespondingauthorsCorrespondenceto KerstinHowe,EugeneW.Myers,RichardDurbin,AdamM.PhillippyorErichD.Jarvis.Ethicsdeclarations Competinginterests Duringthecontributingperiod,B.T.H.,M.Simbirsky,A.F.andM.MooneywereemployeesofDNAnexusInc.S.B.K.,R.H.,Z.K.,J.Korlach,I.S.andC.D.werefull-timeemployeesatPacificBiosciences,acompanydevelopingsingle-moleculelongreadsequencingtechnologies.R.E.G.,N.H.P.,andJ.G.wereaffiliatedwithDovetailGenomics,acompanydevelopinggenomeassemblytools,includingHi-C.I.G.wasaffiliatedwithOxfordNanoporeTechnologies,acompanygeneratinglongreadsequencingtechnologies.A.H.andJ.LwereemployeesofBionanoGenomics,acompanydevelopingopticalmapsforgenomeassembly.S.SelvarajwasanemployeeofArimaGenomics,acompanydevelopingHi-Cdataforgenomeassemblies.R.D.isascientificadvisoryboardmemberofDovetailInc.P.FlicekisamemberoftheScientificAdvisoryBoardsofFabricGenomics,Inc.,andEagleGenomics,Ltd.H.C.receivesroyaltiesfromthesaleofUCSCGenomeBrowsersourcecode,LiftOver,GBiB,andGBiClicensestocommercialentities.S.K.hasreceivedtravelfundstospeakatsymposiaorganizedbyOxfordNanopore.M.D.andL.N.receiveroyaltiesfromlicensingofUCSCGenomeBrowser.ForW.E.J.,thecontenthereisnottobeconstruedastheviewsoftheDAorDOD.Allotherauthorsdeclarenocompetinginterests. AdditionalinformationPeerreviewinformationNaturethanksMichaelSchatz,JustinZookandtheother,anonymous,reviewer(s)fortheircontributiontothepeerreviewofthiswork.Peerreviewerreportsareavailable.Publisher’snoteSpringerNatureremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.ExtendeddatafiguresandtablesExtendedDataFig.1AssessmentofcompletenessoftheAnna’shummingbirdassembly.a,b,StepsandNG50continuityvaluesoftheVGPassemblypipelinethatgavethehighestqualityassemblyforAnna’shummingbird(a)andCanadalynx(b)inthisstudy.ThespecificstepsareoutlinedfurtherinExtendedDataFig.2a,andMethods.c,Whole-genomealignmentofCLR(red),linkedreads(green),opticalmaps(blue),andHi-Creads(purple)oftheAnna’shummingbird,alongwithtelomeremotif(TTAGGGanditsreversecomplement,yellow)andgaps(grey)usingAssetsoftware103.Foreachdatatype,thefirstrowshowsthemappedcoverage,andthesecondshowsthenumberofcountsoflowcoverageorsignsofcollapsedrepeats.Largerchromosomalscaffolds(1–19)havefewergapsandlowcoverageorcollapsedregionscomparedwiththemicrochromosomes(20–33).Chromosomes14,15and19oftheAnna’shummingbirdwerethemoststructurallyreliablescaffolds,havingonlyonegapeachwithnolow-supportregions.Wedefinedreliableblocksasthosesupportedbyatleasttwotechnologies.Reliableblocksexcludedregionswithstructuralassemblyerrors,suchascollapsedrepeatsorunresolvedsegmentalduplications.Low-supportregionsarethosewherethereliableblocksrowhasapeak.ExtendedDataFig.2VGPassemblypipelineappliedacrossmultiplespecies.a,Iterativeassemblypipelineofsequencedatatypes(colouredasinb)withincreasingchromosomaldistance.Thinbars,sequencereads;thickblackbars,assembledcontigs;blackbarswithspaceandarcinglinks,scaffolds;greybars,gapsplacedbyprevioussteps;thickredborder,trackingofanexamplecontiginthepipeline.Thecurationstepshowsanexampleofamis-assemblybreakidentifiedbysequencecoverage(grey,left)andanexampleofaninversionerror(right)detectedbytheopticalmap.b,Intra-moleculelengthdistributionofthefourdatatypesusedtogeneratetheassembliesof16vertebratespecies,weightedbythefractionofbasesineachlengthbin(logscaled).Moleculelengthabove1 kbwasmeasuredfromreadlengthforCLR,estimatedmoleculecoverageforlinkedreads,rawmoleculelengthforopticalmaps,andinteractiondistanceforHi-Creads.Foreachspecies,thefragmentlengthdistributionofeachdatatypewassimilartothosefortheAnna’shummingbird,withdifferencesprimarilyinfluencedbytissuetype,preservationmethod,andcollectionorstorageconditions(unpublisheddata).ExtendedDataFig.3Flowchartsofassemblypipelinesusedtogeneratehigh-qualityassembliesinthisstudy.a,StandardVGPassemblypipelinewhensequencingdataofoneindividual,thatgeneratedthehighestqualityassemblies:generateprimarypseudo-haplotypeandalternatehaplotypecontigswithCLRusingFALCON-Unzip17;generatescaffoldswithlinkedreadsusingScaff10x74;breakmis-joinsandfurtherscaffoldwithopticalmapsusingSolve87;generatechromosome-scalescaffoldswithHi-CreadsusingSalsa279;fillingapsandpolishbase-errorswithCLRusingArrow(PacificBioSciences);performtwoormoreroundsofshort-readpolishingwithlinkedreadsusingFreeBayes85;andperformexpertmanualcurationtocorrectpotentialassemblyerrorsusinggEVAL25,95b,StandardVGPtrioassemblypipelinewhenDNAisavailableforachildandparents20.Dashedlineindicatesthattheotherhaplotypewentthroughthesamestepsbeforecuration.Inadditiontothecuratedassembliesofbothhaplotypes,arepresentativehaplotypewithbothsexchromosomesissubmitted.c,Mitochondrialassemblypipeline.Figurekeyappliestoa–c.Stepsnewlyintroducedinv1.5–v1.6arehighlightedinlightblue.c,contigs;p,purgedfalseduplicationsfromprimarycontigs;q,purgedalternatecontigs;s,scaffolds;t,polishedscaffolds.Furtherdetailsandinstructionsareavailableelsewhere33andathttps://github.com/VGP/vgp-assembly.ExtendedDataFig.4Relationshipbetweencollapsesandgenomiccharacteristics.a,Correlationbetweenthetotalnumberofcollapsesandpercentagerepeatcontentestimatedinthesubmittedcuratedversionsofn=17genomesfrom16species.b,CorrelationbetweentotalnumberofbasesincollapsedregionsperGbandrepeatcontent.c,CorrelationbetweentotalmissingbasescollapsedperGbandrepeatcontent.d,Correlationbetweentotalnumberofgenes(codingandnon-coding)inthecollapsedregionsandrepeatcontent.e,Lackofcorrelationbetweentheaveragecollapsedsizeandrepeatcontent.f,Lackofcorrelationbetweenthetotalnumberofcollapsesandpercentageheterozygosity.g,LackofcorrelationbetweenthetotalnumberofcollapsesperGbandgenomesize.Genomesize,heterozygosity,andrepeatcontentwereestimatedfrom31-mercountsusingGenomeScope71.Reportedareadjustedr2andPvaluesfromF-statistics.h,CumulativecollapsedbasesperGbineachcollapseandpercentagerepeatmasked.Eachcircleiscolouredbyspecieswithitssizerelativetothelengthofthe collapseasitappearsintheassembly.Collapsesabovethehorizontalbar(>90%)arefurtherclassifiedascollapsedhigh-copyrepeats,andthosebelowthehorizontalbarareclassifiedassegmentalduplications(low-copyrepeats).i,Majorrepeattypesincollapsedhigh-copyrepeats.MostoftherepeatsweremaskedonlywithWindowMasker75,withnoannotationavailablebyRepeatMasker104.j,Minorrepeattypesincollapsedrepeats.Thisisabreakdownoftherepeatscategorizedas‘Others’ini,owingtothesmallerscale.Barcoloursiniandjareasinh.Notesmallerscaleinjcomparedwithi.Collapsedsatellitearrayswerealmostexclusivelyfoundintheplatypus,comprising~2.5Mb.Collapsedsimplerepeatswerethemajorsourceinthethornyskate(~400kb).TherewasahigherproportionofLTRsinbirds,LINEsandSINEsinmammals,andDNArepeatsintheamphibian.Amongthegenesinthecollapses,manywererepetitiveshortnon-codingRNAs.PvaluesfromF-statistics.ExtendedDataFig.5Falseduplicationmechanismsingenomeassembly.a,Falseheterotype(haplotype)duplicationsoccurswhenmoredivergentsequencereadsfromeachhaplotypeA(blue)andB(red)(maternalandpaternal)formgreaterdivergentpathsintheassemblygraph(bubbles),whilenearlyidenticalhomozygoussequences(black)becomecollapsed.Whentheassemblygraphisproperlyformedandcorrectlyresolved(greenarrow),oneofthehaplotype-specificpaths(redorblue)ischosenforbuildinga‘primary’pseudo-haplotypeassemblyandtheotherissetapartasan‘alternate’assembly.Whenthegraphisnotcorrectlyresolved(purplearrow),oneoffourtypesofpatternareformedinthecontigsandsubsequentscaffolds.Dependingonthesupportingevidence,thescaffoldereitherkeepsthesehaplotypecontigsonseparatescaffoldsorbringsthemtogetheronthesamescaffold,oftenseparatedbygaps:1.Separatecontigs:bothcontigsareretainedintheprimarycontigset,anerroroftenobservedwhenhaplotype-specificsequencesarehighlydiverged.2.Flankingcontigs:theassemblygraphispartiallyformed,connectingthehomozygoussequenceofthe5′sidetoonehaplotype(blue)andthe3′sidetotheotherhaplotype(red).3.Partialflankingcontigs:onlyonehaplotype(blue)flanksonesideofthehomozygoussequence.4.Failedconnectingofcontigs:allhaplotypesequencesfailtoproperlyconnecttoflankinghomozygoussequences.b,Falsehomotypeduplicationsoccurwhereasequencefromthesamegenomiclocusisduplicated,andareoftwotypes:1.Overlappingsequencesatcontigboundaries:incurrentoverlap-layout-consensusassemblers,branchingsequencesinassemblygraphsthatarenotselectedastheprimarypathhaveasmalloverlappingsequence(purple),dovetailingtotheprimarypathwhereitoriginatedabranch.Thesizeoftheduplicatedsequenceisoftenthelengthofacorrectedread.Subsequentscaffoldingresultsintandemduplicatedsequenceswithagapbetween.2.Under-collapsedsequences:sequencingerrorsinreads(redx)randomlyorsystematicallypileup,formingunder-collapsedsequences.Subsequentduplicationerrorsinthescaffoldingaresimilartotheheterotypeduplications.Purge_haplotigs13alignsequencestothemselvestofindasmallersequencethatalignsfullytoalargercontigorscaffold,andremovesheterotypeduplicationtypes1,3,and4.Purge_dups14additionallyusescoverageinformationtodetectheterotypeduplicationtype2andhomotypeduplications.Wedistinguishedthetwotypesofduplicationsby:1)haplotype-specificvariantsinreadsaligningathalfcoveragetoeachheterotypeduplication;2)differingconsensusqualitythatresultedfromreadcoveragefluctuationswhenaligningreadstohomotypeduplications;and3)k-mercopynumberanomaliesinwhichhomotypeduplicationswereobservedintheassemblywithmorethantheexpectednumberofcopies.ExtendedDataFig.6Falseduplicationexamplesfixedduringmanualcuration.a,Anexampleofaheterotypeduplicationinthefemalezebrafinch,non-trioassembly.Left,aself-dotplotofthisregiongeneratedwithGepard105,withsequencescolouredbyhaplotypes.Gaps,duplicatedsequences(greenandpurple),andhaplotype-specificmarkerdensitiesareindicatedatthetop.Right,adetailedalignmentviewofthegreenhaplotypeduplicationwithpaternalandmaternalmarkers,self-alignmentcomponents,transcriptsannotated,contigs,bionanomaps,andrepeatcomponentsdisplayedingEVAL95.b,Exampleofahomotypeduplicationfoundinthehummingbirdassembly.ThesewerecausedbyanalgorithmbuginFALCON,whichwaslaterfixed.c,Exampleofacombinedduplicationinvolvingbothheterotype(green)andhomotype(orange)duplications.Assemblygraphstructureisshownontheleftforclarity,highlightingtheoverlappingsitesatthecontigboundaryshadedfollowingtheduplicationtype.Assemblyerrorsincludingtheabovefalseduplicationsweredetectedandfixedduringthecurationprocess.ExtendedDataFig.7Evidenceofnear-completechromosomescaffoldsintheVGPassemblies.ShownareHi-Cinteractionheat mapsforeachspeciesaftercuration,visualizedwithPretextView106.Ascaffoldisconsideredaputativearm-to-armchromosomewhenallHi-Creadpairsinarowandcolumnmaptoasquare(thatis,anassembledchromosome)onthediagonalwithoutanyotherinteractionsoffthediagonal.Thosewithremainingoff-diagonalmatchestosmallerscaffoldsarenotlinkedbecauseofambiguousorderororientation,andareinsteadsubmittedas‘unlocalized’belongingtotherelevantchromosome.Bandsatthetopofeachheat mapshowscaffoldsidentifiedasX,Z(blue)orY,W(red)sexchromosomes.TheHi-CmapoffAstCal1isnotincludedaswehadnoremainingtissueleftoftheanimalusedtogenerateHi-Creads.ExtendedDataFig.8ComparisonofchromosomalorganizationbetweenpreviousandnewVGPassemblies.a,Zebrafinchmalecomparedtoapreviousreferenceassemblyofthesameanimal.b,Platypusmalecomparedwithapreviousreferencefemaleassembly(sotheYchromosomesareabsentinthepreviousreference).c,Hummingbirdfemalecomparedtoapreviousreferenceofthesameanimal.d,Climbingperchcomparedtoapreviousreference.EachrowrepresentsaVGP-generatedchromosomeforthetargetspecies.Coloursdepictidentitywiththereference(seekeytotheright);morethanonecolourindicatesreorganizationintheVGPassemblyrelativetothereference.Thelineswithineachblockdepictorientationrelativetothereference;apositiveslopeisthesameorientationasthereference,whereasanegativeslopeistheinverseorientation.Gapsarewhiteboxeswithnolines,inthereferencerelativetotheVGPassembly.AwhiteboxfortheentirechromosomemeansanewlyidentifiedchromosomeintheVGPassembly.Top20isthelongest20scaffoldsofthehummingbirdandclimbingperchassemblies.AccessionnumbersoftheassembliescomparedarelistedinSupplementaryTable19.ExtendedDataFig.9Haplotype-resolvedsexchromosomesandmitochondrialgenomes.a,Alignmentscatterplot,generatedwithMUMmerNUCmer107,visualizedwithdot108,ofmaternalandpaternalchromosomesfromthefemalezebrafinchtrio-basedassembly.Blue,sameorientation;red,inversion;orange,repeatsbetweenhaplotypes.ThepaternalZchromosomeishighlydivergentfromthematernalW,andthusmostlyunaligned.b,AlignmentscatterplotofassembledZandWchromosomesacrossthethreebirdspecies,approximatedwithMashMap286.Segmentsof300 kb(green),500 kb(blue),and1 Mb(purple)areshadeddarkerwithhighersequenceidentity,withaminimumof85%.ThesmallersizeandhigherrepeatcontentoftheWchromosomeareclearlyvisible.c,XandYchromosomesegmentsofthemammals(platypus,Canadalynx,palespear-nosedbat,andgreaterhorseshoebat)showingahigherdensityofrepeatswithinthemammalianXchromosomethantheavianZchromosome.d,VGPkākāpōmitochondrialgenomeassemblyrevealspreviouslymissingrepetitivesequences(adding2,232 bp)intheoriginofreplicationregion,containingan83-bprepeatunit.e,VGPclimbingperchmitochondrialgenomeassemblyshowingaduplicationoftrnL2andpartialduplicationofNad1,whichwereabsentfromthepriorreference.Orangearrowsandredlines,tRNAgenesandtheiralignments;darkgreyarrowsandgreyshading,allothergenesandtheiralignments;black,non-codingregions;greenline,conventionalstartingpointofthecircularsequence.ExtendedDataFig.10Largehaplotypeinversionswithdirectevidenceinthezebrafinchtrioassembly.a,Twoinversions(greenandred)inchromosome5foundfromtheMUMmerNUCmer107alignmentofthematernalandpaternalhaplotypeassemblies,visualizedwithdot108.b,Hi-Cinteractionplotshowingthatthetrio-binnedHi-Cdataremovemostoftheinteractionsfromtheotherhaplotype(redarrows),whichcouldbeerroneouslyclassifiedasamis-assemblyifonlyonehaplotypewasusedasareference.c,An8.5-Mbinversionfoundonchromosome11andacomplicated8.1-Mbrearrangementonchromosome13betweenmaternalandpaternalhaplotypes.d,Nomis-assemblysignalsweredetectedfromthebinnedHi-Cinteractionplots,indicatingthatthehaplotype-specificinversionsarereal.e,HalfthePacBioCLRspanandBionanoopticalmapsagreewiththeinversionbreakpointsinchromosome11,supportingthehaplotype-specificinversion.ExtendedDataFig.11Polishingartefacts.a,AnexampleofunevenmappingcoverageintheprimaryandalternatesequencepairoftheAnna’shummingbirdassembly.Inthisexample,thealternate(alt)sequencewasbuiltathigherquality,attractingalllinked-readsforpolishing.Thematchinglocusintheprimary(pri)assemblywasleftunpolished,resultinginframeshifterrorsintheTLK1gene.b,Haplotype-specificmarkers(redformaternal,blueforpaternal)anderrormarkersfoundintheassemblyontheZchromosome(inheritedfromthepaternalside)ofthetrio-binnedfemalezebrafinchassembly.Eachrowshowsmarkersbeforeshort-readpolishing,mappingallreadstobothhaplotypeassemblies,andpolishingbymappingpaternallybinnedreadstothepaternalassembly.PolishingimprovesQV,butintroduceshaplotypeswitcherrorswhenusingreadsfrombothhaplotypesasshowninrow2.Thiscanbeavoidedwhenusinghaplotypebinnedreadsforpolishing.c,Exampleofover-polishing.Thenuclearmitochondria(NuMT)sequencewastransformedasafullmitochondria(MT)sequenceduringlong-readpolishingowingtotheabsenceoftheMTcontig,wheretheNuMTattractedalllongreadsfromtheMT.Incomparison,thetrio-binnedassemblyhadtheMTsequenceassembledinplace,preventingmis-placingofMTreadsduringreadmapping.ExtendedDataFig.12Chromosomeevolutionamongthebatspeciessequenced.a,Genessurroundinganinversioninthegreaterhorseshoebat,relativetohumanchromosome15(redhighlight).TheSTARD5geneisdirectlydisruptedbythisinversion,whichseparatesexons1–5fromexon6inthegreaterhorseshoebat.b,RNA-seqtracksshowingthelackofRNAsplicingevidenceofSTARD5transcriptsinthegreaterhorseshoebat(bottom)incomparisontothepalespear-nosedbatwheretheSTARD5geneisnotdisrupted(top).c,Circosplotsofchromosomeorganizationrelationshipsbetweentheeachoftheanalysedbatsandsegmentsofthehumanchromosomes1,2,6and10.Redstar,breakpointlocationinhumanchromosome6,depictingthefissionoftheboreoeutherianchromosome5inthebatancestor;bluestar,theregionupstreamofthebreakpointinthebats;greenstar,theregiondownstreamofthebreakpointinthebats.Theredstarredbreakpointwasconfirmedasreused,asopposedtoassemblyerrors,inchromosomalrearrangementsofthepalespear-nosedbat,Egyptianfruitbat,andgreaterhorseshoebat.Thereisnoevidenceofreuseforthevelvetyfree-tailedbat.Wecouldnotconfirmbreakpointreuseinthegreatermouse-earedbatorKuhl’spipistrelleatthechromosomalscalebecausetheywereonsmallscaffoldsthatmaynotbecompletelyassembled.ExtendedDataTable1SummarymetricsofthecuratedandsubmittedvertebratespeciesassembliesFullsizetableExtendedDataTable2AnnotationsummarystatisticsinpreviousandnewlyassembledVGPreferencegenomesFullsizetableSupplementaryinformation SupplementaryInformationThisfilecontainsSupplementarytext,SupplementaryNotes1-7,SupplementaryFigures1-6andSupplementaryreferences.ReportingSummarySupplementaryTablesThisfilecontainsSupplementaryTables1-23.PeerReviewFileRightsandpermissions OpenAccessThisarticleislicensedunderaCreativeCommonsAttribution4.0InternationalLicense,whichpermitsuse,sharing,adaptation,distributionandreproductioninanymediumorformat,aslongasyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade.Theimagesorotherthirdpartymaterialinthisarticleareincludedinthearticle’sCreativeCommonslicense,unlessindicatedotherwiseinacreditlinetothematerial.Ifmaterialisnotincludedinthearticle’sCreativeCommonslicenseandyourintendeduseisnotpermittedbystatutoryregulationorexceedsthepermitteduse,youwillneedtoobtainpermissiondirectlyfromthecopyrightholder.Toviewacopyofthislicense,visithttp://creativecommons.org/licenses/by/4.0/. ReprintsandPermissionsAboutthisarticleCitethisarticleRhie,A.,McCarthy,S.A.,Fedrigo,O.etal.Towardscompleteanderror-freegenomeassembliesofallvertebratespecies. Nature592,737–746(2021).https://doi.org/10.1038/s41586-021-03451-0DownloadcitationReceived:22May2020Accepted:12March2021Published:28April2021IssueDate:29April2021DOI:https://doi.org/10.1038/s41586-021-03451-0SharethisarticleAnyoneyousharethefollowinglinkwithwillbeabletoreadthiscontent:GetshareablelinkSorry,ashareablelinkisnotcurrentlyavailableforthisarticle.Copytoclipboard ProvidedbytheSpringerNatureSharedItcontent-sharinginitiative Furtherreading GenomicinsightsintobodysizeevolutioninCarnivorasupportPeto’sparadox XinHuang DiSun GuangYang BMCGenomics(2021) Accuratelong-readdenovoassemblyevaluationwithInspector YuChen YixinZhang ZechenChong GenomeBiology(2021) Telomere-to-telomereassemblyofafishYchromosomerevealstheoriginofayoungsexchromosomepair LingzhanXue YuGao LuohaoXu GenomeBiology(2021) LeafGo:LeaftoGenome,aquickworkflowtoproducehigh-qualitydenovoplantgenomesusinglong-readsequencingtechnology PatrickDriguez SalimBougouffa LucaErmini GenomeBiology(2021) Assemblingvertebrategenomes KatharineH.Wrighton NatureReviewsGenetics(2021) CommentsBysubmittingacommentyouagreetoabidebyourTermsandCommunityGuidelines.Ifyoufindsomethingabusiveorthatdoesnotcomplywithourtermsorguidelinespleaseflagitasinappropriate. DownloadPDF AssociatedContent Special VertebrateGenomesProject Advertisement Explorecontent Researcharticles News Opinion ResearchAnalysis Careers Books&Culture Podcasts Videos Currentissue Browseissues Collections Subjects FollowusonFacebook FollowusonTwitter Signupforalerts RSSfeed Aboutthejournal JournalStaff AbouttheEditors JournalInformation Ourpublishingmodels EditorialValuesStatement Awards JournalImpact Contact Editorialpolicies HistoryofNature Sendanewstip Publishwithus ForAuthors ForReferees Submitmanuscript Search Searcharticlesbysubject,keywordorauthor Showresultsfrom Alljournals Thisjournal Search Advancedsearch Quicklinks Explorearticlesbysubject Findajob Guidetoauthors Editorialpolicies
延伸文章資訊
- 1Base pair - Wikipedia
A kilobase (kb) is a unit of measurement in molecular biology equal to 1000 base pairs of DNA or ...
- 2The Human Genome - The Cell - NCBI Bookshelf
To understand the magnitude of this undertaking (called the Human Genome ... of genes to interpha...
- 3Genome Size - an overview | ScienceDirect Topics
Genome size refers to the amount of DNA contained in a haploid genome expressed either in terms o...
- 4Genome Size Check - NCBI
The NCBI Genome Size Check API can be used to check the size of a genome ... with standard suffix...
- 5NF-kB Target Genes » NF-kB Transcription Factors - Boston ...
(*indicates that the gene has a kB site in the promoter but has not clearly been shown to be cont...