DNA Sequencing Costs: Data - National Human Genome ...

文章推薦指數: 80 %
投票人數:10人

DNA Sequencing Costs: Data ... relative to two metrics: (1) "Cost per Megabase of DNA Sequence" - the cost of determining one megabase (Mb; ... Skiptomaincontent DNASequencingCosts:Data Formanyyears,theNationalHumanGenomeResearchInstitute(NHGRI)hastrackedthecostsassociatedwithDNAsequencingperformedatthesequencingcentersfundedbytheInstitute.ThisinformationhasservedasanimportantbenchmarkforassessingimprovementsinDNAsequencingtechnologiesandforestablishingtheDNAsequencingcapacityoftheNHGRIGenomeSequencingProgram.Here,NHGRIprovidesananalysisofthesedata,whichgivesoneviewoftheremarkableimprovementsinDNAsequencingtechnologiesanddata-productionpipelinesinrecentyears. Overview Thecost-accountingdatapresentedherearesummarizedrelativetotwometrics:(1)"CostperMegabaseofDNASequence"-thecostofdeterminingonemegabase(Mb;amillionbases)ofDNAsequenceofaspecifiedquality[seebelow];(2)"CostperGenome"-thecostofsequencingahuman-sizedgenome.Foreach,agraphisprovidedshowingthedatasince2001;inaddition,theactualnumbersreflectedbythegraphsareprovidedinasummarytable. NHGRIwelcomespeopletodownloadthesegraphsandusethemintheirpresentationsandteachingmaterials.NHGRIplanstoupdatethesedataonaregularbasis.YoucanviewthedataininExcelbydownloadingthe SequencingCosts2021.     ToillustratethenatureofthereductionsinDNAsequencingcosts,eachgraphalsoshowshypotheticaldatareflectingMoore'sLaw,whichdescribesalong-termtrendinthecomputerhardwareindustrythatinvolvesthedoublingof'computepower'everytwoyears(See: Moore'sLaw [wikipedia.org]).Technologyimprovementsthat'keepup'withMoore'sLawarewidelyregardedtobedoingexceedinglywell,makingitusefulforcomparison. Inbothgraphs,note:(1)theusealogarithmicscaleontheYaxis;and(2)thesuddenandprofoundout-pacingofMoore'sLawbeginninginJanuary2008.ThelatterrepresentsthetimewhenthesequencingcenterstransitionedfromSanger-based(dideoxychainterminationsequencing)to'secondgeneration'(or'next-generation')DNAsequencingtechnologies.Additionaldetailsaboutthesegraphsareprovidedbelow. Thesedata,however,donotcaptureallofthecostsassociatedwiththeNHGRILarge-ScaleGenomeSequencingProgram.Thesequencingcentersperformanumberofadditionalactivitieswhosecostsarenotappropriatetoincludewhencalculatingcostsforproduction-orientedDNAsequencing.Inotherwords,NHGRImakesadistinctionbetween'production'activitiesand'non-production'activities.ProductionactivitiesareessentialtotheroutinegenerationoflargeamountsofqualityDNAsequencedatathataremadeavailableinpublicdatabases;thecostsassociatedwithproductionDNAsequencingaresummarizedhereanddepictedonthetwographs.Additionalinformationabouttheotheractivitiesperformedbythesequencingcentersisprovidedbelow. KeyConsiderations CostCategories TheexpendituresincludedineachcategorywereestablishedbasedondiscussionsbetweenNHGRIstaffandsequencingcenterpersonnel. Forthetwographs("CostperMegabaseofDNASequence"and"CostperGenome"),thefollowing'production'costsareaccountedfor: Labor,administration,management,utilities,reagents,andconsumables Sequencinginstrumentsandotherlargeequipment(amortizedoverthreeyears) Informaticsactivitiesdirectlyrelatedtosequenceproduction(e.g.,laboratoryinformationmanagementsystemsandinitialdataprocessing) Submissionofdatatoapublicdatabase IndirectCosts astheyrelatetotheaboveitems   Inthecaseofcostscoveredbysignificantsubsidiestoasequencingcenter(e.g.,agranteeinstitutionprovidingfundsforpurchasinglargeequipment),NHGRIhasattemptedtoappropriatelyaccountforsuchcostsintheseanalyses. Thecostsassociatedwiththefollowing'non-production'activitiesarenotreflectedinthetwographs: Qualityassessment/controlforsequencingprojects Technologydevelopmenttoimprovesequencingpipelines Developmentofbioinformatics/computationaltoolstoimprovesequencingpipelinesortoimprovedownstreamsequenceanalysis Managementofindividualsequencingprojects Informaticsequipment Dataanalysisdownstreamofinitialdataprocessing(e.g.,sequenceassembly,sequencealignments,identifyingvariants,andinterpretationofresults)   DNASequencingTechnologies Inbothgraphs,thedatafrom2001throughOctober2007representthecostsofgeneratingDNAsequenceusingSanger-basedchemistriesandcapillary-basedinstruments('firstgeneration'sequencingplatforms).BeginninginJanuary2008,thedatarepresentthecostsofgeneratingDNAsequenceusing'second-generation'(or'next-generation')sequencingplatforms.ThechangeininstrumentsrepresentstherapidevolutionofDNAsequencingtechnologiesthathasoccurredinrecentyears. Quality FortheSanger-basedsequencedata,thecostaccountingreflectsthegenerationofbaseswithaminimumqualityscoreofPhred20(orQ20),whichrepresentsanerrorprobabilityof1%andisanacceptedcommunitystandardforahigh-qualitybase.Forsequencedatageneratedwithsecond-generationsequencingplatforms,thereisnotyetasingleacceptedmeasureofaccuracy;eachmanufacturerprovidesqualityscoresthatare,atthistime,acceptedbytheNHGRIsequencingcentersasequivalenttoorgreaterthanQ20. Inthe"CostperMegabaseofDNASequence"graph,thedatareflectthecostofgeneratingraw,unassembledsequencedata;noadjustmentwasmadefordatageneratedusingdifferentinstrumentsdespitesignificantdifferencesinthesequencereadlengths.Incontrast,the"CostperGenome"graphdoestakethesedifferencesintoaccountsincesequencereadlengthinfluencestheabilitytogenerateanassembledgenomesequence. GenomeCoverage The"CostperGenome"graphwasgeneratedusingthesameunderlyingdataasthatusedtogeneratethe"CostperMegabaseofDNASequence"graph;theformerthusreflectsanestimateofthecostofsequencingahuman-sizedgenomeratherthantheactualcostsforspecificgenome-sequencingprojects. Tocalculatethecostforsequencingagenome,oneneedstoknowthesizeofthatgenomeandtherequired'sequencecoverage'(i.e.,'sequenceredundancy')togenerateahigh-qualityassemblyofthegenomegiventhespecificsequencingplatformbeingused.Forgeneratingthe"CostperGenome"graph,theassumedgenomesizewas3,000Mb(i.e.,thesizeofahumangenome).Theassumedsequencecoverageneededdifferedamongsequencingplatforms,dependingontheaveragesequencereadlengthforthatplatform. Thefollowing'sequencecoverage'valueswereusedincalculatingthecostpergenome: Sanger-basedsequencing(averagereadlength=500-600bases):6-foldcoverage 454sequencing(averagereadlength=300-400bases):10-foldcoverage IlluminaandSOLiDsequencing(averagereadlength=75-150bases):30-foldcoverage   FordatasinceJanuary2008(representingdatageneratedusing'second-generation'sequencingplatforms),the"CostperGenome"graphreflectsprojectsinvolvingthe're-sequencing'ofthehumangenome,whereanavailablereferencehumangenomesequenceisavailabletoserveasabackbonefordownstreamdataanalyses.Therequired'sequencecoverage'wouldbegreaterforsequencinggenomesforwhichnoreferencegenomesequenceisavailable. KeyConsiderations CostCategories TheexpendituresincludedineachcategorywereestablishedbasedondiscussionsbetweenNHGRIstaffandsequencingcenterpersonnel. Forthetwographs("CostperMegabaseofDNASequence"and"CostperGenome"),thefollowing'production'costsareaccountedfor: Labor,administration,management,utilities,reagents,andconsumables Sequencinginstrumentsandotherlargeequipment(amortizedoverthreeyears) Informaticsactivitiesdirectlyrelatedtosequenceproduction(e.g.,laboratoryinformationmanagementsystemsandinitialdataprocessing) Submissionofdatatoapublicdatabase IndirectCosts astheyrelatetotheaboveitems   Inthecaseofcostscoveredbysignificantsubsidiestoasequencingcenter(e.g.,agranteeinstitutionprovidingfundsforpurchasinglargeequipment),NHGRIhasattemptedtoappropriatelyaccountforsuchcostsintheseanalyses. Thecostsassociatedwiththefollowing'non-production'activitiesarenotreflectedinthetwographs: Qualityassessment/controlforsequencingprojects Technologydevelopmenttoimprovesequencingpipelines Developmentofbioinformatics/computationaltoolstoimprovesequencingpipelinesortoimprovedownstreamsequenceanalysis Managementofindividualsequencingprojects Informaticsequipment Dataanalysisdownstreamofinitialdataprocessing(e.g.,sequenceassembly,sequencealignments,identifyingvariants,andinterpretationofresults)   DNASequencingTechnologies Inbothgraphs,thedatafrom2001throughOctober2007representthecostsofgeneratingDNAsequenceusingSanger-basedchemistriesandcapillary-basedinstruments('firstgeneration'sequencingplatforms).BeginninginJanuary2008,thedatarepresentthecostsofgeneratingDNAsequenceusing'second-generation'(or'next-generation')sequencingplatforms.ThechangeininstrumentsrepresentstherapidevolutionofDNAsequencingtechnologiesthathasoccurredinrecentyears. Quality FortheSanger-basedsequencedata,thecostaccountingreflectsthegenerationofbaseswithaminimumqualityscoreofPhred20(orQ20),whichrepresentsanerrorprobabilityof1%andisanacceptedcommunitystandardforahigh-qualitybase.Forsequencedatageneratedwithsecond-generationsequencingplatforms,thereisnotyetasingleacceptedmeasureofaccuracy;eachmanufacturerprovidesqualityscoresthatare,atthistime,acceptedbytheNHGRIsequencingcentersasequivalenttoorgreaterthanQ20. Inthe"CostperMegabaseofDNASequence"graph,thedatareflectthecostofgeneratingraw,unassembledsequencedata;noadjustmentwasmadefordatageneratedusingdifferentinstrumentsdespitesignificantdifferencesinthesequencereadlengths.Incontrast,the"CostperGenome"graphdoestakethesedifferencesintoaccountsincesequencereadlengthinfluencestheabilitytogenerateanassembledgenomesequence. GenomeCoverage The"CostperGenome"graphwasgeneratedusingthesameunderlyingdataasthatusedtogeneratethe"CostperMegabaseofDNASequence"graph;theformerthusreflectsanestimateofthecostofsequencingahuman-sizedgenomeratherthantheactualcostsforspecificgenome-sequencingprojects. Tocalculatethecostforsequencingagenome,oneneedstoknowthesizeofthatgenomeandtherequired'sequencecoverage'(i.e.,'sequenceredundancy')togenerateahigh-qualityassemblyofthegenomegiventhespecificsequencingplatformbeingused.Forgeneratingthe"CostperGenome"graph,theassumedgenomesizewas3,000Mb(i.e.,thesizeofahumangenome).Theassumedsequencecoverageneededdifferedamongsequencingplatforms,dependingontheaveragesequencereadlengthforthatplatform. Thefollowing'sequencecoverage'valueswereusedincalculatingthecostpergenome: Sanger-basedsequencing(averagereadlength=500-600bases):6-foldcoverage 454sequencing(averagereadlength=300-400bases):10-foldcoverage IlluminaandSOLiDsequencing(averagereadlength=75-150bases):30-foldcoverage   FordatasinceJanuary2008(representingdatageneratedusing'second-generation'sequencingplatforms),the"CostperGenome"graphreflectsprojectsinvolvingthe're-sequencing'ofthehumangenome,whereanavailablereferencehumangenomesequenceisavailabletoserveasabackbonefordownstreamdataanalyses.Therequired'sequencecoverage'wouldbegreaterforsequencinggenomesforwhichnoreferencegenomesequenceisavailable. References MardisE.Adecade'sperspectiveonDNAsequencingtechnology. Nature,470:198-203.2011.[PubMed] MetzkerM.Sequencingtechnologies-thenextgeneration. NatureGenetics,11:31-46.2010.[PubMed] SteinL.Thecaseforcloudcomputingingenomeinformatics. GenomeBiology,11:207-213.2010.[PubMed] Humangenomeatten:thesequenceexplosion. Nature,464:670-671.2010.[PubMed] NHGRIGenomeSequencingProgram HowtoCitethisWebPage: WetterstrandKA.DNASequencingCosts:DatafromtheNHGRIGenomeSequencingProgram(GSP)Availableat: www.genome.gov/sequencingcostsdata.Accessed[dateofaccess]. Contact KrisA.Wetterstrand,M.S. ScientificLiaisontotheDirectorforExtramuralActivities OfficeoftheDirector ... Lastupdated:November1,2021



請為這篇文章評分?