DNA Sequencing Costs: Data - National Human Genome ...
文章推薦指數: 80 %
DNA Sequencing Costs: Data ... relative to two metrics: (1) "Cost per Megabase of DNA Sequence" - the cost of determining one megabase (Mb; ... Skiptomaincontent DNASequencingCosts:Data Formanyyears,theNationalHumanGenomeResearchInstitute(NHGRI)hastrackedthecostsassociatedwithDNAsequencingperformedatthesequencingcentersfundedbytheInstitute.ThisinformationhasservedasanimportantbenchmarkforassessingimprovementsinDNAsequencingtechnologiesandforestablishingtheDNAsequencingcapacityoftheNHGRIGenomeSequencingProgram.Here,NHGRIprovidesananalysisofthesedata,whichgivesoneviewoftheremarkableimprovementsinDNAsequencingtechnologiesanddata-productionpipelinesinrecentyears. Overview Thecost-accountingdatapresentedherearesummarizedrelativetotwometrics:(1)"CostperMegabaseofDNASequence"-thecostofdeterminingonemegabase(Mb;amillionbases)ofDNAsequenceofaspecifiedquality[seebelow];(2)"CostperGenome"-thecostofsequencingahuman-sizedgenome.Foreach,agraphisprovidedshowingthedatasince2001;inaddition,theactualnumbersreflectedbythegraphsareprovidedinasummarytable. NHGRIwelcomespeopletodownloadthesegraphsandusethemintheirpresentationsandteachingmaterials.NHGRIplanstoupdatethesedataonaregularbasis.YoucanviewthedataininExcelbydownloadingthe SequencingCosts2021. ToillustratethenatureofthereductionsinDNAsequencingcosts,eachgraphalsoshowshypotheticaldatareflectingMoore'sLaw,whichdescribesalong-termtrendinthecomputerhardwareindustrythatinvolvesthedoublingof'computepower'everytwoyears(See: Moore'sLaw [wikipedia.org]).Technologyimprovementsthat'keepup'withMoore'sLawarewidelyregardedtobedoingexceedinglywell,makingitusefulforcomparison. Inbothgraphs,note:(1)theusealogarithmicscaleontheYaxis;and(2)thesuddenandprofoundout-pacingofMoore'sLawbeginninginJanuary2008.ThelatterrepresentsthetimewhenthesequencingcenterstransitionedfromSanger-based(dideoxychainterminationsequencing)to'secondgeneration'(or'next-generation')DNAsequencingtechnologies.Additionaldetailsaboutthesegraphsareprovidedbelow. Thesedata,however,donotcaptureallofthecostsassociatedwiththeNHGRILarge-ScaleGenomeSequencingProgram.Thesequencingcentersperformanumberofadditionalactivitieswhosecostsarenotappropriatetoincludewhencalculatingcostsforproduction-orientedDNAsequencing.Inotherwords,NHGRImakesadistinctionbetween'production'activitiesand'non-production'activities.ProductionactivitiesareessentialtotheroutinegenerationoflargeamountsofqualityDNAsequencedatathataremadeavailableinpublicdatabases;thecostsassociatedwithproductionDNAsequencingaresummarizedhereanddepictedonthetwographs.Additionalinformationabouttheotheractivitiesperformedbythesequencingcentersisprovidedbelow. KeyConsiderations CostCategories TheexpendituresincludedineachcategorywereestablishedbasedondiscussionsbetweenNHGRIstaffandsequencingcenterpersonnel. Forthetwographs("CostperMegabaseofDNASequence"and"CostperGenome"),thefollowing'production'costsareaccountedfor: Labor,administration,management,utilities,reagents,andconsumables Sequencinginstrumentsandotherlargeequipment(amortizedoverthreeyears) Informaticsactivitiesdirectlyrelatedtosequenceproduction(e.g.,laboratoryinformationmanagementsystemsandinitialdataprocessing) Submissionofdatatoapublicdatabase IndirectCosts astheyrelatetotheaboveitems Inthecaseofcostscoveredbysignificantsubsidiestoasequencingcenter(e.g.,agranteeinstitutionprovidingfundsforpurchasinglargeequipment),NHGRIhasattemptedtoappropriatelyaccountforsuchcostsintheseanalyses. Thecostsassociatedwiththefollowing'non-production'activitiesarenotreflectedinthetwographs: Qualityassessment/controlforsequencingprojects Technologydevelopmenttoimprovesequencingpipelines Developmentofbioinformatics/computationaltoolstoimprovesequencingpipelinesortoimprovedownstreamsequenceanalysis Managementofindividualsequencingprojects Informaticsequipment Dataanalysisdownstreamofinitialdataprocessing(e.g.,sequenceassembly,sequencealignments,identifyingvariants,andinterpretationofresults) DNASequencingTechnologies Inbothgraphs,thedatafrom2001throughOctober2007representthecostsofgeneratingDNAsequenceusingSanger-basedchemistriesandcapillary-basedinstruments('firstgeneration'sequencingplatforms).BeginninginJanuary2008,thedatarepresentthecostsofgeneratingDNAsequenceusing'second-generation'(or'next-generation')sequencingplatforms.ThechangeininstrumentsrepresentstherapidevolutionofDNAsequencingtechnologiesthathasoccurredinrecentyears. Quality FortheSanger-basedsequencedata,thecostaccountingreflectsthegenerationofbaseswithaminimumqualityscoreofPhred20(orQ20),whichrepresentsanerrorprobabilityof1%andisanacceptedcommunitystandardforahigh-qualitybase.Forsequencedatageneratedwithsecond-generationsequencingplatforms,thereisnotyetasingleacceptedmeasureofaccuracy;eachmanufacturerprovidesqualityscoresthatare,atthistime,acceptedbytheNHGRIsequencingcentersasequivalenttoorgreaterthanQ20. Inthe"CostperMegabaseofDNASequence"graph,thedatareflectthecostofgeneratingraw,unassembledsequencedata;noadjustmentwasmadefordatageneratedusingdifferentinstrumentsdespitesignificantdifferencesinthesequencereadlengths.Incontrast,the"CostperGenome"graphdoestakethesedifferencesintoaccountsincesequencereadlengthinfluencestheabilitytogenerateanassembledgenomesequence. GenomeCoverage The"CostperGenome"graphwasgeneratedusingthesameunderlyingdataasthatusedtogeneratethe"CostperMegabaseofDNASequence"graph;theformerthusreflectsanestimateofthecostofsequencingahuman-sizedgenomeratherthantheactualcostsforspecificgenome-sequencingprojects. Tocalculatethecostforsequencingagenome,oneneedstoknowthesizeofthatgenomeandtherequired'sequencecoverage'(i.e.,'sequenceredundancy')togenerateahigh-qualityassemblyofthegenomegiventhespecificsequencingplatformbeingused.Forgeneratingthe"CostperGenome"graph,theassumedgenomesizewas3,000Mb(i.e.,thesizeofahumangenome).Theassumedsequencecoverageneededdifferedamongsequencingplatforms,dependingontheaveragesequencereadlengthforthatplatform. Thefollowing'sequencecoverage'valueswereusedincalculatingthecostpergenome: Sanger-basedsequencing(averagereadlength=500-600bases):6-foldcoverage 454sequencing(averagereadlength=300-400bases):10-foldcoverage IlluminaandSOLiDsequencing(averagereadlength=75-150bases):30-foldcoverage FordatasinceJanuary2008(representingdatageneratedusing'second-generation'sequencingplatforms),the"CostperGenome"graphreflectsprojectsinvolvingthe're-sequencing'ofthehumangenome,whereanavailablereferencehumangenomesequenceisavailabletoserveasabackbonefordownstreamdataanalyses.Therequired'sequencecoverage'wouldbegreaterforsequencinggenomesforwhichnoreferencegenomesequenceisavailable. KeyConsiderations CostCategories TheexpendituresincludedineachcategorywereestablishedbasedondiscussionsbetweenNHGRIstaffandsequencingcenterpersonnel. Forthetwographs("CostperMegabaseofDNASequence"and"CostperGenome"),thefollowing'production'costsareaccountedfor: Labor,administration,management,utilities,reagents,andconsumables Sequencinginstrumentsandotherlargeequipment(amortizedoverthreeyears) Informaticsactivitiesdirectlyrelatedtosequenceproduction(e.g.,laboratoryinformationmanagementsystemsandinitialdataprocessing) Submissionofdatatoapublicdatabase IndirectCosts astheyrelatetotheaboveitems Inthecaseofcostscoveredbysignificantsubsidiestoasequencingcenter(e.g.,agranteeinstitutionprovidingfundsforpurchasinglargeequipment),NHGRIhasattemptedtoappropriatelyaccountforsuchcostsintheseanalyses. Thecostsassociatedwiththefollowing'non-production'activitiesarenotreflectedinthetwographs: Qualityassessment/controlforsequencingprojects Technologydevelopmenttoimprovesequencingpipelines Developmentofbioinformatics/computationaltoolstoimprovesequencingpipelinesortoimprovedownstreamsequenceanalysis Managementofindividualsequencingprojects Informaticsequipment Dataanalysisdownstreamofinitialdataprocessing(e.g.,sequenceassembly,sequencealignments,identifyingvariants,andinterpretationofresults) DNASequencingTechnologies Inbothgraphs,thedatafrom2001throughOctober2007representthecostsofgeneratingDNAsequenceusingSanger-basedchemistriesandcapillary-basedinstruments('firstgeneration'sequencingplatforms).BeginninginJanuary2008,thedatarepresentthecostsofgeneratingDNAsequenceusing'second-generation'(or'next-generation')sequencingplatforms.ThechangeininstrumentsrepresentstherapidevolutionofDNAsequencingtechnologiesthathasoccurredinrecentyears. Quality FortheSanger-basedsequencedata,thecostaccountingreflectsthegenerationofbaseswithaminimumqualityscoreofPhred20(orQ20),whichrepresentsanerrorprobabilityof1%andisanacceptedcommunitystandardforahigh-qualitybase.Forsequencedatageneratedwithsecond-generationsequencingplatforms,thereisnotyetasingleacceptedmeasureofaccuracy;eachmanufacturerprovidesqualityscoresthatare,atthistime,acceptedbytheNHGRIsequencingcentersasequivalenttoorgreaterthanQ20. Inthe"CostperMegabaseofDNASequence"graph,thedatareflectthecostofgeneratingraw,unassembledsequencedata;noadjustmentwasmadefordatageneratedusingdifferentinstrumentsdespitesignificantdifferencesinthesequencereadlengths.Incontrast,the"CostperGenome"graphdoestakethesedifferencesintoaccountsincesequencereadlengthinfluencestheabilitytogenerateanassembledgenomesequence. GenomeCoverage The"CostperGenome"graphwasgeneratedusingthesameunderlyingdataasthatusedtogeneratethe"CostperMegabaseofDNASequence"graph;theformerthusreflectsanestimateofthecostofsequencingahuman-sizedgenomeratherthantheactualcostsforspecificgenome-sequencingprojects. Tocalculatethecostforsequencingagenome,oneneedstoknowthesizeofthatgenomeandtherequired'sequencecoverage'(i.e.,'sequenceredundancy')togenerateahigh-qualityassemblyofthegenomegiventhespecificsequencingplatformbeingused.Forgeneratingthe"CostperGenome"graph,theassumedgenomesizewas3,000Mb(i.e.,thesizeofahumangenome).Theassumedsequencecoverageneededdifferedamongsequencingplatforms,dependingontheaveragesequencereadlengthforthatplatform. Thefollowing'sequencecoverage'valueswereusedincalculatingthecostpergenome: Sanger-basedsequencing(averagereadlength=500-600bases):6-foldcoverage 454sequencing(averagereadlength=300-400bases):10-foldcoverage IlluminaandSOLiDsequencing(averagereadlength=75-150bases):30-foldcoverage FordatasinceJanuary2008(representingdatageneratedusing'second-generation'sequencingplatforms),the"CostperGenome"graphreflectsprojectsinvolvingthe're-sequencing'ofthehumangenome,whereanavailablereferencehumangenomesequenceisavailabletoserveasabackbonefordownstreamdataanalyses.Therequired'sequencecoverage'wouldbegreaterforsequencinggenomesforwhichnoreferencegenomesequenceisavailable. References MardisE.Adecade'sperspectiveonDNAsequencingtechnology. Nature,470:198-203.2011.[PubMed] MetzkerM.Sequencingtechnologies-thenextgeneration. NatureGenetics,11:31-46.2010.[PubMed] SteinL.Thecaseforcloudcomputingingenomeinformatics. GenomeBiology,11:207-213.2010.[PubMed] Humangenomeatten:thesequenceexplosion. Nature,464:670-671.2010.[PubMed] NHGRIGenomeSequencingProgram HowtoCitethisWebPage: WetterstrandKA.DNASequencingCosts:DatafromtheNHGRIGenomeSequencingProgram(GSP)Availableat: www.genome.gov/sequencingcostsdata.Accessed[dateofaccess]. Contact KrisA.Wetterstrand,M.S. ScientificLiaisontotheDirectorforExtramuralActivities OfficeoftheDirector ... Lastupdated:November1,2021
延伸文章資訊
- 1Base pair - Wikipedia
Mb (= Mbp) = mega base pairs = 1,000,000 bp; Gb = giga base pairs = 1,000,000,000 bp. For single-...
- 2基因組大小- 維基百科,自由的百科全書
基因組大小(英語:Genome size)是指一個基因組中所擁有的DNA含量,一般以重量計算,單位通常是皮克(10-12克),寫成pg;有時也用道耳頓;或是以核苷酸鹼基對的數量 ...
- 3Lambda DNA Ladder (48.5 KB - 1 MB), 5 plugs, PFGE - Lonza ...
Lambda DNA Ladder (48.5 KB - 1 MB), 5 plugs for Pulsed Field Gel Electrophoresis.
- 4Genomic DNA from Desulfovibrio desulfuricans subsp ... - ATCC
Genomic DNA isolated from Desulfovibrio desulfuricans subsp. desulfuricans Strain MB. This bacter...
- 5Exploring genetic alterations in circulating tumor DNA from ...
Medulloblastoma (MB) is the most common type of brain malignancy in children. Molecular profiling...