A Guide to Sabermetric Research: How to Find Raw Data

文章推薦指數: 80 %
投票人數:10人

Some things weren't too bad — if you wanted to know Bill Terry's batting ... MLB's website provides copious statistical data, sortable and printable, ... HowtoFindRawData Backinthebeginningdaysofsabermetrics,datawashardtocomeby.Somethingsweren’ttoobad—ifyouwantedtoknowBillTerry’sbattingaveragein1933,thereweretwoencyclopedias,MacmillanandNeft/Cohen,thatwouldtellyou.Butifyouwantedmoreesotericstatistics,likeJoeMorgan’scareerperformancewiththebasesloaded,youwereoutofluck. WhenBillJamesstartedwritinghisself-publishedBaseballAbstractsbackinthelate1970s,hehadtocompilesituationalstatisticshimself,fromthedailyboxscores,withoutacomputer.Atthetime,Billmarketedhisbookas“featuring18categoriesofstatisticalinformationthatyoujustcan’tgetanywhereelse.” Jamesfoundthathehadtokeepcompilingthosestatsevenintothe1980s;famously,inhis1981book,hereprintedaletterfromtheChicagoCubsrefusingtoprovidehimwithsuch“intelligence-type”stats. Now,ofcourse,thingsaredifferent.Thereisnoshortageofalmostanykindofdata.Myfourfavorites—inroughorderofincreasingdetail—are: MLB.com Baseball-Reference.com TheLahmanDatabase Retrosheet.org MLB’swebsiteprovidescopiousstatisticaldata,sortableandprintable,updatedinstantlyasgamesprogress.Butthatstuffcanbefoundelsewhere.ThemainattractionoftheMLBwebsiteisthatitprovidesPITCHf/xdata.Thatis,foreverypitchthrownbyanypitchersinMLB,they’lltellyouthetypeofpitch,whereitcrossedtheplate,andhowmuchitbrokeverticallyandhorizontally.Asaresult,andnotsurprisingly,muchofthegroundbreakingresearchthesedayshastodowithpitchanalysis. EasilythebestsourceforprecalculatedhistoricalstatisticsisBaseball-Reference.com(B-R).Thatsitehasprettymuchrenderedprintedbaseballencyclopediasobsolete.NotonlydoyougettheregularBill-Terry’s-batting-averagedata,butyoualsogetalargeselectionofsabermetricstats,breakdownsbytensofdifferentcriteria(left/right,day/night,April/September,andsoon),andtheabilitytomanipulatethedatainwaysthatotherwebsitesdon’tallow.Youcanalsodoabsurdlyspecificsearches.WanttoknowJoeMorgan’slongestconsecutivestreakofgameswherehecametotheplateatleasttwice?Theanswer:235games.(Ifyouwantthedetails,youhavetosubscribe,buttheoverwhelmingmajorityoftheinformationonthesitecanbehadforfree.) Forthoseofuswhowanttodomorecomplicatedthings,BaseballReference,awesomeasitis,justisn’tenough.Weneedtherawdataonourowncomputers,sowecanmanipulateitinwaysthatB-Rneverthoughtof.Therearetwomainsourcesofrawdata:theLahmanDatabaseandRetrosheet. TheLahmanDatabasecanbeobtainedforfreeatseanlahman.com/baseball-archive/statistics,thewebsiteofitscreator,SeanLahman.It’sbasicallyastandardBaseballEncyclopediaindownloadableform.Youcangetitintextform,forloadingintoExcel,but,moreimportantly,italsocomesinrelationaldatabaseformat(MicrosoftAccess).Ifyou’refamiliarwithAccessandwithSQLdatabasequeries,youknowhowconvenientitistouseittodopowerful,specificdatasearchesquickly.(Ifyou’renotfamiliarwithSQL,therehavebeenafewtutorialsonsabermetricsitesrecently.) Anyway,theLahmanDatabasehaseveryplayer’sstandardbattingandpitchinglineforeveryyear.It’sgotmanagers,birthdates,awards,all-stargames,andothergoodstuff.Itslimitationisthatdataisavailableonlyforsingleseasons—ifyouwanttoknowhowEddieMurrayhitinJuly1979,there’snowaytheLahmanDatabasewilltellyou.Forthat,youhavetoturntoRetrosheet. Retrosheetis,basically,amiracle.It’stheresultofasmallarmyofvolunteers,combinghistoricalsourcestotrytore-createtheplay-by-playofeverygameinbaseballhistoryanddigitizingitfordownloadandanalysis.Ican’tbegintoimaginehowdifficultitistofindallthatinformation,toreconstructthetopofthe6thinningoftheCardinals/PhilliesgameofApril29,1953.Buttheydid.(D.Ricegroundedout(shortstoptofirst);Preskopoppedtofirstinfoulterritory;Hemuspoppedtofirstinfoulterritory.) Youcanalsoseetheentirecareerofanyplayer,gamebygame.Youcanseethestandingsandresultsfromanydateinbaseballhistory.Youcanseeacoach’scareer,whichteamshecoachedforandwhathecoached,andevenhowmanytimeshewasejected. Youcanseethisstuffonline,or,ifyouhavecomputerdata-manipulationskills,youcandownloaditandworkwithityourself.YoucanloadthedataintoExcelandwritemacrostomanipulateit.Or,youcanwriteprogramstoanalyzeit;IuseVisualBasic,butanylanguagewilldo.There’sa2006bookcalledBaseballHacks(O’Reilly),whichexplainshowtouseacomputerlanguagecalled“R”todownloadandanalyzeRetrosheetdata(and,actually,lotsofotherbaseballdatathatcanbefoundontheinternet). NotallofbaseballhistoryisavailableonRetrosheet—yet.Thevolunteersarestillworkingonit,though.(Wanttohelp?Clickherefordetails.)Fornow,youcanseegame-by-gamesummariesfrom1871on.Youcanseeboxscoresformorethan90percentofgamessince1916.And,ifyouwantfullplay-by-playdata,it’savailableforanygameafter1952,andalargenumberofgamesbeforethat.Someyearsevenincludepitch-by-pitchdata,intermsofball,strike,foul. Theresultofliterallytensofthousandsofhoursofvolunteerlabor,Retrosheetisthegreatestsabermetricresourceever. AGuidetoSabermetricResearch SupportSABRtoday! Donate Join CronkiteSchoolatASU 555N.CentralAve.#416 Phoenix,AZ85004 Phone:602.496.1460 ContactSABR About History MeettheStaff BoardofDirectors AnnualReports DiversityStatement DONATE ©SABR.AllRightsReserved Scrolltotop



請為這篇文章評分?