Tier-1 Operations

Transcription

USCMSUSCMSfTier-1 OperationsDOE-NP Review of CMS Heavy Ion ComputingJune 2nd, 2010Vanderbilt University, Nashville TNL.A.T.Bauerdick/FermilabU.S. CMS Software and Computing Manager1

USCMSUSCMSCMS Computing ModelWorking Well In Battle Conditions! no change in the Roles of sites asdevised in Computing Model T1 center play prominent role as theyshare the custodial storage of raw &reconstructed data US T1 at Fermilab is the largest ofCMS, the only T1 in the AmericasNetwork capabilities of the wholecomputing system very importantUSACERN Primary reconstruction Partial Reprocessing First archive copy of theData recordingraw data (cold)1 Tier 0LATBauerdick/Fermilab share raw &UKItalyFrance Germany SpainTaiwanf reconstructeddata (FEVT) forcustodialstorageeach site hasfull AOD dataperform DataReprocessingAnalysis Tasks(Skimming)Data Serving toTier-2 centersfor analysisArchiveSimulationFrom Tier-27 Tier 1DOE Review of CMS HI Computing MonteCarloProductionPrimaryAnalysisFacilities25-50 Tier 2June 02, 20102

USCMSUSCMSRole of U.S. CMS Tier-1at Fermilab for HI Computingf U.S. to perform all T1 functions for the CMS HI Program custodialstorage and serving of data, data processing, skimming etc Separate T1 functions of storage (Fermilab) and processing (ACCRE) modificationof pp Tier-1 role for CMS HI-running: custodial storage of HI raw and reco data at Fermilab Tier-1 in large tape library data processing at Vanderbilt HI facility, streaming data over the WAN thisis technically quite feasible and cost effective: no need for a separate full-function HI Tier-1: networking and data transfer operation, DM and WM services, interactions withWLCG etc all done the same as for pp through Fermilab Tier-1 however, data processing done on farm at Vanderbilt, data served from Fermilab Fermilab Tier-1 functions: receivesdata from CERN T0 and stores on tape serves data to ACCRE for data processing receives processed data from ACCRE for tape storage given estimated HI data sample sizes, to be done at incremental costLATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 20103

USCMSUSCMSCMS Tier-1 Computing Resourcesfor 2010/11 runningall CMSTier-1s200920104610015161Disk (TB)65001500519,5006,440Tape (TB)11900259245240021,000CPU (kHS06)f2011 FNAL ppin 2011 Significant data storage and processing facility for Fermilab Tier-1 U.S.will pledge 40% of total CMS pp Tier-1 needs US contributions to CMS pp program: inFY10 US T1 can provide CPU up to 94 kHS06, Disk up to 5PB Fermilab also a large analysis facility for US physics users 2850job slots CPU, 1.5PB disk 1PB disk upgrade in FY11 For FY11 Fermilab plans for a tape capacity of 21,000 TB for pp dataLATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 20104

USCMSUSCMSFermilab FacilityfLATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 20105

USCMSUSCMSLarge Tape Library: 30,000 SlotsLATBauerdick/FermilabDOE Review of CMS HI ComputingfJune 02, 20106

USCMSUSCMSFermilab ComputingFacility for U.S. CMSf Fermilab Facility has reached goals of a four-year procurement ramp TheTier-1 provides resources for CMS data processing and data serving The LPC-CAF provides for U.S. data analysis capabilities Tape library of 3 SL8500 with 30,000 slots, and 25 LTO4 tape drives LTO4drives with 800GB capacity per cartridge (90% fill factor) first LTO5 drives expected within the year, with 1600 GB/cartridge transition of library to LTO5 probably in 2011, maybe 2012Tier-1 LPC7k 2.8k coresProcessing NodesFermilab Disk Storage T1Facility: Disk Storage LPCTier-1User Diskand LPCTape Library3.5 PBdCache, high capacity high throughput (1.6GB/s)1.5 PBData Sample for Local Analysis0.285 PBUser-generated data, nTuple etc (135% allocated)30k tape slots3 SL8500, 25 LTO4 drives, 5.5 PB storedLTO5 is late, filling up avail librariesNetwork15Gb/sCERN to FNALLATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 20107

USCMSUSCMSOperations Performance:Tier-1 Data Processingf To maximize availability CMS stores data on both sides of Atlantic FermilabT1 for now takes all pp raw data, T2s cover analysis data sets Data get to the U.S. rapidly Replicateanalyzable data out of CERN within 12h data are on tape at T1 skimmed datasets go toLatency to get Data to the UST2 centers/LPC-CAFfor analysis Data Re-processing at T1 DataOpsteam on top of this working reliably,fast turn-arounds supporting the manyCMSSW releases — many re-processingsdone “routinely”LATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 20108

USCMSUSCMSOperations Performance:Data TransfersT0 — T1fT1 — T2 Overalldata rate is still not high on average, but peaks hit nominal rates Operational availability is excellent Rate to transfer the O(500TB) for HI data should work out wellLATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 20109

USCMSUSCMSData Transfer Performancef in 2010Q1 there was 750TBdelivered from Fermilab, and 370TBwere absorbed this did NOT include the “debug”transfers: 2,625 TB mostly fromCERN, to Fermilab, and 1,080 TBfrom Fermilab sustained transfer rate of 600Mbpsfrom CERN tested (nominal240Mbps) this corresponds to 50TB/day tested(20TB/day nominal) weʼre not particularly worried aboutthe HI data transfers required forFermilab to function as a HI T1LATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 2010 10

USCMSUSCMSfTransatlantic Connectivity Demands on transatlantic networking extensively tested at scale Duringcommissioning computing and software systems at scale Since then, data taking of cosmics, collision data at .9, 2.36 and 7 TeV ramp-up of data sizes still slow — robustness mostly tested in challenges So far “nominal” demands only in peaks — and no problems seen in peaks End-to-end data transfers work well, network performance adequate 20Gbps 1700 TB/dayTrans-Atlantic Network Throughput:STEP’09LATBauerdick/FermilabDOE Review of CMS HI ComputingOctXLHC DataTestsJune 02, 2010 11

USCMSUSCMSPerformance & Availability ofUS Sites is Excellentf US sites typically leading the CMS reliability/availability charts detailedsite availability metric, tests if CMS can use a site FNAL consistent leader in T1s metrics since start of WLCG recording. excellent performance, e.g. compared to non-US sites also,accounting information collected by OSG and reported to all stake holdersTier-1 Readiness during Q1 2010FermilabFermilab Tier-197% Availability Goalachieved in Q1 2010despite problemsLATBauerdick/FermilabDOE Review of CMS HI Computing97% CMS GoalJune 02, 2010 12

USCMSUSCMS FermilabStorage System Performancefhas a high performance hierarchical storage system large capacity of 30,000 cartridge slots, 25 tape drives with data servers, dCachedisk caching system with 10GB/sec system throughput, 5PB disk caches etc continuallymonitor tape library performance, like fill factors of tapes etc operations like data squeezing, migration from LTOn-1 to LTOn etc in 2010 expect to digest 4,700TB of data, and about 10,000TB in 2011LATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 2010 13

USCMSUSCMSfCost of HI Tape Archive We adopt a cost model that charges “incremental cost” per tape slot weestimate about 110/tape slot includes media cost, the incremental tape library cost, maintenance etc but does not include personnel cost for storage management etc alsoHI should provide tape drive which would always be available for HIdata transfers/streaming (once the LTO5 drives become available) assume that we can use LTO5 technology for HI data already in 2011Categoryyear 1year 2year 3year 4year 5TotalTape Volume(PB)0.61.00.51.41.44.9Cost 94K 103K 40K 116K 120K 473KLATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 2010 14

USCMSUSCMSfConclusions Fermilab is the U.S. CMS Tier-1 center and is well established as areliable and high performance data center for tens of PetaByte of data Fermilab will provide custodial storage for all HI data RAWand RECO (but not MC) to be streamed to ACCRE for data processing through WAN U.S. CMS will provide sufficient storage capacity and throughput atincremental cost thetotal cost over 5 years are expected to be around 473k there is a possible issue with tape library capacity in 2012 — risk We expect the data sizes to be manageable assumingconservative sample size estimates An MoU between Fermilab, U.S. CMS and the HI program shouldoutline the details of the agreementLATBauerdick/FermilabDOE Review of CMS HI ComputingJune 02, 2010 15

USCMSUSCMSLATBauerdick/FermilabBackup SlidesDOE Review of CMS HI ComputingfJune 02, 2010 16

USCMSUSCMSfTapesystem!31%Users !2. Archiving/reading FEVT ,RECO, re-RECO and ! 490 MB/sec!MCPROD and AOD!370 MB/sec!10 Gb/s!5. Daily export for dataanalysis: 51 TB!1. Data taking!TO!2.2 Gb/s!FNAL!20400 cores!0.43 Gb/s!3. AOD Sync!Exports!488 TB!3.3 Gb/s!3. AOD Sync!Imports!173 TB!1.2 Gb/s!4. MC PROD: 1626 TB!9 /draft!DOECRB/8Reviewof CMSHI Computing18!June02, 2010 17

LATBauerdick/Fermilab DOE Review of CMS HI Computing June 02, 2010 USCMS Fermilab Computing f Facility for U.S. CMS Fermilab Facility has reached goals of a four-year procurement ramp The Tier-1 provides resources for CMS data processing and data serving The LPC-CAF provides for U.S. data analysis capabilities Tape library of 3 SL8500 with 30,000 slots, and 25 LTO4 tape drives