AI For Particle Physics: Better, Faster, Smarter

Transcription

AI for Particle Physics:Better, Smarter, FasterKevin PedroAssociate ScientistScientific Computing Division / Particle Physics DivisionFermilabMay 6, 2020

OutlineParticle Physics & AI: Particle detectors Artificial intelligenceBetter: Recent improvements in AI Physics use case: tagging Open questionsSmarter: Cutting-edge R&D Graph-based algorithms Preliminary resultsFaster: High Luminosity Upgrade Accelerating inference Coprocessors as a serviceFNAL ColloquiumBroadDeepKevin Pedro2

Particle Physics& AI

Collider PhysicsLarge Hadron Collider Largest, highest-energy particle collidero Circumference 27 km (17 mi)o Center-of-mass energy 13 TeV High data rate requires multiple levels of triggersFNAL ColloquiumKevin Pedro4

Collider PhysicsLarge Hadron Collider Largest, highest-energy particle collidero Circumference 27 km (17 mi)o Center-of-mass energy 13 TeVCMS Focus on AI results from CMS experimento Many items also applicable to ATLAS,neutrino physics, cosmology, etc.FNAL ColloquiumKevin Pedro5

CMS Detector “Hit”: energy deposit in single channelTracks: built from consistent hits in tracker, muon chambersClusters: built from nearby hits in calorimetersParticles: built from linked tracks and clustersJets: collimated sprays of particlesFNAL ColloquiumKevin Pedro6

HL-LHC project scheduleUpgrades Increase in luminosity more data!o Also more radiation Corresponding CMS detector upgrades, including:o Pixel (innermost tracker):66M 1947M channelso Outer tracker:9.6M 215M channelso High Granularity Calorimeter (endcaps):85K 6M channelsFNAL ColloquiumKevin PedroCE-ECE-H7

Neutrino PhysicsLong Baseline Neutrino Facility Neutrinos interact very rarely and weakly Detectors need large volume of material and long exposureo Try to reduce backgrounds (unwanted hits) from cosmic rays, etc. Neutrinos have mass and therefore oscillate between different flavorso Near and far detectors compare proportions Upcoming LBNF: most intense neutrino beam, 120 GeVFNAL ColloquiumKevin Pedro8

What is Artificial Intelligence?“AI is whatever hasn’t been done yet.”– Douglas Hofstadter In this colloquium: machine learning (ML) ML is function approximation: map inputs to outputs, x yo y F( x) unknown, probably not analyticy F′( x; w) by optimizing weights w try to find approximation Deep learning:o Use thousands, even millions of weightso Use many layers with intermediate features derived from inputs More “neurons” more multiplicationsThe Neural Network ZooFNAL ColloquiumKevin Pedro9

Training an AI Iteratively modify weights so F′ gets “closer” to y (training data)o “Closer” defined by a loss functionhackernoon.como Use gradient descent to follow change in loss Keep separate datasets for testing & validationo Otherwise, AI could be overtrained Training is very intensive: large datasets,billions (!) of multiplicationsFeaturesOutputo GPUs are optimized for these operations Inference: applying trained AI to (new)input data to get outputo Output: classification, regression, etc.Comput. Meth. Appl. M. 353 (2019) 201FNAL ColloquiumKevin Pedro10

AI at FNAL: A Long HistoryB. Denby, “Neural Network Tutorial for High EnergyPhysicists”, FERMILAB-Conf-90/94, May 1990FNAL ColloquiumKevin Pedro11

Better

AI TodayAlphaGo ZeroarXiv:1605.07678GPT-2 Massive industry efforts in R&D for deep neural networkso Many frameworks: TensorFlow, PyTorch, MXNet, scikit-learn, etc. Giant leaps in image recognition, language processing, even game playingo Similar leaps in computational requirements FNAL ColloquiumKevin Pedro13

Convolutional Neural om Image recognition started modern AI revolution Innovation: convolutional neural networks (CNNs)o Combine neighboring pixels according to matrix of weights Same convolution applied to whole image reduce # weightso Derive features at different scales: edges, corners, etc.FNAL ColloquiumKevin Pedro14

CNNs for Neutrinos Neutrino detector data naturally image-likeEvent selected w/ 90% prob. NOvA was first particle physicsexperiment to publish† result from CNN ResNet50 can distinguish chargedcurrent events from cosmic background†JINST 11 (2016 ) P09001,Phys. Rev. Lett. 118 (2017) 231801FNAL ColloquiumKevin Pedro15

Collider Physics Example: TaggingPrototypical case: tagging top quarks Many models of physics beyond the standard model (SM) include newparticles that can decay to top quarkso Heavy new particles boosted top quarks,decay products merge into a single wide jettbW Clear signature of new physicsqqo But background events (e.g. SM QCD) have much higher rate Traditionally identified using jet substructure:o x Nsubjettiness, groomed jet mass, etc.(“expert” variables)o y top quark or QCDo F( x) selection criteria Example: τ32 0.6FNAL ColloquiumKevin Pedro16

AI for Tagging Can machine learning algorithmsdo a better job than experts?arXiv:2004.08262 Usual progression:bettero Combine expert variables inboosted decision tree (BDT)o Combine expert variables indeep neural network (DNN)o Use lower-level variables(reconstructed tracks, particles,etc.) in DNNo Use more advanced neuralnetwork architecturesFNAL ColloquiumKevin Pedro17

Different ApproachesTop quark jet imageImageTop: build “image” out of jet constituents Pros: leverage ubiquitous industry tools forimage recognition, convolutional neuralnetworks (CNNs) Cons: some information is lost(jets aren’t “really” 2D images)DeepAK8: learn from particle, vertexvariables directly Pros: keep more information Cons: 1D convolutions may notfully capture all relationshipsbetween quantitiesFNAL ColloquiumKevin Pedro18

AI Enables DiscoveryPhys. Lett. B 716 (2012) 30Higgs bb̄: most common decay,but huge background Dedicated DNN tags boostedHiggs with two secondary vertices Once thought impossibleo Now 2.5σ evidence in 20202013CMS-PAS-HIG-19-003Higgs γγ: rare process, but clearsignature and clean background Boosted decision trees critical forHiggs discovery in 2012 Event-level classification enhancesresonanceFNAL ColloquiumKevin Pedro19

Open Questions How to handle differences between data and simulation?o Gradient reversal (domain adaptation) very promising Can also help avoid other unwanted behavior, e.g. mass dependence How to explain what the network learns?o Should probably be an entire academic field in itselfo See e.g. The Building Blocks of Interpretability for CNNs Far from an exhaustive list (of algorithms or questions)arXiv:1409.7495FNAL ColloquiumKevin Pedro20

Smarter

Cutting Edge TaggingTaggerAUC Acc1/εB* # ParamsP-CNN0.980 0.930 759ResNet50†0.983 0.935 1000 25MResNeXt0.984 0.936 1147 1.46M348KParticleNet 0.986 0.940 1615 366K†CSBS 3 (2019) 13* (εS 0.3) P-CNN simplified DeepAK8 Apply massive image recognition networks(from industry) for significant gainsPhys. Rev. D 101 (2020) 056019see also SciPost Phys. 7 (2019) 014o But regular grids unnatural for collider data sparse occupancy, varying geometry, etc. ParticleNet does even better with far fewer parameters o But more operations: 3–4 ResNeXtFNAL ColloquiumKevin Pedro22

All Roads Lead to Graphs Generalize convolutions message passing w/ graphs (nodes & edges)o Derive new features for node xi using neighbors xjo Can even assign features to edgesarXiv:1801.07829 Aside: recurrent networks (RNNs) for language processing now supplantedby “Transformers” that use “attention”o These are just graphs!towardsdatascience.comFNAL ColloquiumKevin Pedro23

Graph Networks (GNNs) for Physics ParticleNet (leading top tagger) uses “point cloud” (also a GNN)o Also called “interaction networks”, “graph CNNs”, etc.o Same techniques applicable to many other tasks Input graphOutput graph Most fundamental problems in event reconstruction:tracking, vertexing, clustering How to associate detector hits with other detector hitso Detector geometry very important!FNAL ColloquiumKevin Pedro24

GNNs for TrackingGNNCTD2018 First application to reconstruction: tracking Start w/ possible connections between hits Edge classification:GNN decides which edges are correct Work in progress: 97% efficientCTD2019FNAL ColloquiumKevin Pedro25

GNNs for Clustering CMS upgrades ( 2026) include integratedendcap High Granularity Calorimetero Hexagonal wafers increase silicon yield Especially non-grid-like geometry Edge classification workswell for clustering Charged pion: 90%efficiency to find correctedges (99% for photonsand muons)o 98–99% correct energyassignmentNeurIPS (ML4PS) 2019FNAL ColloquiumKevin Pedro26

Edge Determination Default edge assignment: use k-nearest neighborsor similar algorithm (based on detector geometry) GravNet: edges determined dynamicallyo k-nearest neighbors using derived features GNN optimizes latent space to associate detector hits Open question: how to handle unknown number of clusters?Eur. Phys. J. C 79 (2019) 608FNAL ColloquiumKevin Pedro27

Graphs for Calibration Calibration: another fundamental problem in physicso Raw measurements usually have some bias Dynamic reduction network: arXiv:2003.08013 Preliminary results (hadron resolution) competitive w/ expert algorithmso Approach still being refinedPreliminaryhadrons(train: 10K)o Also studying industry benchmarkssuch as MNISTFNAL ColloquiumKevin Pedro28

Faster

Computing for AI AI has significant impacts on physics:o Helps us do things we couldn’t do before e.g. tag top and bottom quarks with unprecedented accuracyo Helps us do better at fundamental problems Tracking, clustering, calibration, etc. But can we afford to keep doing all of this?o HL-LHC is just around the corner 136 simultaneous protonproton collisions (2018 data)FNAL ColloquiumKevin PedroHGCal simulation, 200simultaneous pp collisions30

More Data, More Problems HL-LHC vital statistics:o 10 data vs. Run 2/3DUNE2026 30 PBo 200 simultaneous collisionsvs. 30 in Run 2o Detector upgrades:15–65 increase in channels More data and more complexityo DUNE, LSST, SKA will providesimilarly huge datasetsCMSOfflineComputingResults Data volumes will approach scaleof Google and Facebooko But computing resources won’t FNAL ColloquiumKevin Pedro31

CPU StagnationWLCG provides: 42 countries 170 computing centers 2 million tasks/day 1 million CPU cores 1 exabyte of storageWorldwide LHC Computing Grid Moore’s Law continueso But without Dennard scaling Single-thread performance can’tkeep up with accelerator intensity Projected shortfalls 2–10 ,depending on assumptionsFNAL ColloquiumKevin Pedro32

Heterogeneous Revolution New coprocessors provide efficiency at expense of flexibilityo GPU: execute serial instructions on massive datao FPGA: spatial computing (execute many instructions simultaneously) Luckily, optimized for machine learning!Nvidia GPUMicrosoft FPGAFNAL ColloquiumKevin Pedro33

AI & Coprocessors: Two Great Tastes Just adding new tagging algorithms doesn’t speed up reconstructiono ResNet50 inference on CPU: 1 sec/image Focus on replacing classical algorithms with AIo Fundamental problems (clustering, etc.) involve comparing all detectorhits to all other detector hitso O(N²) operation, can be reduced to O(NlogN) w/ clever techniqueso AI inference is O(N) much better scaling w/ detector occupancy Use coprocessors to accelerate AI inferenceo GPUs also useful for training,but training only uses subset of datao Inference must be performed for every event(billions, at least)FNAL ColloquiumKevin Pedro34

AI for Triggers CMS L1 trigger uses FPGAs to satisfy extreme latency requirements ( 1 μs) hls4ml: open-source packageo Optimize ML algorithms to run efficiently on FPGAso Handles BDTs, various DNN architectures Planned for use during LHC Run 3hls4mlFNAL ColloquiumKevin Pedro35

Inference As A ServiceCPUCoprocessor ServerCPU CPU Offline computing: looser latency requirementsMultiple CPUs send inference requests to coprocessor serverEnsures optimal utilization of GPUs/FPGAs, along with flexibilityOne coprocessor could serve 100 CPUso Depending on conditions and requirements:latency, bandwidth, memory, inference time, etc.o Much more cost effective than buying 1 GPU for every CPU in the grid FNAL ColloquiumKevin Pedro36

LHC Computing Model Inference as a service naturally fits into existing computing model Reconstruction process involves 100s of algorithmso Only a few worth accelerating Most efficient method:asynchronous, nonblocking callso Enabled by task-basedmultithreading CPU can do other workwhile inference requestis ongoingo Significantly reducesimpact of networklatencyFNAL ColloquiumKevin Pedro37

SONIC Approach SONIC (Services for Optimized Network Inference on Coprocessors):inference as a service in experiment software frameworks Use industry tools:o gRPC communicationo TensorFlow or Nvidia Triton inference serverso Kubernetes for dynamic scaling of resources Interact with cloud services: Azure, AWS, GCP Avoid rewriting millions of lines of C algorithm code in specializedcoprocessor languageso User code just converts input and output data into desired formatsFNAL ColloquiumKevin Pedro38

Cloud vs. EdgeCPU farmHeterogeneous Cloud ResourceExperimentSoftwareNetwork inputPredictionCPUFPGA/GPUHeterogeneous Edge Resource Cloud service has higher latencyExperimentSoftware Local installation of coprocessors:“on-prem” or “edge” Provides test of ultimate performanceCPUFNAL ColloquiumFPGA/GPU Use gRPC protocol either wayKevin Pedro39

LatencyFPGA Results ThroughputCPUCSBS 3 (2019) 13 Microsoft Brainwave FPGA, ResNet50 top tagger inference Latency: time for single request to completeo ‹CPU› 500–1000 ms, ‹remote› 60 ms, ‹on-prem› 10 ms Throughput: requests per secondo FPGA processes one image at a time, very quickly (1.8 ms)o GPU (GTX 1080) needs batch of 50 images to attain similar throughputFNAL ColloquiumKevin Pedro40

Scaling Up GPUsNvidia T4 GPUResNet50 inference1 event 10 imagesDeepCalo† inference1.8M parameters, batch 5dynamic batch 250–500† Spaatind2020, GitLab Use Kubernetes Triton to deploy multi-GPU servero More GPUs support more CPUs, higher throughput Triton supports dynamic batching: combine requests from multiple CPUso Huge increase in throughput for large networks with small batch sizeFNAL ColloquiumKevin Pedro41

Neutrino ChallengesDUNEDUNE: Deep Underground Neutrino Experiment Largest liquid argon detector ever designed 1M channels, 1 ms integration time w/ MHz sampling 30 petabytes/yearo Rate ultimately limited by available computing ProtoDUNE operating at CERN (5% size of DUNE)FNAL ColloquiumKevin Pedro42

ν-SONICall CPUw/ GPU ProtoDUNE reconstructiondominated by single ML algorithm Process Offload to GPU as a service:Full event 2 overall improvement!ML algorithmo Simple implementation(EmTrackMichelId)w/ blocking, synchronous callo Latency preferable to CPU inferenceFNAL ColloquiumKevin PedroTime [s](all CPU)Time [s](w/ GPU)227991421043

Better Smarter Moving beyond fully-connected and convolutional neural networks generalize by embedding data in graphs Cutting-edge techniques can handle fundamental tasks:tracking, clustering, calibrationFasterConclusions Need to accomplish fundamentals and encourage new capabilities,while coping with unprecedented floods of data Solution: accelerate AI inference with coprocessors as a service Promising and achievable path for colliders, neutrinos, & beyond!Major strides in deep learning have been incorporated in particle physicsSignificant improvements in top quark tagging (and other tasks)AI enables new avenues for discovery, such as boosted H bb̄Many open questions remainFNAL ColloquiumKevin Pedro44

Backup

Jet SubstructureFNAL ColloquiumKevin Pedro46

AI for b-tagging Similar progression to top tagging:o Expert variableso Expert variables combined in BDTo Expert variables combined in DNNo Low-level variables improve DNN Double-b-tagging benefits similarlyo “Expert” corresponds to b-tagging subjetsCMS-DP-2018-046JINST 13 (2018) P05011CMS-DP-2019-003DeepJetFNAL ColloquiumKevin Pedro47

Tagging New Particles Many new AI approaches being developed all the timeo Access to tools, frameworks, computing constantly increasing Can even tag new particles (beyond the SM)o e.g. displaced jets from long-lived particles Gradient reversal employed to avoid data/simulation discrepanciesarXiv:1912.12238FNAL ColloquiumKevin Pedro48

Better, Smarter, Faster Kevin Pedro Associate Scientist Scientific Computing Division / Particle Physics Division Fermilab May 6, 2020. Outline Particle Physics & AI: Particle detectors Artificial intelligence