SKA, DOME & ASTRON Project - µServer

Transcription

SKA, DOME & ASTRON project - µServerRonald P. Luijten – Data Motion Architectlui@zurich.ibm.comIBM Research - Zurich16 July 2015DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.

COMPUTE is FREE – DATA is NOTRonald P. Luijten – Data Motion Architectlui@zurich.ibm.comIBM Research - Zurich16 July 2015DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.

DOME: ppp Astron, IBM, Dutch gvt20MEur funding over 5 yearsStarted feb 2012Ronald P. Luijten – BDEC @ ISC15 - 16Jul153

SKA (Square Kilometer Array) to measure Big BangBigBangInflation010-32sProtonscreatedStart ofnucleosynthesisthrough fusion10-6sEnd of nucleosynthesis0.01s3minModern Universe380’000 years13.8 Billion yearsPicture source: NZZ march 2014 2012 IBM CorporationRonald P. Luijten – BDEC @ ISC15 - 16Jul154

CSPSDP 1 PB/Day.330 disks/day 10 Pb/s?120’000 disks/yr86’400 sec/day15 ExaByte/dayTop-500 Supercomputing(11/2013) . 0.3Watt/Gflop/sToday’s industry focus is 1 Eflop @ 20MW. (2018)( 0.02 Gflop/s)Too hardMost recent data from SKA:CSP .max. power 7.5MWSDP .max. power 1 MWLatest need for SKA – 4 Exaflop (SKA1 - Mid)1.2GW 80MWToo easy (for us)Factor 80-1200Moore’s lawmultiple breakthroughs needed 20142012IBMIBMCorporationCorporationRonald P. Luijten – BDEC @ ISC15 - 16Jul155

IBM / ASTRON DOME projectDOMEProject:5Technology roadmap development IBM at CeBIT 2013 – Rethink your businessYears, 33M Euro Sustainable(Green) Computing Nanophotonics System ion Algorithms & Machines Computing-Microservers-Accelerators Transport-Nanophotonics-Real TimeCommunications-CompressiveSampling UserPlatform Data & Streaming Storage-Access Patterns 20132012IBMIBMCorporationCorporation 6Ronald P. Luijten – BDEC @ ISC15 - 16Jul156 6

DOME µServer Motivation & Objectives Create the worlds highest density 64 bit µ-server drawer–Useful to evaluate both SKA radio-astronomy and IBM future business–Platform for Business Analytics appliance pre-product research–High energy efficiency / very low cost–Commodity components, HW SW standards based–Leverage ‘free computing’ paradigm–Enhance with ‘Value Add’: packaging, system integration, –Density and speed of light Most efficient cooling using IBM technology(ref: SuperMUC June 2012 TOP500 machine) Must be true 64 bit to enable business applications Must run server class OS (SLES11 or RHEL6, or equivalent)–Precluded ARM (64-bit Silicon was not available)–PPC64 is available in SoC from FSL since 2011–(no to build a new SoC ) This is the DOME project capability demonstrator – not a productRonald P. Luijten – BDEC @ ISC15 - 16Jul157

DefinitionµServer:The integration of an entire server node motherboard*into a single microchip except DRAM, Nor-boot flashand power conversion logic.139mmx55mm245mm305mm* nographicsRonald P. Luijten – BDEC @ ISC15 - 16Jul158

DefinitionµServer:The integration of an entire server node motherboard*into a single microchip except DRAM, Nor-boot flashand power conversion logic.This does NOT imply low performance!139mmx55mm245mm305mm* nographicsRonald P. Luijten – BDEC @ ISC15 - 16Jul159

T4240 Chip Overview12 core – fully dual threaded1.8 GHz ppc64 (e6500)12 DP-FPU; 12 128b Altivec3 DDR3 channels at 1.86GT/s3x 0.5MB L3 cache4x 10GbE 2x SATAPCIe 3.0HW packet accelerationRegEx Pattern Match acc.Crypto acceleration28nm TSMC Bulk CMOS239mm2 - 1.7B transistors111Mbit SRAM, 6M FFRonald P. Luijten – BDEC @ ISC15 - 16Jul157 Power states (2 power gating)10

T4240 Chip OverviewThis is NOT the ideal partHowever, a very good oneBuilt for Embedded marketImpressive powermanagement featuresNot great for HPC:not enough DP-FP unitsNo DDR prefetchingRonald P. Luijten – BDEC @ ISC15 - 16Jul1511

DOME compute node board diagram16GBDRAM72bit16GBDRAM72bit1866 MT/s1866 MT/s1Gbit SPIflash1866 MT/s16GBDRAM72bitT4240PSoCI2C1V / 40APowerconverterSerialJTAGUSB4 x PCIe x8 2 x SATA10 GbERonald P. Luijten – BDEC @ ISC15 - 16Jul1512V / 2.5A12

DOME compute node board diagramPSOC collapses 6 functions into a smallDRAMchipto saveDRAMArea, Power and CostSPIflashPSoCI2CSerialJTAGUSB1. On/Off and Power up sequencing2. Provide uServer boot configurationDRAM3. JTAG debug access4. Serialport access (Linux)T4240Power5. Temperature monitoring andprotectionconverter and current measurement6. Management interface and control4 x PCIe x8 2 x SATA10 GbERonald P. Luijten – BDEC @ ISC15 - 16Jul1512V / 2.5A13

DOME Compute node board form factor55 mmT4240 SoCStandard 240 pin DDR3memory DIMM board30 mm(lid removed)FRONT133 mmP5020 SoC55 mm(Lid Removed)133 mm139 mmP5020/P5040(Generation 1)Ronald P. Luijten – BDEC @ ISC15 - 16Jul15DecouplingCapacitorsareaBACK139 mmT4240Generation 214

Planned System: 2U rack unit19” 2U Chassis w/ Combined Cooling & Power128 compute node boards1536 cores / 3072 Threads6 TB DRAM1.28Tbps Ethernet (@40Gbps)Datacenter-in-a-box Expected 2U unit total power: 6kW Integrated mains power converter to 12V distribution: 12V / 500A Each compute node has own 12V / 40W converter Common Power Converter boards for all other supplies High radix 10GbE / 40GbE switch boards (under construction) Connects to Mains, Rack level Water, 32x 40Gbps Ethernet Hot-water cooled for efficiency and densityRonald P. Luijten – BDEC @ ISC15 - 16Jul1515

Planned network for 128 nodes with 40G external links6 x 40G6 x 40G6 x NNNNNNNNNNNNNNNNNNNNNNNNNNNNN6 x itch4 x 40GSwitchNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN4 x 40GSwitch 32 external 40G ports using Ethernet switches 1280 Gbps external BWRonald P. Luijten – BDEC @ ISC15 - 16Jul1516

Performance Measurement ResultsCPUCPU2006 BenchmarkTest EnvironmentFreescale T424012 cores; 24 thr.28nm BulkIntel Xeon E3-1230L v34 cores; 8 threads22nm FinFetSystem: T4240RDB-PB1.666 GHz core clock,1.866 GT/s 6GB DRAM, 3 channelsFedora 20, Kernel 3.12.19GCC 4.7.2gcc options: -O3 -mcpu powerpc64System: Supermicro X10SAE1.8 GHz core clock; Turbo disabled1.666 GT/s 8 GB DRAM, 2 channelsFedora 19, Kernel 3.13.9GCC 4.8.2gcc options: -O3 -march native -mtune native6.8620.7CINT-base – all threads109.34 (24 threads)77.6 (8 threads)Coremark - all threads188K (24 threads)65K (8 threads)CINT-base – 1 threadRonald P. Luijten – BDEC @ ISC15 - 16Jul1517

Performance Measurement ResultsCPUCPU2006 BenchmarkTest EnvironmentFreescale T424012 cores; 24 thr.28nm BulkIntel Xeon E3-1230L v34 cores; 8 threads22nm FinFetSystem: T4240RDB-PB1.666 GHz core clock,1.866 GT/s 6GB DRAM, 3 channelsFedora 20, Kernel 3.12.19GCC 4.7.2gcc options: -O3 -mcpu powerpc64System: Supermicro X10SAE1.8 GHz core clock; Turbo disabled1.666 GT/s 8 GB DRAM, 2 channelsFedora 19, Kernel 3.13.9GCC 4.8.2gcc options: -O3 -march native -mtune native6.8620.7CINT-base – all threads109.34 (24 threads)77.6 (8 threads)Coremark - all threads188K (24 threads)65K (8 threads)CINT-base – 1 thread40% more performance @ 70% of node level energyconsumption2x more operations per WattRonald P. Luijten – BDEC @ ISC15 - 16Jul1518

Performance Measurement ResultsCPUCPU2006 BenchmarkTest EnvironmentFreescale T424012 cores; 24 thr.28nm BulkIntel Xeon E3-1230L v34 cores; 8 threads22nm FinFetSystem: T4240RDB-PB1.666 GHz core clock,1.866 GT/s 6GB DRAM, 3 channelsFedora 20, Kernel 3.12.19GCC 4.7.2gcc options: -O3 -mcpu powerpc64System: Supermicro X10SAE1.8 GHz core clock; Turbo disabled1.666 GT/s 8 GB DRAM, 2 channelsFedora 19, Kernel 3.13.9GCC 4.8.2gcc options: -O3 -march native -mtune native6.8620.7CINT-base – all threads109.34 (24 threads)77.6 (8 threads)Coremark - all threads188K (24 threads)65K (8 threads)CINT-base – 1 thread40% more performance @ 70% of node level energyconsumption2x more operations per WattRonald P. Luijten – BDEC @ ISC15 - 16Jul1519

ComparisonP8 memoryDIMM?DOMEcompute nodeRonald P. Luijten – BDEC @ ISC15 - 16Jul1520

Power Measurement ResultsPower measurement on rev 1 board #5, on 7 8 april 2015; PSoC firmware 2-mar-15current measurements at 12V input of power converters, T4240 temp 65Cvoltage domaincurrent measured @ 12V inputconditionPSOC only powerT4240 power on, kept in resetu-boot prompt (idle)Linux prompt, idle systemBW MEM 512M, 24 thrstream, 24 threadBW MEM 512, 24 thridle at XCFE desktopSpecInt PerlBench, 24 thrSpecInt PerlBench, 12 thrSpecInt gcc, 12 thr1V8 I/OmA3.47577.677.677.377.377.777.777.87878Ronald P. Luijten – BDEC @ ISC15 - 03554161V0 coreWAW0.888 0.0008 0.00961.824 0.32 3.844.21.48 17.763.781.58 18.965.41.65 19.85.641.65 19.83.842.53 30.363.841.619.24.82.63 31.564.262.226.44.992 6.367635.132423.972437.293631.59626.32821

RemarksNew Big-Data Metric: Memory BW densityuse raw memory BW available at SoC or CPUdivide by volume of entire enclosure, incl. HDD, PCI slotsDOME 128node 2U rack unit: 159GB/s/Liter (peak)P8 server S822L (dual socket): 13.9GB/s/Liter (peak) New era – perfect storm and Innovators Dilemma µServer is all about SoC and packaging This is a serendipitous data pointRonald P. Luijten – BDEC @ ISC15 - 16Jul1522

LIVE DEMOWe demonstrate asingle node running: T4240Fedora 20XFCE DesktopStreamCPMD And live 1V domaincurrent measurementcompute nodemini BaseBoardShowing a revision-1 board T4240ZMS compute server: Larger than DOME form factor, same netlist All components on top side (save bring-up time and expense) Air-cooled for single node operationRonald P. Luijten – BDEC @ ISC15 - 16Jul1523

T4240ZMS node:-Revision 1 board-slower 424024 HW thread1625MHzDEMO SETUPDRAM16GB1500MTASerialJTAGPowerconverterDIMM connectorUSBSATA1GbE88E1111PHYmSATASingle node carrier boardRonald P. Luijten – BDEC @ ISC15 - 16Jul1524

Status and PlansUntil YE 20152016 a new compute nodeBeyond 2016H2020 proposalsRonald P. Luijten – BDEC @ ISC15 - 16Jul1525

1. Please provide a brief overview of the activities at your Institution that address the technical challenges inhardware and software architecture. These efforts can be in traditional scientific HPC, or in the area of "BigData" and Data Analytics. You have an opportunity to highlight unique perspectives you can bring to theworkshop as representatives of the broader International community. Analytics, HPC (alg; codes; arch), Accelerators, Security2. A key goal of the BDEC workshop is to systematically map the opportunities for Big Data synergy withExtreme-Scale HPC. In recent decades, the HPC community has used HPC systems that were created fromthe integration of commodity computing components that were largely designed and developed for the muchlarger desktop and server markets. Moving forward, in an analogous manner it is very likely that future HPCsystems will be created from the integration of commodity computing components that were originallydesigned for the much larger Big Data markets. Do you agree with this statement and from your perspectiveare there other synergies that can be leveraged? uServer is using embedded market commodity SoC – example of other leveraged synergy3. What are your priorities for international cooperation in designing and developing hardware and softwarearchitectures for both Big Data and Extreme-scale Computing? From the perspective of your Institution, doyou have examples of successful cooperation or collaboration? Examples can be cited as workshops youhosted, successful open source technology collaborations, visiting researcher positions, joint papers, etc. Successful collaboration with ASTRON, influencing SKA. Great collab with FSL. DOME USER PLATFORMHave developed low cost – high performance data aggregator to feed IoT data into HPC – opportunity here!4. In what areas could you benefit from contributions provided by other institutions including industry vendors,academia, and government organizations? Whether open source or proprietary, what would you seek in theway of hardware and software components and tools, experimental results and findings, or drivercomputational challenges from the world-wide HPC and big data community to further your own goals inthese emerging cooperative fields? The insights in this project (it’s the system design, stupid) tell us what SoC our community should build Looking for 100M to build better SoC – I have ideas .Ronald P. Luijten – BDEC @ ISC15 - 16Jul1526

LinksSKA: http://www.skatelescope.orgDOME: http://www.dome-exascale.nlµServer: http://www.zurich.ibm.com/microserverT4240 system: http://swissdutch.ch:6999Wikipedia: https://en.wikipedia.org/wiki/MicroserverTwitter: https://twitter.com/ronaldgadgetVideos:Impossible µServer: http://t.co/4vEkEVEazOInnovators Dilemma: http://youtu.be/imweQe8NgnIDOME T4240 Fedora: http://youtu.be/D6da5DqcyQk4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoCfor Big-DataApplicationswith 159GB/s/L Memory Bandwidth System DensityRonald P. Luijten – BDEC@ ISC15- 16Jul152727 of 15

Literature“Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nmBulk CMOS 64b SoC for Big-Data Applications with 159GB/s/L Memory BandwidthSystem Density”, R.Luijten et al., ISSCC15, San Francisco, Feb 2015“The DOME embedded 64 bit microserver demonstrator”, R. Luijten and A. Doering,ICICDT 2013, Pavia, Italy, May 2013“Quantitative Analysis of the Berkeley Dwarfs' Parallelism and Data MovementProperties”, Victoria Caparros Cabezas, Phillip Stanley-Marbell, ACM CF 2011, May2011“Performance, Power, and Thermal Analysis of Low-Power Processors for ScaleOut Systems”, Phillip Stanley-Marbell, Victoria Caparros Cabezas, IEEE HPPAC 2011,May 2011“Pinned to the Walls—Impact of Packaging and Application Properties on theMemory and Power Walls”, Phillip Stanley-Marbell, Victoria Caparros Cabezas,Ronald P. Luijten, IEEE ISLPED 2011, Aug 2011. 2015 IEEEInternational Solid-StateRonaldCircuits P.ConferenceLuijten4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoCfor Big-DataApplicationswith 159GB/s/L Memory Bandwidth System Density– BDEC@ ISC15- 16Jul152828 of 15

AcknowledgementsThis work is the results of many people Peter v. Ackeren, FSLEd Swarthout, FSL AustinDac Pham, FSL AustinYvonne Chan, IBM TorontoAndreas Doering, IBM ZRLAlessandro Curioni, IBM ZRLStephan Paredes, IBM ZRLMatteo Cossale, IBM ZRLJames Nigel, FSLBoris Bialek, IBM TorontoMarco de Vos, Astron NLVipin Patel, IBM FishkillAnd many more remain unnamed .Companies: FSL Austin, Belgium & Germany; IBM worldwide; Transfer - NLRonald P. Luijten – BDEC @ ISC15 - 16Jul1529

Questions?PS. I like lightweight thingsµServer website: www.swissdutch.chRonald P. Luijten – BDEC @ ISC15 - 16Jul1530

COMPUTE is FREE - DATA is NOT Ronald P. Luijten - Data Motion Architect lui@zurich.ibm.com IBM Research - Zurich 16 July 2015 DISCLAIMER: This presentation is entirely Ronald's view and not necessarily that of IBM.