The Science DMZ - GlobusWorld

Transcription

The Science DMZWith apologies to.Building the Modern Research Data PortalEli Dart, Network EngineerYale UniversityESnet Science EngagementOctober 12, 2016Lawrence Berkeley National Laboratory

Outline Science DMZ in brief Context – Science DMZ in the community Science DMZ and Data 17

Science DMZ Design Pattern (Abstract)Border RouterperfSONARWAN10GEnterprise BorderRouter/Firewall10GESite / Campusaccess to ScienceDMZ resourcesClean,High-bandwidthWAN path10GEperfSONAR10GESite / CampusLANScience DMZSwitch/Router10GEperfSONARPer-servicesecurity policycontrol pointsHigh performanceData Transfer Nodewith high-speed storage3 – ESnet Science Engagement (engage@es.net) - 1/25/17 2015, Energy Sciences Network

HPC Center Data PathBorder RouterperfSONARFront endswitchFront endswitchperfSONARData TransferNodesHigh Latency WAN PathSupercomputerLow Latency LAN PathParallel Filesystem4 – ESnet Science Engagement (engage@es.net) - 1/25/17 2014, Energy Sciences Network

Context: Science DMZ Adoption Initially deployed within DOE National Laboratories Growing adoption among institutions of all sizes NSF CC* programs have funded many Science DMZs Other US agencies, e.g. NIH, USDA International, e.g. Australia, Brazil, UK51/25/17

Strategic Impacts We are undergoing significant cyberinfrastructure upgrades. .but enterprise networks need not be unduly perturbed J Significantly enhanced capabilities compared to 3 years ago– Terabyte-scale data movement is much easier– Petabyte-scale data movement possible outside the LHC experiments 3.1Gbps 1PB/month 14Gbps 1PB/week– Widely-deployed tools are much better (e.g. Globus) Metcalfe’s Law of Network Utility– Value of Science DMZ proportional to the number of DMZs n2 or n(logn) doesn’t matter – the effect is real– Cyberinfrastructure value increases as we all upgrade61/25/17

Legacy Portal DesignBorder RouterperfSONARFirewallWANperfSONAR Software components tightly coupled Cannot move entire portal to Science DMZbecause of security Even if you could put it in a DMZ, manycomponents aren’t scalable Performance improvement requiresarchitectural change71/25/17EnterpriseBrowsing pathQuery pathData path10GEPortal server applications:· web server· search· database· authentication· data servicePortalServer10GEFilesystem(data store)

Example of Architectural Change – CDN CDNs are a well-deployed design pattern Inherent in Internet architecture (e.g. Netflix, Amazon) Store static and dynamic content in separate locations– Static content is simple (but often BIG)– Application dynamics are complex (stateful, synchronous) Separation of application and data service allows each tobe optimized81/25/17

Classical Web Server Model Web browser fetches pages from web server All content stored on the web server Web applications run on the web server Web server sends data to client browser over the network Issue: Latency increases time to page render Issue: Packet loss latency problems for large static objectsWeb roadbandBrowserLong Distance / High Latency91/25/17

Solution: Place Large Static Objects Near Client Reduced latency– Faster page rendering– Faster static content delivery Reduced web server loadCDN DataServer Significant win for overall performanceDATACDNShort Distance /Low LatencyWeb roadbandBrowserLong Distance / High Latency101/25/17

Client Simply Sees Increased Performance Client doesn’t see the CDN as separate entity Web content is all still viewed in a browser– Browser fetches what the page tells it to fetch– Different content comes from different places– User doesn’t know/care CDNs provide architectural solution to performanceCDN DataServerDATASimple,FastWeb ServerWEBWeb ServerThe‘NetWEBThe‘NetBrowserBrowserRich, Slow111/25/17

Architectural Examination of Data Portals Common data portal functions– Search/query/discovery– Data download method for data access– GUI for browsing by humans– API (ideally incorporates search/query/download) Performance issues primarily in data download– Rapid increase in data scale eclipsed legacy software stack– Portal servers often stuck in enterprise network121/25/17

Legacy Portal DesignBorder RouterperfSONARFirewallWANEnterpriseBrowsing pathQuery pathData pathperfSONAR10GEPortal server applications:· web server· search· database· authentication· data serviceCan we “disassemble” the portal andreassemble it for improved performance? 13Use Science DMZ as a platform for the data pieceAvoid placing complex software in the Science DMZ1/25/17PortalServer10GEFilesystem(data store)

Modern Data Portal Leverages Science DMZperfSONARBorder RouterFirewallWANEnterpriseData PathBrowsing pathQuery path10GEperfSONARperfSONARPortal server applications:· web server· search· database· authenticationScience DMZSwitch/RouterPortalServer10GE10GEData Transfer Path10GE10GE10GEDTNDTN1/25/1710GE10GEDTNPortal Query/Browse Path14DTN10GEAPI DTNs(data access governedby portal)Filesystem(data store)

Separating data handling from portal logic Portal GUI, search, etc. all function as before Query returns pointers to data objects in Science DMZ Portal freed from ties to data servers Data handling is separate, and scalable– High-performance DTNs in the Science DMZ– Scale without modifying the portal software Outsource data handling to computing centers151/25/17

Links and Lists– ESnet fasterdata knowledge base http://fasterdata.es.net/– Science DMZ paper http://www.es.net/assets/pubs presos/sc13sciDMZ-final.pdf– Science DMZ email list Send mail to sympa@lists.lbl.gov with subject "subscribe esnet-sciencedmz”– perfSONAR onar/ http://www.perfsonar.net– Globus https://www.globus.org/16 – ESnet Science Engagement (engage@es.net) - 1/25/17 2015, Energy Sciences Network

Thanks!Eli Dart dart@es.nethttp://fasterdata.es.net/Energy Sciences Network (ESnet)http://my.es.net/Lawrence Berkeley National Laboratoryhttp://www.es.net/

Science DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points n, th th ces R R . Metcalfe's Law of Network Utility -Value of Science DMZ proportional to the number of DMZs n2or n(log n) doesn't matter -the effect is real