Transcription
Like what you hear? Tweet it using: #Sec360
HADOOP SECURITYLike what you hear? Tweet it using: #Sec360
HADOOP SECURITYAbout Robert:School: UW Madison, U St. ThomasProgramming: 15 years, C, C , JavaSecurity Work:§ Surescripts, Minneapolis (present)§ Big Retail Company, Minneapolis§ Big Healthcare Company, MinnetonkaOWASP Local VolunteerCISSP, CISM, CISA, CHPSEmail: bob@confidentialsoftware.comTwitter: @msp sullivan
HADOOP SECURITYHistoryWhat is new?Common ApplicationsThreatsSecurity ArchitectureSecure Baseline and TestingPolicy Impact
HADOOP HISTORY 2002 : Doug Cutting & Mike Cafarella: Nutch Crawl and index hundreds of millions of pages 2003: Google File System paper released 2004: Google MapReduce paper released 2006: Yahoo formed Hadoop 5 to 20 nodes 2008: Yahoo, Hadoop “behind every click” 2008: Google spun off Cloudera 2,000 Hadoop nodes 2008: Facebook open sourced Hive for Hadoop 2011: Yahoo spins out Hortonworks Hortonworks Hadoop 42,000 nodes, hundreds of petabytesDerrick Harris “The History of Hadoop from 4 nodes to the future ofdata”, gigamon.com
HADOOP ISThe Apache Hadoop software library is a framework that allows for thedistributed processing of large -Software Framework-Distributed Processing-Large Data Sets-Clusters of Computers-High Availability-Scale to Thousands of torial
MAPREDUCE IS NEWMAPREDUCE
HADOOP COMMON APPLICATIONS1. Web Search2. Advertising & recommendations3. Security Threat Identification4. Fraud Detection5. Patient Record Search
Source: -yahoo-more-ever-54421.html
PATIENT MATCHING AT SURESCRIPTS- Surescripts provides a Patient Matching service- 230 Million Patients- Over 1 Billion matches last year- Requirements:-Reliability and performanceData Protection at rest is requiredData Protection in transit is requiredComprehensive security logging is neededISO 27001 & EHNAC Audit Accreditation status must bemaintained
NOW WHAT?SECURE THE BEES
HADOOP THREAT MODEL1) Unauthorized data access (protected health information access)2) Unauthorized data change3) Unauthorized job submission, delete or change4) Task may access other tasks or access local data5) Rogue DataNode, NameNode or Job Tracker6) User spoofing to submit workflow as another userFrom:“Adding Security to Apache Hadoop”, Das, O’Malley, Rhadia, Zhang, 1/10/securitydesign withCover-1.pdf
HADOOP SECURITY-Network a ProtectionData sersEnterprise Identity,Logging, Encryption,Key Management
DATA PROTECTION-Network a Protection- Encryption at rest;- Volume, file- Encryption in transit:- HTTPSData plicationUsersEnterprise Identity,Logging, Encryption,Key Management
SECURITY AUDITING-Network Security-Authentication-Authorization--Auditing- Failed/Successful Authn.- System changes- Access to PHI- Application logs: HDFS,YARN, MapReduce Data ProtectionData sersEnterprise Identity,Logging, Encryption,Key Management
AUTHORIZATION-Network Security-Authentication-Authorization- Limit user access tofunction- Limit user access to objects- Manage delegation ofaccess-Auditing-Data ProtectionData sersEnterprise Identity,Logging, Encryption,Key Management
AUTHENTICATION-Data NodesNetwork SecurityAuthentication- All users, all applications,all access paths- Apache Knox Gateway-Authorization-Auditing-Data plicationUsersEnterprise Identity,Logging, Encryption,Key Management
NETWORK SECURITY-Network a ProtectionData sersEnterprise Identity,Logging, Encryption,Key Management
HADOOP SECURE MODEApache Hadoop Secure Mode: 2.6.0 (March 14’)Authentication- Covers HDFS, YARN, MapReduce & Web Console- Uses central LDAP Server or Active Directory- Requires Kerberos keytabs for each application--Authorization- Each Hadoop service has a list of users and groups- Group permissions on HDFS filesystem componentsAudit- Hadoop log, YARN log, other logsData Protection- Encryption in transit between Hadoop services & clients- Encryption in transit between DataNodes- Encryption in transit between web console & clients (HTTPS)- Encryption at rest for HDFS columns
HADOOP SECURE MODEApache Hadoop Secure Mode: 2.6.0 (March TaskAccessRogueNodeUserSpoofing
APACHE KNOXThe Apache Knox Gateway is a REST API Gateway for interacting withHadoop clusters. The Knox Gateway provides a single access point for allREST interactions with Hadoop clusters.Knox can provide: Authentication (LDAP and Active Directory Authentication Provider) Federation/SSO (HTTP Header Based Identity Federation) Authorization (Service Level Authorization) AuditingIntegrations:- WebHDFS (HDFS), Templeton (Hcatalog), Stargate (Hbase), Oozie, Hive/JDBCStatus: Incubating
APACHE RANGERA centralized security framework to manage fine grained access control.Status: IncubatingAuthentication Kerberos in native Apache Hadoop Secured by the Apache Knox Gateway via the HTTP/REST APIAuthorization on the folder and file level, via HDFS on the database, table and column level, via Hive on the table, column family and column level, via HBaseAuditUser access auditing in HDFS, Hive and HBase at IP address, Resource/resource type, Timestamp, Access granted or deniedData Protection Wire, volume and file/column encryotion HDFS Transparent Encryption (TDE) Third-Party Partners (Hortonworks)Administration Policy management, administration and /HDP2/HDP-2.2.0/Ranger U Guide v22/index.html#Item1.1
HADOOP SECURITY POLICYAuthentication of processes:-May go into existing application security policySecurity Logging requirements:-Which applications must be logged?-Add node identifier to standard log recordsDe-anonymization Issues-Sparse data can be de-anonymized through matching to public sources-Could 200 days of tweets be matched to any of my de-identified data?Key Management & Business Continuity
BUILD A SECURITY BASELINE-Start with your Vendor’s distribution-Add your company’s sauce-Review Hadoop Security Benchmark project at the Center For InternetSecurity:- Apache Hadoop 2.6.0 Benchmark- Community Discussion- Editors and members get free access to validation tools- Everyone gets free access to baselines- Registration is moderated. That means human registrants are approved andreceive a welcome email.- Link:- http://tinyurl.com/HadoopSecurityBenchmark
HADOOP SECURITY REVIEW1. Start with the threats2. Choose your diagram3. Ask the standard security questions:u Network Securityu Authenticationu Authorizationu Security Auditu Data Protection4. Update your policy5. Build a Security Baseline
HADOOP SECURITY RESOURCES1.Apache “Hadoop in Secure Modehttp://tinyurl.com/hadoopSecureMode2. Yahoo Hadoop l3. Securosis: “Securing Big Data: Security Recommendations for Hadoop and NoSQLEnvironments”, 10/12/2012, Adrian ecuringBigData FINAL.pdf4.5.Cloudera: “Introduction to Hadoop rtonworks: “Security for Enterprise y/6.Center for Internet Security: Hadoop Security Baselinehttp://tinyurl.com/HadoopSecurityBenchmark
QUESTIONS?Updates at http://www.confidentialsoftware.com
APACHE KNOX The Apache Knox Gateway is a REST API Gateway for interacting with Hadoop clusters. The Knox Gateway provides a single access point for all REST interactions with Hadoop clusters. Knox can provide: Authentication (LDAP and Active Directory Authentication Provider) Federation/SSO (HTTP Header Based Identity Federation)