Like What You Hear? Tweet It Using: #Sec360

Transcription

Like what you hear? Tweet it using: #Sec360

HADOOP SECURITYLike what you hear? Tweet it using: #Sec360

HADOOP SECURITYAbout Robert:School: UW Madison, U St. ThomasProgramming: 15 years, C, C , JavaSecurity Work:§ Surescripts, Minneapolis (present)§ Big Retail Company, Minneapolis§ Big Healthcare Company, MinnetonkaOWASP Local VolunteerCISSP, CISM, CISA, CHPSEmail: bob@confidentialsoftware.comTwitter: @msp sullivan

HADOOP SECURITYHistoryWhat is new?Common ApplicationsThreatsSecurity ArchitectureSecure Baseline and TestingPolicy Impact

HADOOP HISTORY 2002 : Doug Cutting & Mike Cafarella: Nutch Crawl and index hundreds of millions of pages 2003: Google File System paper released 2004: Google MapReduce paper released 2006: Yahoo formed Hadoop 5 to 20 nodes 2008: Yahoo, Hadoop “behind every click” 2008: Google spun off Cloudera 2,000 Hadoop nodes 2008: Facebook open sourced Hive for Hadoop 2011: Yahoo spins out Hortonworks Hortonworks Hadoop 42,000 nodes, hundreds of petabytesDerrick Harris “The History of Hadoop from 4 nodes to the future ofdata”, gigamon.com

HADOOP ISThe Apache Hadoop software library is a framework that allows for thedistributed processing of large -Software Framework-Distributed Processing-Large Data Sets-Clusters of Computers-High Availability-Scale to Thousands of torial

MAPREDUCE IS NEWMAPREDUCE

HADOOP COMMON APPLICATIONS1. Web Search2. Advertising & recommendations3. Security Threat Identification4. Fraud Detection5. Patient Record Search

Source: -yahoo-more-ever-54421.html

PATIENT MATCHING AT SURESCRIPTS- Surescripts provides a Patient Matching service- 230 Million Patients- Over 1 Billion matches last year- Requirements:-Reliability and performanceData Protection at rest is requiredData Protection in transit is requiredComprehensive security logging is neededISO 27001 & EHNAC Audit Accreditation status must bemaintained

NOW WHAT?SECURE THE BEES

HADOOP THREAT MODEL1) Unauthorized data access (protected health information access)2) Unauthorized data change3) Unauthorized job submission, delete or change4) Task may access other tasks or access local data5) Rogue DataNode, NameNode or Job Tracker6) User spoofing to submit workflow as another userFrom:“Adding Security to Apache Hadoop”, Das, O’Malley, Rhadia, Zhang, 1/10/securitydesign withCover-1.pdf

HADOOP SECURITY-Network a ProtectionData sersEnterprise Identity,Logging, Encryption,Key Management

DATA PROTECTION-Network a Protection- Encryption at rest;- Volume, file- Encryption in transit:- HTTPSData plicationUsersEnterprise Identity,Logging, Encryption,Key Management

SECURITY AUDITING-Network Security-Authentication-Authorization--Auditing- Failed/Successful Authn.- System changes- Access to PHI- Application logs: HDFS,YARN, MapReduce Data ProtectionData sersEnterprise Identity,Logging, Encryption,Key Management

AUTHORIZATION-Network Security-Authentication-Authorization- Limit user access tofunction- Limit user access to objects- Manage delegation ofaccess-Auditing-Data ProtectionData sersEnterprise Identity,Logging, Encryption,Key Management

AUTHENTICATION-Data NodesNetwork SecurityAuthentication- All users, all applications,all access paths- Apache Knox Gateway-Authorization-Auditing-Data plicationUsersEnterprise Identity,Logging, Encryption,Key Management

NETWORK SECURITY-Network a ProtectionData sersEnterprise Identity,Logging, Encryption,Key Management

HADOOP SECURE MODEApache Hadoop Secure Mode: 2.6.0 (March 14’)Authentication- Covers HDFS, YARN, MapReduce & Web Console- Uses central LDAP Server or Active Directory- Requires Kerberos keytabs for each application--Authorization- Each Hadoop service has a list of users and groups- Group permissions on HDFS filesystem componentsAudit- Hadoop log, YARN log, other logsData Protection- Encryption in transit between Hadoop services & clients- Encryption in transit between DataNodes- Encryption in transit between web console & clients (HTTPS)- Encryption at rest for HDFS columns

HADOOP SECURE MODEApache Hadoop Secure Mode: 2.6.0 (March TaskAccessRogueNodeUserSpoofing

APACHE KNOXThe Apache Knox Gateway is a REST API Gateway for interacting withHadoop clusters. The Knox Gateway provides a single access point for allREST interactions with Hadoop clusters.Knox can provide: Authentication (LDAP and Active Directory Authentication Provider) Federation/SSO (HTTP Header Based Identity Federation) Authorization (Service Level Authorization) AuditingIntegrations:- WebHDFS (HDFS), Templeton (Hcatalog), Stargate (Hbase), Oozie, Hive/JDBCStatus: Incubating

APACHE RANGERA centralized security framework to manage fine grained access control.Status: IncubatingAuthentication Kerberos in native Apache Hadoop Secured by the Apache Knox Gateway via the HTTP/REST APIAuthorization on the folder and file level, via HDFS on the database, table and column level, via Hive on the table, column family and column level, via HBaseAuditUser access auditing in HDFS, Hive and HBase at IP address, Resource/resource type, Timestamp, Access granted or deniedData Protection Wire, volume and file/column encryotion HDFS Transparent Encryption (TDE) Third-Party Partners (Hortonworks)Administration Policy management, administration and /HDP2/HDP-2.2.0/Ranger U Guide v22/index.html#Item1.1

HADOOP SECURITY POLICYAuthentication of processes:-May go into existing application security policySecurity Logging requirements:-Which applications must be logged?-Add node identifier to standard log recordsDe-anonymization Issues-Sparse data can be de-anonymized through matching to public sources-Could 200 days of tweets be matched to any of my de-identified data?Key Management & Business Continuity

BUILD A SECURITY BASELINE-Start with your Vendor’s distribution-Add your company’s sauce-Review Hadoop Security Benchmark project at the Center For InternetSecurity:- Apache Hadoop 2.6.0 Benchmark- Community Discussion- Editors and members get free access to validation tools- Everyone gets free access to baselines- Registration is moderated. That means human registrants are approved andreceive a welcome email.- Link:- http://tinyurl.com/HadoopSecurityBenchmark

HADOOP SECURITY REVIEW1. Start with the threats2. Choose your diagram3. Ask the standard security questions:u Network Securityu Authenticationu Authorizationu Security Auditu Data Protection4. Update your policy5. Build a Security Baseline

HADOOP SECURITY RESOURCES1.Apache “Hadoop in Secure Modehttp://tinyurl.com/hadoopSecureMode2. Yahoo Hadoop l3. Securosis: “Securing Big Data: Security Recommendations for Hadoop and NoSQLEnvironments”, 10/12/2012, Adrian ecuringBigData FINAL.pdf4.5.Cloudera: “Introduction to Hadoop rtonworks: “Security for Enterprise y/6.Center for Internet Security: Hadoop Security Baselinehttp://tinyurl.com/HadoopSecurityBenchmark

QUESTIONS?Updates at http://www.confidentialsoftware.com

APACHE KNOX The Apache Knox Gateway is a REST API Gateway for interacting with Hadoop clusters. The Knox Gateway provides a single access point for all REST interactions with Hadoop clusters. Knox can provide: Authentication (LDAP and Active Directory Authentication Provider) Federation/SSO (HTTP Header Based Identity Federation)