Hortonworks Data Platform - System Administration Guides

Transcription

docs.hortonworks.com

Hortonworks Data PlatformOct 28, 2014Hortonworks Data Platform: System Administration GuidesCopyright 2012-2014 Hortonworks, Inc. Some rights reserved.The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and 100% opensource platform for storing, processing and analyzing large volumes of data. It is designed to deal withdata from many sources and formats in a very quick, easy and cost-effective manner. The HortonworksData Platform consists of the essential set of Apache Hadoop projects including MapReduce, HadoopDistributed File System (HDFS), HCatalog, Pig, Hive, HBase, Zookeeper and Ambari. Hortonworks is themajor contributor of code and patches to many of these projects. These projects have been integrated andtested as part of the Hortonworks Data Platform release process and installation and configuration toolshave also been included.Unlike other providers of platforms built using Apache Hadoop, Hortonworks contributes 100% of ourcode back to the Apache Software Foundation. The Hortonworks Data Platform is Apache-licensed andcompletely open source. We sell only expert technical support, training and partner-enablement services.All of our technology is, and will remain free and open source.Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. Formore information on Hortonworks services, please visit either the Support or Training page. Feel free toContact Us directly to discuss your specific needs.Except where otherwise noted, this document is licensed underCreative Commons Attribution ShareAlike 3.0 3.0/legalcodeii

Hortonworks Data PlatformOct 28, 2014Table of Contents1. ACLs on HDFS . 11.1. Configuring ACLs on HDFS . 11.2. Using CLI Commands to Create and List ACLs . 11.3. ACLs Examples . 31.3.1. Introduction: ACLs Versus Permission Bits . 31.3.2. Example 1: Granting Access to Another Named Group . 41.3.3. Example 2: Using a Default ACL for Automatic Application to NewChildren . 41.3.4. Example 3: Blocking Access to a Sub-Tree for a Specific User . 61.4. ACLS on HDFS Features . 61.5. Use Cases for ACLs on HDFS . 81.5.1. Multiple Users . 81.5.2. Multiple Groups . 81.5.3. Hive Partitioned Tables . 91.5.4. Default ACLs . 91.5.5. Minimal ACL/Permissions Only . 101.5.6. Block Access to a Sub-Tree for a Specific User . 101.5.7. ACLs with Sticky Bit . 102. Capacity Scheduler . 122.1. Introduction . 122.2. Enabling Capacity Scheduler . 132.3. Setting up Queues . 132.4. Controlling Access to Queues with ACLs . 152.5. Managing Cluster Capacity with Queues . 162.6. Setting User Limits . 192.7. Application Reservations . 212.8. Starting and Stopping Queues . 212.9. Setting Application Limits . 222.10. Preemption . 232.11. Scheduler User Interface . 243. Centralized Cache Management in HDFS . 263.1. Overview . 263.2. Caching Use Cases . 263.3. Caching Architecture . 273.4. Caching Terminology . 283.5. Configuring Centralized Caching . 283.5.1. Native Libraries . 293.5.2. Configuration Properties . 293.5.3. OS Limits . 303.6. Using Cache Pools and Directives . 313.6.1. Cache Pool Commands . 313.6.2. Cache Directive Commands . 324. Configuring Rack Awareness on HDP . 354.1. Create a Rack Topology Script . 354.2. Add the Topology Script Property to core-site.xml . 364.3. Restart HDFS and MapReduce . 364.4. Verify Rack Awareness . 375. Using DistCp to Copy Files . 39iii

Hortonworks Data PlatformOct 28, 20145.1. Using DistCp .5.2. Command Line Options .5.3. Update and Overwrite .5.4. DistCp and Security Settings .5.5. DistCp and HDP Version .5.6. DistCp Data Copy Matrix: HDP1/HDP2 to HDP2 .5.7. Copying Data from HDP-2.x to HDP-1.x Clusters .5.8. DistCp Architecture .5.8.1. DistCp Driver .5.8.2. Copy-listing Generator .5.8.3. InputFormats and MapReduce Components .5.9. DistCp Frequently Asked Questions .5.10. Appendix .5.10.1. Map Sizing .5.10.2. Copying Between Versions of HDFS .5.10.3. MapReduce and Other Side-Effects .5.10.4. SSL Configurations for HSFTP Sources .6. Decommissioning Slave Nodes .6.1. Prerequisites .6.2. Decommission DataNodes or NodeManagers .6.2.1. Decommission DataNodes .6.2.2. Decommission NodeManagers .6.3. Decommission HBase RegionServers .7. Manually Add Slave Nodes to a HDP Cluster .7.1. Prerequisites .7.2. Add Slave Nodes .7.3. Add HBase RegionServer .8. NameNode High Availability for Hadoop .8.1. Architecture .8.2. Hardware Resources .8.3. Deploy NameNode HA Cluster .8.3.1. Configure NameNode HA Cluster .8.3.2. Deploy NameNode HA Cluster .8.3.3. Deploy Hue with an HA Cluster .8.3.4. Deploy Oozie with HA Cluster .8.4. Operating a NameNode HA cluster .8.5. Configure and Deploy NameNode Automatic Failover .8.5.1. Prerequisites .8.5.2. Instructions .8.5.3. Configuring Oozie Failover .8.6. Appendix: Administrative Commands .9. ResourceManager High Availability for Hadoop .9.1. Hardware Resources .9.2. Deploy ResourceManager HA Cluster .9.2.1. Configure Manual or Automatic ResourceManager Failover .9.2.2. Deploy the ResourceManager HA Cluster .9.2.3. Minimum Settings for Automatic ResourceManager HA Configuration.10. Hadoop Archives .10.1. Introduction .10.2. Hadoop Archive Components 889

Hortonworks Data Platform11.12.13.14.15.16.17.Oct 28, 201410.3. Creating a Hadoop Archive . 9010.4. Looking Up Files in Hadoop Archives . 9010.5. Hadoop Archives and MapReduce . 91High Availability for Hive Metastore . 9211.1. Use Cases and Fail Over Scenarios . 9211.2. Software Configuration . 9211.2.1. Install HDP . 9311.2.2. Update the Hive Metastore . 9311.2.3. Validate configuration . 94Highly Available Reads with HBase . 9512.1. Understanding HBase Concepts . 9512.1.1. Region Servers and Secondary Mode . 9512.1.2. Timeline and Strong Data Consistency . 9712.2. Enabling HA Reads for HBase . 9712.3. Creating Highly

28.10.2014 · Hortonworks Data Platform Oct 28, 2014 1 1. ACLs on HDFS This guide describes how to use Access Control Lists (ACLs) on the Hadoop Distributed File System (HDFS).