Training Catalog - Hortonworks

Transcription

Training CatalogApache Hadoop Training from the ExpertsCopyright 2016, Hortonworks, Inc. All rights reserved.February 20161

2Copyright 2016, Hortonworks, Inc. All rights reserved.

Hortonworks UniversityHortonworks University provides an immersive and valuable real world experience withscenario-based training Courses in public, private on site and virtual led courses, s self-pacedlearning library, and an Academic program. All courses include Industry-leading lecture andhands-on labs.Individualized Learning PathsHadoop CertificationJoin an exclusive group of professionals with demonstrated skills and the qualifications toprove it. Hortonworks certified professionals are recognized as leaders in the field.Hortonworks certified professionals are recognized as leaders in the field.Hortonworks Certified Developer: HDP Certified Developer (HDPCD) HDP Certified Developer: Java (HDPCD: Java)Hortonworks Certified Administrator: HDP Certified Administrator (HDPCA)Copyright 2016, Hortonworks, Inc. All rights reserved.3

4Copyright 2016, Hortonworks, Inc. All rights reserved.

Table of ContentsHortonworks University Self-Paced Learning Library7HDP Overview: Apache Hadoop Essentials8HDP Analyst: Apache HBase Essentials9HDP Analyst: Data Science10HDP Developer: Apache Pig and Hive11HDP Developer: Java12HDP Developer: Windows13HDP Developer: Custom YARN Applications14HDP Developer: Apache Spark using Python15HDP Developer: Apache Spark using Scala16HDP Developer: Storm and Trident Fundamentals17HDP Operations: Hadoop Administration 118HDP Operations: Hadoop Administration 219HDP Operations: Apache HBase Advanced Management20HDP Operations: Hortonworks Data Flow21HDP Operations: Security22HDP Operations: Migrating to the Hortonworks Data Platform23HDP Certified Administrator (HDPCA)24HDP Certified Developer (HDPCD)27HDP Certified Java Developer (HDPCD Java)31Hortonworks University Academic Program32Copyright 2016, Hortonworks, Inc. All rights reserved.5

6Copyright 2016, Hortonworks, Inc. All rights reserved.

Hortonworks University Self-Paced Learning LibraryOverviewSelf-Paced Learning ContentHortonworks University Self-Paced Learning Library is an ondemand, online, learning repository that is accessed using aHortonworks University account. Learners can view lessonsanywhere, at any time, and complete lessons at their own pace.Lessons can be stopped and started, as needed, and completionis tracked via the Hortonworks University Learning ManagementSystem. This learning library makes it easy for Hadoop Administrators,Data Analysts, and Developers to continuously learn and stay upto-date on Hortonworks Data Platform. HDP Overview: HDP EssentialsHDP Developer Learning Patho Apache Pig and Hiveo Windowso Developing Applications with Javao Developing Custom YARN Applicationso Storm and Trident Fundamentalso Apache Spark using Python**o Apache Spark using Scala**HDP Operationso Hadoop Administration Io Hortonworks Data Flow**o Apache HBase Advanced Managemento Migrating to HDPo Hadoop Administration II**o Hadoop Security**HDP Analysto Apache Pig and Hiveo Apache HBase Essentialso Data ScienceHortonworks University courses are designed and developed byHadoop experts and provide an immersive and valuable realworld experience. In our scenario-based training courses, weoffer unmatched depth and expertise. We prepare you to be anexpert with highly valued, practical skills and prepare you tosuccessfully complete Hortonworks Technical Certifications. The Self-Paced learning library accelerates time to Hadoopcompetency. In addition, the learning library content is constantlybeing expanded with new content being added on an ongoingbasis.** Coming soon!DurationAccessing the Self-Paced Learning LibraryAccess to the Hortonworks University Self Paced LearningLibrary is provided for a 12-month subscription period perindividual named user. The subscription includes access to over400 hours of individual lessons.Target AudienceThe Hortonworks University Self-Paced Learning Library isdesigned for architects, developers, analysts, data scientists, andIT decision makers – as well as those new to Hadoop essentiallyanyone with a need or desire to learn more about ApacheHadoop and the Hortonworks Data Platform framework.CertificationHortonworks offers a comprehensive certification program thatidentifies you as an expert in Apache Hadoop. Visithortonworks.com/training/certification for more information.Access to Hortonworks Self Paced Learning Library is included aspart of the Hortonworks Enterprise, Enterprise Plus & PremiereSubscriptions for each named Support Contact. Additional SelfPaced Learning Library subscriptions can be purchased on a peruser basis for individuals who are not named Support Contacts.PrerequisitesNone.Hortonworks UniversityHortonworks University is your expert source for Apache Hadooptraining and certification. Public and private on-site courses areavailable for developers, administrators, data analysts and otherIT professionals involved in implementing big data solutions.Classes combine presentation material with industry-leadinghands-on labs that fully prepare students for real-world Hadoopdeployments.For more information contact: trainingops@hortonworks.comAbout HortonworksHortonworks develops, distributes and supports theonly 100 percent open source distribution ofApache Hadoop explicitly architected, built andtested for enterprise-grade deployments.US: 1.855.846.7866International: 1.408.916.4121www.hortonworks.com5470 Great America ParkwaySanta Clara, CA 95054 USA

HDP Overview:Apache Hadoop EssentialsOverviewThis course provides a technical overview of Apache Hadoop. Itincludes high-level information about concepts, architecture,operation, and uses of the Hortonworks Data Platform (HDP) andthe Hadoop ecosystem. The course provides an optional primerfor those who plan to attend a hands-on, instructor-led courseCourse Objectives Describe what makes data “Big Data” List data types stored and analyzed in Hadoop Describe how Big Data and Hadoop fit into your current infrastructure and environmentDescribe fundamentals of:o the Hadoop Distributed File System (HDFS)o YARNo MapReduceo Hadoop frameworks: (Pig, Hive, HCatalog, Storm, Solr,Spark, HBase, Oozie, Ambari, ZooKeeper, Sqoop,Flume, and Falcon)o Recognize use cases for Hadoopo Describe the business value of Hadoopo Describe new technologies like Tez and the KnoxGatewayHands-On Labs There are no labs for this course.Duration8Hours,OnLine.Target AudienceData architects, data integration architects, managers, C-levelexecutives, decision makers, technical infrastructure team, andHadoop administrators or developers who want to understandthe fundamentals of Big Data and the Hadoop ecosystem.PrerequisitesNo previous Hadoop or programming knowledge is required.Students will need browseraccesstotheInternet.Format 100%self- scussionCertificationHortonworks offers a comprehensive certification program thatidentifies you as an expert in Apache Hadoop. Visithortonworks.com/training/certification for more information.Hortonworks UniversityHortonworks University is your expert source for Apache Hadooptraining and certification. Public and private on-site courses areavailable for developers, administrators, data analysts and otherIT professionals involved in implementing big data solutions.Classes combine presentation material with industry-leadinghands-on labs that fully prepare students for real-world Hadoopscenarios.About HortonworksHortonworks develops, distributes and supports theonly 100 percent open source distribution ofApache Hadoop explicitly architected, built andtested for enterprise-grade deployments.US: 1.855.846.7866International: 1.408.916.4121www.hortonworks.com5470 Great America ParkwaySanta Clara, CA 95054 USA

HDP Analyst: Apache HBase EssentialsOverviewThis course is designed for big data analysts who want to use theHBase NoSQL database which runs on top of HDFS to providereal-time read/write access to sparse datasets. Topics includeHBase architecture, services, installation and schema design.Course Objectives How HBase integrates with Hadoop and HDFSArchitectural components and core concepts of HBaseHBase functionalityInstalling and configuring HBaseHBase schema designImporting and exporting dataBackup and recoveryMonitoring and managing HBaseHow Apache Phoenix works with HBaseHow HBase integrates with Apache ZooKeeperHBase services and data operationsOptimizing HBase AccessHands-On Labs Using Hadoop and MapReduceUsing HBaseImporting Data from MySQL to HBaseUsing Apache ZooKeeperExamining Configuration FilesUsing Backup and SnapshotHBase Shell OperationsCreating Tables with Multiple Column FamiliesExploring HBase SchemaBlocksize and Bloom filtersExporting DataUsing a Java Data Access Object Application toInteract with HBaseDuration2daysTarget AudienceArchitects, software developers, and analysts responsible forimplementing non-SQL databases in order to handle sparse datasets commonly found in big data use cases.PrerequisitesStudents must have basic familiarity with data managementsystems. Familiarity with Hadoop or databases is helpful but notrequired. Students new to Hadoop are encouraged to attend theHDP Overview: Apache Hadoop Essentials ertificationHortonworks offers a comprehensive certification program thatidentifies you as an expert in Apache Hadoop. Visithortonworks.com/training/certification for more information.Hortonworks UniversityHortonworks University is your expert source for Apache Hadooptraining and certification. Public and private on-site courses areavailable for developers, administrators, data analysts and otherIT professionals involved in implementing big data solutions.Classes combine presentation material with industry-leadinghands-on labs that fully prepare students for real-world Hadoopscenarios.About HortonworksHortonworks develops, distributes and supports theonly 100 percent open source distribution ofApache Hadoop explicitly architected, built andtested for enterprise-grade deployments.US: 1.855.846.7866International: 1.408.916.4121www.hortonworks.com5470 Great America ParkwaySanta Clara, CA 95054 USA

HDP Analyst: Data ScienceOverviewHands-On ContentThis course Provides instruction on the processes and practice ofdata science, including machine learning and natural languageprocessing. Included are: tools and programming languages(Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib. Duration3 daysTarget AudienceArchitects, software developers, analysts and data scientists whoneed to apply data science and machine learning on Hadoop.Course Objectives Recognize use cases for data science on HadoopDescribe the Hadoop and YARN architectureDescribe supervised and unsupervised learning differencesUse Mahout to run a machine learning algorithm on HadoopDescribe the data science life cycleUse Pig to transform and prepare data on HadoopWrite a Python scriptDescribe options for running Python code on a Hadoop clusterWrite a Pig User-Defined Function in PythonUse Pig streaming on Hadoop with a Python scriptUse machine learning algorithmsDescribe use cases for Natural Language Processing (NLP)Use the Natural Language Toolkit (NLTK)Describe the components of a Spark applicationWrite a Spark application in PythonRun machine learning algorithms using Spark MLlibTake data science into productionPrerequisitesStudents must have experience with at least one programming orscripting language, knowledge in statistics and/or mathematics,and a basic understanding of big data and Hadoop principles.Students new to Hadoop are encouraged to attend the HDPOverview: Apache Hadoop Essentials course.Lab: Setting Up a Development EnvironmentDemo: Block StorageLab: Using HDFS CommandsDemo: MapReduceLab: Using Apache Mahout for Machine LearningDemo: Apache PigLab: Getting Started with Apache PigLab: Exploring Data with PigLab: Using the IPython NotebookDemo: The NumPy PackageDemo: The pandas LibraryLab: Data Analysis with PythonLab: Interpolating Data PointsLab: Defining a Pig UDF in PythonLab: Streaming Python with PigDemo: Classification with Scikit-LearnLab: Computing K-Nearest NeighborLab: Generating a K-Means ClusteringLab: POS Tagging Using a Decision TreeLab: Using NLTK for Natural Language ProcessingLab: Classifying Text using Naive BayesLab: Using Spark Transformations and ActionsLab Using Spark MLlibLab: Creating a Spam Classifier with MLlibFormat50% Lecture/Discussion50% Hands-on LabsCertificationHortonworks offers a comprehensive certification program thatidentifies you as an expert in Apache Hadoop. Visithortonworks.com/training/certification for more information.Hortonworks UniversityHortonworks University is your expert source for Apache Hadooptraining and certification. Public and private on-site courses areavailable for developers, administrators, data analysts and otherIT professionals involved in implementing big data solutions.Classes combine presentation material with industry-leadinghands-on labs that fully prepare students for real-world Hadoopscenarios.About HortonworksHortonworks develops, distributes and supports theonly 100 percent open source distribution ofApache Hadoop explicitly architected, built andtested for enterprise-grade deployments.US: 1.855.846.7866International: 1.408.916.4121www.hortonworks.com5470 Great America ParkwaySanta Clara, CA 95054 USA

HDP Developer: Apache Pig and HiveOverviewHands-On LabsThis course is designed for developers who need to createapplications to analyze Big Data stored in Apache Hadoop usingPig and Hive. Topics include: Hadoop, YARN, HDFS,MapReduce, data ingestion, workflow definition and using Pigand Hive to perform data analytics on Big Data. Labs areexecuted on a 7-node HDP cluster. Duration4 daysTarget AudienceSoftware developers who need to understand and developapplications for Hadoop.Course Objectives Describe Hadoop, YARN and use cases for HadoopDescribe Hadoop ecosystem tools and frameworksDescribe the HDFS architectureUse the Hadoop client to input data into HDFSTransfer data between Hadoop and a relational databaseExplain YARN and MaoReduce architecturesRun a MapReduce job on YARNUse Pig to explore and transform data in HDFSUse Hive to explore Understand how Hive tables are definedand implementedand analyze data setsUse the new Hive windowing functionsExplain and use the various Hive file formatsCreate and populate a Hive table that uses ORC file formatsUse Hive to run SQL-like queries to perform data analysisUse Hive to join datasets using a variety of techniques,including Map-side joins and Sort-Merge-Bucket joinsWrite efficient Hive queriesCreate ngrams and context ngrams using HivePerform data analytics like quantiles and page rank on BigData using the DataFu Pig libraryExplain the uses and purpose of HCatalogUse HCatalog with Pig and HiveDefine a workflow using OozieSchedule a recurring workflow using the Oozie Coordinator Lab: Starting and HDP 2.3 ClusterDemo: Block StprageLab: Using HDFS commandsLab: Importing and Exporting Data in HDFSLab: Using Flume to import log files into HDFSDemo: MapReduceLab: Running a MapReduce JobDemo: Apache PigLab: Getting started with Apache PigLab: Exploring data with Apache PigLab: Splitting a datasetUse Sqoop to transfer data betweenHDFS and a RDBMSRun MapReduce and YARN application jobsExplore and transform data using PigSplit and join a dataset using PigUse Pig to transform and export a dataset for use with HiveUse HCatLoader and HCatStorerUse Hive to discover useful information in a datasetDescribe how Hive queries get executed as MapReduce jobsPerform a join of two datasets with HiveUse advanced Hive features: windowing, views, ORC filesUse Hive analytics functionsWrite a custom reducer in PythonAnalyze and sessionize clickstream dataCompute quantiles of NYSE stock pricesUse Hive to compute ngrams on Avro-formatted filesLab: Exploring Spark SQLLab: Defining an Oozie workflowPrerequisitesStudents should be familiar with programming principles andhave experience in software development. SQL knowledge is alsohelpful. No prior Hadoop knowledge is required.Format50% Lecture/Discussion50% Hands-on LabsCertificationHortonworks offers a comprehensive certification program thatidentifies you as an expert in Apache Hadoop. VisitAbout HortonworksHortonworks develops, distributes and supports theonly 100 percent open source distribution ofApache Hadoop explicitly architected, built andtested for enterprise-grade deployments.US: 1.855.846.7866International: 1.408.916.4121www.hortonworks.com5470 Great America ParkwaySanta Clara, CA 95054 USA

HDP Developer: JavaOverviewHands-On LabsThis advanced course provides Java programmers a deep-diveinto Hadoop application development. Students will learn how todesign and develop efficient and effective MapReduceapplications for Hadoop using the Hortonworks Data Platform,including how to implement combiners, partitioners, secondarysorts, custom input and output formats, joining large datasets,unit testing, and developing UDFs for Pig and Hive. Labs are runon a 7-node HDP 2.1 cluster running in a virtual machine thatstudents can keep for use after the training. Duration4daysTarget AudienceExperienced Java software engineers who need to develop JavaMapReduce applications for Hadoop.Course Objectives Describe Hadoop 2 and the Hadoop Distributed File SystemDescribe the YARN frameworkDevelop and run a Java MapReduce application on YARNUse combiners and in-map aggregationWrite a custom partitioner to avoid data skew on reducersPerform a secondary sortRecognize use cases for built-in input and output formatsWrite a custom MapReduce input and output formatOptimize a MapReduce jobConfigure MapReduce to optimize mappers and reducersDevelop a custom RawComparator classDistribute files as LocalResourcesDescribe and perform join techniques in HadoopPerform unit tests using the UnitMR APIDescribe the basic architecture of HBaseWrite an HBase MapReduce applicationList use cases for Pig and HiveWrite a simple Pig script to explore and transform big dataWrite a Pig UDF (User-Defined Function) in JavaWrite a Hive UDF in JavaUse JobControl class to create a MapReduce workflowUse Oozie to define and schedule workflowsConfiguring a Hadoop Development EnvironmentPutting data into HDFS using JavaWrite a distributed grep MapReduce applicationWrite an inverted index MapReduce applicationConfigure and use a combinerWriting custom combiners and partitionersGlobally sort output using the TotalOrderPartitionerWriting a MapReduce job to sort data using a composite keyWriting a custom InputFormat classWriting a custom OutputFormat classCompute a simple moving average of stock price dataUse data compressionDefine a RawComparatorPerform a map-side joinUsing a Bloom filterUnit testing a MapReduce jobImporting data into HBaseWriting an HBase MapReduce jobWriting User-Defined Pig and Hive functionsDefining an Oozie workflowPrerequisitesStudents must have experience developing Java applications andusing a Java IDE. Labs are completed using the Eclipse IDE andGradle. No prior Hadoop knowledge is required.Format50%Lecture/Discussion50%Hands- ‐onLabsCertificationHortonworks offers a comprehensive certification program thatidentifies you as an expert in Apache Hadoop. Visithortonworks.com/training/certification for more information.Hortonworks UniversityHortonworks University is your expert source for Apache Hadooptraining and certification. Public and private on-site courses areavailable for developers, administrators, data analysts and otherIT profession

learn), the Natural Language Toolkit (NLTK), and Spark MLlib. Duration 3 days Target Audience Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop. Course Objectives Recognize use cases for data science on Hadoop Describe the Hadoop and YARN architecture