Integrated Program In BIG DATA And DATA SCIENCE

Transcription

Integrated Program InBIG DATA andDATA SCIENCECONTINUING STUDIES

Table of ContentsAbout the Course. 03Key Features of Integrated Program inBig Data and Data Science.04Learning Path. 05Key Learning Objectives.06Step 1 : Data Science with R. 07Data Science with PythonStep 2 : Big Data Hadoop and Spark Developer.09Step 3 : Tableau Desktop. 10Step 4 : Machine Learning. 11Electives.122 http://learnmore.duke.edu

About the CourseThe Big Data and Data Science program is a five-course,integrated, all-inclusive certificate program for Big Data andData Science professionals. The curriculum is comprehensiveand spans the major technologies in big data, data science,and reporting/visualization. The recommended learning pathfor this certificate program has been designed by renownedindustry experts and big data influences to maximize yourlearning potential. As each course of the program builds uponthe next, concepts introduced initially in the learning path willcontribute to your proficiency with concepts for the latercourses of the program.Resources such as live virtual teaching sessions, access toan instructor, and non-graded electives of your choosingreinforce this programs learning experience.3 http://learnmore.duke.edu

Key FeaturesIndustry-recommended learning pathAccess to 300 hours of content created by industry expertsHands-on project execution on CloudLabsAligned with the Cloudera CCA175 Certification and TableauDesktop 10 Associate CertificationDuke University Certificate upon successful completion of the course30 real-life industry projects in retail, insurance, healthcare,banking, telecommunication, airline and social media4 http://learnmore.duke.edu

Learning Path1Data Sciencewith RData Sciencewith Python23Big Data Hadoopand Spark DeveloperTableau DesktopMachine Learning4BIG DATA AND DATA SCIENCE5 http://learnmore.duke.edu

Key Learning ObjectivesThis learning path is designed for a professional interested in the field ofanalytics who wishes to develop skills in both big data and data science.Data Science with RLearn R programming language and all the importantstatistical and predictive analytics conceptsData Science with PythonIntroduces the various packages in Python like NumPy,SciPy, Pandas, and Scikit-learn for performing dataanalysis.Big Data Hadoop and Spark DeveloperLearn the various components of Hadoop and Sparkecosystem. The course is aligned to Cloudera CCA175certification.Tableau Desktop and Visualization TrainingLearn the various aspects of Tableau. Aligned withTableau Desktop Qualified Associate certification.Machine LearningGain an understanding of Machine Learning applicationsand algorithms. It also covers deep learning and SparkMachine learning.6 http://learnmore.duke.edu

STEP 1 2 3 4Data Science with RThis course has been designed to impart an in-depthknowledge of the various data analytics techniques thatcan be performed using R. It includes real-life projects,case studies, and R CloudLabs for practice.Key Learning ObjectivesGain a foundational understanding of business analytics.Learn the R programming and how various statements are executed.Gain an in-depth understanding of data structure used in R and learnto import/export data in R.Define and use the various apply functions and DPLYP functions.Recognize and use the various graphics in R for data visualization.Gain a basic understanding of the various statistical concepts.Understand the hypothesis testing method to drive business decisions.Become familiar with regression models and classification techniques.Learn and use the various association rules and the Apriori algorithm.Gain an understanding of clustering methods including K-means,DBSCAN, and hierarchical clustering.7 http://learnmore.duke.edu

STEP 1 2 3 4Data Science with PythonLearn data analytics, machine learning, and webscraping using Python programming. Gain an in-depthunderstanding of the various packages in Python likeNumPy, SciPy, Pandas, and Scikit-learn for performingdata analysis, implementing machine learning models,and NLP.Key Learning ObjectivesGain an in-depth understanding of data wrangling, data exploration,data visualization, hypothesis building, and testing.Understand the essential concepts of Python programming like datatypes, tuples, lists, dicts, basic operators, and functions.Perform high-level mathematical computing using NumPy packageand its large library of mathematical functions.Conduct scientific and technical computing using SciPy package andits sub-packages such as Integrate, Optimize, Statistics, IO, and Weave.Perform data analysis and manipulation using data structures andtools provided in Pandas package.Gain knowledge in machine learning using the Scikit-Learn package.Use matplotlib library of Python for data visualization.Extract useful data from websites by performing web scrapping.Integrate Python with Hadoop, Spark, and MapReduce.8 http://learnmore.duke.edu

STEP 1 2 3 4Big Data Hadoop & Spark DeveloperThis course has been designed to impart an in-depthknowledge of Big Data processing using Hadoop andSpark. The course contains real-life projects and casestudies to be executed in CloudLabs and aligns withthe Cloudera CCA175 certification.Key Learning ObjectivesUnderstand the architecture of HDFS and YARN, and learn how to workwith them for storage and resource management.Recognize MapReduce and its characteristics.Receive an overview of Sqoop and Flume and how to ingest data.Create databases and tables in Hive and Impala, understand HBase, anduse Hive and Impala for partitioning.Learn Flume architecture, sources, sinks and configurations.Understand HBase, its architecture, data storage.Gain a working knowledge of Pig and its components.Perform functional programming in Spark, understand RDDs and buildSpark applications.Learn Spark SQL, and learn about creating, transforming, andquerying data frames.Course aligned with the Cloudera Big Data CCA175 certification.9 http://learnmore.duke.edu

STEP 1 2 3 4Tableau Desktop 10The focus of the course is to help you learn TableauDesktop 10 skills such as visualization building, analytics,and dashboards. This course is also aligned with theTableau Desktop 10 Qualified Associate exam.Key Learning ObjectivesGrasp the concepts of Tableau Desktop 10 and learn Tableau statisticsand building interactive dashboards.Learn data connections as well as organizing and simplifying data.Understand formatting, annotations, and spatial analysis.Become familiar with special field types and Tableau generated fields.Review the concepts of using charts including Pareto, waterfall, Gantt,box plots, Sparkline and perform market basket analysis.Learn fundamental calculations along with automatic and customsplit, ad-hoc analytics, and LOD calculations.Understand process of creating and using parameters and gaincommand over mapping concepts such as custom geocodingand radial selections.10 http://learnmore.duke.edu

STEP 1 2 3 4Machine learningThis course provides advanced-level training onMachine Learning applications and algorithms.It will give you hands-on experience in multiple,highly sought-after machine learning skills in bothsupervised and unsupervised learning. Thismachine learning training helps you learn toapply machine learning algorithms like regression,clustering, classification, and recommendation.The unique case study approach ensures you areworking hands-on with data while you learn. You’llalso receive training in deep learning and SparkMachine learning—skills which are in highdemand today.Key Learning ObjectivesClassify the types of learning including supervised and unsupervised.Identify the various applications of machine learning algorithms.Perform supervised learning techniques: linear and logistic regression.Understand classification data and models.Use unsupervised learning algorithms including deep learning,clustering, and recommendation systems.Experience using machine learning with Spark.11 http://learnmore.duke.edu

Elective CoursesData Science with SASThe data science with SAS training is designed to impartan in-depth knowledge of SAS programming language,SAS tools, and various advanced analytics techniques.Apache Spark and ScalaWith this Apache Spark you will learn the essential skillssuch as Spark Streaming, Spark SQL, Machine LearningProgramming, GraphX Programming, Shell ScriptingSpark.MongoDB Developer and AdministratorMongoDB training helps you learn data modelling,ingestion, query and Sharding, Data Replicationwith MongoDB along with installing, updating, andmaintaining MongoDB environment.CassandraThe Apache Cassandra training provides you with indepthknowledge of Cassandra architecture, features,configuration and hadoop ecosystem around this NoSQLdatabase.12 http://learnmore.duke.edu

Business Analytics with ExcelBusiness Analytics with Excel training has beendesigned to help initiate you to the world of analytics.For this we use the most commonly used analyticstool—Microsoft Excel. The training will equip you withthe concepts and hard skills required to work in thisindustry.Apache StormApache Storm training provides you with experiencein stream processing Big Data technology of ApacheStorm.Impala: An Open Source SQL Engine forHadoop Training CourseThe “Impala: An Open Source SQL Engine for Hadoop”is an ideal course package for individuals who want tounderstand the basic concepts of Massively ParallelProcessing or MPP SQL query engine that runs onApache Hadoop. Upon completing this course, learnerswill be able to interpret the role of Impala in the BigData Ecosystem.Apache KafkaThe Apache Kafka course guides participants throughthe Kafka architecture, installation, interfaces, andconfiguration. The participants are also trained in thefundamental concepts of Big Data in this course.13 http://learnmore.duke.edu

Tableau Server 10 Qualified AssociateThe Tableau Server 10 Qualified Associate course isdesigned to impart in-depth understanding and skills toimplement, administer, and manage Tableau 10 server.This course is designed for Tableau server users andadministrators.Big Data Hadoop AdministratorBig Data and Hadoop Administrator course is alignedwith Cloudera’s CCAH “CCA-500” certification andcovers the core Hadoop distributions—Apache Hadoopand Vendor specific distribution—CDH (ClouderaDistribution of Hadoop).14 http://learnmore.duke.edu

InstructorsRonald Van LoonTop 10 Big Data & Data Science Influencer,Director - AdversitementNamed by Onalytica as one of the three mostinfluential people in Big Data, Ronald writes fora number of leading Big Data and Data Sciencewebsites, including Datafloq, Data ScienceCentral, and The Guardian. He is a regularspeaker at renowned events.Sina JamshidiBig Data Lead at Bell LabsSina has over 10 years of experience in theTechnology field as a Big Data Architect atBell Labs and as a Platinum-level trainer.Sina is a very passionate about building aBig Data education ecosystem and hasbeen a contributor in a number of publicand journal publications.Simon TavasoliAnalytics Lead at Cancer Care, OntarioSimon is a Data Scientist with 12 years ofexperience in Healthcare Analytics. He has aMasters in Biostatistics from the University ofWestern Ontario. Simon is passionate aboutteaching data science and has publishedseveral journals in preventive medicineanalytics.15 http://learnmore.duke.edu

InstructorsAlvaro FuentesFounder and Data Scientist at Quant CompanyAlvaro is a Data Scientist who founded QuantCompany and has also worked as a leadEconomic analyst in the Central Bank ofGuatemala. He is a M.S. in QuantitativeEconomics and Applied Mathematics and isactively involved in consulting and training inthe data science space.Paul SharkovData Scientist at BMO Financial Group, Member ofSAS Canada CommunityPaul is lead SAS Data Scientist at Bank ofMontreal. As a SAS Certified PredictiveModeler, SAS Statistical Business Analyst, andSAS Certified Advanced Programmer, Paul ispassionate about sharing his knowledge onhow data science can support data-drivenbusiness decisions.Live virtual classrooms are facilitated by qualified industry subject matterexperts in alignment with the curriculum designed by the instructorslisted above.16 http://learnmore.duke.edu

CONTINUING STUDIESDuke Continuing StudiesBox 90700Duke University East CampusDurham, NC 27708-0700(919) u

integrated, all-inclusive certificate program for Big Data and Data Science professionals. The curriculum is comprehensive and spans the major technologies in big data, data science, and reporting/visualization. The recommended learning path for this certificate program has been designed by renowned industry experts and big data influences to maximize your learning potential. As each course