Big Data Course And Learning Model For Online Education (LMO) At The .

Transcription

Big Data course and Learning Model for Onlineeducation (LMO)at the Laureate Online Education(University of Liverpool)Yuri Demchenko (University of Amsterdam)Emanuel Gruengard (Laureate Online Education)RDA EDISON Workshop21 September 2014, Amsterdam

Outline Need for professional education in Big Data Big Data definition and Big Data ArchitectureFramework (BDAF) Common Body of Knowledge in Big Data Collaborative Online Learning Model Principles atLaureate Online Education (LOE) Big Data and Data Analytics Course Bloom’s Taxonomy and Andragogy Summary and next stepsEDISON Wsh 21 Sept 2014Big Data Education at LOE2

Professional Education ObjectivesAn effective professional education needs to provide for theprofessional level of knowledge to achieve the following1) Master basic concepts and major application areas2) Compare similar concepts (and concepts inter-relation) andalternatives, as well as application specific areas3) Appraise basic technologies and their relation to the basicconcepts Challenges due to Big Data is very wide technology domain– Comparing to still narrower Cloud Computing New types of skills in Big Data– Analytics and research methodsEDISON Wsh 21 Sept 2014Big Data Education at LOE3

Improved: 6 (5 1) V’s of Big DataVolumeVariety kedDynamic TerabytesRecords/ArchTables, FilesDistributed Adopted in generalby NIST BD-WGBDDAC2014 @CTS2014BatchReal/near-timeProcessesStreams6 Vs ofBig Data Changing data Changing model LinkageVariabilityVelocity tworthinessAuthenticityOrigin, ReputationAvailabilityAccountabilityGeneric Big DataProperties Volume Variety VelocityAcquired Properties(after entering system) Value Veracity VariabilityCommonly accepted3V’s of Big DataVeracityBig Data Architecture Framework4

Big Data Definition: From 6V to 5 Parts (1)(1) Big Data Properties: 5V– Volume, Variety, Velocity, Value, Veracity– Additionally: Data Dynamicity (Variability)(2) New Data Models– Data linking, provenance and referral integrity– Data Lifecycle and Variability/Evolution(3) New Analytics–Real-time/streaming analytics, interactive and machine learning analytics(4) New Infrastructure and Tools––––High performance Computing, Storage, NetworkHeterogeneous multi-provider services integrationNew Data Centric (multi-stakeholder) service modelsNew Data Centric security models for trusted infrastructure and data processingand storage(5) Source and Target– High velocity/speed data capture from variety of sensors and data sources– Data delivery to different visualisation and actionable systems and consumers– Full digitised input and output, (ubiquitous) sensor networks, full digital controlBDDAC2014 @CTS2014Big Data Architecture Framework5

Big Data Definition: From 6V to 5 Parts (2)Refining Gartner definition“Big data is (1) high-volume, high-velocity and high-variety information assets thatdemand (3) cost-effective, innovative forms of information processing for (5)enhanced insight and decision making” Big Data (Data Intensive) Technologies are targeting to process (1) high-volume,high-velocity, high-variety data (sets/assets) to extract intended data value andensure high-veracity of original data and obtained information that demand costeffective, innovative forms of data and information processing (analytics) forenhanced insight, decision making, and processes control; all of those demand(should be supported by) new data models (supporting all data states and stagesduring the whole data lifecycle) and new infrastructure services and tools thatallows also obtaining (and processing data) from a variety of sources (includingsensor networks) and delivering data in a variety of forms to different data andinformation consumers and devices.(1) Big Data Properties: 5V(2) New Data Models(3) New Analytics(4) New Infrastructure and Tools(5) Source and TargetBDDAC2014 @CTS2014Big Data Architecture Framework6

Big Data Architecture Framework (BDAF)(1) Data Models, Structures, Types– Data formats, non/relational, file systems, etc.(2) Big Data Management– Big Data Lifecycle (Management) Model Big Data transformation/staging– Provenance, Curation, Archiving(3) Big Data Analytics and Tools– Big Data Applications Target use, presentation, visualisation(4) Big Data Infrastructure (BDI)– Storage, Compute, (High Performance Computing,) Network– Sensor network, target/actionable devices– Big Data Operational support(5) Big Data Security– Data security in-rest, in-move, trusted processing environmentsBDDAC2014 @CTS2014Big Data Architecture Framework7

Big Data Infrastructure and Analytics ToolsBig Data Infrastructure Heterogeneous multi-providerinter-cloud infrastructure Data managementinfrastructure Collaborative Environment(user/groups managements) Advanced high performance(programmable) network Security infrastructureBig Data Analytics High Performance ComputerClusters (HPCC) Analytics/processing: Realtime, Interactive, Batch,Streaming Big Data Analytics tools andapplicationsBDDAC2014 @CTS2014Big Data Architecture Framework8

Data Lifecycle/Transformation ModelData Model (1)Data Model (1)Data Model (3)Data Model (4)Data (inter)linking? PID/OID ORCID Identification Privacy, OpacityData ataDelivery,VisualisationConsumerData AnaliticsApplicationDataSourceCommon Data Model? Data Variety and Variability Semantic InteroperabilityData repurposing,Analitics re-factoring,Secondary processing Does Data Model changes alonglifecycle or data evolution?Identifying and linking dataBDDAC2014 @CTS2014––––Persistent identifiersData ownershipTraceability vs OpacityReferral integrityBig Data Architecture Framework9

Big Data and Data Science Skill Taxonomy Data Science Competencies Taxonomy by HPC tencies/––––Undergraduate Level Computational Science CompetenciesGraduate Level Computational Science CompetenciesBasic Data Driven Science CompetenciesAdvanced Data Driven Science Competencies Analysing the Analysers. O’Reilly Strata Survey – Harris,Murphy & Vaisman, 2013 The task of RDA IG on Education and TrainingEDISON Wsh 21 Sept 2014Big Data Education at LOE10

Analysing the Analysers. O’Reilly Strata Survey(2013)EDISON Wsh 21 Sept 2014Big Data Education at LOE11

Common Body of Knowledge (CBK)in Big Data and Data Intensive TechnologiesCBK refers to several domains or operational categories into which Big Data theory andpractices breaks down The scope is very wide, need to combine few previously not connected domains This is one of attempts verified by practical course developmentCBK Big Data and Data Intensive Technologies1.Big Data Definition and Big Data Architecture Framework, Data driven and data centricapplications model, Stakeholders and Roles2.Big Data use cases and application domains taxonomy and requirements, Big Data inindustry and science3.Data structures, SQL and NoSQL databases4.Data Analytics Methods and Tools, Knowledge Presentation5.Big Data Management and curation, Big Data Lifecycle, Data Preservation and Sharing,Enterprise Data Warehouses, Agile Data Driven Enterprise6.Cloud based Big Data infrastructure and computing platforms, Data Analytics applicationand new Data Scientist skills required7.Computing models: High Performance Computing (HPC), Massively Parallel Computing(MPP), Grid, Cluster Computing8.Big Data Security and Privacy, Certification and ComplianceEDISON Wsh 21 Sept 2014Big Data Education at LOESlide 12

Big Data Course at LOE/UoL StructureSeminar 1: Introduction. Big Data technology domain definition, Big Data ArchitectureFrameworkSeminar 2: Big Data use cases from science, industry and businessSeminar 3: Big Data Infrastructure components and platforms, Enterprise Data Warehouses,MapReduce and Hadoop, distributed file systems and database architectures, data structures,NoSQL databases.Seminar 4: Big Data analytic techniques, introduction to RapidMiner. Statistical techniques formodeling data.Seminar 5: Processes behind Big Data Analytics: Rule Extraction Algorithms and ClusterAnalysis, Decision tree induction.Seminar 6: Classification and forecasting techniques: Machine Learning, Neural Networks andSupport Vector Machines/ Measurement techniques: Receiver Operating Curves and GainsCharts.Seminar 7: Big Data Management, Enterprise Data Warehouses (EDW) and emerging AgileData Driven Enterprise (ADDE), Big Data Service and platform providers.Seminar 8: Big Data Security and Privacy, data centric security models. Big Data privacy issuesand regulations, Privacy Enhancement Techniques.EDISON Wsh 21 Sept 2014Big Data Education at LOE13

Laureate Online Education (LOE) Laureate Online Education (LOE), the online education partner of theUniversity of Liverpool, provides fully online teaching/educationenvironment based on customized Blackboard platform. Laureate’s courses are designed to push the boundaries of access tohigher education from different countries, cultural backgrounds, and forstudents with varying educational background. The common method here is to push students beyond the boundaries oftheir customary thinking (i.e., to push them to think “outside of the box”)and stimulate their self-motivated learning.EDISON Wsh 21 Sept 2014Big Data Education at LOE14

Collaborative Online Learning Model Principlesa) Programs and courses are developed with input from nationally- and internationally-recognizedSubject-Matter Experts (SME), leading practitioners, associations/professional groups, andinternational representatives.––Educational materials combine strong conceptual foundation, technology basis and applied mechanisms,standardization, best practices and industry implementation.Programs and courses fully leverage technological and media resources to optimize collaboration andcommunication.b) Programs and courses are designed to create an inspiring and transforming student experienceand promote collaborative student experiences–––Programs are future-oriented and forward thinking, both in providing course materials that reflects currentstatus and trends in the technology domain, and in facilitating critical and analytical students’ thinking.Students are responsible for their learning and they exercise elements of control over their learningenvironment. They are inspired through opportunities to engage in reflection and critical thinking, toconnect theory to practice, their own experience and educational group experience in the weekly classroomdiscussions.Work on individual and group projects and hands on assignment.c) Laureate’s programs and courses are designed to expose students to diverse ideas, opinions,perspectives, and experiences - both brought by instructors and based on knowledge andexperience exchange in the classroomd) The course undergoes a quality review process that includes a critical reader review andrecommendations, and adoption to the common learning model. The quality reviews are continuedall along the course “life span”.Difference from campus education Top down vs bottom up approach in course developmentEDISON Wsh 21 Sept 2014Big Data Education at LOE15

Bloom’s Taxonomy in Online EducationThe courses are developed using best practices for online education andapplying Bloom’s taxonomy with strong emphasis not only on the CognitiveDomain but also on the Affective Domain to facilitate deep and selfmotivated learning. This includes the following: Present learning tasks in terms of problem solving, not only asdemonstration of accumulated knowledge, and encourage multipleapproaches to problem solving. Provide opportunities for collaboration with others, including: discussions;sharing of experience, perceptions, and alternate viewpoints; and groupactivities. Allow students to draw on their own experience as part of their learningand to incorporate their own goals into the work of the course.EDISON Wsh 21 Sept 2014Big Data Education at LOE16

Andragogy in Adult Education Andragogy provides effective approach to online higher education. The followingprinciples of andragogy (adult learning) [19, 20] are incorporated:Define a rationale for learning and make a case for the value of doing the work.Create environments where self-directed skills are nurtured.Have different experiences, background, learning styles, motivation, interests,and goals.Have a life-centered orientation to learning; motivate to learn the whole courseknowledge domain and show relevance to their professional or career needs.Instruction should help the students perform tasks, deal with problems, and thrivein real-life situations.Rely on the internal motivation factors and provide such motivators as subjectmastering satisfaction, knowledge opening their wider vision and generalunderstanding.EDISON Wsh 21 Sept 2014Big Data Education at LOE17

Role of Final Dissertation The important role belongs to the final dissertation module where theformation of the future specialist is finalized– Duration 9 months– Supervised by Dissertation Advisor (DA)] The students learn the basics of the research methods and apply them tothe dissertation development process that includes––––Hypothesis and hypothesis verificationsResearch questionsScholarly contributionSolution development and evaluationEDISON Wsh 21 Sept 2014Big Data Education at LOE18

Big Data Module Structure The module consists of 8 weekly seminars that includes 2 Discussion Questions(DQ), Hands-in Assignment (HA, or homework) and project assignment.Each seminar is provided with the Lecture Notes and textbook readingassignment.–– Discussion questions and asynchronous discussion are the main form ofeducational activity.––– DQ answers are submitted to the discussion forum and the students are required to contribute to thediscussion.The students benefit from the knowledge and experience sharing during discussion and learn howto defend own answer.Instructor plays a role of moderator and the students’ knowledge and activities assessor.Discussion questions are designed in such a way that to stimulate the students’higher cognitive activities starting from the basic literature search to analyzingand evaluating collected and their application to problem solving.– There are no synchronous lectures which makes also possible education delivery to countries andto students with low Internet connectivity, as well as bypassing time zone issues.Recorded lectures and accompanying videos are planned for the future.Practical use of Bloom’sGroup project and hands on assignmentsEDISON Wsh 21 Sept 2014Big Data Education at LOE19

Lessons Learnt No books to cover more than 1/3 of the course– Plenty of Data Analytics and Machine Learning– No Big Data or Scientific Data Infrastructure Some books Data Warehouse, Scientific/HPC computing, NoSQLdatabases Potential use of Cloud Computing and Big Data platforms onclouds is promising but not for such wide studentsbackground like at LOE Instructor training for online courses guiding and moderation Use of MOOC or video lectures – to be considered Time to develop good courseEDISON Wsh 21 Sept 2014Big Data Education at LOE20

Summary and Further Development Develop Data Science (Big Data) Infrastructurecourse for campus educationEDISON Wsh 21 Sept 2014Big Data Education at LOE21

Strata Survey Skills and Data Scientist Self-IDAnalysing the Analysers. O’Reilly StrataSurvey – Harris, Murphy & Vaisman, 2013 Based on how data scientists thinkabout themselves and their work Identified four Data Scientist clustersEDISON Wsh 21 Sept 2014Big Data Education at LOE22

Skills and Self-ID Top icsML – Machine LearningOR – Operations ResearchEDISON Wsh 21 Sept 2014Big Data Education at LOE23

Bloom’s Taxonomy – Cognitive ActivitiesExample Cloud ComputingKnowledgeExhibit memory of previously learned materials by recalling facts, terms, basic concepts and answers Knowledge of specifics - terminology, specific factsKnowledge of ways and means of dealing with specifics - conventions, trends and sequences, classifications and categories, criteria, methodologyKnowledge of the universals and abstractions in a field - principles and generalizations, theories and structuresQuestions like: What are the main benefits of outsourcing company’s IT services to cloud?ComprehensionDemonstrate understanding of facts and ideas by organizing, comparing, translating, interpreting, describing, and stating the main ideas Translation, Interpretation, ExtrapolationQuestions like: Compare the business and operational models of private clouds and hybrid clouds.ApplicationUsing new knowledge. Solve problems in new situations by applying acquired knowledge, facts, techniques and rules in a different way Questions like: Which cloud service model is best suited for medium size software development company, and why?AnalysisExamine and break information into parts by identifying motives or causes. Make inferences and find evidence to support generalizations Analysis of elements, relationships, organizational principlesQuestions like: What cloud services are needed to support typical business processes of a web trading company? Give suggestionshow these services can be implemented with PaaS or IaaS clouds. Provide references to support your statements.SynthesisCompile information together in a different way by combining elements in a new pattern or proposing alternative solutions Production of a unique communication, a plan, or proposed set of operations, derivation of a set of abstract relationsQuestions like: Describe the main steps and tasks for migrating IT services of an example company to clouds? What services and datacan be moved to clouds and which will remain at the enterprise premises.EvaluationPresent and defend opinions by making judgments about information, validity of ideas or quality of work based on a set of criteria Judgments in terms of internal evidence or external criteriaQuestions like: Do you think that cloudification of the enterprise infrastructure creates benefits for enterprises, short term and longterm?EDISON Wsh 21 Sept 2014Big Data Education at LOE24

Mapping Bloom’s Taxonomy from Cognitive Domainto Professional Activity Domain Perform standard tasks,use API and GuidelinesCreate own complexapplications usingstandard API (simpleengineering)Integrate differentsystems/components,e.g. Cloud provider andenterprise (complexengineering)Extend existing services,design new servicesDevelop newarchitecture and models,platforms andinfrastructuresEDISON Wsh 21 Sept ply(Application)Analyse(Analysis)Big Data Education at LOEEvaluate(Evaluation)Create(Synthesis)25

Pedagogy vs AndragogyPedagogy (child-leading) and Andragogy (man-leading) On-campus and on-line education Developed by American educator Malcolm Knowles, statedwith six assumptions related to motivation of adult learning:– Adults need to know the reason for learning something (Need toKnow)– Experience (including error) provides the basis for learning activities(Foundation)– Adults need to be responsible for their decisions on education;involvement in the planning and evaluation of their instruction (Selfconcept)– Adults are most interested in learning subjects having immediaterelevance to their work and/or personal lives (Readiness).– Adult learning is problem-centered rather than content-oriented(Orientation)– Adults respond better to internal versus external motivators(Motivation)EDISON Wsh 21 Sept 2014Big Data Education at LOE26

Applying Andragogy to Self-Education and OnlineTraining - Problems Andragogy concept is widely used in on-line educationbut– Based on active discussion activities guided/moderated byinstructor/moderator– Combined with the Bloom’s taxonomy Self-education (guided) and online training specifics– Course consistency in sense of style, presentation/graphics,etc– Requires the course workflow to be maximum automated Especially if coupled with certification or pre-certification– Less time to be devoted by trainee Estimated 1 hour per lesson, maximum 3 lessons per topic– Knowledge control questionnaires at the end of lessons ortopicsEDISON Wsh 21 Sept 2014Big Data Education at LOE27

Laureate Online Education (LOE) Laureate Online Education (LOE), the online education partner of the University of Liverpool, provides fully online teaching/education environment based on customized Blackboard platform. Laureate's courses are designed to push the boundaries of access to higher education from different countries .