CS 626 Large Scale Data Science

Transcription

CS 626 Large Scale Data ScienceLecture 12 - Apache HBaseJun ZhangMarch 10, 2020Originally prepared by Dr. Licong Cui

Paper Presentation Requirements

Logistics Time: 35 mins/paper 30 mins presentation 5 mins QA Formats Slides Email a copy to the instructor

Slides Requirements Paper title/authors/affiliations Presenter Outline Background Motivation and problem statement Related work

Slides Requirements Methods Results Summarization of strength and weakness Potential improvement

Review: Hadoop Ecosystem – LayerDiagram

Outline HBase History HBase Data Model HBase Architecture Interacting with HBase HBase shell Java API

HDFS vs HBase HDFS is good for batch processing, butNot good for record lookupNot good for incremental addition of small batchesNot good for updates HBase is designed to efficiently address the abovepointsFast record lookupSupport for record-level insertionSupport for updates (although not in-place) HBase updates are done by creating new versions of values

HBase History 2006: Google releases paper on Bigtable 2007: First usable HBase 2010: HBase becomes Apache top levelproject

HBase Logo

HBase Use CasesSee more: http://wiki.apache.org/hadoop/Hbase/PoweredBy

Number of Companies using HBasehttps://enlyft.com/tech/products/apache-hbase

HBase A part of Hadoop Written in Java Built on top of HDFS Column-family-oriented real-time database A sparse, distributed, multidimensional map NOT relational Does not support SQL

HBase Data Model Data is stored in tables Tables are made of rows and columns Each row is identified by a unique key value Row columns are grouped into column families column family : qualifier E.g., username:firstnameTables are partitioned into regions

HBase Data Model (cont.)

HBase Data Model (cont.)

HBase Data Model (cont.)Relational DatabaseColumn-family-oriented HBaseIndexed by table, row key, column key, and a timestamp(Table, RowKey, Family, Column, Timestamp) - Value

Sparsely-populated Data Missing values: cells remain empty and occupy no storage

Hbase version The cell can have different values of versions

Regions of an Hbase Table

HBase Architecture

HBase Architecture (cont.)

Difference between Hbase and RDBMS

Difference between Hbase, HDFS, & Hive

More difference between Hbase and Hive

Interacting with HBase Interactive mode HBase shell: hbase shell Java API Please take a look at Hbase Shell Tricks athbase.apache.org

HBase Shell – Create & List Table Syntaxcreate ‘ table name ’,’ column family ’list Example

HBase Shell – Disable Table Syntaxdisable ‘ table name ’is disabled ‘ table name ’ Example

HBase Shell – Enable Table Syntaxenable ‘ table name ’ Example

HBase Shell – Describe Table Syntaxdescribe ’ table name ' Example

HBase Shell – Alter Table Syntax Change the Maximum Number of Cells of a Column Familyalter 't1', NAME 'f1', VERSIONS 5 Set Read Onlyalter 't1', READONLY(option) Example

HBase Shell – Drop Table Syntaxdisable ' table name 'drop ' table name ’disable all ‘ regex ’drop all ‘ regex ’

HBase Shell – Insert Data Syntaxput ’ table name ’, ’ row ’,’ colfamily:colname ’, ’ value ’ Example

HBase Shell – Update Data Syntaxput ‘ table name ’, ’ row ’, ‘ colfamily:colname ’, ’new value’ Example

HBase Shell – Read Data Syntaxget ' table name ', ‘ row ’get ‘ table name ', ‘ row ', {COLUMN ‘ colfamily:colname '} Example

HBase Shell – Delete Data Syntaxdelete ' table name ', ' row ', ' column name ', ' time stamp ’deleteall ' table name ', ' row ' Example

HBase Shell – Get More Versions Syntaxscan ' table name ’, {COLUMN column name , VERSIONS 3}get ‘emp’, ‘1’, {COLUMN ‘professional:designation’,VERSIONS 3} deleteall ‘get ‘emp’, ‘1’, {COLUMN ‘professional:designation’,TIMESTAMP *****}

HBase Shell – Count & Truncate Countcount ' table name ’ Truncatetruncate ' table name '

Java API – Create TableStep 1: Instantiate HBaseAdminStep 2: Create TableDescriptorStep 3: Execute through Admin

Java API – List Table

Java API – Disable Table

Java API – Enable Table

Java API – Add a Column Family

Java API – Delete Table

Java API – Insert Data

Java API – Update Data

Java API – Read Data

Java API – Delete Data

References Hbase: The Definitive Guide (By Lars George) HBase Essentials (by Nishant Garg) http://wiki.apache.org/hadoop/Hbase/PoweredBy qlcarol b-maprtablessecuritymar2014 f-hive-and-hbase-12805463

References usecases.html http://www.tutorialspoint.com/hbase/

HDFS vs HBase HDFS is good for batch processing, but Not good for record lookup Not good for incremental addition of small batches Not good for updates HBase is designed to efficiently address the above points Fast record lookup Support for record-level insertion Support for updates (although not in-place) HBase updates are done by creating new versions of values