Transcription
CS 626 Large Scale Data ScienceLecture 12 - Apache HBaseJun ZhangMarch 10, 2020Originally prepared by Dr. Licong Cui
Paper Presentation Requirements
Logistics Time: 35 mins/paper 30 mins presentation 5 mins QA Formats Slides Email a copy to the instructor
Slides Requirements Paper title/authors/affiliations Presenter Outline Background Motivation and problem statement Related work
Slides Requirements Methods Results Summarization of strength and weakness Potential improvement
Review: Hadoop Ecosystem – LayerDiagram
Outline HBase History HBase Data Model HBase Architecture Interacting with HBase HBase shell Java API
HDFS vs HBase HDFS is good for batch processing, butNot good for record lookupNot good for incremental addition of small batchesNot good for updates HBase is designed to efficiently address the abovepointsFast record lookupSupport for record-level insertionSupport for updates (although not in-place) HBase updates are done by creating new versions of values
HBase History 2006: Google releases paper on Bigtable 2007: First usable HBase 2010: HBase becomes Apache top levelproject
HBase Logo
HBase Use CasesSee more: http://wiki.apache.org/hadoop/Hbase/PoweredBy
Number of Companies using HBasehttps://enlyft.com/tech/products/apache-hbase
HBase A part of Hadoop Written in Java Built on top of HDFS Column-family-oriented real-time database A sparse, distributed, multidimensional map NOT relational Does not support SQL
HBase Data Model Data is stored in tables Tables are made of rows and columns Each row is identified by a unique key value Row columns are grouped into column families column family : qualifier E.g., username:firstnameTables are partitioned into regions
HBase Data Model (cont.)
HBase Data Model (cont.)
HBase Data Model (cont.)Relational DatabaseColumn-family-oriented HBaseIndexed by table, row key, column key, and a timestamp(Table, RowKey, Family, Column, Timestamp) - Value
Sparsely-populated Data Missing values: cells remain empty and occupy no storage
Hbase version The cell can have different values of versions
Regions of an Hbase Table
HBase Architecture
HBase Architecture (cont.)
Difference between Hbase and RDBMS
Difference between Hbase, HDFS, & Hive
More difference between Hbase and Hive
Interacting with HBase Interactive mode HBase shell: hbase shell Java API Please take a look at Hbase Shell Tricks athbase.apache.org
HBase Shell – Create & List Table Syntaxcreate ‘ table name ’,’ column family ’list Example
HBase Shell – Disable Table Syntaxdisable ‘ table name ’is disabled ‘ table name ’ Example
HBase Shell – Enable Table Syntaxenable ‘ table name ’ Example
HBase Shell – Describe Table Syntaxdescribe ’ table name ' Example
HBase Shell – Alter Table Syntax Change the Maximum Number of Cells of a Column Familyalter 't1', NAME 'f1', VERSIONS 5 Set Read Onlyalter 't1', READONLY(option) Example
HBase Shell – Drop Table Syntaxdisable ' table name 'drop ' table name ’disable all ‘ regex ’drop all ‘ regex ’
HBase Shell – Insert Data Syntaxput ’ table name ’, ’ row ’,’ colfamily:colname ’, ’ value ’ Example
HBase Shell – Update Data Syntaxput ‘ table name ’, ’ row ’, ‘ colfamily:colname ’, ’new value’ Example
HBase Shell – Read Data Syntaxget ' table name ', ‘ row ’get ‘ table name ', ‘ row ', {COLUMN ‘ colfamily:colname '} Example
HBase Shell – Delete Data Syntaxdelete ' table name ', ' row ', ' column name ', ' time stamp ’deleteall ' table name ', ' row ' Example
HBase Shell – Get More Versions Syntaxscan ' table name ’, {COLUMN column name , VERSIONS 3}get ‘emp’, ‘1’, {COLUMN ‘professional:designation’,VERSIONS 3} deleteall ‘get ‘emp’, ‘1’, {COLUMN ‘professional:designation’,TIMESTAMP *****}
HBase Shell – Count & Truncate Countcount ' table name ’ Truncatetruncate ' table name '
Java API – Create TableStep 1: Instantiate HBaseAdminStep 2: Create TableDescriptorStep 3: Execute through Admin
Java API – List Table
Java API – Disable Table
Java API – Enable Table
Java API – Add a Column Family
Java API – Delete Table
Java API – Insert Data
Java API – Update Data
Java API – Read Data
Java API – Delete Data
References Hbase: The Definitive Guide (By Lars George) HBase Essentials (by Nishant Garg) http://wiki.apache.org/hadoop/Hbase/PoweredBy qlcarol b-maprtablessecuritymar2014 f-hive-and-hbase-12805463
References usecases.html http://www.tutorialspoint.com/hbase/
HDFS vs HBase HDFS is good for batch processing, but Not good for record lookup Not good for incremental addition of small batches Not good for updates HBase is designed to efficiently address the above points Fast record lookup Support for record-level insertion Support for updates (although not in-place) HBase updates are done by creating new versions of values