CookieTsai 的手記: 9月 2015

注意事項：
    1. 文章內容須先將 Apache HBase 建置完成。
    2. 文章中一行表示一個欄位，一筆則是由多行組成的一個 Row。
    3. 本文並不包含所有旨令內容，主要為常用指令。
    4. 僅供參考使用。

Getting Start

進入 HBase 指令模式
```
$ hbase shell
```

List

取得資料表列表
```
hbase> list
```

Create

新增資料表
Here is some for this command:

Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute.

Examples:

Create a table with namespace=ns1 and table qualifier=t1
- hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}
Create a table with namespace=default and table qualifier=t1
- hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
- hbase> create 't1', 'f1', 'f2', 'f3'
- hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
- hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
Table configuration options can be put at the end.

Examples:
- hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
- hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
- hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
- hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
- hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
- hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
You can also keep around a reference to the created table:
- hbase> t1 = create 't1', 'f1'
Which gives you a reference to the table named 't1', on which you can then call methods.
Training
```
hbase> create 'test', 'cf'
```

TOP

Put

新增一行資料
Here is some help for this command: Put a cell 'value' at specified table/row/column and optionally timestamp coordinates. To put a cell value into table 'ns1:t1' or 't1' at row 'r1' under column 'c1' marked with the time 'ts1', do:
- hbase> put 'ns1:t1', 'r1', 'c1', 'value'
- hbase> put 't1', 'r1', 'c1', 'value'
- hbase> put 't1', 'r1', 'c1', 'value', ts1
- hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
- hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
- hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same commands also can be run on a table reference. Suppose you had a reference t to table 't1', the corresponding command would be:
- hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}

Training

hbase> put 'test','row0','cf:string','字串'
hbase> put 'test','row0','cf:boolean',"\x01"
hbase> put 'test','row0','cf:short',"\x00\x01"
hbase> put 'test','row0','cf:int',"\x00\x00\x00\x01"
hbase> put 'test','row0','cf:long',"\x00\x00\x00\x00\x00\x00\x00\x01"
hbase> put 'test','row0','cf:float',"?\x80\x00\x00"
hbase> put 'test','row0','cf:double',"?\xF0\x00\x00\x00\x00\x00\x00"
hbase> put 'test','row1','cf:name','Cookie'
hbase> put 'test','row1','cf:phone','0999123456'
hbase> put 'test','row2','cf:name','Tom'
hbase> put 'test','row3','cf:name','Mary'

TOP

Get

取得一筆資料
Here is some help for this command: Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp, timerange and versions. Examples:
- hbase> get 'ns1:t1', 'r1'
- hbase> get 't1', 'r1'
- hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
- hbase> get 't1', 'r1', {COLUMN => 'c1'}
- hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
- hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
- hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
- hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
- hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
- hbase> get 't1', 'r1', 'c1'
- hbase> get 't1', 'r1', 'c1', 'c2'
- hbase> get 't1', 'r1', ['c1', 'c2']
- hbsae> get 't1','r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
- hbsae> get 't1','r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
Besides the default 'toStringBinary' format, 'get' also supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the get specification. The FORMATTER can be stipulated:
1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
- hbase> get 't1', 'r1' {COLUMN => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }
Note that you can specify a FORMATTER by column only (cf:qualifer). You cannot specify a FORMATTER for all columns of a column family.

The same commands also can be run on a reference to a table (obtained via gettable or createtable). Suppose you had a reference t to table 't1', the corresponding commands would be:
- hbase> t.get 'r1'
- hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
- hbase> t.get 'r1', {COLUMN => 'c1'}
- hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
- hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
- hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
- hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
- hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
- hbase> t.get 'r1', 'c1'
- hbase> t.get 'r1', 'c1', 'c2'
- hbase> t.get 'r1', ['c1', 'c2']

Training

hbase> get 'test','row0'
hbase> get 'test','row0',['cf:string','cf:boolean','cf:float']
hbase> get 'test','row0', ['cf:string:toString','cf:boolean:toBoolean','cf:int:toInt','cf:float:toFloat']

TOP

Scan

注意事項：
    * Scan Filter 常用包含：
        1. RowFilter
        2. SingleColumnValueFilter
        3. ValueFilter
        4. PrefixFilter
    * FILTER ByteArrayComparable 常用包含：
        1. binary
        2. substring
        3. regexstring
        4. binaryprefix

掃描 Table（查詢多筆資料）
Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS, CACHE or RAW, VERSIONS

If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family:'.

The filter can be specified in two ways:
1. Using a filterString - more information on this is available in the Filter Language document attached to the HBASE-4176 JIRA
2. Using the entire package name of the filter.
Some examples:
- hbase> scan 'hbase:meta'
- hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
- hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
- hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
- hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
- hbase> scan 't1', {REVERSED => true}
- hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"}
- hbase> scan 't1', {FILTER = org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} For setting the Operation Attributes
- hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
- hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled.
Examples:
- hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
Also for experts, there is an advanced option -- RAW -- which instructs the scanner to return all cells (including delete markers and uncollected deleted cells). This option cannot be combined with requesting specific COLUMNS. Disabled by default.

Example:
- hbase> scan 't1', {RAW => true, VERSIONS => 10}
Besides the default 'toStringBinary' format, 'scan' supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the scan specification. The FORMATTER can be stipulated:
1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: * hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }

Note that you can specify a FORMATTER by column only (cf:qualifer). You cannot specify a FORMATTER for all columns of a column family.

Scan can also be used directly from a table, by first getting a reference to a table, like such:
- hbase> t = get_table 't'
- hbase> t.scan
Note in the above situation, you can still provide all the filtering, columns, options, etc as described above.

Training

hbase> scan 'test'
hbase> scan 'test', {STARTROW => 'row1', STOPROW => 'row2~'}
hbase> scan 'test', {COLUMNS => 'cf:name'}
hbase> scan 'test', {COLUMNS => ['cf:string:toString','cf:short:toShort','cf:long:toLong']}
hbase> scan 'test', {FILTER => "ValueFilter(=,'binary:Cookie')"}
hbase> scan 'test', {FILTER => "SingleColumnValueFilter('cf','name',=,'substring:o')"}
hbase> scan 'test', {FILTER => "RowFilter(=,'regexstring:[03]$')"}

TOP

Delete

刪除一行資料
Here is some help for this command: Put a delete cell value at specified table/row/column and optionally timestamp coordinates. Deletes must match the deleted cell's coordinates exactly. When scanning, a delete cell suppresses older versions. To delete a cell from 't1' at row 'r1' under column 'c1' marked with the time 'ts1', do:
- hbase> delete 'ns1:t1', 'r1', 'c1', ts1
- hbase> delete 't1', 'r1', 'c1', ts1
- hbase> delete 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same command can also be run on a table reference. Suppose you had a reference t to table 't1', the corresponding command would be:
- hbase> t.delete 'r1', 'c1', ts1
- hbase> t.delete 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

Training

hbase> delete 'test', 'row1', 'cf:phone'

TOP

Delete All

刪除一筆資料
Here is some help for this command:

Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp.

Examples:
- hbase> deleteall 'ns1:t1', 'r1'
- hbase> deleteall 't1', 'r1'
- hbase> deleteall 't1', 'r1', 'c1'
- hbase> deleteall 't1', 'r1', 'c1', ts1
- hbase> deleteall 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same commands also can be run on a table reference. Suppose you had a reference t to table 't1', the corresponding command would be:
- hbase> t.deleteall 'r1'
- hbase> t.deleteall 'r1', 'c1'
- hbase> t.deleteall 'r1', 'c1', ts1
- hbase> t.deleteall 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
Training
```
hbase> deleteall 'test', 'row2'
```

TOP

Disable

關閉資料表
Here is some help for this command:

Start disable of named table:
- hbase> disable 't1'
- hbase> disable 'ns1:t1'
Training
```
hbase> disable 'test'
```

TOP

Enable

開啟資料表
Here is some help for this command:

Start enable of named table:
- hbase> enable 't1'
- hbase> enable 'ns1:t1'
Training
```
hbase> ensable 'test'
```

TOP

Truncate

注意事項：效果完全等同於 disable + drop + create 等指令。

清空資料表

Here is some help for this command:

Disables, drops and recreates the specified table.
Training
```
hbase> truncate 'test'
```

TOP

Drop

注意事項：執行 drop 前須確認資料表處於關閉狀態。（須先執行 diable）

刪除資料表
Here is some help for this command:

Drop the named table. Table must first be disabled:
- hbase> drop 't1'
- hbase> drop 'ns1:t1'
Training
```
hbase> drop 'test'
```

TOP

CookieTsai 的手記

2015年9月18日星期五

Learning HBase Shell

目錄

Getting Start

List

Create

Put

Get

Scan

Delete

Delete All

Disable

Enable

Truncate

Drop

2015年9月14日星期一

Install R-3.1.3 tarball on Cloudera

Download tarball

Install Dependency Packages

Unzip

Congiure and Make Install

2015年9月18日 星期五

Learning HBase Shell

目錄

Getting Start

List

Create

Put

Get

Scan

Delete

Delete All

Disable

Enable

Truncate

Drop

2015年9月14日 星期一

Install R-3.1.3 tarball on Cloudera

Download tarball

Install Dependency Packages

Unzip

Congiure and Make Install

2015年9月18日星期五

2015年9月14日星期一