2015年9月18日 星期五

Learning HBase Shell

注意事項:
    1. 文章內容須先將 Apache HBase 建置完成。
    2. 文章中一行表示一個欄位,一筆則是由多行組成的一個 Row。
    3. 本文並不包含所有旨令內容,主要為常用指令。
    4. 僅供參考使用。

目錄

Getting Start

  • 進入 HBase 指令模式
    $ hbase shell

List

  • 取得 資料表 列表
    hbase> list

Create

  • 新增 資料表
    Here is some for this command:
    Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute.
    Examples:
    Create a table with namespace=ns1 and table qualifier=t1
    • hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}
    Create a table with namespace=default and table qualifier=t1
    • hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
    • hbase> create 't1', 'f1', 'f2', 'f3'
    • hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
    • hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
    Table configuration options can be put at the end.
    Examples:
    • hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
    • hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
    • hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
    • hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
    • hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
    • hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
    You can also keep around a reference to the created table:
    • hbase> t1 = create 't1', 'f1'
    Which gives you a reference to the table named 't1', on which you can then call methods.
  • Training
    hbase> create 'test', 'cf'

Put

  • 新增一行資料
    Here is some help for this command: Put a cell 'value' at specified table/row/column and optionally timestamp coordinates. To put a cell value into table 'ns1:t1' or 't1' at row 'r1' under column 'c1' marked with the time 'ts1', do:
    • hbase> put 'ns1:t1', 'r1', 'c1', 'value'
    • hbase> put 't1', 'r1', 'c1', 'value'
    • hbase> put 't1', 'r1', 'c1', 'value', ts1
    • hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
    • hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
    • hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
    The same commands also can be run on a table reference. Suppose you had a reference t to table 't1', the corresponding command would be:
    • hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  • Training
    hbase> put 'test','row0','cf:string','字串'
    hbase> put 'test','row0','cf:boolean',"\x01"
    hbase> put 'test','row0','cf:short',"\x00\x01"
    hbase> put 'test','row0','cf:int',"\x00\x00\x00\x01"
    hbase> put 'test','row0','cf:long',"\x00\x00\x00\x00\x00\x00\x00\x01"
    hbase> put 'test','row0','cf:float',"?\x80\x00\x00"
    hbase> put 'test','row0','cf:double',"?\xF0\x00\x00\x00\x00\x00\x00"
    hbase> put 'test','row1','cf:name','Cookie'
    hbase> put 'test','row1','cf:phone','0999123456'
    hbase> put 'test','row2','cf:name','Tom'
    hbase> put 'test','row3','cf:name','Mary'

Get

  • 取得一筆資料
    Here is some help for this command: Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp, timerange and versions. Examples:
    • hbase> get 'ns1:t1', 'r1'
    • hbase> get 't1', 'r1'
    • hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
    • hbase> get 't1', 'r1', {COLUMN => 'c1'}
    • hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
    • hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
    • hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
    • hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
    • hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
    • hbase> get 't1', 'r1', 'c1'
    • hbase> get 't1', 'r1', 'c1', 'c2'
    • hbase> get 't1', 'r1', ['c1', 'c2']
    • hbsae> get 't1','r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
    • hbsae> get 't1','r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
    Besides the default 'toStringBinary' format, 'get' also supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the get specification. The FORMATTER can be stipulated:
    1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
    2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
    Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
    • hbase> get 't1', 'r1' {COLUMN => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }
    Note that you can specify a FORMATTER by column only (cf:qualifer). You cannot specify a FORMATTER for all columns of a column family.
    The same commands also can be run on a reference to a table (obtained via gettable or createtable). Suppose you had a reference t to table 't1', the corresponding commands would be:
    • hbase> t.get 'r1'
    • hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
    • hbase> t.get 'r1', {COLUMN => 'c1'}
    • hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
    • hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
    • hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
    • hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
    • hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
    • hbase> t.get 'r1', 'c1'
    • hbase> t.get 'r1', 'c1', 'c2'
    • hbase> t.get 'r1', ['c1', 'c2']
  • Training
    hbase> get 'test','row0'
    hbase> get 'test','row0',['cf:string','cf:boolean','cf:float']
    hbase> get 'test','row0', ['cf:string:toString','cf:boolean:toBoolean','cf:int:toInt','cf:float:toFloat']

Scan

注意事項:
    * Scan Filter 常用包含:
        1. RowFilter
        2. SingleColumnValueFilter
        3. ValueFilter
        4. PrefixFilter
    * FILTER ByteArrayComparable 常用包含:
        1. binary
        2. substring
        3. regexstring
        4. binaryprefix 
  • 掃描 Table(查詢多筆資料)
    Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS, CACHE or RAW, VERSIONS
    If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family:'.
    The filter can be specified in two ways:
    1. Using a filterString - more information on this is available in the Filter Language document attached to the HBASE-4176 JIRA
    2. Using the entire package name of the filter.
    Some examples:
    • hbase> scan 'hbase:meta'
    • hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
    • hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
    • hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
    • hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
    • hbase> scan 't1', {REVERSED => true}
    • hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"}
    • hbase> scan 't1', {FILTER = org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} For setting the Operation Attributes
    • hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
    • hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled.
    Examples:
    • hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
    Also for experts, there is an advanced option -- RAW -- which instructs the scanner to return all cells (including delete markers and uncollected deleted cells). This option cannot be combined with requesting specific COLUMNS. Disabled by default. 
    Example:
    • hbase> scan 't1', {RAW => true, VERSIONS => 10}
    Besides the default 'toStringBinary' format, 'scan' supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the scan specification. The FORMATTER can be stipulated:
    1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
    2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
    Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: * hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }
    Note that you can specify a FORMATTER by column only (cf:qualifer). You cannot specify a FORMATTER for all columns of a column family.
    Scan can also be used directly from a table, by first getting a reference to a table, like such:
    • hbase> t = get_table 't'
    • hbase> t.scan
    Note in the above situation, you can still provide all the filtering, columns, options, etc as described above.
  • Training
    hbase> scan 'test'
    hbase> scan 'test', {STARTROW => 'row1', STOPROW => 'row2~'}
    hbase> scan 'test', {COLUMNS => 'cf:name'}
    hbase> scan 'test', {COLUMNS => ['cf:string:toString','cf:short:toShort','cf:long:toLong']}
    hbase> scan 'test', {FILTER => "ValueFilter(=,'binary:Cookie')"}
    hbase> scan 'test', {FILTER => "SingleColumnValueFilter('cf','name',=,'substring:o')"}
    hbase> scan 'test', {FILTER => "RowFilter(=,'regexstring:[03]$')"}

Delete

  • 刪除一行資料
    Here is some help for this command: Put a delete cell value at specified table/row/column and optionally timestamp coordinates. Deletes must match the deleted cell's coordinates exactly. When scanning, a delete cell suppresses older versions. To delete a cell from 't1' at row 'r1' under column 'c1' marked with the time 'ts1', do:
    • hbase> delete 'ns1:t1', 'r1', 'c1', ts1
    • hbase> delete 't1', 'r1', 'c1', ts1
    • hbase> delete 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
    The same command can also be run on a table reference. Suppose you had a reference t to table 't1', the corresponding command would be:
    • hbase> t.delete 'r1', 'c1', ts1
    • hbase> t.delete 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
  • Training
    hbase> delete 'test', 'row1', 'cf:phone'

Delete All

  • 刪除一筆資料
    Here is some help for this command:
    Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp. 
    Examples:
    • hbase> deleteall 'ns1:t1', 'r1'
    • hbase> deleteall 't1', 'r1'
    • hbase> deleteall 't1', 'r1', 'c1'
    • hbase> deleteall 't1', 'r1', 'c1', ts1
    • hbase> deleteall 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
    The same commands also can be run on a table reference. Suppose you had a reference t to table 't1', the corresponding command would be:
    • hbase> t.deleteall 'r1'
    • hbase> t.deleteall 'r1', 'c1'
    • hbase> t.deleteall 'r1', 'c1', ts1
    • hbase> t.deleteall 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
  • Training
    hbase> deleteall 'test', 'row2'

Disable

  • 關閉 資料表
    Here is some help for this command:
    Start disable of named table:
    • hbase> disable 't1'
    • hbase> disable 'ns1:t1'
  • Training
    hbase> disable 'test'

Enable

  • 開啟 資料表
    Here is some help for this command:
    Start enable of named table:
    • hbase> enable 't1'
    • hbase> enable 'ns1:t1'
  • Training
    hbase> ensable 'test'

Truncate

注意事項:效果完全等同於 disable + drop + create 等指令。
  • 清空資料表
    Here is some help for this command:
    Disables, drops and recreates the specified table.
  • Training
    hbase> truncate 'test'

Drop

注意事項:執行 drop 前須確認資料表處於關閉狀態。(須先執行 diable)
  • 刪除 資料表
    Here is some help for this command:
    Drop the named table. Table must first be disabled:
    • hbase> drop 't1'
    • hbase> drop 'ns1:t1'
  • Training
    hbase> drop 'test'

2015年9月14日 星期一

Install R-3.1.3 tarball on Cloudera

注意事項:
    1. 文章僅包含安裝函式庫
    2. 安裝好後可執行『 R -f {file path}/{file name}.R 』得到結果
    3. 安裝須使用 root 權限執行

Download tarball

Install Dependency Packages

$ yum install gcc-gfortran  
$ yum install readline-devel 
$ yum install libXt-devel 

Unzip

$ tar -zxvf R-3.1.3.tar.gz
$ cd R-3.1.3

Congiure and Make Install

$ ./configure --enable-R-shlib
$ make
$ make install

2015年9月1日 星期二

Hadoop + HBase + Hive 建置手冊(完全分布式)

注意事項:
    1. 所有指令皆使用 root 身份執行,僅供練習使用。
    2. 前置步驟在 master, slaver1, slaver2 都必須做一遍。
    3. PDF 格式部分文字會失真,輸入時請注意符號是否正確。

目錄

套件清單

PackagePackage NameInstallation PathVersion
Oracle Javajdk-7u79-linux-x64.rpm/user/java/java7
Apache Hadoophadoop-2.4.1.tar.gz/opt/hadoop2.4.1
Apache HBasehbase-0.98.13-hadoop2-bin.tar.gz/opt/hbase0.98.13
Apache Hiveapache-hive-1.2.1-bin.tar.gz/opt/hive1.2.1
Apache Zookeeperzookeeper-3.4.6.tar.gz/opt/zookeeper3.4.6

環境配置

OSIPHost Name
CentOS 6.7192.168.60.101master
CentOS 6.7192.168.60.102slaver1
CentOS 6.7192.168.60.103slaver2

前置步驟

安裝 JDK

$ rpm -ivh /tmp/jdk-7u79-linux-x64.rpm
$ ln -s /usr/java/jdk1.7.0_79 /usr/java/java

編輯 profile

$ vim /etc/profile
增加內容
export JAVA_HOME=/usr/java/java
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib/rt.jar
export PATH=$PATH:$JAVA_HOME/bin

載入 profile

$ source /etc/profile

產生 SSH key

$ ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 600 ~/.ssh/authorized_keys
$ ssh localhost exit

SSH asking disable

$ vim /etc/ssh/ssh_config
修改內容
StrictHostKeyChecking no

重新啟動 SSH

$ service sshd restart

Security disable

$ setenforce 0

Permanent disable

$ vim /etc/selinux/config
修改內容
SELINUX=disabled

關閉防火牆

$ service iptables stop

開機不啟動防火牆

$ chkconfig iptables off

Apache Hadoop

安裝 Apache Hadoop

解壓縮並建立連結

$ tar -zxvf /tmp/hadoop-2.4.1.tar.gz
$ mv hadoop-2.4.1 /opt
$ ln -s /opt/hadoop-2.4.1 /opt/hadoop

建立 Hadoop 暫存目錄

$ mkdir -p /opt/hadoop/tmp

編輯 hosts

$ vim /etc/hosts
增加內容
192.168.60.101 master 
192.168.60.102 slaver1 
192.168.60.103 slaver2

編輯 profile

$ vim /etc/profile
增加內容
export HADOOP_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
## HADOOP-9450
export HADOOP_USER_CLASSPATH_FIRST=true
## Add 2016/03/14
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_PREFIX=$HADOOP_HOME

載入 profile

$ source /etc/profile

編輯 slaves

$ vim $HADOOP_HOME/etc/hadoop/slaves
覆蓋內容
slaver1
slaver2

編輯 core-site.xml

$ vim $HADOOP_HOME/etc/hadoop/core-site.xml
覆蓋內容
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <property>
      <name>fs.defaultFS</name>
      <value>hdfs://master:9000</value>
   </property>
   <property>
      <name>hadoop.tmp.dir</name>
      <value>/opt/hadoop/tmp</value>
   </property>
</configuration>

編輯 hdfs-site.xml

$ vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
覆蓋內容
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
   </property>
   <property>
      <name>dfs.permissions</name>
      <value>false</value>
   </property>
</configuration>

編輯 mapred-site.xml

$ vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
覆蓋內容
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
   </property>
</configuration>

編輯 yarn-site.xml

$ vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
覆蓋內容
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <property>
      <name>yarn.resourcemanager.hostname</name>
      <value>master</value>
   </property>
   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
   <property>
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
</configuration>

編輯 hadoop-env.sh

$ vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
增修內容
export JAVA_HOME=/usr/java/java
export HADOOP_LOG_DIR=/opt/hadoop/logs

Copy to slavers

$ scp -rp /opt/hadoop slaver1:/opt
$ scp -rp /opt/hadoop slaver2:/opt
$ scp -rp /etc/hosts slaver1:/etc
$ scp -rp /etc/hosts slaver2:/etc
$ scp -rp /etc/profile root@slaver1:/etc
$ scp -rp /etc/profile root@slaver2:/etc

Hadoop format

$ hadoop namenode -format

Reboot all hosts

$ ssh slaver1 reboot
$ ssh slaver2 reboot
$ reboot

啟動 Apache Hadoop

$ start-dfs.sh
$ start-yarn.sh

測試

$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar TestDFSIO -write
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar TestDFSIO -clean
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 2 5

Apache Zookeeper

安裝 ZooKeeper

解壓縮並建立連結

$ tar -zxvf /tmp/zookeeper-3.4.6.tar.gz
$ mv zookeeper-3.4.6 /opt
$ ln -s /opt/zookeeper-3.4.6 /opt/zookeeper

編輯 profile

$ vim /etc/profile
增加內容
export ZOOKEEPER_HOME=/opt/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin

載入 profile

$ source /etc/profile

編輯 zoo.cfg

$ cp $ZOOKEEPER_HOME/conf/zoo_sample.cfg $ZOOKEEPER_HOME/conf/zoo.cfg
$ vim $ZOOKEEPER_HOME/conf/zoo.cfg
增修內容
dataDir=/opt/zookeeper
server.1=master:2888:3888

編輯 myid

$ vim /opt/zookeeper/myid
覆蓋內容
1

啟動 ZooKeeper

zkServer.sh start

Apache HBase

安裝 HBase

解壓縮並建立連結

$ tar -zxvf /tmp/hbase-0.98.13-hadoop2-bin.tar.gz
$ mv hbase-0.98.13-hadoop2 /opt
$ ln -s /opt/hbase-0.98.13-hadoop2 /opt/hbase

編輯 profile

$ vim /etc/profile
增加內容
export HBASE_HOME=/opt/hbase
export PATH=$PATH:$HBASE_HOME/bin

載入 profile

$ source /etc/profile

編輯 regionservers

$ vim $HBASE_HOME/conf/regionservers
覆蓋內容
slaver1
slaver2

編輯 hbase-env.sh

$ vim $HBASE_HOME/conf/hbase-env.sh
增加內容
export JAVA_HOME=/usr/java/java
export HBASE_MANAGES_ZK=false

編輯 hbase-site.xml

$ vim $HBASE_HOME/conf/hbase-site.xml
覆蓋內容
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <property>
      <name>hbase.rootdir</name>
      <value>hdfs://master:9000/hbase</value>
   </property>
   <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value>
   </property>
   <property>
      <name>hbase.zookeeper.quorum</name>
      <value>master</value>
   </property>
   <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/opt/zookeeper</value>
   </property>
</configuration>

Remove hbase's log4j (bug)

rm -rf $HBASE_HOME/lib/slf4j-log4j12-1.6.4.jar

Copy to slavers

$ scp -rp /opt/hbase root@slaver1:/opt
$ scp -rp /opt/hbase root@slaver2:/opt
$ scp -rp /etc/profile root@slaver1:/etc
$ scp -rp /etc/profile root@slaver2:/etc

啟動 HBase

$ start-hbase.sh

Apache Hive

安裝 Hive

解壓縮並建立連結

$ tar -zxvf /tmp/apache-hive-1.2.1-bin.tar.gz
$ mv apache-hive-1.2.1-bin /opt
$ ln -s /opt/apache-hive-1.2.1-bin /opt/hive

編輯 profile

$ vim /etc/profile
增加內容
export HIVE_HOME=/opt/hive
export PATH=$PATH:$HIVE_HOME/bin

載入 profile

$ source /etc/profile

HDFS 上建立資料夾

$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir -p /user/hive/warehouse

更改資料夾權限

$ hadoop fs -chmod -R 777 /tmp
$ hadoop fs -chmod -R 777 /user/hive/warehouse

啟動 hiveserver2

$ hiveserver2 &

連線方式(新版)

$ beeline -u jdbc:hive2://master:10000