Hbase Architecture:
------------------------------------------------------------------------------------------------------------------
Available as
1. Stand Alone mode
2. Psuedo Distributed mode
3. Fully Distributed mode
2.2.2. Distributed
Distributed mode can be subdivided into distributed
but all daemons run on a single node -- a.k.a pseudo-distributed--
and fully-distributed where the daemons are spread across all
nodes in the cluster
Distributed modes require an instance of the Hadoop
Distributed File System (HDFS).
A pseudo-distributed mode is simply a distributed
mode run on a single host.
First, setup your HDFS in pseudo-distributed mode.
Next, configure HBase. Below is an example conf/hbase-site.xml. This is the file into which
you add local customizations
Note that the hbase.rootdir property
points to the local HDFS instance.
Note
Let HBase create the hbase.rootdir directory.
If you don't, you'll get warning saying HBase needs a migration run because the
directory is missing files expected by HBase (it'll create them if you let it).
Below is a sample pseudo-distributed file for the
node h-24-30.example.com. hbase-site.xml
<configuration>
...
<property>
<name>hbase.rootdir</name>
<value>hdfs://h-24-30.sfo.stumble.net:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>h-24-30.sfo.stumble.net</value>
</property>
...
</configuration>
To start up the initial HBase cluster...
% bin/start-hbase.sh
To start up an extra backup master(s) on the same
server run...
% bin/local-master-backup.sh start 1
... the '1' means use ports 60001 & 60011, and
this backup master's logfile will be at logs/hbase-${USER}-1-master-${HOSTNAME}.log.
To startup multiple backup masters run...
% bin/local-master-backup.sh start 2 3
You can start up to 9 backup masters (10 total).
To start up more regionservers...
% bin/local-regionservers.sh start 1
where '1' means use ports 60201 & 60301 and its
logfile will be at logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log.
To add 4 more regionservers in addition to the one
you just started by running...
% bin/local-regionservers.sh start 2 3 4 5
This supports up to 99 extra regionservers (100
total).
Assuming you want to stop master backup # 1, run...
% cat /tmp/hbase-${USER}-1-master.pid |xargs kill
-9
Note that bin/local-master-backup.sh stop 1 will
try to stop the cluster along with the master.
To stop an individual regionserver, run...
% bin/local-regionservers.sh stop 1
For running a fully-distributed operation on more
than one host, make the following configurations. In hbase-site.xml, add the property hbase.cluster.distributed and
set it to true and point the HBase hbase.rootdir at the
appropriate HDFS NameNode and location in HDFS where you would like HBase to
write data. For example, if you namenode were running at namenode.example.org
on port 8020 and you wanted to home your HBase in HDFS at /hbase, make the following configuration.
<configuration>
...
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode.example.org:8020/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false:
standalone and pseudo-distributed setups with managed Zookeeper
true:
fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
...
</configuration>
In addition, a fully-distributed mode requires that
you modify conf/regionservers.
The Section 2.4.1.2,
“regionservers” file lists all hosts that you would have
runningHRegionServers, one host per line (This file in HBase is like the
Hadoop slaves file). All servers
listed in this file will be started and stopped when HBase cluster start or
stop is run.
----------------------------------------------------------------------------------------------------------------
has own webui
--------------------------------------------------------------------------------------------------------------------
Distributed Column Oriented Database
- ontop of hdfs
ROW
Column
1.
OLTP
1. OLAP (has a single column with all values)
2. Single row
insert
2. Aggregation over
3.Small number of rows and
columns
3. High Compression
----------------------------------------------------------------------------------------------------------------
HBase
RDBMS
1.
Schema-less
1. Schema
2. for Wide
Tables
2. Thin tables
3.
Denormalized
3. Normalised
-----------------------------------------------------------------------------------------------------------
HBase
HDFS
1. Low latency
access
1. High Latency Access
2. Random
access
2. No concept of Random access
(e.g. Facebook )
--------------------------------------------------------------------------------------------------------------
HBase Architecture
1. Master Server - assign
region, handling load balancing, Finding out where the data is, sharding
a) when tables get bigger it
is splitted at the middle and distributed across the region server uniformally
b) when slow , just add more
region server
zookeeper - this is the guy to
whom master talks with
2. Region Server
Table
region
store
memstore(file gets stored
first and then flushed to hfile)
hfile(store file)
Column Family for grouping
similar data
Pseudo mode:
zookeper is for quorum
management.
Download Apache Hbase
1. From Apache website
2. unpack the tar
3. move to /usr/local/hbase
(with app permission to the appropriate user)
export $HBASE_HOME
export $PATH
Configuration:
vim
/usr/local/hbase/conf/hbase-env.sh
edit JAVA_HOME
vim
/usr/local/hbase/conf/hbase-site.xml
hbase.rootdir=hdfs://hw1:10001/hbase
- points hbase to hdfs and the directory is the shared directory used by all
region server
#hbase.zookeeper.quorum=zoo1,zoo2
- used to point to zookeeper node
#hbase.cluster.distributed=false
- for pseudo and stand alone mode
Region Server
vim vim
/usr/local/hbase/conf/regionservers
localhost for pseudo and ip
for distributed mode
command is start-hbase.sh
webui : hw1:60010
firing region servers -
command is
local-regionservers.sh start 1
2 3
Creating TABLE
hbase shell
create 'htest' ,'cf'
put 'htest','r1','cf:c1','v1'
put 'htest','r1','cf:c2','v2'
put 'htest','r1','cf:c3','v3'
scan 'htest'
cells are versioned in hbase
table
get 'htest','r2'
put
'htest','r2','cf:c2','v2updated'
get 'htest','r2'
delete 'htest','r3','cf:c3'
scan 'htest'
disable 'htest'
drop 'htest'
HBASE DATA-ACCESS
1. JAVA
2. HBASE SHELL
both 1&2 uses ClientAPI
3. REST - for text
4. AVRO - for Binary
5. THRIFT - for both Text and
Binary
6. HIVE
7. PIG
8. MapReduce
4,5 are interactive clients
6,7,8 for Batch Processing
Clients
Loading HBASE
1. IMPORT TSV - for tab
seperated values
2. Complete Bulk Load
3. MapReduce
4. Pig and Hive
Configuring Fully Distributed
HBASE
1. vim
/usr/local/hbase/conf/hbase-site.xml
Properties:
hbase.rootdir
same for region server
hbase.cluster.distributed -
true
hbase.zookeeper.quorum -
HNHBMaster
hbase.zookeeper.property.clientport
-2181
hbase.zookeeper.property.datadir
- path to the directory where zookeeper stores its data
these configurations are to be
scp to all region server too
start-hbase.sh
fires all services
No comments:
Post a Comment