An idempotent tool to easily create and maintain HBase table.
When deploying an application in the Hadoop/HBase world, a common issue is to create the required table.
This is usually achieved using scripts. But, this can quickly become cumbersome and hard to maintain. And a nightmare when it come to updating a running application.
'jdc' stand for 'Just DesCribe'. You define all the namespace, table, column, properties of your HBase application in a simple YAML file and jdchtable will take care of deploying all theses object on your cluster.
In case of schema evolution, just change the description file, apply it again, and appropriate modification will be issued.
jdctable is a convergence tool. Its aim is to align the real configuration to what is described on the source file, while applying only strictly necessary modification.
This make jdchtable a fully idempotent tool, as all modern DevOps tools should be.
jdchtable is provided as rpm packages (Sorry, only this packaging is currently provided. Contribution welcome), on the release pages.
jdchtable MUST be used on properly configured Hadoop client node. (i.e hbase shell
must be functional)
Once installed, usage is the following:
# jdchtable --inputFile yourDescription.yml
Where yourDescription.yml
is a file containing your target HBase namespace and table description. jdchtable will then perform all required operation to reach this target state.
Note than if yourDescription.yml
content match the current configuration, no operation will be performed.
Here is a sample of such description.yml
file:
namespaces:
- name: testapp1
tables:
- name: testtable1
properties:
regionReplication: 1
durability: ASYNC_WAL
columnFamilies:
- name: cf1
properties:
cacheBloomsOnWrite: true
compressionType: NONE
- name: cf2
properties:
maxVersions: 12
minVersions: 2
timeToLive: 200000
presplit:
keysAsString:
- BENJAMIN
- JULIA
- MARTIN
- PAUL
- VALENTIN
-
namespaces:
This tag introduce a list of namespaces, each one with aname:
attribute and hosting one or several tables, under thetables:
attributeIf you don't want to use namespaces, simply define one with
name: default
-
Then, each table is described with:
-
A name: attribute, providing the table name.
-
A list of properties, allowing definition of table properties.
-
A list of columnFamilies, with, for each one, a name and a set of properties.
-
Optionally, a
presplit:
block, allowing an initial region split schema to be defined.
-
You will find table properties definition is HBase documentation. For a complete list, please refer to the Javadoc of the class org.apache.hadoop.hbase.HTableDescriptor
of your HBase version.
For the columnFamily properties, refer to the Javadoc of the class org.apache.hadoop.hbase.HColumnDescriptor
.
Please, note than unlike all other definition, presplitting is only effective at the initial table creation. If the table already exists, no modification is performed and the presplit: attribute is ignored.
Presplitting can be expressed with one othe the following 2 methods:
presplit:
keys:
- BENJAMIN
- JULIA
- MARTIN
- PAUL
- VALENTIN
or:
presplit:
startKey: "A"
endKey: "Z"
numRegion: 10
One can also use the notation "\xXX" to express binary values in the string. For example:
presplit:
keys:
- "\x33"
- "\x66"
- "\x99"
- "\xCC"
or:
presplit:
startKey: "\x00"
endKey: "\xFF"
numRegion: 5
Note the result will be the same for this two last expressions.
Internally, all theses strings are parsed using the function org.apache.hadoop.hbase.util.Bytes.toBytesBinary()
WARNING: All hexadecimal letter (A-F) must be upper case!
When launching the jdchtable command you may provide some optional parameters:
-
--defaultState
parameter will allow setting of allstate
value which are not explicitly defined. See below -
--configFile
parameter allow an Hadoop properties configuration file (such as hdfs-site.xml) to be added to the default set. This parameters can occur several times on the command line -
--principal
parameter allow to specify a principal for Kerberos authentication. If present,--keytab
parameter must also be defined. -
--keytab
parameter allow to specify a keytab for Kerberos authentication. If present,--principal
parameter must also be defined. -
--clientRetries
parameter allow to specify the number of connection attempts before failure (default: 6) -
--dumpConfigFile
Debuging purpose: All HBaseConfiguration will be dumped in this file
All namespaces, tables or columnFamilies not described in the description.yml
file will be left untouched.
To allow deletion to be performed, All theses object got a state:
attribute. When not defined, it default to present
, or to the value provided by the --defaultState
parameter. But it could be set to absent
to trigger the deletion of the corresponding entity.
For example:
namespaces:
- name: testapp1
tables:
- name: testtable1
properties:
regionReplication: 1
durability: ASYNC_WAL
columnFamilies:
- name: cf1
properties:
cacheBloomsOnWrite: true
compressionType: NONE
- name: cf2
state: absent
Will delete columnFamily cf2
(if existing) from previous configuration.
And:
namespaces:
- name: testapp1
state: absent
tables:
- name: testtable1
state: absent
Will remove all object created by our previous example.
Note, as a security, no cascading deletion from namespace to table will be performed. Deletion of a namespace can only be effective if all hosted table are explicitly deleted.
sometime, we don't want namespaces to be managed (created/deleted) by jdchtable. But, we need to refer them, in order to be able to define table in it.
In such case, one may provide a managed: no
flag:
namespaces:
- name: testapp1
managed: no
tables:
- ....
When a namespace is not managed by jdchtable, the following apply:
-
If
state: present
, then the namespace must exists on jdchtable execution. Otherwise, an error is generated. -
If
state: absent
, then the namespace must not exists on jdchtable execution. Otherwise, an error is generated.
This is useful for example to be able to create tables in the default
namespace:
namespaces:
- name: default
managed: no
state: present
tables:
- ....
Note also the state: present
attribute, which prevent an error to be generated if --defaultState
is set to absent
.
In the case your Hadoop cluster is protected by Kerberos, you have two methods to provide authentication.
-
Using the
--principal
and--keytab
parameters. -
Issue a
kinit
command before launching jdchtable. (You then can check your ticket with theklist
command).
In both case, the operation will be performed on behalf of the owner of the ticket. Ensure this user has got sufficient access privileges on HBase.
With its idempotency property, jdchtable is very easy to be orchestrated by usual DevOps tools like Chef, Puppet or Ansible.
You will find an Ansible role at this location.
This role can be used as following;
- hosts: sr1
vars:
jdchtable_rpm_url: https://github.com/BROADSoftware/jdchtable/releases/download/v0.2.0/jdchtable-0.2.0-1.noarch.rpm
myDescription:
namespaces:
- name: testapp1
tables:
- name: testtable1
properties:
regionReplication: 1
durability: ASYNC_WAL
columnFamilies:
- name: cf1
properties:
cacheBloomsOnWrite: true
compressionType: NONE
roles:
- { role: hadoop/jdchtable, jdchtable_description: "{{myDescription}}" }
Just clone this repository and then:
$ gradlew all
This should build everything. You should be able to find generated packages in build/distribution folder.
Copyright (C) 2016 BROADSoftware
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.