ZooKeeper is an open source project to provide a configuration and synchronization service for cluster computing. With ZooKeeper, Hadoop YARN ResourceManager (RM) is supported with high availability. HBase, Storm and other software use ZooKeeper for coordinating the cluster.
ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C. (as quoted in ZooKeeper overview: http://zookeeper.apache.org/doc/trunk/zookeeperOver.html)
Figure 1. ZooKeeper Service
Chubby is a locking service with strong synchronization guarantees.
Zab is a leader-based atomic broadcast protocol used in ZooKeeper to guarantee that update operations satisfy linearizability.
ZooKeeper synchronizes every change to the tree of znodes across the ZooKeeper servers, ensemble. This way prevents inconsistency of the data by sharing the information. If one of the servers fails, the rest of them will replicate state and trees.
Znodes is a hierarchical name space of data registers and each znode has a path with a delimeter “/” like a directory structure. There are parent and child znodes.
Apache Qurator is a set of Java libraries for automatic ZooKeeper connection management with retries and easy development of new ZooKeeper recipes.
ZooKeeper runs in Java, release 1.6 or greater (JDK 6 or greater).
Java JDK
sudo apt-get update
sudo apt-get install openjdk-7-jdk
You can find other installation packages here: http://java.sun.com/javase/downloads/index.jsp
Download latest here: http://www.apache.org/dyn/closer.cgi/zookeeper/
Download 3.4.6
wget http://supergsego.com/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz
tar xzf zookeeper*.tar.gz
ln -s zookeeper-3.4.6 zookeeper
zoo.cfg
¶cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg
nano zookeeper/conf/zoo.cfg
Confirm the settings and update with:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
If you have multiple severs, zoo.cfg
has more values, for example:
server.1=10.0.0.2:2888:3888
server.2=10.0.0.3:2888:3888
server.3=10.0.0.4:2888:3888
It is server.id=host:port:port
Tip
Ensemble setup (multi-server) http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup
var/lib/zookeeper
(For multi-server)¶The myid file which stays in dataDir
contains a machine’s id. If you have 3
servers,the first server has 1 in the myid, and the second one has 2. The id must be
unique within the ensemble and should have a value between 1 and 255.
node 1
mkdir -r /var/lib/zookeeper
echo "1" > /var/lib/zookeeper/myid
node 2
mkdir -r /var/lib/zookeeper
echo "2" > /var/lib/zookeeper/myid
Now that you created the configuration file, you can start ZooKeeper:
zookeeper/bin/zkServer.sh start
zookeeper/bin/zkCli.sh
...
[zk: localhost:2181(CONNECTED) 0]
If you get access to other nodes:
zookeeper/bin/zkCli.sh -server [node ip address]:2181
Hunt, Patrick, et al. “ZooKeeper: Wait-free Coordination for Internet-scale Systems.” USENIX Annual Technical Conference. Vol. 8. 2010. [pdf]
This lesson is adopted from Apache ZooKeeper Documentation: http://zookeeper.apache.org/doc/trunk/zookeeperOver.html