Stack¶
About BDS¶
BDS is a collection of Ansible playbooks to deploy a stack of data analytics software. The current development version of BDS can be fond online here: https://github.com/futuresystems/big-data-stack/tree/unstable
BDS Requirements¶
- Python 2.7
- Virtualenv
- Pip
- Git
- ssh client
- ssh-keys in github (currently a bug and needs to be fixed)
- IP address of nodes to be controlled with privileged ssh user
Using BDS¶
BDS is not a Python library or program and therefore cannot be installed using pip or other tools. It currently works by:
git clonethe bds repository./mk-inventorywith the IP address to create the inventory fileansible-playbook play-hadoop.yml addons/...to install hadoop and any addons
Integrating BDS with Cloudmesh Client¶
Proposed Cloudmesh Changes¶
Additional commands:
cm stackcm hadoop
Additional yaml file dir:
.cloudmesh/stack.yml
cm stack¶
cm stack provides the low-level tools to manage the BDS. This include:
- check: sanity-checking to ensure the all requirements are complete
- cloning and updating the local cache of BDS
- creating and setting up a clone of BDS for the current project/deployment
- deploying software onto pre-configured nodes
cm hadoop¶
cm hadoop wrap several steps in order to deploy a virtual cluster. This includes:
- starting the machines on various providers (EC2, Chameleon, FutureSystems, etc)
- using
cm stackto initialize, sanity check, and configure current project - deploy software using
cm stack
.cloudmesh/stack.yml¶
This file identifies the stacks that may be installed and used. For example:
$ cat ~/.cloudmesh/stack.yml
stack:
bds:
repo: git://github.com/futuresystems/big-data-stack
checkout: unstable
This will allow cm stack to easily learn about different
deployment stacks in the future.
Use Case: Hadoop with Spark, HBase, Drill¶
This should be achievable with a single line:
$ cm hadoop \
--nodes 5 \
--cloud chameleon \
--with spark hbase drill \
--define spark_version=1.7.0 spark_package_type=src
This will:
- start 5 nodes (
--nodes 5) on the chameleon cloud (--on chameleon) - install and configure hadoop
- install and configure the apache spark, hbase, and drill packages
- override ansible variables
spark_versionandspark_package_type(NOTE: the values passed must be supported by BDS).
Implementation Overview¶
This section describes possible implementation approaches
Sanity Check cm stack check¶
Example success:
$ cm stack check
python.......OK
virtualenv...OK
pip..........OK
ansible......OK
git..........OK
ssh..........OK
github.......OK
Example failure:
$ cm stack check
python.......OK
virtualenv...OK
pip..........FAILED
ansible......FAILED
git..........OK
ssh..........OK
github.......FAILED
The following errors were detected:
* Pip is not installed correctly
> `pip` not found in $PATH
* Ansbile is not installed correctly
> `ansible` related commands not found in $PATH
* Authentication to github.com failed
> did you add your public key to https://github.com/settings/ssh?
cm stack check MUST:
verify that the python ecosystem and ansbile are installed. Do this by ensuring that the the following commands are in the
$PATHand checking versions if applicable:python(must be 2.7)virtualenvpipansibleansible-playbookansible-vaultgitssh
verify that keys are added to github. Do this by ensuring that the following command exits with 1:
$ ssh -T git@github.com Hi badi! You've successfully authenticated, but GitHub does not provide shell access. $ echo $? 1
Initialization cm stack init¶
Example:
$ cm stack init --branch unstable --user ubuntu 10.0.0.10 10.0.0.11 10.0.0.12
cm stack init MUST:
- accept
--branch <branchname>to specify the branch name of the repository (egmaster[default],unstable) - accept
--user <usernameto specify the ssh-login username on the nodes. This user MUST have privileges to manage the node. - accept a list of IP addresses as the nodes to control
- accept
--name <project name>to specify the name of this project. It not given, a default one must be chosen or generated. This project name is referred to below as$PROJ
Note
.cloudmesh refers to $HOME/.cloudmesh or
$PWD/.cloudmesh, or wherever the .cloudmesh directory is
found.
Note
$BDS below refers to .cloudmesh/stack/bds
clone BDS from github to a local cache directory. This should be in
$DBS/cache/bds.git.clone
$BDS/cache/bds.gitto$BDS/projects/$PROJand checkout the branch that$BDS/cache/bds.gitwas on (default) or switch to the branch specified by--branch.within
$BDS/projects/$PROJrun./mk-inventory -n $USER-$PROJ $IP1 $IP2 ... >inventory.txtwhere$IPN...is the list of ip addresses and$USERis the username of the owner of the local machine.write the following information to
$BDS/projects/$PROJ/.cloudmesh.yml:- the parameter of
--user - the list of ip addresses
This will allow other programs to inspect properties about this specific project
- the parameter of
Listing Stacks cm stack list¶
Example:
$ cm stack list
Deployment Stacks
- BDS (<version or branchname>) ~/.cloudmesh/stack/bds/cache/bds.git
Projects
- > foo [<stack name eg BDS>] [<date created>] ~/.cloudmesh/stack/projects/foo
- test-1 [<stack name eg BDS>] [<date created>] ~/.cloudmesh/stack/projects/test-1
- p1 [<stack name eg BDS>] [<date created>] ~/.cloudmesh/stack/projects/p1
- p2 [<stack name eg BDS>] [<date created>] ~/.cloudmesh/stack/projects/p2
cm stack list provides an interface to list the deployment stacks (eg BDS or others) and all the projcts using a stack.
cm stack list MUST:
- accept
--sort <field>wherefieldcan bedate, orstack, orname(default:date - accept
--list <field,...>to list a subset of (stack,project) - accept
--jsonwhich will cause the output to be rendered using json so that other programs may easity parse the output
Switching Projects cm stack project¶
Example:
$ cm stack list --list project
Projects
- test-1 [<stack name eg BDS>] [<date created>] ~/.cloudmesh/stack/projects/test-1
- > p1 [<stack name eg BDS>] [<date created>] ~/.cloudmesh/stack/projects/p1
$ tm stack project
p1
$ cm stack project test-1
Switched to project `test-1``
$ cm stack project
test-1
$ cm stack list --list project
Projects
- > test-1 [<stack name eg BDS>] [<date created>] ~/.cloudmesh/stack/projects/test-1
- p1 [<stack name eg BDS>] [<date created>] ~/.cloudmesh/stack/projects/p1
Deploying Onto Nodes cm stack deploy¶
Example:
$ cm stack project
p1
$ cm stack deploy bds \
--plays play-hadoop.yml addons/spark.yml addons/hbase.yml \
--define spark_version=1.7.0
Verifying that nodes are reachable...........OK
Deploying play-hadoop.yml....................OK
Deploying addons/spark.yml...................OK
Deploying addons/hbase.yml...................OK
Done.
os.chdir($BDS/project/$PROJ)- Verify nodes are reachable:
until ansible all -m ping -u <username>; do sleep 5; done - Deploy hadoop:
ansible-playbook play-hadoop.yml -e spark_version=1.7.0 - Deploy spark:
ansible-playbook addons/spark.yml -e spark_version=1.7.0 - Deploy hbase:
ansible-playbook addons/hbase.yml -e spark_version=1.7.0
Deploying Hadoop with Addons cm hadoop¶
Example:
$ cm hadoop --nodes 5 --cloud chameleon --with spark hbase drill