=====
Stack
=====

.. sidebar:: Page Contents

   .. contents:: :local:
   

Questions
==========

   
About BDS
=========

BDS is a collection of Ansible playbooks to deploy a stack of data
analytics software. The current development version of BDS can be fond
online here:
https://github.com/futuresystems/big-data-stack/tree/unstable


BDS Requirements
----------------

- Python 2.7
- Virtualenv
- Pip
- Git
- ssh client
- ssh-keys in github (currently a bug and needs to be fixed)
- IP address of nodes to be controlled with privileged ssh user


Using BDS
---------

BDS is not a Python library or program and therefore cannot be
installed using pip or other tools. It currently works by:

#. ``git clone`` the bds repository
#. ``./mk-inventory`` with the IP address to create the inventory file
#. ``ansible-playbook play-hadoop.yml addons/...`` to install hadoop and any addons


Integrating BDS with Cloudmesh Client
=====================================


Proposed Cloudmesh Changes
--------------------

Additional commands:

#. ``cm stack``
#. ``cm hadoop``


Additional yaml file dir:

- ``.cloudmesh/stack.yml``


``cm stack``
~~~~~~~~~~~~

``cm stack`` provides the low-level tools to manage the BDS. This include:

- check: sanity-checking to ensure the all requirements are complete
- cloning and updating the local cache of BDS
- creating and setting up a clone of BDS for the current project/deployment
- deploying software onto pre-configured nodes


``cm hadoop``
~~~~~~~~~~~~~

``cm hadoop`` wrap several steps in order to deploy a virtual cluster. This includes:

#. starting the machines on various providers (EC2, Chameleon, FutureSystems, etc)
#. using ``cm stack`` to initialize, sanity check, and configure current project
#. deploy software using ``cm stack``


``.cloudmesh/stack.yml``
~~~~~~~~~~~~~~~~~~~~~~~~

This file identifies the stacks that may be installed and used.
For example::

  $ cat ~/.cloudmesh/stack.yml
  stack:
    bds:
      repo: git://github.com/futuresystems/big-data-stack
      checkout: unstable

This will allow ``cm stack`` to easily learn about different
deployment stacks in the future.


Use Case: Hadoop with Spark, HBase, Drill
-----------------------------------------

This should be achievable with a single line::

  $ cm hadoop \
      --nodes 5 \
      --cloud chameleon \
      --with spark hbase drill \
      --define spark_version=1.7.0 spark_package_type=src


This will:

- start 5 nodes (``--nodes 5``) on the chameleon cloud (``--on chameleon``)
- install and configure hadoop
- install and configure the apache spark, hbase, and drill packages
- override ansible variables ``spark_version`` and ``spark_package_type`` (NOTE: the values passed must be supported by BDS).


Implementation Overview
=======================

This section describes possible implementation approaches


Sanity Check ``cm stack check``
----------------------------------

Example success::

  $ cm stack check
  python.......OK
  virtualenv...OK
  pip..........OK
  ansible......OK
  git..........OK
  ssh..........OK
  github.......OK


Example failure::

  $ cm stack check
  python.......OK
  virtualenv...OK
  pip..........FAILED
  ansible......FAILED
  git..........OK
  ssh..........OK
  github.......FAILED

  The following errors were detected:

  * Pip is not installed correctly
    > `pip` not found in $PATH
  * Ansbile is not installed correctly
    > `ansible` related commands not found in $PATH
  * Authentication to github.com failed
    > did you add your public key to https://github.com/settings/ssh?


``cm stack check`` MUST:

- verify that the python ecosystem and ansbile are installed. Do this
  by ensuring that the the following commands are in the ``$PATH`` and
  checking versions if applicable:

  - ``python`` (must be 2.7)
  - ``virtualenv``
  - ``pip``
  - ``ansible``
  - ``ansible-playbook``
  - ``ansible-vault``
  - ``git``
  - ``ssh``

- verify that keys are added to github. Do this by ensuring that the following command exits with 1::

    $ ssh -T git@github.com
    Hi badi! You've successfully authenticated, but GitHub does not provide shell access.
    $ echo $?
    1


Initialization ``cm stack init``
--------------------------------


Example::

  $ cm stack init --branch unstable --user ubuntu 10.0.0.10 10.0.0.11 10.0.0.12


``cm stack init`` MUST:

- accept ``--branch <branchname>`` to specify the branch name of the repository (eg ``master`` [default], ``unstable``)

- accept ``--user <username`` to specify the ssh-login username on the nodes. This user MUST have privileges to manage the node.

- accept a list of IP addresses as the nodes to control

- accept ``--name <project name>`` to specify the name of this project. It not given, a default one must be chosen or generated. This project name is referred to below as ``$PROJ``

.. note::

   ``.cloudmesh`` refers to ``$HOME/.cloudmesh`` or
   ``$PWD/.cloudmesh``, or wherever the ``.cloudmesh`` directory is
   found.

.. note::

   ``$BDS`` below refers to ``.cloudmesh/stack/bds``

- clone BDS from github to a local cache directory. This should be in ``$DBS/cache/bds.git``.

- clone ``$BDS/cache/bds.git`` to ``$BDS/projects/$PROJ`` and checkout the branch that ``$BDS/cache/bds.git`` was on (default) or switch to the branch specified by ``--branch``.

- within ``$BDS/projects/$PROJ`` run ``./mk-inventory -n $USER-$PROJ $IP1 $IP2 ... >inventory.txt`` where ``$IPN...``  is the list of ip addresses and ``$USER`` is the username of the owner of the local machine.

- write the following information to ``$BDS/projects/$PROJ/.cloudmesh.yml``:

  - the parameter of ``--user``
  - the list of ip addresses

  This will allow other programs to inspect properties about this specific project
    

Listing Stacks ``cm stack list``
--------------------------------

Example::

  $ cm stack list
  Deployment Stacks
  - BDS (<version or branchname>)  ~/.cloudmesh/stack/bds/cache/bds.git

  Projects
  - > foo    [<stack name eg BDS>]  [<date created>]     ~/.cloudmesh/stack/projects/foo
  -   test-1 [<stack name eg BDS>]  [<date created>]     ~/.cloudmesh/stack/projects/test-1
  -   p1     [<stack name eg BDS>]  [<date created>]     ~/.cloudmesh/stack/projects/p1
  -   p2     [<stack name eg BDS>]  [<date created>]     ~/.cloudmesh/stack/projects/p2


``cm stack list`` provides an interface to list the deployment stacks (eg BDS or others) and all the projcts using a stack.

``cm stack list`` MUST:

- accept ``--sort <field>`` where ``field`` can be ``date``, or ``stack``, or ``name`` (default: ``date``

- accept ``--list <field,...>`` to list a subset of (``stack``, ``project``)

- accept ``--json`` which will cause the output to be rendered using json so that other programs may easity parse the output


Switching Projects ``cm stack project``
---------------------------------------

Example::

  $ cm stack list --list project
  Projects
  -   test-1 [<stack name eg BDS>]  [<date created>]     ~/.cloudmesh/stack/projects/test-1
  - > p1     [<stack name eg BDS>]  [<date created>]     ~/.cloudmesh/stack/projects/p1
  

  $ tm stack project
  p1

  $ cm stack project test-1
  Switched to project `test-1``

  $ cm stack project
  test-1

  $ cm stack list --list project
  Projects
  - > test-1 [<stack name eg BDS>]  [<date created>]     ~/.cloudmesh/stack/projects/test-1
  -   p1     [<stack name eg BDS>]  [<date created>]     ~/.cloudmesh/stack/projects/p1


Deploying Onto Nodes ``cm stack deploy``
----------------------------------------


Example::

  $ cm stack project
  p1

  $ cm stack deploy bds \
      --plays play-hadoop.yml addons/spark.yml addons/hbase.yml \
      --define spark_version=1.7.0 
  Verifying that nodes are reachable...........OK
  Deploying play-hadoop.yml....................OK
  Deploying addons/spark.yml...................OK
  Deploying addons/hbase.yml...................OK

  Done.


#. ``os.chdir($BDS/project/$PROJ)``
#. Verify nodes are reachable: ``until ansible all -m ping -u <username>; do sleep 5; done``
#. Deploy hadoop: ``ansible-playbook play-hadoop.yml -e spark_version=1.7.0``
#. Deploy spark: ``ansible-playbook addons/spark.yml -e spark_version=1.7.0``
#. Deploy hbase: ``ansible-playbook addons/hbase.yml -e spark_version=1.7.0``


Deploying Hadoop with Addons ``cm hadoop``
------------------------------------------


Example::

  $ cm hadoop --nodes 5 --cloud chameleon --with spark hbase drill