===== Stack ===== .. sidebar:: Page Contents .. contents:: :local: Questions ========== About BDS ========= BDS is a collection of Ansible playbooks to deploy a stack of data analytics software. The current development version of BDS can be fond online here: https://github.com/futuresystems/big-data-stack/tree/unstable BDS Requirements ---------------- - Python 2.7 - Virtualenv - Pip - Git - ssh client - ssh-keys in github (currently a bug and needs to be fixed) - IP address of nodes to be controlled with privileged ssh user Using BDS --------- BDS is not a Python library or program and therefore cannot be installed using pip or other tools. It currently works by: #. ``git clone`` the bds repository #. ``./mk-inventory`` with the IP address to create the inventory file #. ``ansible-playbook play-hadoop.yml addons/...`` to install hadoop and any addons Integrating BDS with Cloudmesh Client ===================================== Proposed Cloudmesh Changes -------------------- Additional commands: #. ``cm stack`` #. ``cm hadoop`` Additional yaml file dir: - ``.cloudmesh/stack.yml`` ``cm stack`` ~~~~~~~~~~~~ ``cm stack`` provides the low-level tools to manage the BDS. This include: - check: sanity-checking to ensure the all requirements are complete - cloning and updating the local cache of BDS - creating and setting up a clone of BDS for the current project/deployment - deploying software onto pre-configured nodes ``cm hadoop`` ~~~~~~~~~~~~~ ``cm hadoop`` wrap several steps in order to deploy a virtual cluster. This includes: #. starting the machines on various providers (EC2, Chameleon, FutureSystems, etc) #. using ``cm stack`` to initialize, sanity check, and configure current project #. deploy software using ``cm stack`` ``.cloudmesh/stack.yml`` ~~~~~~~~~~~~~~~~~~~~~~~~ This file identifies the stacks that may be installed and used. For example:: $ cat ~/.cloudmesh/stack.yml stack: bds: repo: git://github.com/futuresystems/big-data-stack checkout: unstable This will allow ``cm stack`` to easily learn about different deployment stacks in the future. Use Case: Hadoop with Spark, HBase, Drill ----------------------------------------- This should be achievable with a single line:: $ cm hadoop \ --nodes 5 \ --cloud chameleon \ --with spark hbase drill \ --define spark_version=1.7.0 spark_package_type=src This will: - start 5 nodes (``--nodes 5``) on the chameleon cloud (``--on chameleon``) - install and configure hadoop - install and configure the apache spark, hbase, and drill packages - override ansible variables ``spark_version`` and ``spark_package_type`` (NOTE: the values passed must be supported by BDS). Implementation Overview ======================= This section describes possible implementation approaches Sanity Check ``cm stack check`` ---------------------------------- Example success:: $ cm stack check python.......OK virtualenv...OK pip..........OK ansible......OK git..........OK ssh..........OK github.......OK Example failure:: $ cm stack check python.......OK virtualenv...OK pip..........FAILED ansible......FAILED git..........OK ssh..........OK github.......FAILED The following errors were detected: * Pip is not installed correctly > `pip` not found in $PATH * Ansbile is not installed correctly > `ansible` related commands not found in $PATH * Authentication to github.com failed > did you add your public key to https://github.com/settings/ssh? ``cm stack check`` MUST: - verify that the python ecosystem and ansbile are installed. Do this by ensuring that the the following commands are in the ``$PATH`` and checking versions if applicable: - ``python`` (must be 2.7) - ``virtualenv`` - ``pip`` - ``ansible`` - ``ansible-playbook`` - ``ansible-vault`` - ``git`` - ``ssh`` - verify that keys are added to github. Do this by ensuring that the following command exits with 1:: $ ssh -T git@github.com Hi badi! You've successfully authenticated, but GitHub does not provide shell access. $ echo $? 1 Initialization ``cm stack init`` -------------------------------- Example:: $ cm stack init --branch unstable --user ubuntu 10.0.0.10 10.0.0.11 10.0.0.12 ``cm stack init`` MUST: - accept ``--branch `` to specify the branch name of the repository (eg ``master`` [default], ``unstable``) - accept ``--user `` to specify the name of this project. It not given, a default one must be chosen or generated. This project name is referred to below as ``$PROJ`` .. note:: ``.cloudmesh`` refers to ``$HOME/.cloudmesh`` or ``$PWD/.cloudmesh``, or wherever the ``.cloudmesh`` directory is found. .. note:: ``$BDS`` below refers to ``.cloudmesh/stack/bds`` - clone BDS from github to a local cache directory. This should be in ``$DBS/cache/bds.git``. - clone ``$BDS/cache/bds.git`` to ``$BDS/projects/$PROJ`` and checkout the branch that ``$BDS/cache/bds.git`` was on (default) or switch to the branch specified by ``--branch``. - within ``$BDS/projects/$PROJ`` run ``./mk-inventory -n $USER-$PROJ $IP1 $IP2 ... >inventory.txt`` where ``$IPN...`` is the list of ip addresses and ``$USER`` is the username of the owner of the local machine. - write the following information to ``$BDS/projects/$PROJ/.cloudmesh.yml``: - the parameter of ``--user`` - the list of ip addresses This will allow other programs to inspect properties about this specific project Listing Stacks ``cm stack list`` -------------------------------- Example:: $ cm stack list Deployment Stacks - BDS () ~/.cloudmesh/stack/bds/cache/bds.git Projects - > foo [] [] ~/.cloudmesh/stack/projects/foo - test-1 [] [] ~/.cloudmesh/stack/projects/test-1 - p1 [] [] ~/.cloudmesh/stack/projects/p1 - p2 [] [] ~/.cloudmesh/stack/projects/p2 ``cm stack list`` provides an interface to list the deployment stacks (eg BDS or others) and all the projcts using a stack. ``cm stack list`` MUST: - accept ``--sort `` where ``field`` can be ``date``, or ``stack``, or ``name`` (default: ``date`` - accept ``--list `` to list a subset of (``stack``, ``project``) - accept ``--json`` which will cause the output to be rendered using json so that other programs may easity parse the output Switching Projects ``cm stack project`` --------------------------------------- Example:: $ cm stack list --list project Projects - test-1 [] [] ~/.cloudmesh/stack/projects/test-1 - > p1 [] [] ~/.cloudmesh/stack/projects/p1 $ tm stack project p1 $ cm stack project test-1 Switched to project `test-1`` $ cm stack project test-1 $ cm stack list --list project Projects - > test-1 [] [] ~/.cloudmesh/stack/projects/test-1 - p1 [] [] ~/.cloudmesh/stack/projects/p1 Deploying Onto Nodes ``cm stack deploy`` ---------------------------------------- Example:: $ cm stack project p1 $ cm stack deploy bds \ --plays play-hadoop.yml addons/spark.yml addons/hbase.yml \ --define spark_version=1.7.0 Verifying that nodes are reachable...........OK Deploying play-hadoop.yml....................OK Deploying addons/spark.yml...................OK Deploying addons/hbase.yml...................OK Done. #. ``os.chdir($BDS/project/$PROJ)`` #. Verify nodes are reachable: ``until ansible all -m ping -u ; do sleep 5; done`` #. Deploy hadoop: ``ansible-playbook play-hadoop.yml -e spark_version=1.7.0`` #. Deploy spark: ``ansible-playbook addons/spark.yml -e spark_version=1.7.0`` #. Deploy hbase: ``ansible-playbook addons/hbase.yml -e spark_version=1.7.0`` Deploying Hadoop with Addons ``cm hadoop`` ------------------------------------------ Example:: $ cm hadoop --nodes 5 --cloud chameleon --with spark hbase drill