Please read the information in the overview page at
After doing so please return to this page. Identify a project suitable for this class, propose it and work on it.
There are several categories of software projects, which are detailed in lower sections:
You may propose a project in one of these categories, if you are doing a software projects.
Warning
These are non-trivial project and involve substantial work. Many students vastly underestimate the difficulty and the amount of time required. This is the reason why the project assignment is early on in the semester so you have ample time to propose and work on it. If you start the project 2 weeks before December (Note the early due data) We assume you may not finish.
All software projects must:
Be submitted via gitlab (a repository will be created for you)
Be reproducibly deployed
Assume you are given a username and a set of IP addresses. From this starting point, you should be able to deploy everything in a single command line invocation.
Warning
Do not assume that the username or IP address will be the ones you use during development and testing.
Provide a report in the docs/report
directory
LaTeX or Word may be used. Include the original sources as well as a PDF called report.pdf
(See Software Project for additional details on the
report format. You will be using 2 column ACM format we have used before.)
Provide a properly formatted README.rst
or README.md
in the root directory
The README should have the following sections:
Warning
in the past we got projects that had 10 pages installation instructions. Certainly that is not good and you will get point deductions. The installation should be possible in a couple of lines. A nice example is the installation of the development software in the ubuntu vm. Naturally you can use other technologies, other than ansible. Shell scrips, makefiles, python scripts are all acceptable.
A LICENSE
file (this should be the LICENSE
for Apache License Version 2.0)
All figures should include labels with the following format: label (units)
.
For example:
distance (meters)
volume (liters)
cost (USD)
All figures should have a caption describing what the measurement is, and a summary of the conclusions drawn.
For example:
This shows how A changes with regards to B, indicating that under conditions X, Y, Z, Alpha is 42 times better than otherwise.
Deployment projects focuses on automated software deployments on multiple nodes using automation tools such as Ansible, Chef, Puppet, Salt, or Juju. You are also allowed to use shell scripts, pdsh, vagrant, or fabric. For example, you could work on deploying Hadoop to a cluster of several machines. Use of Ansible is recommended and supported. Other tools such as Chef, Puppet, etc, will not be supported.
Note that it is not sufficient to merely deploy the software on the cluster. You must also demonstrate the use of the cluster by running some program on it and show the utilization of your entire cluster. You should also benchmark the deployment and running of your demonstration on several sizes of a cluster (eg 1, 3, 6, 10 nodes) (Note that these numbers are for example only).
We expect to see figures showing times for each (deployment, running) pair on for each cluster size, with error bars. This means that you need to run each benchmark multiple times (at least three times) in order to get the error bars. You should also demonstrate cluster utilization for each cluster size.
The program used for demonstration can be simple and straightforward. This is not the focus of this type of project.
It is allowable to use
for your projects. Note that on powerful desktop machines even virtualbox can run multiple vms. Use of docker is allowed, but you must make sure to use docker properly. In the past we had students that used docker but did not use it in the way it was designed for. Use of docker swarm is allowed.
Deployment projects must include a repeatable deployment framework that uses cmd5 and ansible. When using ansible it should be called from a custoom cmd5 program.
See also https://docs.google.com/document/d/1KylDsRBmVbCZSqGpRbzYwdzUGKFi92bkATwU03of5gw
Deployment projects must have EASY installation setup just as we demonstrated in the ubuntu image.
A command to manage the deployment must be written using python docopts that than starts your deployment and allows management of it. You can than from within this command call whatever other framework you use to manage it. The docopts manual page should be designed first and discussed in the team for completeness.
Using argparse and other python commandline interface environments is not allowed.
Deployment project will not only deply the farmewor, but either provide a sophisticated benchmark while doing a simple analysis using the deployed software.
Analytics projects focus on data exploration. For this type of
projects, you should focus on analysis of a dataset (see
datasets for starting points). The key here is to take a
dataset and extract some meaningful information from in using tools
such as scikit-learn
, mllib
, or others. You should be able to
provide graphs, descriptions for your graphs, and argue for
conclusions drawn from your analysis.
Your deployment should handle the process of downloading and installing the required datasets and pushing the analysis code to the remote node. You should provide instructions on how to run and interpret your analysis code in your README.
An analytocs project may focus on a sophisticated and academically correct usage of an analytics of data. It must be significant and not just a simple replication of what others have done before.
This project can also be executed as bonus project to gather information about the feasability of existing databases.
It would be important to identify also how to potentially merge these databases into a single world map and derive statistics from them. This project can be done on your local machines. Not more than 6 people can work on this.
Identify someone that has experience with android and/or iphone programming Design an application that preferably works on iphone and android that allows a user while driving to
Make sure the app is ready early so others can test and use it and you can collect data.
Before starting the project identify if such an application already exists.
If more than 6 people sign up we may build a second group doing something similar, maybe potholes ..
Gregor would like to get this project or at least the database search query staffed.
Given millions of publications how do we identify if an author of paper a with the name Will Smith is the sam as the author of paper 2 with the name Will Smith, or William Smith, or W. Smith. AUthor databases are either provided in bibtex format, or a database that can not be shared outside of this class. YOu may have to add additional information from IEEE explorer, rsearch gate, ISI, or other online databases.
Identify further issues and discuss solutions to them. Example, an author name changes, the author changes the institution.
Do a comprehensive literature review
Some ideas:
A possible good start is a previous project published at
There are also some screenshots available: