FAQ

What are the prerequisites for this class?

We have communicated the prerequisites to the university, but they may have forgotten to add them to the class. The perquisites can be found in the appropriate class overview. Please review them carefully.

See:

I am full time student at IUPUI. Can I take the online version?

Yes you can.

If you are an international student, I suggest you verify this with the office and the registrar. There may be some restrictions for international students. Also some degree programs may have a limit or do not allow to take online classes. It will be up to you to verify the requirements with the appropriate administrators.

I am a residential student at IU. Can I take the online version only?

We recommend you take the residential class.

If you are an international student or a student of a particular degree program restrictions may be placed in if and how many online courses you can take. It will be up to you to contact the appropriate administrative departments including the international student office to verify what is allowed for you. In general international students have such restrictions. Please find out what they are and which section of the course is appropriate for you.

The class is full what do I do?

  1. Make sure to put yourself on the waiting list.
  2. If you are a residential student show up on the first class in the specified lecture room. More likely than not some students will enroll in more classes than they can do and places will free up. We will create a list and discuss with the registrar what to do.

Do I need to buy a textbook?

We cover a wide range of topics and their subject-matter is constantly undergoing changes. A textbook would be out of date by the time of publishing. No, the resources will be provided for every unit. However, we recommend that you identify useful books for the class that can help you.

  1. Some O’Reiley books my come in handy.
  2. “Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics”, Bill Franks Wiley ISBN: 978-1-118-20878-6
  3. “Doing Data Science: Straight Talk from the Frontline”, Cathy O’Neil, Rachel Schutt, O’Reilly Media, ISBN 978-1449358655
  4. Big data: The next frontier for innovation, competition, and productivity

If you find good books, we like to add them here.

Do I need a computer to participate in this class?

If you are an online student you need access to a computer. If you are a residential student the facilities provided by SOIC will be sufficient. However, as you study involves computers, its probably important to evaluate if a computer will make your work easier.

If it comes to what computer to buy we really do not have a good recommendation as this depends on your budget. A computer running Linux or OSX makes programming probably easier. A windows computer has the advantage of also being able to run Word and ppt (so does OSX). A cheap machine with multiple cores and sufficient memory (16GB+) is a good idea. A SSD will make access to data especially if large data snappy.

For this reason I myself use a Mac, but you probably can get much cheaper machines with similar specs elsewhere.

Other students bought themselves a cheap computer and installed Linux on it so they do not interfere with their work machines or with Windows. Given how inexpensive computers these days are this may be a reasonable idea. However, do not go too cheap have enough memory and use an SSD if you can.

Where is the official IU calendar for the Fall?

Please follow this link

How do I ask a question?

How to write a research article on computer science?

  1. A good lecture about this is presented by Simon Peyton Jones, Microsoft Research https://www.youtube.com/watch?v=g3dkRsTqdDA

Other resources may inspire you also:

  1. https://globaljournals.org/guidelines-tips/research-paper-publishing
  2. http://www.cs.columbia.edu/-hgs/etc/writing-style.html
  3. https://www.quora.com/How-do-I-write-a-research-paper-for-a-computer-science-journal

Which bibliography manager is required for the class?

We require you use jabref:

  1. http://www.jabref.org/

Can I use endnote or other bibliography managers?

No. Jabref is best for us and we do require that you hand in all bibliographies while cleaning and transferring them to jabref. We will not accept any other bibliography tool such as:

  1. http://endnote.com/
  2. http://libguides.utoledo.edu/c.php?g=284330&p=1895338
  3. https://www.mendeley.com/
  4. https://community.mendeley.com/guides/using-citation-editor/05-creating-bibliography
  5. https://www.zotero.org

How many hours will this course take to work on every week?

This question can not rely be answered precisely. Typically we have 2-3 hours video per week. However starting from that its difficult to put a real number down as things may also depend on your background.

  • The programming load is modest, but requires knowledge in python and linux systems which you may have to learn outside of this class.
  • Some students have more experience than others, thus it may be possible to put in 6 hours per week overall, but other may have to put in 12 hours, while yet others may enjoy this class so much that they spend a lot more hours.
  • We will certainly not stopping you form spending time in the class. It will be up to you to figure out how much time you will spend.
  • Please remember that procrastination will not pay of in this class.
  • The project or term paper will take a significant amount of time.

Is all classes material final?

No. Class material can change. Please remember that in a normal class you will be given several hours of lectures a week. They will be released on a weekly basis. What we do here is to release the material as much as possible upfront and correct them when we find it necessary to provide improvements or additions. Additionally, we integrate your feedback into the classes. If you find errors on the class Web page or have additions that you want to add, we would like to hear from you. Pull requests can be issued by you so your contributions get acknowledged and rewarded as part of the grade.

What are the changes to the web page?

The changes we make are typically fixing errata or clarification of content. We do attempt to indicate when major change is made.

What lectures should I learn when?

The class is structured in lectures that you can listen to at any time. If you have difficulties with organizing your own calendar, we will develop a sample calendar for you. Please contact us. However we have undergraduates, graduates, residential and online students. We even have students that can only work part of the semester while they use their vacation. Hence, it is impossible for us to provide an exact calendar that satisfies all the different types of students. Hence we appeal to your organizational skills to create a “study” plan for you during the first week of the semester that works for you.

We recommend to do the theory lectures as quickly as possible, but also start learning ansible at the same time as this will be part of your project. You will fail if you assume you can do the project in 2 weeks. You will need to work on it all semester long on weekly basis, starting with learning how to use ansible and cloud resources.

I524: Why are you doing the papers?

Part of doing research is to communicate your thoughts on topics and to be able to analyze and evaluate technologies that may or may not be useful for you. Our goal within this class is for the first time to gather a significant portion of the technologies that you hear about in class and that you get exposed to as part of the technology list into a “proceedings” developed by all students in class. The papers serve also the dual purpose of you learning how to write a paper and use bibliographies.

I524: Why are there no homework to test me on skills such as ansible or python?

We used to do smaller homework in previous classes to evaluate you on your skills. However we found that they did not reflect real-world use cases. By focusing on the project instead, you will be forced to develop these skills.

However, we can provide you with additional ungraded homework that you can conduct to test your skills if you like. Please let us know if you like to do that and we can assign such homework to you.

I524: Why not use chef or another DevOps framework?

We used to use chef and other DevOps frameworks. However we found that for a class grading can not be uniformly conducted while using too many frameworks. We also found that the value of learning on how to collaboratively contribute as part of an opensource class was diminished while a small group were choosing other technologies. These groups complained later on that they had too much work and could not benefit from other students. Hence we make is simple. All DevOps must be provided in ansible. All programming must be provided in python if not an explicit reason exist to use another language or technology such as R or technologies such as neo4j. However all deployment must be done in python and ansible.

I am lost?

Please contact the instructors for your class.

I do not like Technology/Topic/Project/etc?

Please contact the instructors for your class.

I am not able to attend the online hours

Typically we provide many different times for meetings via Zoom. We even schedule within reason special sessions. All of them are however during reasonable hours in United States Easter Standard Time.

Do I need to attend the online sessions?

No. But you can ask any question you want. We found that in previous classes that some students clearly benefitted from online sessions. If you attend them make sure you have a working and tested microphone if possible.

What are the leaning outcomes?

If you feel that they are not clearly stated as part of the course please contact us so that we can clarify the material.

There are so many messages on Piazza I can not keep up.

Residential students typically participate in live lectures in which we discuss with each other important aspects of a topic. As an online class may not have such a lecture, the piazza posts are just a replacement of them. It is required that you read the posts and decide which of them are relevant for you. In a lecture room you will find also that one student asks a question, while the professor answers the question to the entire class.

I do not know python. What do I do?

This class requires python. Please learn it.

Tips: TechList.1 homework

Warning

why is this not placed in techlist-hw.rst?

Citations

Do not mention the authors of a citation that you use.

Example do not say:

As Gregor von Laszewski pointed out with flowery words in an article published recently …. [1]

Instead use: In [1] …

Naturally you should use the cite command.

Spelling

  • use a space after periods, and commas in a centence
  • use a spellchecker
  • do the indentation properly as demonstrated in the examples. (use fixed width font to edit RST to see it more easily)

Github

  • when dounig your pull request, make sure you do not have any conflists, rebase if needed

Rubric

We already commented on what a good entry looks like so its rather simple, avoid plagiarism, subsections in the text, keep bullet lists minimal, be short but provide enough detail, dont just copy from the web page, relate technology to big data if you can

  • a write a good introdcution to the technology that summarizes what it is (and if possible how it relates to big data)
  • include the most important refernces and prepare them in correct bibtex format
  • check in your contribution (obviously if you can not do that ask for help form the TAs so you get educated on git)
  • you get 50% of your points from the writeup and 50% of the points from the bibliography

You are allowed to work in teams to improve your own submissions.

Timeliness

You will safe yourself a lot of hazle if you check in your assignment early. ON the last day typically a lot of checkins happen and may require you to do a rebase. The sooner you do it the easier for you.

Outdated Tech ology

One of the technology assigned to me is ‘Ninefold’. It seems ninefold has shutdown their cloud service on January 30, 2016. Should I write a tech summary for ninefold or do we have remove this from the techlist as it is no longer in operation?

Kindly refer:

http://ninefold.com/

http://ninefold.com/news/

Note: Outdated and unnecessary technologies will be removed by the TAs.

Techlist 1 and Paper 1 : Pagecount

TechLIst = a couple of paragraphs (so real short, see the NAGIOS example

Paper 1 = 2 pages in the format we specified, images and refs not included. See at the end of the paper format for a suitable layout

PS: If your paper is longer or if it a paragraph short that does not matter to us, important is the content

Tips to Install Virtualbox

A video on how to install virtual box on windows 10 can be seen as part of an unrelated course on youtube at

https://www.youtube.com/watch?v=XvCUpZuHgvo

It is a bit wordy as the presenter complains about the difficulties to record videos on windows 10, and talks about his course, so just ignore these portions. Naturally use whatever is the newest version. Here is one for Windows 8 which also contains ubuntu install (use the one above on how to install vb on windows 10 and ignore that part form the window bellow)

https://www.youtube.com/watch?v=13GS1cLyk-E

Do I generate the SSH key on Ubuntu VM ?

I have installed Ubuntu(on virtual box) on my windows 10 system. I wanted to confirm if the SSH key should be created on the Ubuntu VM? Yes we need to generate ssh on Ubuntu VM, because even it is a VM or a real machine we have to set up ssh in order to work with ssh based communication, in order to maintain security when you are using an application like Github.

You need to generate SSH, no matter what operating system you are using or on which operating system you are running VM.

First let us revisit what an ssh key is for. A key pair has a public and a private key pair. If a remote machine has the public key from another machine you will be able to login to that machine form the machine where you have created the public and private key pair from. Some services do require key authentication. Such services include:

  1. login to any virtual machine
  2. using github
  3. login to the login nodes of futuresystems

Thus if you like ta access any of them any computer on which you want to access them from need a key pair. (or key as we sometimes abbreviate).

So if you like to access from your ubuntu vm future systems which you want you need one, if you want to access githu, you need one, if you want to login to vas on chameleon cloud you need one, if you want to login to vas on jetstreem you need one, if you want …. you need one.

So the answer is yes. Under no circumstances copy the private key to another computer as that is a security violation. You can only copy the public key. That is the reason its called public. On each machine where you like to access these services you need to create a different key and add the public key to the remote services/machines you want to access.

Ways to run Ubuntu on Windows 10

There are multiple ways to get ubuntu onto Windows.

a) The recommended way to do it is via virtual box which seems to work for most, but requires sometimes that the bios settings need to be adjusted. Naturally we do not know what your bios settings are so you need to figure this out from the internet. However in 99% of the cases virtual box works nicely.A student tip describes what needs to be done:

You need the virtual box software (https://www.virtualbox.org/wiki/Downloads) that corresponds to the operating system running on the physical machine in front of you. Then download the Ubuntu 16.04 .iso file (https://www.ubuntu.com/download/desktop) to your computer. Start virtual box. I think a wizard starts to guide you through setting up a new virtual machine when you choose “new”. Then brows to where you downloaded the iso file and click on it. you will have to start this and ubuntu will start installing. (improve this description if something is not clear)

b) the other way of installing bash on windows is as subsystem as documented by your fellow students. This may not fulfill the requirements of running ansible, but it will help you to get started quickly while running bash on your host directly. It is often referred to as “ubuntu on windows”.

http://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10

If you want to use one method, do a)

How can I download lecture sildes ?

Please refer to the following link. https://cloudmesh.github.io/classes/i524/lectures.html

Don’t use Anaconda

We use python 2.7.13 for this class. It is better to use Virtualenv and pip. And for the IDE, you can use PyCharm. This is the open source way of doing python, while we use 2.7 because not everything is yet available in 3.5. We do not recommend Anaconda or Canopy. In fact we found issues with both. Especially with Canopy. It was incompatible with libraries the open source community uses and it negatively effected a students system wide python install. We had to reinstall python completely after we uninstalled canopy. Unfortunately it did cost us a lot of time to fix this. TAs will not provide any help in case you use anaconda or canopy.

Using SSH Key for Git Push

When you cloned your repository did you use SSH rather than HTTPS? Your clone command should look like this:

$ git clone git@github.com:YOUR_USERNAME/classes.git

You can use git remote set-url as described here to change from HTTPS to SSH: https://help.github.com/articles/changing-a-remote-s-url/

Changing the origin remote (as opposed to both origin and upstream) will be sufficient, since this is the only one you push into.

Can I write the papers on OSX?

Yes of course you can write papers on OSX. But we support for Ubuntu 16.04, because we consider it as the main OS that we use in this class. You can use, VM to install Ubuntu and use it for class work.

What is the nature of team collaboration on papers

You can build teams of three. You need to yourself build the team. The web page tells you that there will be no reduction in numbers of papers you write = number of team members * 3, papers can not be combined.

What is the nature of team collaboration on papers

You can build teams of three. You need to yourself build the team. The web page tells you that there will be no reduction in numbers of papers you write = number of team members * 3, papers can not be combined.

What are the due dates for assignments

Due dates are posed on the Web page calendar.

How to install Matplotlib?

Follow the installation in the class documentation properly.

Install the requirements:

$ pip install -r requirements.txt

Install matplotlib using pip:

$ pip install matplotlib

Install python Tkinter packages:

$ sudo apt-get install python-tk

How to test if your OS can install cloudmesh_client

In installation of Cloudmesh Client, there may be extra packages that has to be installed.Missing a few dependencies for cryptography.

Since this SO question keeps coming up I’ll drop a response here too (I am one of the pyca/cryptography developers). Here’s what you need to reliably install pyca/cryptography on the 3 major platforms.

Please note in all these cases it is highly recommended that you install into a virtualenv and not into the global package space. This is not specific to cryptography but rather is generic advice to keep your Python installation reliable. The global package space in OS provided Pythons is owned by the system and installing things via pip into it is asking for trouble.

Windows

Upgrade to the latest pip (8.1.2 as of June 2016) and just pip install cryptography

cryptography and cffi are both shipped as statically linked wheels.

OS X

Ono OSX you need to install xcode.

Upgrade to the latest pip (8.1.2 as of June 2016) and just pip install cryptography

cryptography and cffi are both shipped as statically linked wheels. This will work for pyenv Python, system Python, homebrew Python, etc. As long as you’re on the latest pip you won’t even need a compiler.

Linux

On Linux you’ll need a C compiler, libffi + its development headers, and openssl + its development headers.

Debian or Ubuntu derived distributions

apt-get install build-essential libssl-dev libffi-dev python-dev followed by:

pip install cryptography

Red Hat derived distributions:

yum install gcc openssl-devel libffi-devel python-devel followed by

pip install cryptography

Tips to write a Good Paper

This that must be avoided to write a good paper.

Why Technology xyz

(and makeing sure to include a ? ;)

Instead give the section a good name that is not a question, such as

Introduction

Design

Architecture

Performance

Comparison

Big Data Use cases

Conclusion

And there are many more different things.

Make sure to write a good paper avoiding these headings when you start a sub-section in your paper.