This page may be updated throughout Fall 2016, we recommend to review this page weekly for changes.
The Big Data Applications and Analytics course is an overview course in Data Science and covers the applications and technologies (data analytics and clouds) needed to process the application data. It is organized around rallying cry: Use Clouds running Data Analytics Collaboratively processing Big Data to solve problems in X-Informatics.
This course is offered for Graduate and Undergraduate students at Indiana University and as an online course. To Register, for University credit please go to:
Please, select the course that is most suitable for your program:
- INFO-I 423 - BIG DATA APPLS & ANALYTICS
- 34954 online undergraduate students
- 34955 Discussion Friday 9:30 - 10:45AM Informatics East (I2) 150
- INFO-I 523 - BIG DATA APPLS & ANALYTICS
- 32863 online graduate students
- 32864 Discussion Friday 9:30 - 10:45AM Informatics East (I2) 150
- 32866 Data Science majors only
- ENGR-E 599 - TOPICS IN INTELLIGENT SYSTEMS ENGINEERING
- 36362 online graduate engineering students
- 36363 Discussion Friday 9:30 - 10:45AM Informatics East (I2) 150
Warning
Please note that all discussion sections for residential students have been merged to:
Please ignore postings in CANVAS and the REGISTRAR about this.
From Registrar (however with updated meeting times and location):
INFO-I 523 BIG DATA APPLS & ANALYTICS (3 CR)
CLSD ***** ARR ARR ARR Von Laszewski G 50 0 2
Above class open to graduates only
Above class taught online
Discussion (DIS)
CLSD 32864 09:30A-10:45A F I2 150 Von Laszewski G 50 0 2
Above class meets with INFO-I 423
INFO-I 523 BIG DATA APPLS & ANALYTICS (3 CR)
I 523 : P - Data Science majors only
32866 RSTR ARR ARR ARR Von Laszewski G 90 72 0
This is a 100% online class taught by IU Bloomington. No
on-campus class meetings are required. A distance education
fee may apply; check your campus bursar website for more
information
Above class for students not in residence on the Bloomington
campus
INFO-I 423 BIG DATA APPLS & ANALYTICS (3 CR)
CLSD ***** RSTR ARR ARR ARR Von Laszewski G 10 0 6
Above class open to undergraduates only
Above class taught online
Discussion (DIS)
CLSD 34955 RSTR 09:30A-10:45A F I2 150 Von Laszewski G 10 0 6
Above class meets with INFO-I 523
ENGR-E 599 TOPICS IN INTELL SYS ENGINEER (3 CR)
VT: BG DATA APPLICATNS & ANLYTCS ISE
***** RSTR ARR ARR ARR Von Laszewski G 25 25 0
Above class open to graduate engineering students only
Above class taught online
Discussion (DIS)
VT: BG DATA APPLICATNS & ANLYTCS ISE
36363 RSTR 01:00P-02:15P F HD TBA Von Laszewski G 25 25 0
Above class meets with INFO-I 523
The classes are published online. Residential students at Indiana University will participate in a discussion taking place at the following time:
For the 100% online students see the office hours.
Office hours will be held every week
These are live sessions that will allow you to interact in group or one-on-one with either an instructor or a TA. Office hours sessions may be recorded. All importan FAQs will be either posted on the Web page or in Piazza ASAP. During these times, we can be reached via zoom with the following information for the call:
Join from PC, Mac, Linux, iOS or Android:
Or Telephone:
- However as we are most likely sharing documents phone participation may not be too useful.
- Dial: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
- Meeting ID: 195 576 919
- International numbers available: https://IU.zoom.us/zoomconference?m=GUZ8CEVGWPB_312js4gnzkGM_QvcVUy3
Please use a headphone with microphone to increase sound quality.
Online discussions and communication will be conducted in piazza at the following URL:
https://piazza.com/iu/fall2016/infoi523/home
Discussions are conducted in clearly marked folders/topics. For example “Discussion d1” will be conducted in the piazza folder “d1”. Students are responsible for posting their content to the right folder. No credit will be given if the post has been filed wrongly.
Please note that the communications to instructors can be seen by all instructors. In matters that are sensitive, please use gvonlasz@indiana.edu. Please, never share your university ID number or your social security number or any other sensitive information with us either in e-mail or in the discussion lists.
All sessions refer to Sections, Discussions and Units
Assigned | Wk | Week | Descriptions |
08/22/2016 | 1 | W1 | Introduction (due in W1)
Overview of Data Science (due in W1)
d1 (due in W1)
SURVEY1 (due in W1)
Paper p1 (due in W2)
|
08/26/2016 | 2 | W2 | |
09/02/2016 | 3 | W3 | |
09/05/2016 | 3 | Holiday | Labor Day
|
09/09/2016 | 4 | W4 | Preparation Documenting Scientific Research
Preparation: Software Projects
Programming prg1: Python (recom. 10/14 (3)
Programming prg1: Python (due 12/02)
|
09/16/2016 | 5 | W5 | |
09/23/2016 | 6 | W6 | |
09/30/2016 | 7 | W7 | |
10/07/2016 | 7 | No Lectures | No Lectures(1)
|
10/08/2016 | 7 | No Lectures | No Lectures(1)
|
10/09/2016 | 7 | No Lectures | No Lectures(1)
|
10/07/2016 | 8 | W8 | |
10/14/2016 | 9 | W9 | |
10/21/2016 | 10 | W10 | |
10/28/2016 | 11 | W11 | |
11/04/2016 | 12 | W12 | |
11/09/2016 11/11/2016 |
13 13 |
W12 W13 |
|
11/20/2016 | 14 | No Lectures | Thanksgiving break Starts(1)
|
11/27/2016 | 14 | No Lectures | Thanksgiving break Ends(1)
|
12/02/2016 | 15 | Due Date | |
12/09/2016 | 15 | Due Date | PRG-GEO: Geolocation students with
term paper
|
12/12/2016 | 16 | Last Class | Last chance overdue homework due
Improve Project (5)
|
12/16/2016 | 17 | Last Day | End Date of Semester
|
The following sections will be replaced:
Projects may be executed on your local computer, a cloud or other resources you may have access to. This may include:
You have a choice to write a term paper or do a software project. This will constitute to 50% of your class grade.
In case you chose a project your maximum grade could be an A+. However, an A+ project must be truly outstanding and include an exceptional project report. Such a project and report will have the potential quality of being able to be published in a conference.
In case you chose a Term Paper your maximum Grade for the entire class will be an A-.
Please note that a project includes also writing a project report/paper. However the length is a bit lower than for a term paper.
In case of a software project, we encourage a group project with up to three members. You can use the discussion forum in the folder project to form project teams or just communicate privately with other class members to formulate a team. The following artifacts are part of the deliverables for a project
A report must be produced while using the format discussed in the Report Format section. The following length is required:
This document is only needed for team projects. A one page PDF document describing who did what. It includes pointers to the git history that documents the statistics that demonstrate not only one student has worked on the project.
In addition the graders will go into gitlab, which provides a history of checkins to verify each team member has used gitlab to checkin their contributions frequently. E.g. if we find that one of the students has not checked in code or documentation at all, it will be questioned.
A report must be produced while using the format discussed in the Report Format section. The following length is required:
In case you chose the term paper, you or your team will pick a topic relevant for the class. You will write a high quality scholarly paper about this topic. The following artifacts are part of the deliverables for a term paper. A report must be produced while using the format discussed in the Report Format section. The following length is required:
All reports will be using the ACM proceedings format. The MSWord template can be found here:
paper-report.docx
A LaTeX version can be found at
however you have to remove the ACM copyright notice in the LaTeX version.
There will be NO EXCEPTION to this format. In case you are in a team, you can use either gitlab while collaboratively developing the LaTeX document or use MicrosoftOne Drive which allows collaborative editing features. All bibliographical entries must be put into a bibliography manager such as jabref, endnote, or Mendeley. This will guarantee that you follow proper citation styles. You can use either ACM or IEEE reference styles. Your final submission will include the bibliography file as a separate document.
Documents that do not follow the ACM format and are not accompanied by references managed with jabref or endnote or are not spell checked will be returned without review.
Report Checklist:
Code repositories are for code, if you have additional libraries that are needed you need to develop a script or use a DevOps framework to install such software. Thus zip files and .class, .o files are not permissible in the project. Each project must be reproducible with a simple script. An example is:
git clone ....
make install
make run
make view
Which would use a simple make file to install, run, and view the results. Naturally you can use ansible or shell scripts. It is not permissible to use GUI based DevOps preinstalled frameworks. Everything must be installable form the command line.
Python or Java experience is expected. The programming load is modest.
In case you elect a programming project we will assume that you are familiar with the programming languages required as part of the project you suggest. We will limit the languages to Python and JavaScript if you like to do interactive visualization. If you do not know the required technologies, we will expect you to learn it outside of class. For example, Python has a reputation for being easy to learn, and those with strong programming background in another general-purpose programming language (like C/C++, Java, Ruby, etc.) can learn it within a few hours to days dependent on experience level. Please consult the instructor if you have concerns about your programming background. In addition, we may encounter math of various kinds, including linear algebra, probability theory, and basic calculus. We expect that you know them on an elementary level. Students with limited math backgrounds may need to do additional reading outside of class.
In case you are interested in further development of cloudmesh for big data strong Python or JavaScript experience is needed.
You will also need a sufficiently modern and powerful computer to do the class work. Naturally if you expect that you want to to the course only on your cell phone or iPad, or your windows 98 computer, this does not work. We recommend that you have a relatively new and updated computer with sufficient memory. In some cases its easier to not use Windows and for example use Linux via virtualbox, so your machine should have sufficient memory to comfortably run it. If you do not have such a machine we are at this time trying to get virtual machines that you can use on our cloud. However, runtime of these VMs is limited to 6 hours and they will be terminated after that. Naturally you can run new VMs. This is done in order to avoid resource “hogging” of idle VMs. In contrast to AWS you are not paying for our VMs so we enforce a rule to encourage proper community spirit while not occupying resources that could be used by others. Certainly you can naturally also use AWS or other clouds where you can run virtual machines, but in that case you need to pay for the usage yourself.
Please remember that this course does not have a required text books and the money you safe on this you can be used to buy a new or upgrade your current computer if needed.
Students will gain broad understanding of Big Data application areas and approaches used. This course is a good preparation for any student likely to be involved with Big Data in their future.
Grading for homework will be done within a week of submission on the due date. For homework that were submitted beyond the due date, the grading will be done within 2-3 weeks after the submission. A 10% grade reduction will be given. Some homework can not be delivered late (which will be clearly marked and 0 points will be given if late; these are mostly related to setting up your account and communicating to us your account names.)
It is the student’s responsibility to upload submissions well ahead of the deadline to avoid last minute problems with network connectivity, browser crashes, cloud issues, etc. It is a very good idea to make early submissions and then upload updates as the deadline approaches; we will grade the last submission received before the deadline.
Note that paper and project will take a considerable amount of time and doing proper time management is a must for this class. Avoid starting your project late. Procrastination does not pay off. Late Projects or term papers will receive a 10% grade reduction.
Details about the assignments can be found in the Section Homework.
We take academic integrity very seriously. You are required to abide by the Indiana University policy on academic integrity, as described in the Code of Student Rights, Responsibilities, and Conduct, as well as the Computer Science Statement on Academic Integrity (http://www.soic.indiana.edu/doc/graduate/graduate-forms/Academic-Integrity-Guideline-FINAL-2015.pdf). It is your responsibility to understand these policies. Briefly summarized, the work you submit for course assignments, projects, quizzes, and exams must be your own or that of your group, if group work is permitted. You may use the ideas of others but you must give proper credit. You may discuss assignments with other students but you must acknowledge them in the reference section according to scholarly citation rules. Please also make sure that you know how to not plagiarize text from other sources while reviewing citation rules.
We will respond to acts of plagiarism and academic misconduct according to university policy. Sanctions typically involve a grade of 0 for the assignment in question and/or a grade of F in the course. In addition, University policy requires us to report the incident to the Dean of Students, who may apply additional sanctions, including expulsion from the university.
Students agree that by taking this course, papers and source code submitted to us may be subject to textual similarity review, for example by Turnitin.com. These submissions may be included as source documents in reference databases for the purpose of detecting plagiarism of such papers or codes.
The course presents lectures in online form given by the instructors listed bellow. Many others have helped making this material available and may not be listed here.
Gregor von Laszewski is an Assistant Director of Cloud Computing in the DSC. He held a position at Argonne National Laboratory from Nov. 1996 – Aug. 2009 where he was last a scientist and a fellow of the Computation Institute at University of Chicago. During the last two years of that appointment he was on sabbatical and held a position as Associate Professor and the Director of a Lab at Rochester Institute of Technology focussing on Cyberinfrastructure. He received a Masters Degree in 1990 from the University of Bonn, Germany, and a Ph.D. in 1996 from Syracuse University in computer science. He was involved in Grid computing since the term was coined. He was the lead of the Java Commodity Grid Kit (http://www.cogkit.org) which provides till today a basis for many Grid related projects including the Globus toolkit. Current research interests are in the areas of Cloud computing. He is leading the effort to develop a simple IaaS client available at as OpenSource project at http://cloudmesh.github.io/client/
His Web page is located at http://gregor.cyberaide.org. To contact him please send mail to laszewski@gmail.com. For class related e-mail please use PIazza for this class.
In his free time he teaches Lego Robotics to high school students. In 2015 the team won the 2nd prize in programming design in Indiana. If you like to volunteer helping in this effort please contact him.
He offers also the opportunity to work with him on interesting independent studies. Current topics include but are not limited to
Please contact me if you are interested in this.
Fox received a Ph.D. in Theoretical Physics from Cambridge University and is now distinguished professor of Informatics and Computing, and Physics at Indiana University where he is director of the Digital Science Center, Chair of Department of Intelligent Systems Engineering and Director of the Data Science program at the School of Informatics and Computing. He previously held positions at Caltech, Syracuse University and Florida State University after being a postdoc at the Institute of Advanced Study at Princeton, Lawrence Berkeley Laboratory and Peterhouse College Cambridge. He has supervised the PhD of 68 students and published around 1200 papers in physics and computer science with an index of 70 and over 26000 citations. He currently works in applying computer science from infrastructure to analytics in Biology, Pathology, Sensor Clouds, Earthquake and Ice-sheet Science, Image processing, Deep Learning, Manufacturing, Network Science and Particle Physics. The infrastructure work is built around Software Defined Systems on Clouds and Clusters. The analytics focuses on scalable parallelism.
He is involved in several projects to enhance the capabilities of Minority Serving Institutions. He has experience in online education and its use in MOOCs for areas like Data and Computational Science. He is a Fellow of APS (Physics) and ACM (Computing).