Setup a virtual cluster with a NoSQL distributed database. Create ansible scripts for the deployment and management of the database while resuing cloudmesh. Develop new commandline tools with the help of docopts while integrting them into cloudmesh either as pulgin or simply as standalone command with cm-* as commandname.
Given millions of publications how do we identify if an author of paper a with the name Will Smith is the sam as the author of paper 2 with the name Will Smith, or William Smith, or W. Smith. AUthor databases are either provided in bibtex format, or a database that can not be shared outside of this class. You may have to add additional information from IEEE explorer, rsearch gate, ISI, or other online databases.
Identify further issues and discuss solutions to them. Example, an author name changes, the author changes the institution.
Do a comprehensive literature review
Some ideas:
There are also some screenshots available: