We are witnessing an explosive growth in the amount of data generated by scientific research, businesses, governments, social networks, etc. This course examines the techniques for distilling useful information from this massive flood of data. Topics include information retrieval, web search, data mining, as well as I/O-efficient, parallel, and distributed realization of large-scale data analysis tasks. Besides classic textbook materials on these topics, the course will also examine recent research developments, such as streaming data, social network analysis, and data-centric parallel programming languages (e.g., Google's Sawzall and Yahoo!'s Pig).
The course is designed to not overlap with CPS 216 in content.
Prerequisites: A good understanding of algorithms, data structures, and programming. No background in databases is assumed.
In addition to lectures, the course will have some seminar-style class meetings. Students will read recent research papers, and give presentations and lead discussions of these papers. There also will be an open-ended course project.
There will not be any exams or homework (except ungraded reading and presentation assignments). The grade will be based on class participation and course project only.
Instructor: Jun Yang
Time and Place
1:30pm-4pm on Tuesdays; North 306
The above meeting schedule began 01/14. On 01/13 the class met 2:50pm-4:05pm. No class meeting on 01/15.
No textbook is required. There will be a reading list drawn from recent research literature. The list will be posted and updated regularly on the course Web site.
Web and Email
Most of the course materials, including the tentative schedule, lecture notes, reading list, etc., will be available through the course Web page (http://www.cs.duke.edu/courses/spring09/cps296.3/).
The email address firstname.lastname@example.org reaches everybody in the class as well as the instructor. Only announcements, questions/answers, and comments of general interests should be sent to this address. Specific questions should be directed to the instructor. Please check your emails regularly, as important announcements and information will be sent via email.
Grading is done on an absolute scale (in other words, there is no curve). Anyone earning 90% or more of the total number of points available will receive a grade in the A range; 80% or more guarantees a grade in the B range; 70% or more guarantees a grade in the C range; 60% or more guarantees a grade in the D range.
Under the Duke Honor Code, you are expected to submit your own work in this course. On many occasions, it is useful to ask others for hints or help, or to search the Web for related resources (e.g., slides from the original authors of a paper you are presenting). Such activities are acceptable, but you must explicitly indicate any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding and being able to explain on your own all materials that you submit and present. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.
|Last updated Tue Jan 13 21:29:49 EST 2009|