The amount of data generated by businesses, science, Web, and social
networks is growing at a very fast rate. This course will cover the
algorithms and database techniques required to extract useful
information from this flood of data. Data mining, which is the
automatic discovery of interesting patterns and relationships in data,
is a central focus of the course. Topics covered in data mining
include association discovery, clustering, classification, and anomaly
detection. Special emphasis will be given to techniques for data
warehousing where extremely large datasets (e.g., many terabytes) are
processed. The course also covers Web mining. Topics covered include
analysis of Web pages and links (like Google) and analysis of large
social networks (like Facebook). Programming projects are required.
2:50pm-4:05pm on Tuesdays and Thursdays; D106 LSRC
The textbook for this class is:
Data Mining: Concepts and Techniques, 2nd ed.
by Jiawei Han and Micheline Kamber.
Introduction to Data Mining
by Pang-Ning Tan,
Michael Steinbach, and Vipin Kumar
will be used as a reference.
Email: shivnath at cs dot my_univ. Replace my_univ with duke.edu.
Office: D338 LSRC, Phone: 919-660-6579
Office hours: After class on Tuesdays and Thursdays,
or by appointment. It is a good idea to let the
instructor know ahead of time, either in class or via email, that you will be coming
during office hours. The office hours will be held in the instructor's office:
There will be 3-4 written homework assignments.
Late homeworks will not be accepted, unless
there are documented excuses from a physician or dean.
There will be programming assignments (done
individually) and a longer
course project (done either individually or in groups
of at most two). Details will be presented in class.
Both midterm and final exams are open-book and open-notes.
What are the prerequisites for the course?
A good understanding of algorithms, data structures,
and programming. CPS 100 or equivalent will suffice for sure.
If you are unsure, feel free to contact the instructor.
What is the course syllabus?
This course is new, so the syllabus will evolve as the
class progresses. Here
are some related classes at other universities. 50% of
the material that we cover will overlap with some of these courses:
Is the course mainly about
learning theory in depth (e.g., like CPS 130) or is it
more about learning basic concepts and then applying them
in projects (e.g., like CPS 116)?
The latter, similar to
CPS 116 taught by Prof. Jun Yang.
How will the class be graded?
Homeworks, exams, and programming projects partitioned roughly as:
15% for homeworks, 40% for projects, 20% for midterm, and 25%
There will be a semester-long course project that involves
programming. The project may be split into smaller modules
for ease of grading.
What programming languages will I have to know?
You should know one programming language that will enable
you to do the semester-long
course project. Any of Java, C++, a scripting language
like Perl or Python, or Matlab will be enough.
What is the level of effort required?
CPS 116 taught by Prof. Jun Yang.
Under the Duke Honor
Code, you are expected to submit your own work in this course,
including homeworks, projects, and exams. On many occasions when
working on homeworks and projects, it is useful to ask others (the
instructor or other students) for hints or debugging help,
or to talk generally about the written problems or programming
strategies. Such activity is both acceptable and encouraged, but you
must indicate in your submission any assistance you received. Any
assistance received that is not given proper citation will be
considered a violation of the Honor Code. In any event, you are
responsible for understanding and being able to explain on your own
all written and programming solutions that you submit. The course
staff will pursue aggressively all suspected cases of Honor Code
violations, and they will be handled through official University