This course has concluded and all grades have been posted.
Reminder that we have our last test on March 25th, in class. The test is made up of 6 Fill in the Blank, 23 Multiple Choice, 4 True or False, and 4 Short Answer questions.
Lab Assignment 5 is now available.
The Checklist for Test 2 is now complete. The test will cover everything from slide deck 10 to 16 (and associated materials per the course outline) inclusive.
Lab Assignment 4 is now available.
As discussed the due date for the project interim report is now 10 March at 23:59. By popular request, I have also added a few more days onto the deadline for Lab 3. Happy competing!
Test 1 and Assignment 2 have now been marked. You should all have received an email with your score. The histogram for the test scores is here.
I have finalized test 1, and it is comprised as follows: Fill in the Blank (5 points), Multiple Choice (31 points), True or False (4 points), and Short Answer (10 points). The exam is closed book.
Lab Assignment 3 is now also available.
The Test Checklist for test 1 is now available. I may tweak it slightly, depending on how much we cover in tomorrow's class.
As discussed in class, the Proposal delivery due date is now February 11. Please do note that you do need to be working on your project by then, given the due date for your Interim Report of February 27.
Lab Assignment 2 is now available.
I have added a link to first Chapter of the International Handbook of AI Law, to the Resources for this course. You should find it a readable overview of much of what we cover in this course. You may recognize the author. Also note that there is no lab this week.
Due to an intervening personal event, I am not able to physically be in the lab this afternoon. I will briefly cover the lab assignment in class today. If you do have issues or questions with the lab, please reach out to me on Teams or by email. To compensate for my absence, I have extended the due date by one day. My apologies.
Lab Assignment 1 is now available.
There will be no lab on Thursday January 9th.
Please read this page with details applicable to all my courses.
Data mining, or knowledge discovery, is an interdisciplinary area of computer science with the goal of extracting new knowledge and insights from big and complex data sets.
This course begins with an overview of the data mining landscape, an introduction to similarity, and then introduces the knowledge discovery workflow, including data processing, cleaning, integration, and transformation, towards pattern recognition methodologies using various statistical and machine learning techniques. Additional theoretical dimensions will be introduced as we work our way through a number of different kinds of data mining problems.
(These are at a high level, and put together with Bloom's Taxonomy in mind)
Understand the core statistical and logical principles which guide knowledge discovery
Describe and apply various approaches to identifying similarity
Describe and apply the data mining process and demonstrate proficiency in preprocessing and visualizing data
Explain and apply the various data mining algorithms used for classification, clustering, and regression
Use Python to implement data mining algorithms in real world datasets
Evaluate the performance of data mining models
Explore applications of data mining in various domains
Communicate data mining findings and insights effectively
CS or INFO 1103, CS 2704, and (STAT 2593 or STAT 2793).
Lectures will be in-person as a default, but we may mix it up with an online lecture on occasion, or when warranted.
Class meetings will be on Tuesday and Thursday from 11:00 to 12:20, in Oland Hall 210.
A number of labs will be scheduled to take place on certain Thursdays from 14:30 to 15:20 in KC Irving Hall 101.
Please consult the CS 4403 Course Schedule page for details of the various meeting and deliverable deadline dates for the course. This page will be updated as we go along.
For the material to be covered during our meetings, I will primarily refer to various portions from the following free online books:
"MMD": Mining Massive Datasets (Jure Leskovec, Anand Rajaraman, Jeff Ullman), Stanford U.
"FDS": Foundations of Data Science (Avrim Blum, John Hopcroft, and Ravindran Kannan), Cornell U.
"DMML": Data Mining and Machine Learning: Fundamental Concepts and Algorithms (Mohammed J. Zaki, Wagner Meira, Jr.), Cambridge U.
For some select topics I may also provide extracts from other books or link to good online resources. These will be linked to on the Course Schedule page.
For some of the topics, the following additional resources may also help you:
Introduction to Statistical Learning by Gareth James et al. Freely available online.
The Art of Statistics by David Spiegelhalter. A highly readable and contextualized examination of how to do statistics right (and wrong). Has several chapters dedicated to the key statistical aspects of robust machine learning.
The Hundred Page Machine Learning Book by Andriy Burkov. A great algorithm focused reference resource for machine learning. I have a printed copy which I am happy to lend out.
You may also find Chapter 1 of the International Handbook of AI Law useful as a readable overview of what we generally cover in this course.
Please use Python 3.x for the assignments/labs and project.
For most of the in class demos and labs, I will use Jupyter Notebooks, a great free platform for editing and running Python. I recommend it for your use also.
There are a number of ways to implement/access Jupyter Notebooks. I recommend one of the following:
Anaconda Navigator (available for Win, MacOS and Linux). It includes everything you will need to run Jupyter Notebooks on your own machine: https://docs.anaconda.com/navigator/install/
Google Colab (cloud based). You will need a Google account: https://colab.research.google.com/
PyCharm (available for Win, MacOS and Linux). Has abit steeper learning curve, but with rewards.
For notebook storage, you can use GitHub or if you are using Colab, Google Drive.
Lab based assignments (5): 25 points
In class tests (2): 50 points
Term project: 25 points
Note that here is no final exam.
The Term project report will be due the last day of classes.
As noted above, each lab will have an accompanying assignment which will be due usually a day or two after the lab. Details of each lab will be provided a few days before the lab's date.
Please consult the CS 4403 Course Schedule page for the due dates of assignments. Assignments must be submitted by email to the instructor, no later than 11:59 p.m. on the day they are due. Please ensure you use a descriptive subject line for the email (for example: CS 4403 Assignment 1).
All assignments must be done individually.
There will be two in class tests. Details of how the tests are put together will be released as we get a little further along in the course. I will provide a checklist of things to know for each exam.
Details and requirements are set out on the Term Project page.