Every student will complete a data mining term project, either individually or as a group. The requirements for completing this course component are set out below.
The goal for the term project is to complete a meaningful data mining task for an interesting dataset. What dataset and what task is largely up to you. Try to look for something interesting that is meaningful to you. Try not to repeat what someone else has already done. Doing something previously done a different way is OK.
Group Work Option: You may complete this project individually or in groups of 2-3 students. Group projects should demonstrate proportionally greater scope and depth than individual projects (see details below).
Completing a data mining project takes time. Please do not leave completion of the various steps to the last minute. This is a recipe for a poor result and thus a poor grade.
Start the various steps early and work on your project deliverables every week.
You will likely find that getting your data, exploring it for knowledge discovery strategies, and then transforming it into a usable format will take a lot of time. You can start this early, so please do so.
As you work on your project, you may find that you need to adapt your plan. This is fine, if not expected. Please discuss a significant deviation from your plan with me, as soon as you become aware of such a need to adapt.
For Groups:
Establish clear roles and responsibilities early.
You must use version control (GitHub) to track all contributions from the project group members. This is not required for individual projects.
Each group member must contribute to multiple aspects of the project: ideation, data work, analysis/modeling, reporting, etc.
All group members are expected to contribute substantively to all phases of the project.
The steps in completing your project will be as follows:
Identification of a domain to work in.
Formation of groups (if applicable) - submit a Project Group Proposal by 12 February 2026
Search for, exploration of, and selection of a dataset.
Data exploration.
Preparation and submission of a proposal.
Acceptance of the proposal.
Implementation/execution of the proposal.
Data organization, cleaning, feature engineering, algorithm selection and testing, validating, testing, validating, analysing, etc.
Reporting the results.
If you choose to work in a group:
Groups may consist of 2-3 students (no more, no less).
All documents, code and analysis must be tracked in a shared GitHub repository with meaningful commits from all members throughout the project timeline. You must provide me access to this repo.
Submit your Project Group Proposal by 12 February 2026. Your proposal must set out:
the names of the group members,
the clear roles and responsibilities for each member,
what communication tool or platform you will use,
a link to your shared Github repo, with read access provided to me (my Github account is jvanderlaan-unb), and
details of how you will handle disagreements.
All group members must agree to work together. You can either provide a signed piece of paper or send individual emails to me to that effect.
Group changes after the proposal deadline will only be permitted in exceptional circumstances.
Groups are expected to meet regularly (at least weekly) and maintain clear communication.
2-person groups: Should explore 2-3 approaches, or work with larger/more complex datasets requiring more extensive preprocessing.
3-person groups: Should explore 3-4 approaches, conduct comparative analysis, or tackle more ambitious problems requiring significant feature engineering.
You must submit a short (one to two page) proposal for your project on the date set out in the course schedule. This proposal should include the following particulars:
What is the dataset you plan to use and where did you/ you will get it from? State a date by which you plan to have your data together. Provide some comfort that your dataset is of sufficient size to enable effective mining.
What is the information/insight you want to mine from the dataset? What is the proposed use of that information?
How is that information or insight useful or interesting? What do you hope to learn from your mining effort?
A preliminary timeline for completing your project.
You must submit an interim report, demonstrating material progress on your project, on the date set out in the course schedule. The deliverables for this report are:
A summary of the data you have collected. What format is it in? How many data instances do you have?
Details of the progress you have made towards your project goal. Please note that a report on data collection is not sufficient for your interim report. You need to be working on the data at this point in time.
What data mining strategies have you explored so far, and with what results?
What are your proposed strategies for other or further approaches?
A summary of what is left to do to complete your project and your assessment of how you will do so within the remaining time.
For groups: Provide detail of each of the group members' contributions so far.
After we have concluded our 5 labs and each student or group has provided an interim report, the remaining lab time will be committed to one on one sessions with every student or groupto discuss the progress of their Project. This will help ensure that you will deliver the best report you can. Details will be provided as we get a bit further along in the course.
Every individual or group will give a 5 to 10 minute presentation about their project, at the end of the term. For group presentations, all members of the group must be present.
Each student or group will prepare and submit a final report of their project, which should focus on summarizing and describing efforts made, as well as the results of that effort.
The report must have the following content:
A catchy and descriptive title for your project. Have some fun with this.
For groups: Clearly identify all group members on the title page.
Explain the problem you have explored. Give a bit of background to provide context. Summarize and cite previous or related work on your problem.
Explain the motivation for your project. What question have you sought to answer?
Explain the data you have explored. Where did you get it? What did you do with/to the data to make it usable?
What is the key idea or anticipated insight that drove your project? What were you hoping to discover?
Discuss your efforts and results in sufficient detail. Did you prove something? Did you gain a new insight? Did you find a new pattern? What paths did you explore (even if they led nowhere)?
Summarize what you learned. Be specific. Illustrate your observations and insights with meaningful visualizations. A picture can be worth many words.
For groups only: Include an appendix with a contribution statement: each member should write 1-2 paragraphs describing their specific contributions to the project (data collection, preprocessing, algorithm implementation, analysis, writing, etc.)
Also provide a link to your data and code.
View your report as a means to teach the reader about your work. Make it consumable for someone who may not know much about data mining or your topic.
The grade for your report will be determined as follows:
Group formation submission (groups only) - 0 points (required, but not graded)
Proposal - 10 points
Interim report, including successful data collection summary - 15 points
Presentation - 10 points
Final report - 65 points, distributed as follows:
Description of the problem and summary of background research - 5 points
"Interestingness" and uniqueness of your project - 5 points
Quality of data selection, processing and preparation - 20 points
Effort made and results obtained - 25 points
Effectiveness of articulation of your observations and results - 10 points
For groups: All members will receive the same grade unless peer evaluations and contribution statements reveal significant disparities in effort, in which case individual grades may be adjusted up or down by up to 15%. I reserve the right to meet individually with any group member to discuss their understanding of the project. If a student cannot adequately explain the work bearing their name, their individual grade may be reduced regardless of peer evaluations.
Academic Integrity Note: All work submitted must be the original work of the group/individual submitting it. Proper citation of sources is required. Use of AI tools (ChatGPT, Copilot, etc.) should be disclosed and used appropriately as aids, not replacements for your own analysis and thinking.
Try to look for something Canada or New Brunswick related first:
UNB libaries' list of datasets (many from Canada and New Brunswick)
UNB Cybersecurity related datasets
Large Stanford University dataset library
The Google public data database
Open data on Amazon Web Services
A small phishing email dataset
The UCI Machine Learning Repository
https://www.projectpro.io/article/100-machine-learning-datasets-curated-for-you/407
Some sites which make suggestions for student projects:
https://favtutor.com/blogs/data-mining-projects
https://www.interviewbit.com/blog/data-mining-projects/
https://www.lovelycoding.org/data-mining-project-ideas-for-computer-science-student/
There are many others. Be creative!