Every student will complete a data mining term project. The requirements for completing this course component are set out below.
The goal for the term project is to complete a meaningful data mining task for an interesting dataset. What dataset and what task is largely up to you. Try to look for something interesting that is meaningful to you. Try not to repeat what someone else has already done. Doing something previously done a different way is OK.
Completing a data mining project takes time. Please do not leave completion of the various steps to the last minute. This is a recipe for a poor result and thus a poor grade.
Start the various steps early and work on your project deliverables every week.
You will likely find that getting your data, exploring it for mining strategies, and then transforming it into a usable format will take a lot of time. You can start this early, so please do so.
As you work on your project, you may find that you need to adapt your plan. This is fine, if not expected. Please discuss a significant deviation from your plan with me, as soon as you become aware of such a need to adapt.
The steps in completing your project will be as follows:
Identification of a domain to work in.
Search for, exploration of, and selection of a dataset.
Data exploration.
Preparation and submission of a proposal.
Acceptance of the proposal.
Implementation/execution of the proposal.
Data organization, cleaning, feature engineering, algorithm selection and testing, validating, testing, validating, analysing, etc.
Reporting the results.
You must submit a short (one to two page) proposal for your project on the date set out in the course schedule. This proposal should include the following particulars:
What is the dataset you plan to use and where did you/ you will get it from? State a date by which you plan to have your data together. Provide some comfort that your dataset is of sufficient size to enable effective mining.
What is the information/insight you want to mine from the dataset? What is the proposed use of that information?
How is that information or insight useful or interesting? What do you hope to learn from your mining effort?
A preliminary timeline for completing your project.
You must submit an interim report, demonstrating material progress on your project, on the date set out in the course schedule. The deliverables for this report are:
A summary of the data you have collected. What format is it in? How many data instances do you have?
Details of the progress you have made towards your project goal. Please note that a report on data collection is not sufficient for your interim report. You need to be working on the data at this point in time.
What data mining strategies have you explored so far, and with what results?
What are your proposed strategies for other or further approaches?
A summary of what is left to do to complete your project and your assessment of how you will do so within the remaining time.
You will prepare and submit a final report of your project, which should focus on summarizing and describing your efforts made, as well as the results of that effort.
Your report should have the following content:
A catchy and descriptive title for your project. Have some fun with this.
Explain the problem you have explored. Give a bit of background to provide context. Summarize and cite previous or related work on your problem.
Explain the motivation for your project. What question have you sought to answer?
Explain the data you have explored. Where did you get it? What did you do with/to the data to make it usable?
What is the key idea or anticipated insight that drove your project? What were you hoping to discover?
Discuss your efforts and results in sufficient detail. Did you prove something? Did you gain a new insight? Did you find a new pattern? What paths did you explore (even if they led nowhere)?
Summarize what you learned. Be specific. Illustrate your observations and insights with meaningful visualizations. A picture can be worth many words.
Also provide a link to your data and code, preferably on GitHub or Google Drive.
View your report as a means to teach the reader about your work. Make it consumable for someone who may not know much about data mining or your topic.
The grade for your report will be determined as follows:
Proposal - 10 points
Interim report, including successful data collection summary - 25 points
Final report - 65 points, distributed as follows:
Selection and description of the problem - 5 points
"Interestingness" and uniqueness of your project - 5 points
Quality of data selection, processing and preparation - 20 points
Effort made and results obtained - 25 points
The effectiveness of articulation of your observations and results - 10 points
Try to look for something Canada or New Brunswick related first:
UNB libaries' list of datasets (many from Canada and New Brunswick)
UNB Cybersecurity related datasets
Large Stanford University dataset library
The Google public data database
Open data on Amazon Web Services
A small phishing email dataset
The UCI Machine Learning Repository
https://www.projectpro.io/article/100-machine-learning-datasets-curated-for-you/407
Some sites which make suggestions for student projects:
https://favtutor.com/blogs/data-mining-projects
https://www.interviewbit.com/blog/data-mining-projects/
https://www.lovelycoding.org/data-mining-project-ideas-for-computer-science-student/