This second Lab Assignment will give you some practice doing some basic cleaning and exploring of a dataset.
In the early part of the 20th century, a large ship hit an iceberg as a result of which the ship sank in short order, causing the loss of most of its passengers.
You have been asked to help with better understanding why some passengers survived while others did not.
You are provided with a dataset at this link.
Load this file into a pandas data frame in a Jupyter Notebook.
Explore the dataset, examine each of the data columns and identify any columns with potential data entry or other errors. List the various issues you identify in a markup cell.
Somehow you are limited in what your computer can handle and you need to delete one column from the dataset, based primarily on the quality of its data, as well as its likely relevance to your analysis. Identify this column and delete it. Explain your decision in a markup cell.
Now review the other columns and identify any data inconsistencies and then correct them. For each correction, explain your approach and rationale in the code comments or in a header markup cell.
Convert the ticket_class column into a one-hot encoded representation with appropriate names. Once done, delete the ticket_class column.
We need a new column called likely_parent, which needs to be populated with a binary value, using appropriate logic pulled from the other columns. Create a function to perform this logic and use a lambda function to iterate over the rows of your dataset to add the new column. Explain your approach in a markup cell.
Now that you have organized the dataset to a degree, explore it a bit by comparing correlations between the various columns (features). Learn a bit about the very useful pandas corr() function here. You can visualize correlations with the seaborn package. Example here. Try to identify patterns you might be able to model with this dataset (or one with more data). Identify two potential patterns, and explain your rationale.
Download your Jupyter Notebook and send it to me as an attachment to an email with the subject line: CS 4403 - Lab Assignment 2.
Submit this no later than 23:59 on Sunday January 25, 2026