Logo

Ndcharles Nweke

Data | Operations | Research | Growth

An engineering graduate with experience in data and operations, hyper research and everything growth. My interest is in real-life applications of data to business and every day problems.

Let's Connect
LinkedIn | Twitter | Blog | View my Resume

Exploratory Data Analysis on Titanic Dataset


Exploratory data analysis (EDA) is an important pillar of data science. It is an important step required to complete every project regardless of type of data you are working with. Exploratory analysis gives us a sense of what additional work should be performed to quantify and extract insights from our data.

In this project I went through the general data analysis process using pandas, numpy, seaborn and matplotlib.

Method

We were provided data from Kaggle which contains information of all the passengers aboard the RMS Titanic, which unfortunately was shipwrecked. The data description is given below:

The first step after importing the data was to preview the details therein. This is by checking the columns. Previewing the dhead of the data. And taking a 5-number statistical summary.

Univariate Analysis

There are 9 categorical variables and 3 numerical variables. A bar plot function was created to plot the categorical variables and a histogram function for the numerical variables.

bar_plot

histogram


Bivariate Analysis

The aim here is to dig deeper into the dataset. To answer questions such as:

bivariate

From the above figure (left to right) we can deduce that:

Missing Values

A quick check on the data for missing values shows that Age is missing a couple of values (177). We have a lot of missing values in the Cabin column (687). And only 2 missing in Embarked column. The exact values missing were be gotten using .isnull().sum().

missing_values

The following approaches were taken to treat the missing values:

What we have done

[View project on GitHub]


[Back to Portfolio]


[View my GitHub profile] | [Read the Blog]