DEV Community

Cover image for Titanic Dataset Analysis
net work
net work

Posted on

Titanic Dataset Analysis

Introduction

The Titanic dataset, a well-known dataset in the data science and machine learning communities, consists of information about the passengers aboard the RMS Titanic.

RMS Titanic was a British ocean liner that sank on 15 April 1912 after striking an iceberg on her maiden voyage from Southampton, England to New York City, United States. Of the estimated 2,224 passengers and crew aboard, 1,496 died, making the incident the deadliest sinking of a single ship at the time. Titanic, operated by the White Star Line, carried some of the wealthiest people in the world, as well as hundreds of emigrants from the British Isles, Scandinavia, and elsewhere in Europe who were seeking a new life in the United States and Canada. The disaster drew public attention, spurred major changes in maritime safety regulations, and inspired a lasting legacy in popular culture.

This dataset includes various features such as passenger class, name, gender, age, number of siblings/spouses aboard, number of parents/children aboard, ticket number, fare, cabin, and port of embarkation. The primary objective of this analysis is to extract initial insights regarding the factors influencing passenger survival and identify patterns, trends, or anomalies within the data.

This blog post is an acceptance criteria for HNG Internship. For more information about the HNG Internship program, please visit HNG Internship and HNG Hire.

Observations

After conducting a quick review of the dataset, Here are some initial observations that were identified:

  1. Survival Rate by Passenger Class:

Image description

from the plot above it can be observed that;

  • First Class: Approximately 63% survived.

  • Second Class: Approximately 47% survived.

  • Third Class: Approximately 24% survived.

Socio-economic status, as indicated by passenger class, played a crucial role in survival likelihood.

  1. Survival Rate by Gender:

Image description

from the plot above it can be observed that;

  • Females: Approximately 74% survived.
  • Males: Approximately 19% survived.

Females had a much higher survival rate compared to males.

  1. Survival Rate by Age:

Image description

Younger passengers, particularly children, had higher survival rates, while the survival rate decreased with age.

Conclusion

The initial analysis of the Titanic dataset reveals that passenger class, gender, and age were significant factors influencing survival. Higher-class passengers, females, and younger passengers had better survival rates. These findings provide a foundation for more sophisticated predictive modelling and deeper data analysis.
Here is the link to Colab Notebook where I did my brief analysis

Top comments (0)