Exploratory Data Analysis (EDA)

Beytullah Soylev
3 min readAug 7, 2023

--

Exploratory data analysis (EDA) is a process of summarizing and visualizing data to understand its main characteristics. EDA can be used to identify patterns and relationships in the data, and to formulate hypotheses about the data. EDA is an iterative process, and you may need to go back and forth between the steps as you learn more about your data.

Here are some of the benefits of using EDA:

It can help you to understand your data better.
It can help you to identify patterns and relationships in the data.
It can help you to identify errors in the data.
It can help you to choose the right statistical methods for further analysis.
It helps us better understand the information contained in the dataset and guides us in making informed decisions and formulating strategies to solve real business problems.

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"

data = pd.read_csv("Instagramdata.csv", encoding='latin-1')

print(data.head())
print(data.columns)
print(data.info())
print(data.describe())
print(data.isnull().sum())
fig = px.histogram(data,
x="Impressions",
nbins=10,
title="Distrubition of Impressions")
fig.show()

fig = px.line(data, x=data.index,
y="Impressions",
title="Impressions Over Time")
fig.show()

fig = go.Figure()

fig.add_trace(go.Scatter(x=data.index, y=data["Likes"], name="Likes"))
fig.add_trace(go.Scatter(x=data.index, y=data["Saves"], name="Saves"))
fig.add_trace(go.Scatter(x=data.index, y=data["Follows"], name="Follows"))

fig.update_layout(title="Metrics Over Time",
xaxis_title="Date",
yaxis_title="Count")
fig.show()
Distrubition of Impressions & Impressions Over Time
Metrics Over Time
corr_matrix = data.corr()

fig = go.Figure(data=go.Heatmap(
x=corr_matrix.columns,
y=corr_matrix.index,
z=corr_matrix.values,
colorscale='RdBu',
zmin=-1,
zmax=1))

fig.update_layout(title='Correlation Matrix',
xaxis_title='Features',
yaxis_title='Features')

fig.show()
Correlation Matrix
hashtag_likes = {}
hashtag_impressions = {}

for index, row in data.iterrows():
hashtags = str(row['Hashtags']).split()
for hashtag in hashtags:
hashtag = hashtag.strip()
if hashtag not in hashtag_likes:
hashtag_likes[hashtag] = 0
hashtag_impressions[hashtag] = 0
hashtag_likes[hashtag] += row['Likes']
hashtag_impressions[hashtag] += row['Impressions']

likes_distribution = pd.DataFrame(list(hashtag_likes.items()), columns=['Hashtag', 'Likes'])

impressions_distribution = pd.DataFrame(list(hashtag_impressions.items()), columns=['Hashtag', 'Impressions'])

fig_likes = px.bar(likes_distribution, x='Hashtag', y='Likes',
title='Likes Distribution for Each Hashtag')

fig_impressions = px.bar(impressions_distribution, x='Hashtag',
y='Impressions',
title='Impressions Distribution for Each Hashtag')

fig_likes.show()
fig_impressions.show()
Likes Distribution for Each Hashtag & Impression Distribution Hashtag

Click For Project

Click For Source

“AI is the new electricity. Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years.”

Andrew Ng Founder of DeepLearning.AI / Founder & CEO of Landing AI

Have a nice reading! :D

--

--

No responses yet