<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[100 days of Data]]></title><description><![CDATA[An online program to up skill students and professionals in all the fields related to Data.]]></description><link>https://www.100daysofdata.com</link><generator>RSS for Node</generator><lastBuildDate>Wed, 20 May 2026 20:19:14 GMT</lastBuildDate><atom:link href="https://www.100daysofdata.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[10 data analytics dashboards with Matplotlib]]></title><description><![CDATA[If your data doesn't provide your business actionable insights, it's useless!
Anyone can show numbers and statistics on a graph. But what significance do these number have for the business? What makes those numbers interesting, is a relevant story be...]]></description><link>https://www.100daysofdata.com/data-analytics-with-matplotlib</link><guid isPermaLink="true">https://www.100daysofdata.com/data-analytics-with-matplotlib</guid><category><![CDATA[Python]]></category><category><![CDATA[Programming Blogs]]></category><category><![CDATA[Beginner Developers]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[#data visualisation]]></category><dc:creator><![CDATA[Lenin Mishra]]></dc:creator><pubDate>Mon, 11 Jul 2022 05:43:35 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1657517617587/YYm1gPZ2r.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If your data doesn't provide your business actionable insights, <strong>it's useless!</strong></p>
<p>Anyone can show numbers and statistics on a graph. But what significance do these number have for the business? What makes those numbers interesting, <strong>is a relevant story behind it.</strong> Every aspiring Data Analyst or a Business Intelligence Developer needs to learn <strong>the art of story telling</strong>. </p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://twitter.com/pylenin/status/1546367353679536134">https://twitter.com/pylenin/status/1546367353679536134</a></div>
<p>In this article, we will focus on 10 commonly used visualizations or <strong>plots</strong> using Matplotlib in Python. These plots are not mere graphs! Each plot, tells a story about a real-life scenario and corresponds to common dashboards used by Data Analysts and Management team in various companies to take actionable insights. </p>
<h1 id="heading-10-data-analytics-dashboard-examples-with-matplotlib">10 Data Analytics dashboard examples with Matplotlib</h1>
<p>The 10 plots and problem scenarios discussed in this article are:-</p>
<ol>
<li><a class="post-section-overview" href="#heading-line-chart">Line Chart</a> - How has the news paradigm in India shifted over the last 10 years?</li>
<li><a class="post-section-overview" href="#heading-stacked-area-chart">Stacked Area Chart</a> - What is the total sales generated by an MNC across all its market during last year?</li>
<li><a class="post-section-overview" href="#heading-bar-chart">Bar Chart</a> - What is the YoY(Year-on-Year) monthly sales comparison for a company?</li>
<li><a class="post-section-overview" href="#heading-pie-chart">Pie Chart</a> - What was the approval % of a bill introduced in the winter session of Parliament?</li>
<li><a class="post-section-overview" href="#heading-scatter-plot">Scatter Plot </a> - How does the rent of a house vary with the house size?</li>
<li><a class="post-section-overview" href="#heading-bubble-chart">Bubble chart</a> - How deadly(fatality rate) and widespread(number of fatalities) is a particular disease?</li>
<li><a class="post-section-overview" href="#heading-candlestick">Candlestick</a> - How has Nifty 50 performed on the National Stock Exchange in the month of October?</li>
<li><a class="post-section-overview" href="#heading-timeseries">Timeseries</a> - What is the distribution of "Close" value of Nifty 50 for the last 1 year?</li>
<li><a class="post-section-overview" href="#heading-histograms">Histograms</a> - What is the gender-wise distribution of students' height in a school?</li>
<li><a class="post-section-overview" href="#heading-heatmap">Heatmap</a> - What is the Monthly Recurring Revenue(MRR) retention of a company?</li>
</ol>
<blockquote>
<p>Attention!
This article is only intended to show readers different concepts and tricks to plot useful graphs in Python using the Matplotlib library. The data shown in the following graphs is unreal and is not intended to depict the truth on the ground.</p>
</blockquote>
<h1 id="heading-line-chart">Line chart</h1>
<p>A line chart displays information as a series of data points connected by a straight line. It allows you to track changes in the value of an entity over time.</p>
<p>Line charts are useful to show <strong>trends</strong> of how a certain thing changes over a period. The below example uses line charts to show <strong>how the primary source of news has changed</strong> 
among Indians over the last decade.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636824339765/iQjZD_igk.png" alt="line_chart.png" /></p>
<p><strong>Important points</strong></p>
<ol>
<li>No y axis labels are shown in the graph - Use the <code>set_visible()</code> function.</li>
<li>The first and last data point for every news medium is shown - Use <code>plt.text()</code> function.</li>
</ol>
<p>Check out the code below to build this line chart.</p>
<pre><code class="lang-python3">import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(5, 4), 
                   constrained_layout=True)

# Sets y-axis visibility to False              
ax.yaxis.set_visible(False)

xData = [
  [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021],
  [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021],
  [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021],
  [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]
]

yData = [
  [74, 82, 80, 74, 73, 72, 74, 70, 70, 66, 66],
  [45, 42, 50, 46, 36, 36, 34, 35, 32, 31, 31],
  [13, 14, 20, 24, 20, 24, 24, 40, 35, 41, 43],
  [18, 21, 18, 21, 16, 14, 13, 18, 17, 16, 19]
]

labels = ['Television', 'Newspaper', 'Internet', 'Radio']

colors = ['#434343', '#737373', '#3182bd', '#bdbdbd']

font_style = dict(size=12, color='black')

for data in zip(xData, yData, labels, colors):
    ax.plot(data[0], 
            data[1], 
            label=data[2], 
            color=data[3], 
            linewidth=3)

    # Annotate first and last data point        
    ax.text(data[0][0] - 0.3, 
            data[1][0], 
            str(data[1][0])+'%', 
            **font_style)

    ax.text(data[0][-1], 
            data[1][-1], 
            str(data[1][-1])+'%', 
            **font_style)

plt.legend(fontsize='x-large')
plt.title('Source of news in India for last 10 years', 
          fontsize='x-large')
plt.ylabel('% of respondents', fontsize='x-large')
plt.show()
</code></pre>
<h1 id="heading-stacked-area-chart">Stacked Area Chart</h1>
<p>A stacked area chart displays the change in KPI for different of a dataset. Each group is displayed on top of each other, making it easy to deduce not only the total value, but also the contribution of each group. </p>
<p>For example, an important analysis could be measuring and comparing a company's sales across all its marketing countries. In such scenarios, having a <strong>grid layout</strong> could be useful in figuring out the approximate sales numbers for each country.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636888152384/ZAnDUC9Y0.png" alt="Screenshot from 2021-11-14 16-38-45.png" /></p>
<p>To reproduce the above graph or create a similar graph, use the code below.</p>
<pre><code class="lang-python3">import numpy as np
import matplotlib.pyplot as plt

# Create data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
india_sales = [1, 4, 6, 8, 9, 7, 8, 5, 9, 11, 12, 13]
uk_sales = [2, 2, 7, 10, 12, 4, 8, 8, 10, 12, 14, 10]
usa_sales = [2, 8, 5, 10, 6, 10, 12, 7, 9, 8, 10, 13]

COLORS = ["#74A9CF", "#2B8CBE", "#045A8D"]

# Basic stacked area chart.
plt.stackplot(months, india_sales, uk_sales, usa_sales, colors=COLORS, labels=['India','UK','USA'])
plt.legend(loc='upper left', fontsize='x-large')
plt.grid(True)

plt.xlabel('Month 2020')
plt.ylabel('Sales(Million $)')
plt.title('Sales of an MNC in 3 countries')

plt.show()
</code></pre>
<h1 id="heading-bar-chart">Bar chart</h1>
<p>A bar chart shows the relationship between a numerical and a categorical variable. The categorical variable is represented as a bar. The size of the bar represents its numerical value. </p>
<p>The below example uses bar charts to compare <strong>Year-on-Year</strong> monthly sales of a company.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636855122935/eewEYnOGm.png" alt="Screenshot from 2021-11-14 07-28-00.png" /></p>
<p>You can recreate the above graph using the code below. Pay attention to how the <strong>width of the bars</strong> are fixed in the code and the x-axis labels are <strong>aligned to the centre</strong>.</p>
<pre><code class="lang-python3">import matplotlib.pyplot as plt
import numpy as np

months = ['Jan', 'Feb', 'Mar', 'Apr', 
          'May', 'Jun', 'Jul', 'Aug', 
          'Sep', 'Oct', 'Nov', 'Dec']

sales_2020 = [19, 14, 22, 14, 16, 19, 15, 14, 10, 12, 12, 16]
sales_2021 = [20, 14, 25, 16, 18, 22, 19, 15, 12, 16, 14, 17]

x = np.arange(len(months))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots(figsize=(5, 4), constrained_layout=True)

font_style = dict(size=12, color='black')

rects1 = ax.bar(x - width/2, sales_2021, width, 
          label='Sales 2021', color='#3182BD')
rects2 = ax.bar(x + width/2, sales_2020, width, 
          label='Sales 2020', color='#CCCCCC')

ax.set_xticks(x)
ax.set_xticklabels(months, fontsize='x-large')
fig.tight_layout()
plt.legend(fontsize='x-large')
plt.title('YOY monthly sales comparison', fontsize='x-large')
plt.ylabel('Total Sales(Million $)', fontsize='x-large')
plt.show()
</code></pre>
<h1 id="heading-pie-chart">Pie chart</h1>
<p>A Pie Chart is a circle divided into categorical variables, each representing their value as a numeric percentage of the whole. Although they are <strong>not the best plotting choice</strong> if you want to know the actual percentages of each entity, especially when you are plotting a lot of entities, they do provide a general understanding of the each entity's contribution to the whole.</p>
<p>The below graph provides an analysis of the <strong>approval % of a new bill introduced in the winter session of parliament</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636856242914/zpO0kNYoh.png" alt="pie_chart.png" /></p>
<p>Notice how the approval share has been <strong>exploded out</strong> for better clarity. To reproduce the above pie chart or create a similar plot, use the code below.</p>
<pre><code class="lang-python3">import matplotlib.pyplot as plt
import numpy as np

# pie chart parameters
ratios = [.27, .56, .17]
labels = ['Approve', 'Disapprove', 'Undecided']
explode = [0.1, 0, 0]
# rotate so that first wedge is split by the x-axis
angle = -180 * ratios[0]
plt.pie(ratios, autopct='%1.1f%%', startangle=angle,
        labels=labels, explode=explode)
plt.title("Bill Approval Stats for Parliament Winter session")
plt.show()
</code></pre>
<h1 id="heading-scatter-plot">Scatter Plot</h1>
<p>A scatter plot shows the relationship between 2 numerical variables. You can use any kind of marker to create a scatter plot.</p>
<p>The below graph shows the <strong>relationship between size of a house and it's rent</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636880998667/JWYwQw-sA.png" alt="scatter_plot.png" /></p>
<p>You can also see a straight fit line (through linear regression). You can use <code>numpy.polyfit()</code> function to draw the regression line. Use the code below to recreate the above plot.</p>
<pre><code class="lang-python3">import matplotlib.pyplot as plt
import numpy as np

#random number generator
seed = np.random.default_rng(1234)

x = seed.uniform(0, 10, size=100)
y = x + seed.normal(size=100)

# Initialize layout
fig, ax = plt.subplots(figsize = (9, 9))

# Add scatterplot
# Use the marker parameter to choose
# an appropriate marker
ax.scatter(x, y, s=60, alpha=0.7, edgecolors="k")

# Fit linear regression via least squares with numpy.polyfit
# It returns a slope (b) and intercept (a)
# deg=1 means linear fit
b, a = np.polyfit(x, y, deg=1)

# Create sequence of 100 numbers from 0 to 100 
xseq = np.linspace(0, 10, num=100)

# Plot regression line
ax.plot(xseq, a + b * xseq, color="k", lw=2.5)
plt.title('Rent variation with number of rooms', fontsize='x-large')
plt.ylabel('Rent(k)', fontsize='x-large')
plt.xlabel('Number of rooms', fontsize='x-large')

plt.show()
</code></pre>
<h1 id="heading-bubble-chart">Bubble chart</h1>
<p>A bubble chart is kind of a scatter plot. Based on a third numerical variable, the size of each bubble is determined. This shows the weight of that particular variable in the dataset.</p>
<p>The plot below compares the <strong>fatality rate(deadliness) vs the total number of fatalities</strong> for different diseases.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636868839884/YhPwno_V7.png" alt="Screenshot from 2021-11-14 11-16-21.png" /></p>
<blockquote>
<p>Above data is borrowed from <a target="_blank" href="https://www.informationisbeautiful.net/visualizations/the-microbescope-infectious-diseases-in-context/">Microbescope</a>. Data Chef is not responsible for the authenticity of this data.</p>
</blockquote>
<p>To recreate the above plot, use the code below.</p>
<pre><code class="lang-python3">import matplotlib.pyplot as plt
import numpy as np

# Diseases with their case fatality rates
# (Disease Name, Fatality Rate, Total Fatalities)

bacterial_diseases = [ ('Diphtheria', 7.5, 2600), 
                       ('Meningitis', 45, 127000), 
                       ('Syphilis', 33, 79000),
                       ('MRSA', 20, 11000) ]

viral_diseases = [ ('Ebola', 50, 4555), 
                   ('Bird Flu', 58, 20), 
                   ('Dengue Fever', 22, 47000), 
                   ('Hepatitis A', 1, 5200) ]

parasite_diseases = [ ('Sleeping Sickness', 40, 2300), 
                      ('Malaria', 1.5, 150000) ]

fig, ax = plt.subplots(figsize=(10,8))

ax.scatter(
        [x[1] for x in bacterial_diseases], 
        [x[2] for x in bacterial_diseases], 
        label='Bacteria',
        s=[x[1]*500 for x in bacterial_diseases], 
        color='#7570B3', alpha=0.7
    )

ax.scatter(
        [x[1] for x in viral_diseases], 
        [x[2] for x in viral_diseases], 
        label='Virus',
        s=[x[1]*500 for x in viral_diseases], 
        color='#1B9E77', alpha=0.7
    )

ax.scatter(
        [x[1] for x in parasite_diseases], 
        [x[2] for x in parasite_diseases],
        s=[x[1]*500 for x in parasite_diseases],
        label='Bacteria', color='#D95F02', alpha=0.7 
    )

all_diseases = bacterial_diseases + viral_diseases + parasite_diseases

for data in all_diseases:
    disease, x, y = data
    plt.annotate(disease, (x, y))

ax.ticklabel_format(useOffset=False, style='plain', axis='y')

lgnd = plt.legend(loc="right", fontsize=10)

#change the marker size manually for both lines
lgnd.legendHandles[0]._sizes = [30]
lgnd.legendHandles[1]._sizes = [30]
lgnd.legendHandles[2]._sizes = [30]
plt.title('Fatalities vs Fatality Rate for diseases', fontsize='x-large')
plt.ylabel('Total Fatalities', fontsize='x-large')
plt.xlabel('Fatality Rate', fontsize='x-large')

plt.show()
</code></pre>
<h1 id="heading-candlestick">Candlestick</h1>
<p>A candlestick is similar to a box plot. A candlestick shows the market's open, high, low, and close price for the day. </p>
<p>The below plot shows the <strong>daily statistics of NIFTY50 index</strong> for the month of October.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636883749946/gDpVacU22.png" alt="candlestick.png" /></p>
<p>The data has been downloaded from <a target="_blank" href="https://www1.nseindia.com/products/content/equities/indices/historical_index_data.htm">NSE website</a> and stored as <code>october_2021_nse.csv</code>.</p>
<p>To recreate the above graph or a similar graph, use the code below.</p>
<pre><code class="lang-python3">import matplotlib.pyplot as plt
from mplfinance.original_flavor import candlestick_ohlc
import pandas as pd
import matplotlib.dates as mpl_dates

plt.style.use('ggplot')

# Extracting Data for plotting
data = pd.read_csv('october_2021_nse.csv')
ohlc = data.loc[:, ['Date', 'Open', 'High', 'Low', 'Close']]
ohlc['Date'] = pd.to_datetime(ohlc['Date'])
ohlc['Date'] = ohlc['Date'].apply(mpl_dates.date2num)
ohlc = ohlc.astype(float)

# Creating Subplots
fig, ax = plt.subplots()

candlestick_ohlc(ax, ohlc.values, width=0.6, colorup='green', colordown='red', alpha=0.8)

# Setting labels &amp; titles
ax.set_xlabel('Date')
ax.set_ylabel('Price')
fig.suptitle('October 2021 Candlestick Chart of NIFTY50')

# Formatting Date
date_format = mpl_dates.DateFormatter('%d-%m-%Y')
ax.xaxis.set_major_formatter(date_format)
fig.autofmt_xdate()

fig.tight_layout()

plt.show()
</code></pre>
<h1 id="heading-timeseries">Timeseries</h1>
<p>Timeseries charts represent the evolution of a numeric value over time. They are used in the field of statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting etc.</p>
<p>The below plot shows the <strong>Close price of NIFTY50 index</strong> over the last one year period.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636884570318/kIH_8nY1J.png" alt="Screenshot from 2021-11-14 15-38-41.png" /></p>
<p>To create the above timeseries plot, use the code below.</p>
<pre><code class="lang-python3">import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mpl_dates

plt.style.use('ggplot')

# Extracting Data for plotting
data = pd.read_csv('last_year_nse.csv')
data["Date"] = pd.to_datetime(data["Date"])

date = data["Date"]
value = data["Close"]

fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(date, value)
plt.title("NSE Close price for the past 1 year", fontsize="x-large")
plt.xlabel("Date", fontsize="x-large")
plt.ylabel("Close Price on NSE(INR)", fontsize="x-large")
plt.show()
</code></pre>
<h1 id="heading-histograms">Histograms</h1>
<p>Histogram shows the frequency distribution of any given variable. For example - distribution of height of students in a school. The values are split into bins. Each bin is represented as a bar. </p>
<p>The below histogram plot goes further to compare the <strong>height distribution between male and female students</strong> of a school. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636886406520/v3FcgAMaf.png" alt="Screenshot from 2021-11-14 16-09-37.png" /></p>
<p>To recreate the above plot or to create a similar one, use the code below.</p>
<pre><code class="lang-python3">import numpy as np
import matplotlib.pyplot as plt

total_samples = 300

#female dataset
muf, sigmaf = 155, 4
xf = np.random.normal(muf, sigmaf, total_samples).astype(int)

# the histogram of the data
n, bins, patches = plt.hist(xf, 20, facecolor='#ff6466', alpha=0.75, label='Female')

#male dataset
mum, sigmam = 168, 6
xm = np.random.normal(mum, sigmam, total_samples).astype(int)

# the histogram of the data
n, bins, patches = plt.hist(xm, 20, facecolor='#64c866', alpha=0.75, label='Male')


plt.xlabel('Height(cm)')
plt.ylabel('Number of students')
plt.title('Distribution of student heights of a school')
plt.grid(True)
plt.legend()
plt.show()
</code></pre>
<h1 id="heading-heatmap">Heatmap</h1>
<p>A heatmap is a graphical representation of data where each value of a matrix is represented as a color. It shows magnitude of a KPI as color in two dimensions.</p>
<p>For example, the below heatmap shows the <strong>cohort analysis of Stripe Monthly Recurring Revenue(MRR)</strong> for a company. Each square represents the revenue retained in successive months from the starting month mentioned on the y axis.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1636891410691/JBTaynWk5.png" alt="Screenshot from 2021-11-14 17-32-35.png" /></p>
<p>To recreate a similar plot, use the code below. </p>
<pre><code class="lang-python3">import numpy as np
import matplotlib
import matplotlib.pyplot as plt

dates = ["Jan 2021", "Feb 2021", "Mar 2021", "Apr 2021",
              "May 2021", "June 2021", "July 2021"]

month_num = ["Month 1", "Month 2", "Month 3", "Month 4", "Month 5", "Month 6", "Month 7"]

stripe_cohort = np.array([[0.8, 0.76, 0.7, 0.5, 0.45, 0.43, 0.4],
                    [0.95, 0.9, 0.87, 0.83, 0.79, 0.78, 0.0],
                    [1, 0.93, 0.92, 0.9, 0.83, 0.0, 0.0],
                    [0.82, 0.77, 0.76, 0.7, 0.0, 0.0 , 0.0],
                    [0.93, 0.9, 0.87, 0.0, 0.0, 0.0, 0.0],
                    [0.9, 0.88, 0.0, 0.0, 0.0, 0.0, 0.0],
                    [0.95, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]])


fig, ax = plt.subplots()
im = ax.imshow(stripe_cohort, cmap="YlGnBu")

#Show all ticks
ax.set_xticks(np.arange(len(month_num)))
ax.set_yticks(np.arange(len(dates)))
#Show all labels
ax.set_xticklabels(month_num)
ax.set_yticklabels(dates)

# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")

# Loop over data dimensions and create text annotations.
for i in range(len(dates)):
    for j in range(len(month_num)):
        text = ax.text(j, i, stripe_cohort[i, j],
                       ha="center", va="center", color="w")

ax.set_title("Cohort Analysis of a company's revenue")
fig.tight_layout()
plt.show()
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>I hope this article has helped improve your plotting skills in Matplotlib. If you want to start out as Business Intelligence Specialist or a Data Analyst, having such visualization skills will help you a lot in progressing your career.</p>
<p><strong>Remember</strong>  - Numbers are boring. What makes them interesting is the story behind.</p>
]]></content:encoded></item><item><title><![CDATA[How to invoke a Lambda function using an S3 event notification trigger?]]></title><description><![CDATA[In this article, we will learn to invoke a lambda function using an AWS Simple Storage Service(S3) event notification trigger.
To follow along this article, you need to have an  AWS account and some knowledge about the Python programming language.
Yo...]]></description><link>https://www.100daysofdata.com/trigger-lambda-from-s3</link><guid isPermaLink="true">https://www.100daysofdata.com/trigger-lambda-from-s3</guid><category><![CDATA[Beginner Developers]]></category><category><![CDATA[AWS]]></category><category><![CDATA[Programming Blogs]]></category><category><![CDATA[Tutorial]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Lenin Mishra]]></dc:creator><pubDate>Thu, 23 Jun 2022 06:46:04 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1655966604816/XqA1u7o3B.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this article, we will learn to invoke a lambda function using an AWS Simple Storage Service(S3) event notification trigger.</p>
<p>To follow along this article, you need to have an  <a target="_blank" href="https://aws.amazon.com/">AWS account</a> and some knowledge about the <a target="_blank" href="https://www.youtube.com/pylenin">Python</a> programming language.</p>
<p>You should also have a basic understanding of AWS Lambda and how it works. Check out this <a target="_blank" href="https://www.100daysofdata.com/aws-lambda-for-beginners">visual guide</a> on 100 days of data.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.100daysofdata.com/aws-lambda-for-beginners">https://www.100daysofdata.com/aws-lambda-for-beginners</a></div>
<p>You don't have to <strong>reinvent the wheel</strong>, unless it is for an educational purpose! AWS Lambda comes with <code>s3-get-object-python</code> blueprint lambda function that already has the sample code and function configuration presets for a certain runtime. </p>
<p><strong>Note -</strong> This blueprint's permission are set to allow you to get objects from the S3 bucket. It <strong>doesn't let you write to the S3 bucket</strong>.</p>
<h2 id="heading-step-1-create-an-s3-bucket">Step 1 - Create an S3 bucket</h2>
<ol>
<li>Open the Amazon S3 console and choose <code>Create bucket</code>.</li>
<li>Enter a unique and a <strong>descriptive</strong> name for your bucket. For example - <code>nse50</code>, a bucket to store the top 50 performing stocks from National Stock Exchange.</li>
<li>Next, you have to choose an AWS region. <strong>Note</strong> - Your Lambda function should be created in the same Region.</li>
<li>Choose <code>Create bucket</code>.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1637569580023/dnE4nU38q.png" alt="create-bucket-s3.png" /></p>
<p>After creating the bucket, Amazon S3 opens the Buckets page, which displays a list of all buckets in your account in the current Region.</p>
<p>To upload a test object using the Amazon S3 console</p>
<ol>
<li>On the Buckets page of the Amazon S3 console, choose the name of the bucket that you created.</li>
<li>On the Objects tab, choose <code>Upload</code>.</li>
<li>Drag a test file from your local machine to the Upload page.</li>
<li>Choose <code>Upload</code>.</li>
</ol>
<h2 id="heading-step-2-create-a-lambda-function">Step 2 - Create a Lambda function</h2>
<p>To create a Lambda function from a blueprint in the console</p>
<ol>
<li>Go to the Lambda Functions page and Choose <code>Create function</code>.</li>
<li>On the Create function page, choose <code>Use a blueprint</code>.</li>
<li>Choose <code>s3-get-object-python</code> for a Python function or <code>s3-get-object</code> for a Node.js function. Choose Configure.</li>
<li>Enter a function name of your choice. For example - <code>s3_audit_function</code>.</li>
<li>For Execution role, choose <code>Create a new role from AWS policy templates</code> and enter a role name of your choice. For example - <code>s3_audit_role</code>.</li>
<li><p>Under S3 trigger, choose the S3 bucket that you created previously.</p>
<p>When you configure an S3 trigger using the Lambda console, the console modifies your function's resource-based policy to allow Amazon S3 to invoke the function.</p>
</li>
<li><p>Choose Create function.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1637554235904/SVBjmA6AX.png" alt="Step 2.png" />
<strong>Open the above image in new tab for better viewing experience.</strong></p>
<p>Pay attention to <strong>Step 5</strong> of the above image. This shows that this lambda function has only <strong>read-only</strong> permissions. So you can read from S3, but <strong>you can not write to it</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1637565662106/154aM4uxU.png" alt="step-5.png" /></p>
<h2 id="heading-step-3-testing-the-function">Step 3 - Testing the function</h2>
<p>As mentioned earlier, this blueprint comes with its own sample code. </p>
<p>Before putting your code into production, you need to test your code. AWS Lambda lets you configure different types of test event from different services to help you in testing your code. </p>
<ol>
<li><p>On the Code tab, under Code source, choose the drop down arrow next to Test, and then choose <code>Configure test events</code> from the dropdown list.</p>
</li>
<li><p>In the Configure test event window, do the following:</p>
<ul>
<li>Choose <code>Create new test event</code>.</li>
<li>From Event template, choose <code>Amazon S3 Put (s3-put)</code>. This is similar to the event triggered in S3 when you upload a file.</li>
<li>For Event name, enter a name for the test event.</li>
<li>In the test event JSON, replace the S3 bucket name and object key with your bucket name and test file name. Your test event should look similar to the following: </li>
</ul>
<pre><code class="lang-python3">{
"Records": [
 {
   "eventVersion": "2.0",
   "eventSource": "aws:s3",
   "awsRegion": "us-west-2",
   "eventTime": "1970-01-01T00:00:00.000Z",
   "eventName": "ObjectCreated:Put",
   "userIdentity": {
     "principalId": "EXAMPLE"
   },
   "requestParameters": {
     "sourceIPAddress": "127.0.0.1"
   },
   "responseElements": {
     "x-amz-request-id": "EXAMPLE123456789",
     "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH"
   },
   "s3": {
     "s3SchemaVersion": "1.0",
     "configurationId": "testConfigRule",
     "bucket": {
       "name": "nse50", # Replace your bucket name
       "ownerIdentity": {
         "principalId": "EXAMPLE"
       },
       "arn": "arn:aws:s3:::example-bucket"
     },
     "object": {
       "key": "HappyFace.jpg",# Replace the name of file
       "size": 1024,
       "eTag": "0123456789abcdef0123456789abcdef",
       "sequencer": "0A1B2C3D4E5F678901"
     }
   }
 }
]
}
</code></pre>
<ul>
<li>Choose <code>Create</code>.</li>
</ul>
</li>
<li><p>To invoke the function with your test event, under Code source, choose <code>Test</code>. The Execution results tab displays the response, function logs, and request ID, similar to the following: </p>
</li>
</ol>
<pre><code>Test Event Name
put-event

Response
"image/jpeg"

Function Logs
<span class="hljs-keyword">START</span> RequestId: ca820e7b<span class="hljs-number">-0e24</span><span class="hljs-number">-465</span>a<span class="hljs-number">-97</span>be-edce7f43ace2 <span class="hljs-keyword">Version</span>: $LATEST
<span class="hljs-keyword">CONTENT</span> <span class="hljs-keyword">TYPE</span>: image/jpeg
<span class="hljs-keyword">END</span> RequestId: ca820e7b<span class="hljs-number">-0e24</span><span class="hljs-number">-465</span>a<span class="hljs-number">-97</span>be-edce7f43ace2
REPORT RequestId: ca820e7b<span class="hljs-number">-0e24</span><span class="hljs-number">-465</span>a<span class="hljs-number">-97</span>be-edce7f43ace2    <span class="hljs-keyword">Duration</span>: <span class="hljs-number">141.62</span> ms    Billed <span class="hljs-keyword">Duration</span>: <span class="hljs-number">142</span> ms    <span class="hljs-keyword">Memory</span> <span class="hljs-keyword">Size</span>: <span class="hljs-number">128</span> MB    <span class="hljs-keyword">Max</span> <span class="hljs-keyword">Memory</span> Used: <span class="hljs-number">74</span> MB

Request <span class="hljs-keyword">ID</span>
ca820e7b<span class="hljs-number">-0e24</span><span class="hljs-number">-465</span>a<span class="hljs-number">-97</span>be-edce7f43ace2
</code></pre><h1 id="heading-final-result">Final result</h1>
<p>Now, your lambda function will be invoked every time you upload a new file to your bucket.</p>
<ol>
<li><p>On the Buckets page of the Amazon S3 console, choose the name of the source bucket that you created earlier.</p>
</li>
<li><p>On the Upload page, upload any file of your choice to the bucket.</p>
</li>
<li><p>Open the Functions page on the Lambda console.</p>
</li>
<li><p>Choose the name of your function (my-s3-function).</p>
</li>
<li><p>To verify that the function ran once for each file that you uploaded, choose the Monitor tab. This page shows graphs for the metrics that Lambda sends to <strong>AWS CloudWatch</strong>. The count in the Invocations graph should match the number of files that you uploaded to the Amazon S3 bucket.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1637569938927/La005nUw3.png" alt="Screenshot from 2021-11-22 14-00-56.png" /></p>
]]></content:encoded></item><item><title><![CDATA[A visual guide to AWS Lambda for beginners]]></title><description><![CDATA[Amazon Web Services deserve all the credit for beginning the age of serverless computing with the launch of AWS Lambda in 2014. Since its launch, Lambda functions have found innumerable uses in the field of File Processing, ETL, Data Analytics, deplo...]]></description><link>https://www.100daysofdata.com/aws-lambda-for-beginners</link><guid isPermaLink="true">https://www.100daysofdata.com/aws-lambda-for-beginners</guid><category><![CDATA[AWS]]></category><category><![CDATA[aws lambda]]></category><category><![CDATA[Python]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[data analysis]]></category><dc:creator><![CDATA[Lenin Mishra]]></dc:creator><pubDate>Sun, 19 Jun 2022 15:10:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1655651140789/GbFgZubYO.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Amazon Web Services deserve all the credit for beginning the age of serverless computing with the launch of <strong>AWS Lambda</strong> in 2014. Since its launch, Lambda functions have found innumerable uses in the field of File Processing, ETL, Data Analytics, deployment of web and mobile applications, and much more. </p>
<p>In this blog, you will learn to create and fire your first Lambda function. Conceptually, you will also learn about the basic configurations, the importance of <strong>event</strong> and <strong>context</strong> in a lambda function, and experience the AWS Lambda dashboard to not feel intimidated by it. All you need to have is an account on Amazon Web Services.</p>
<h2 id="heading-benefits-of-using-aws-lambda">Benefits of using AWS Lambda</h2>
<ol>
<li>No servers to manage. </li>
<li>Better software development by decoupling architecture from code.</li>
<li>1 million free requests per month. </li>
<li>Testing feature that allows for code validation before putting it in production.</li>
</ol>
<h2 id="heading-your-first-lambda-function">Your first Lambda function</h2>
<p>Once you have logged in to AWS, this is what your AWS console should look like.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623548801308/acID8lbJs.png" alt="aws-console.PNG" /></p>
<p>If you have used AWS Lambda before, it should appear in your <code>Recently visited services</code> else, just extend the <code>All services</code> dropdown and click on Lambda. This should take you to the AWS Lambda Console.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623549172085/r52hpU4Fc.png" alt="aws-console-lambda.PNG" /></p>
<p>Once you are in the AWS Lambda console, click on the <code>Create function</code> button.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623549438335/D9_T3LEET.png" alt="lambda-console.PNG" /></p>
<p>On the <code>Create function</code> page, you have to fill in some basic information about your Lambda function. For example:-</p>
<ol>
<li><strong>Starting point for your function</strong> - For this article, we will create a function from scratch.</li>
<li><strong>Name of your function</strong> - Remember to use only letters, numbers, hyphens, or underscores with no spaces. Else you might see an error.</li>
<li><strong>Runtime</strong> - Which programming language you want to use to write your Lambda function? For this article, we will use <strong>Python3.8</strong>.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623550221640/hqA4oKct0.png" alt="my-first-lambda.PNG" /></p>
<p>You will also see a few other things like <strong>Permissions</strong>, <strong>Change default execution role</strong>, and <strong>Advanced settings</strong>. Don't bother yourself with these settings for now. We will learn about them in future articles and videos.</p>
<p>Now press on <code>Create Function</code>. Within a few minutes, you should be redirected to your newly created Lambda function console.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623550581389/foTgAdQMZ.png" alt="my-firs-lambda-landing-page.PNG" /></p>
<p>Now scroll down the code editor section.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623550836669/DcRVlDJTe.png" alt="my-first-lambda-ce.PNG" /></p>
<p>Carefully notice the name of the <code>.py</code> file and the name of the python function defined in it. </p>
<p>Filename - <code>lambda_function.py</code></p>
<p>Python Function name - <code>lambda_handler</code></p>
<p>This is the handler method that is invoked by AWS Lambda, every time it is triggered. You can confirm this by scrolling further down to the <code>Runtime settings</code> section.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623551419688/62lVgrygV.png" alt="runtime-settings.PNG" /></p>
<p>So if you were to change the name of the function or create a different file to hold your function, you need <strong>to update them in the Runtime settings</strong>.</p>
<p>As you can see, there is already a placeholder code in our <code>lambda_function.py</code> inside the <code>lambda_handler</code> function. Let's just try and run it as it is. Press on the <code>Test</code> button. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623551759179/ffF31iw3R.png" alt="test-lambda-1.PNG" /></p>
<p>You will be asked to <code>Configure test event</code>. A test event is a way of mimicking a real event that will act as an input to your Lambda function in the form of a  <a target="_blank" href="https://youtu.be/7GFXm8HUD7s">JSON payload</a>.  The test events are a great way to validate your Lambda function, before putting it on production. You can choose any event you want to mimic from the <code>Event template</code> dropdown list. In this article, we will use the simple <strong>hello-world</strong> template.</p>
<p>Replace all the key-value pairs in the JSON payload with some meaningful data. I am using my information in the payload. Then press on <code>Create</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623552355603/uwI9hmn1z.png" alt="test-event-configure-new.PNG" /></p>
<p>Once the test event is created, go ahead and press on <code>Test</code> again in the Lambda console. You should get a similar execution result.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623552578036/GKEBwyGZz.png" alt="test-lambda-results.PNG" /></p>
<p><strong>Congratulations! You have deployed your first Lambda function.</strong> It may not look like much, but believe me, it is the first successful step into the world of Serverless!</p>
<h2 id="heading-significance-of-event-and-context-in-lambda-function">Significance of event and context in Lambda function</h2>
<p>Let us inspect the execution result closely.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623554050553/S3OI_BLaY.png" alt="execution-result.PNG" /></p>
<p>You can see the <code>Test event name</code>, <code>Response</code>, <code>Function logs</code>, and <code>Request ID</code>. Each of these metrics provides some information about your lambda function. </p>
<p>Look at the <code>Response</code> metric. It is basically the JSON payload that the Lambda function returns.</p>
<pre><code>Response
{
  "statusCode": <span class="hljs-number">200</span>,
  "body": "\"Hello <span class="hljs-keyword">from</span> Lambda!\""
}
</code></pre><p>But it is a very generic response code. Let's go ahead and personalize it. If you remember, we had changed the values of our JSON payload in our configured test event. Let's try to use that test event. <strong>How do you think, we can access that event?</strong></p>
<p>The answer lies in the arguments passed into your <code>lambda_handler</code> function - <code>event</code> and <code>context</code>. Let us understand what they are.</p>
<p>When your Lambda function is invoked, <code>event</code> and <code>context</code> arguments are passed into the function. The <code>event</code> object contains the JSON payload that is to be processed by the function. Hence, the payload we created in our test event, is present in the <code>event</code> object. When passed into the function, the event is converted to a native data type - <strong>usually a dictionary</strong>(if you are using Python). However, it could also be <code>list</code>, <code>str</code>, <code>int</code>, <code>float</code> or the <code>NoneType</code> type.</p>
<p>The second argument is <code>context</code>. This object provides methods and properties that provide information about the invocation, function, and runtime environment.</p>
<p>You can verify these two arguments by printing them inside the function.</p>
<pre><code><span class="hljs-keyword">import</span> json

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">lambda_handler</span>(<span class="hljs-params">event, context</span>):</span>
    <span class="hljs-comment"># Newly added print statements</span>
    print(event)
    print(dir(context))

    <span class="hljs-keyword">return</span> {
        <span class="hljs-string">'statusCode'</span>: <span class="hljs-number">200</span>,
        <span class="hljs-string">'body'</span>: json.dumps(<span class="hljs-string">'Hello from Lambda!'</span>)
    }
</code></pre><p>Since you have changed the contents of the file, you need to save it. Press on <code>Deploy</code> to save the changes in the file. Click on the <code>Test</code> button.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623556290522/EZZZ-Eg3c.png" alt="deploy-code-changes.PNG" /></p>
<p>You should get a similar execution result.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623556392301/4bSlV8UM6.png" alt="deploy-code-changes-result.PNG" /></p>
<p>As you can see, the <code>event</code> object is returning the key-value pairs that we had mentioned in our test event. For the <code>context</code> object, we can use the <code>dir()</code> function to return all the properties and functions of the object.</p>
<p>Let us now use the <code>event</code> object to personalize the result of our Lambda function. Change the code inside the <code>lambda_function.py</code> to the below-mentioned code.</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">lambda_handler</span>(<span class="hljs-params">event, context</span>):</span>
    name = event.get(<span class="hljs-string">"name"</span>, <span class="hljs-literal">None</span>)
    twitter_handle = event.get(<span class="hljs-string">"twitter"</span>, <span class="hljs-literal">None</span>)

    <span class="hljs-keyword">return</span> <span class="hljs-string">f"Welcome to <span class="hljs-subst">{name}</span>. "</span>\
           <span class="hljs-string">f"Join us on Twitter - <span class="hljs-subst">{twitter_handle}</span>"</span>
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623556954360/sWJtI3xGb.png" alt="new-code.PNG" /></p>
<p>Click on <code>Deploy</code> to deploy the changes and then click on <code>Test</code>. You should see a similar execution result.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1623557112222/nG9qde2m0.png" alt="new-code-result.PNG" /></p>
<p>The <code>Response</code> metric now contains the newly returned statement using the data passed in the test event.</p>
<p>Hope you enjoyed this visual guide towards understanding and taking your first step towards understanding AWS Lambda. In future articles, we will slowly learn other features of AWS Lambda and perform some real-world applications to show you its advantages.</p>
<blockquote>
<p>Our objective is to present important Data Engineering and Machine Learning concepts in a lucid format. For this purpose, your feedback is important! 
Do share what you like and dislike about the articles in the comment section.</p>
</blockquote>
]]></content:encoded></item><item><title><![CDATA[Connecting Python to Google Sheets]]></title><description><![CDATA[Google Sheets
Google Sheets is a free web-based spreadsheet application provided by Google that allows users to create, manage and format spreadsheets online. It also allows users to collaborate with other users.

Why connect Python to Google Sheets?...]]></description><link>https://www.100daysofdata.com/connect-python-to-google-sheets</link><guid isPermaLink="true">https://www.100daysofdata.com/connect-python-to-google-sheets</guid><category><![CDATA[Python]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[Programming Blogs]]></category><category><![CDATA[python projects]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Lenin Mishra]]></dc:creator><pubDate>Thu, 02 Jun 2022 14:27:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1654180888871/Fr3m6Tofp.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-google-sheets">Google Sheets</h2>
<p><strong>Google Sheets is a free web-based spreadsheet application provided by Google that allows users to create, manage and format spreadsheets online. It also allows users to collaborate with other users.</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654020199894/SjYHryrXn.png" alt="Google Sheets" /></p>
<h2 id="heading-why-connect-python-to-google-sheets">Why connect Python to Google Sheets?</h2>
<p><strong>As the volume of data increases, Python proves much more powerful and practical to perform data analysis and machine learning on your Google sheet data to provide actionable intelligence. It does that through the help of powerful libraries and huge online support group.</strong></p>
<p>It is easy to setup and perform data analysis in Google Sheets. However, as the volume of data increases, you want more computational flexibility and power. </p>
<p>Today, Python is the go-to programming language for data analysis and machine learning, thanks to its third party libraries like Pandas, Numpy, Scikit learn and Matplotlib. The presence of an active online support group also helps. </p>
<p>It is easier to draw the power of those libraries and build actionable business intelligence around your data in Python.  </p>
<h2 id="heading-how-to-connect-python-to-google-sheets">How to connect Python to Google Sheets ?</h2>
<p><strong>You can connect Python to Google Sheets by creating a service account in Google Cloud Console which allows you to make authorized API calls to the Google Sheets API.</strong> </p>
<p>Follow the steps below.</p>
<h3 id="heading-create-a-new-project-in-google-cloud-console">Create a new project in Google Cloud Console</h3>
<ol>
<li>Log in to <a target="_blank" href="https://console.cloud.google.com">Google Cloud Console</a> in your browser.</li>
<li><p>Click on the <strong>Menu</strong> &gt; <strong>IAM &amp; Admin</strong> &gt; <strong>Create a Project</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654132038852/-r7BdL5f8.png" alt="Google Cloud Console" /></p>
</li>
<li><p>Provide a <strong>Project name</strong> and click <strong>Create</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654132070538/aVGeur5E3.png" alt="New project in Google Cloud Console" /></p>
</li>
</ol>
<h3 id="heading-enable-google-drive-api">Enable Google drive API</h3>
<ol>
<li><p>Click on <strong>Menu</strong> &gt; <strong>APIs &amp; Services</strong> &gt; <strong>Enabled APIs &amp; Services</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654133489663/AZoBlI20b.png" alt="Google Cloud APIs &amp; Services" /></p>
</li>
<li><p>Click on <strong>+ Enable APIS AND SERVICES</strong> button in the top middle of the page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654133644955/em21T7LKC.png" alt="Enable an API in Google Cloud" /></p>
</li>
<li><p>Search for the <strong>Google Drive API</strong> and click on it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654134150935/IP1jPpEfT.png" alt="Google Drive API" /></p>
</li>
<li><p>Enable the Google Drive API.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654134251867/_vix8a4hM.png" alt="Enable the Google Drive API" /></p>
</li>
<li>Search for the <strong>Google Sheets API</strong> and enable it.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654172949852/6lPE1m_Sr.png" alt="Screenshot from 2022-06-02 17-58-04.png" /></p>
<p>Once you enable <strong>Google Drive and Google Sheets API</strong>, you will be redirected to its page. To start using this API, <strong>you have to create credentials</strong>.</p>
<h3 id="heading-create-a-service-account">Create a Service account</h3>
<ol>
<li><p>Click on <strong>Create Credentials</strong> on the Google Drive API page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654134970330/AEQgSP52z.png" alt="Create credentials for Google Drive API" /></p>
</li>
<li><p>On the <strong>Create Credentials</strong> page, fill in the necessary details and click on <strong>Done</strong>. You will be directed to <strong>Service accounts</strong> page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654135437740/cHVnLLx07.png" alt="Credentials Page Google Cloud Console" /></p>
</li>
<li><p>Provide a Service account name and description. Click on <strong>CREATE AND CONTINUE</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654136364943/EvkMeZUrv.png" alt="Service Account page in Google Cloud" /></p>
</li>
<li><p>You will receive an email on the screen. Copy that email for later use.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654137437905/v83-pEyqF.png" alt="Service account email for Google Drive API" /></p>
</li>
<li><p>On the same page, click on <strong>Keys</strong> tab. <strong>Add Key</strong> &gt;<strong> Create a new key</strong>. Click on JSON to create a private key.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654138461239/_LHLYmrf6.png" alt="Private key in JSON for Google Drive access" /></p>
<p>Once the private key is downloaded, rename it to <code>credentials.json</code> for use later to perform OAuth2 authentication with Google APIs.</p>
<h3 id="heading-share-the-google-sheet-with-client-email">Share the Google Sheet with client email</h3>
<p>Create a google sheet to experiment with. Here I have created a <strong>To do list</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654175889651/qCu82kWnf.png" alt="My to do list in Google Sheets" /></p>
<p>Click on <strong>Share</strong> button and share the spreadsheet with the <strong>client email</strong> you have saved from Step 4 of <a class="post-section-overview" href="#create-a-service-account">previous section</a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654177081665/86RbAmhSo.png" alt="lambda(11).png" /></p>
<p>You can use the <code>gspread</code> third party library to interact with the sheet.</p>
<h2 id="heading-reading-and-writing-to-google-sheet-with-python">Reading and writing to Google Sheet with Python</h2>
<ol>
<li><p>Install the necessary libraries.</p>
<pre><code class="lang-python3">pip install gspread
pip install --upgrade google-api-python-client oauth2client
</code></pre>
</li>
<li><p>Create a new Python file. Import the following libraries into the file.</p>
<pre><code class="lang-python3">import gspread
import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials
</code></pre>
</li>
<li><p>Copy the <code>credentials.json</code> into the same directory as your file and perform authorization.</p>
<pre><code class="lang-python3"># defining the scope of the application
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive'] 

#credentials to the account
cred = ServiceAccountCredentials.from_json_keyfile_name('credentials.json',scope) 

# authorize the clientsheet 
client = gspread.authorize(cred)
</code></pre>
</li>
<li><p>Read the data from the spreadsheet using gspread <code>get_all_values()</code> method.</p>
<pre><code class="lang-python3"># Provide the Google Sheet Id
gs1 = client.open_by_key('18ZG9iEJN4c2SRdhWxo3o05ch6_TF4r2-7joAGtOeKG0')
ws1 = gs1.sheet1
print(ws1.get_all_values())
</code></pre>
<p>You should get the all the rows from the sheets.</p>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">[[<span class="hljs-string">'\n To Do'</span>, <span class="hljs-string">''</span>, <span class="hljs-string">'0/3 completed  '</span>], [<span class="hljs-string">''</span>, <span class="hljs-string">''</span>, <span class="hljs-string">''</span>], [<span class="hljs-string">'✓'</span>, <span class="hljs-string">'Date'</span>, <span class="hljs-string">'Task'</span>], [<span class="hljs-string">'FALSE'</span>, <span class="hljs-string">'2/6'</span>, <span class="hljs-string">'Finish blog on connecting Python to Google Sheets'</span>], [<span class="hljs-string">'FALSE'</span>, <span class="hljs-string">'2/7'</span>, <span class="hljs-string">'Go book shopping'</span>], [<span class="hljs-string">'FALSE'</span>, <span class="hljs-string">'2/8'</span>, <span class="hljs-string">'Conduct meeting with Jesus'</span>]]
</code></pre>
</li>
<li><p>You can also write to the Google sheet. For example - The below code create a new worksheet.</p>
<pre><code class="lang-python3"># create a new spreadsheet
new_ws = gs1.add_worksheet(title="Trial Worksheet", rows=10, cols=20)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1654179667856/W3fiACVJl.png" alt="Creating new sheet with gspread in Python" /></p>
</li>
</ol>
<p>I hope this article provides a clear <strong>step-by-step guide</strong> to connect to Google Sheets from Python.
In the next article, we will leverage the power of Python and perform some data analysis on the spreadsheet data.</p>
]]></content:encoded></item><item><title><![CDATA[Working with JSON data in Python]]></title><description><![CDATA[What is JSON?
JSON stands for Javascript Object Notation. It is a format for structuring data.
It is one of the most popular formats to be used for exchanging information between servers and browsers.
Below is an example of JSON.
{
  "name": "Lenin",...]]></description><link>https://www.100daysofdata.com/python-json</link><guid isPermaLink="true">https://www.100daysofdata.com/python-json</guid><category><![CDATA[Python]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[python beginner]]></category><category><![CDATA[Programming Blogs]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Lenin Mishra]]></dc:creator><pubDate>Sat, 28 May 2022 16:27:03 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1653721870223/ACw_XWJiu.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-is-json">What is JSON?</h2>
<p><strong>JSON stands for Javascript Object Notation.</strong> It is a format for structuring data.</p>
<p>It is one of the most popular formats to be used for exchanging information between servers and browsers.</p>
<p>Below is an example of JSON.</p>
<pre><code class="lang-bash">{
  <span class="hljs-string">"name"</span>: <span class="hljs-string">"Lenin"</span>,
  <span class="hljs-string">"age"</span>: 30,
  <span class="hljs-string">"twitter"</span>: <span class="hljs-string">"@pylenin"</span>,
  <span class="hljs-string">"website"</span>: <span class="hljs-string">"www.100daysofdata.com"</span>
}
</code></pre>
<p>It looks familiar to <a target="_blank" href="https://www.pylenin.com/blogs/python-dictionary/">Python dictionaries</a>. Data is represented as <strong>key-value pairs</strong>, where the key and value are separated by a colon <code>:</code>.</p>
<p>However, there is a fundamental difference between JSON and a dictionary. <strong>Dictionary is a data type whereas JSON is a data format.</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1653717803878/oPzJ0DZIQ.png" alt="JSON.png" class="image--center mx-auto" /></p>
<p>If you want to send the dictionary data over a series of network connection as an HTTP request(see image above), it needs to be converted into series of bytes. This is called <strong>Serialization.</strong> It helps save the state of the data type to be recreated when needed.</p>
<p>Similarly, if you convert the series of bytes you get as response from the server into a readable format, it is called <strong>Deserialization</strong>.</p>
<p><strong>JSON is a set of rules used to convert such data types into series of bytes and vice-versa.</strong></p>
<p>Python has a module called <code>json</code> that helps you analyze JSON data.</p>
<h2 id="heading-what-is-json-serialization">What is JSON serialization?</h2>
<p>As explained above, <strong>Serialization is the process of encoding naive data types to JSON format.</strong></p>
<p>In Python, different data types convert to different object types when converted to JSON.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Python object</td><td>JSON equivalent</td></tr>
</thead>
<tbody>
<tr>
<td>dict</td><td>object</td></tr>
<tr>
<td>list, tuple</td><td>array</td></tr>
<tr>
<td>str</td><td>string</td></tr>
<tr>
<td>int, float</td><td>number</td></tr>
<tr>
<td>True</td><td>true</td></tr>
<tr>
<td>False</td><td>false</td></tr>
<tr>
<td>None</td><td>null</td></tr>
</tbody>
</table>
</div><p>The <code>json</code> module in Python has two methods for serializing Python objects into JSON format.</p>
<ol>
<li><p><code>json.dump()</code> - writes Python data type to a file-like object in JSON format. </p>
</li>
<li><p><code>json.dumps()</code> - writes Python data to a string in JSON format.</p>
</li>
</ol>
<h3 id="heading-writing-json-to-a-file-with-jsondump">Writing JSON to a file with json.dump()</h3>
<p>To write JSON to a file, you can use <code>json.dump()</code> method. You have to pass in two arguments to the method - the data you want to serialize and the name of the file you are writing into.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import json

data = {
  "name": "Lenin Mishra",
  "age": 30,
  "hobby": ["Biking", "Blogging", "Cooking"],
  "websites": [
    {
    "url": "https://www.pylenin.com",
    "Total blogs": "88",
    "description": "Everything about Python"
    },
    {
    "url": "https://www.100daysofdata.com",
    "Total blogs": "3",
    "description": "Everything about Data"          
    }]
}

with open('details.json', 'w') as file:
  json.dump(data, file)
</code></pre>
<p>The above code will transform the dictionary object into a JSON string and write it to a file named <code>details.json</code>.</p>
<p>If <code>details.json</code> doesn't exist, the above code will create a new file with the same name. To learn more about file operations in Python, check out <a target="_blank" href="https://www.100daysofdata.com/python-file-io">this article</a>.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.100daysofdata.com/python-file-io">https://www.100daysofdata.com/python-file-io</a></div>
<p><br /></p>
<h3 id="heading-writing-python-object-to-a-json-string-with-jsondumps">Writing Python object to a JSON string with json.dumps()</h3>
<p>To convert the same dictionary to just a string representation of JSON, you can use <code>json.dumps()</code> method. Since you are writing to a string in memory, you just have to pass in the python object as an argument.</p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">import json

data = {
  "name": "Lenin Mishra",
  "age": 30,
  "hobby": ["Biking", "Blogging", "Cooking"],
  "websites": [
    {
    "url": "https://www.pylenin.com",
    "Total blogs": "88",
    "description": "Everything about Python"
    },
    {
    "url": "https://www.100daysofdata.com",
    "Total blogs": "3",
    "description": "Everything about Data"          
    }]
}

data_string_json = json.dumps(data)
print(type(data_string_json))
print(data_string_json)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">&lt;class <span class="hljs-string">'str'</span>&gt;
{<span class="hljs-string">"name"</span>: <span class="hljs-string">"Lenin Mishra"</span>, <span class="hljs-string">"age"</span>: 30, <span class="hljs-string">"hobby"</span>: [<span class="hljs-string">"Biking"</span>, <span class="hljs-string">"Blogging"</span>, <span class="hljs-string">"Cooking"</span>], <span class="hljs-string">"websites"</span>: [{<span class="hljs-string">"url"</span>: <span class="hljs-string">"https://www.pylenin.com"</span>, <span class="hljs-string">"Total blogs"</span>: <span class="hljs-string">"88"</span>, <span class="hljs-string">"description"</span>: <span class="hljs-string">"Everything about Python"</span>}, {<span class="hljs-string">"url"</span>: <span class="hljs-string">"https://www.100daysofdata.com"</span>, <span class="hljs-string">"Total blogs"</span>: <span class="hljs-string">"3"</span>, <span class="hljs-string">"description"</span>: <span class="hljs-string">"Everything about Data"</span>}]}
</code></pre>
<h3 id="heading-how-to-pretty-print-json-in-python">How to pretty print JSON in Python?</h3>
<p>When you printed out the JSON string in the above example, the output must have looked messy. There are few arguments you can use to make the JSON look prettier!</p>
<h4 id="heading-indent">indent</h4>
<p>The <code>indent</code> argument allows us to either print the JSON string or the file to which JSON is outputted, in a more readable manner.</p>
<p><strong>Example 3</strong></p>
<pre><code class="lang-python3">import json

data = {
  "name": "Lenin Mishra",
  "age": 30,
  "hobby": ["Biking", "Blogging", "Cooking"],
  "websites": [
    {
    "url": "https://www.pylenin.com",
    "Total blogs": "88",
    "description": "Everything about Python"
    },
    {
    "url": "https://www.100daysofdata.com",
    "Total blogs": "3",
    "description": "Everything about Data"          
    }]
}

data_string_json = json.dumps(data, indent=4)
print(data_string_json)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">{
    <span class="hljs-string">"name"</span>: <span class="hljs-string">"Lenin Mishra"</span>,
    <span class="hljs-string">"age"</span>: 30,
    <span class="hljs-string">"hobby"</span>: [
        <span class="hljs-string">"Biking"</span>,
        <span class="hljs-string">"Blogging"</span>,
        <span class="hljs-string">"Cooking"</span>
    ],
    <span class="hljs-string">"websites"</span>: [
        {
            <span class="hljs-string">"url"</span>: <span class="hljs-string">"https://www.pylenin.com"</span>,
            <span class="hljs-string">"Total blogs"</span>: <span class="hljs-string">"88"</span>,
            <span class="hljs-string">"description"</span>: <span class="hljs-string">"Everything about Python"</span>
        },
        {
            <span class="hljs-string">"url"</span>: <span class="hljs-string">"https://www.100daysofdata.com"</span>,
            <span class="hljs-string">"Total blogs"</span>: <span class="hljs-string">"3"</span>,
            <span class="hljs-string">"description"</span>: <span class="hljs-string">"Everything about Data"</span>
        }
    ]
}
</code></pre>
<p>You can pass in different values for the <code>indent</code> argument.</p>
<ol>
<li>If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level.</li>
<li></li>
</ol>
<h4 id="heading-sortkeys">sort_keys</h4>
<p>If set to <code>True</code>, the <code>sort_keys</code> argument sorts the output JSON according to its keys.</p>
<p><strong>Example 4</strong></p>
<pre><code class="lang-python3">import json

data = {
  "name": "Lenin Mishra",
  "2022":"hello",
  "age": 30,
  "hobby": ["Biking", "Blogging", "Cooking"],
  "websites": [
    {
    "url": "https://www.pylenin.com",
    "Total blogs": "88",
    "description": "Everything about Python"
    },
    {
    "url": "https://www.100daysofdata.com",
    "Total blogs": "3",
    "description": "Everything about Data"          
    }]
}

data_string_json = json.dumps(data, indent=4, sort_keys=True)
print(data_string_json)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">{
    <span class="hljs-string">"age"</span>: 30,
    <span class="hljs-string">"hobby"</span>: [
        <span class="hljs-string">"Biking"</span>,
        <span class="hljs-string">"Blogging"</span>,
        <span class="hljs-string">"Cooking"</span>
    ],
    <span class="hljs-string">"name"</span>: <span class="hljs-string">"Lenin Mishra"</span>,
    <span class="hljs-string">"websites"</span>: [
        {
            <span class="hljs-string">"Total blogs"</span>: <span class="hljs-string">"88"</span>,
            <span class="hljs-string">"description"</span>: <span class="hljs-string">"Everything about Python"</span>,
            <span class="hljs-string">"url"</span>: <span class="hljs-string">"https://www.pylenin.com"</span>
        },
        {
            <span class="hljs-string">"Total blogs"</span>: <span class="hljs-string">"3"</span>,
            <span class="hljs-string">"description"</span>: <span class="hljs-string">"Everything about Data"</span>,
            <span class="hljs-string">"url"</span>: <span class="hljs-string">"https://www.100daysofdata.com"</span>
        }
    ]
}
</code></pre>
<p>As you can see, the keys have been sorted in alphabetical order.</p>
<h3 id="heading-difference-between-jsondump-and-jsondumps">Difference between json.dump() and json.dumps()</h3>
<p>If you want to dump the JSON into a file, then you should use <code>json.dump()</code>. If you only need it as a string, then use <code>json.dumps()</code>.</p>
<p><strong>Tip to remember</strong> - If the method ends with an <code>s</code>, it converts to string.</p>
<h2 id="heading-what-is-json-deserialization">What is JSON deserialization?</h2>
<p>JSON deserialization is the process of decoding JSON data into a native data type in Python. Unless the data is something very simple, these methods will most likely return a Python dictionary or list containing the deserialized data. </p>
<p>The <code>json</code> module has two methods for deserializing JSON.</p>
<ol>
<li><code>json.load()</code> - loads JSON data from a file-like object.</li>
<li><code>json.loads()</code> - loads JSON data from a string containing JSON-encoded data. </li>
</ol>
<h3 id="heading-parsing-a-json-string-in-python-using-jsonloads">Parsing a JSON string in Python using json.loads()</h3>
<p>To parse JSON string and convert it to a Python dictionary, use the <code>json.loads()</code> method.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import json

data = '{"name": "Lenin", "website": "100daysofdata.com", "age":30}'

json_dict = json.loads(data)
print(type(json_dict))
print(json_dict)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">&lt;class <span class="hljs-string">'dict'</span>&gt;
{<span class="hljs-string">'name'</span>: <span class="hljs-string">'Lenin'</span>, <span class="hljs-string">'website'</span>: <span class="hljs-string">'100daysofdata.com'</span>, <span class="hljs-string">'age'</span>: 30}
</code></pre>
<p>If the data being deserialized is not a valid JSON document, a <code>JSONDecodeError</code> will be raised.</p>
<p><br /></p>
<h3 id="heading-parsing-a-json-file-in-python-using-jsonload">Parsing a JSON file in Python using json.load()</h3>
<p>You can use <code>json.load()</code> method to read a file containing JSON object.</p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">import json

file_name = 'details.json'

with open(file_name, 'r') as f:
  data = json.load(f)

print(type(data))
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">&lt;class <span class="hljs-string">'dict'</span>&gt;
</code></pre>
<h3 id="heading-difference-between-jsonload-and-jsonloads">Difference between json.load() and json.loads()</h3>
<p>As <a class="post-section-overview" href="#heading-difference-between-jsondump-and-jsondumps">mentioned earlier</a>, <code>s</code> stands for <strong>string</strong>. </p>
<p>The <code>json. load()</code> is used to convert a JSON file into a dictionary whereas, <code>json. loads()</code> is used to convert a JSON String into the Python dictionary.</p>
<p></p><hr /><p></p>
<h3 id="heading-how-to-unpack-json-data-in-python">How to unpack JSON data in Python?</h3>
<p>Deserialization helps to decode JSON values in Python. Once JSON is deserialized and converted to a dictionary, you can easily go through the keys and values of the dictionary using <code>dict.items()</code> and extract the necessary data.</p>
<p><strong>Example 3</strong></p>
<pre><code class="lang-python3">import json

file_name = 'details.json'

with open(file_name, 'r') as f:
  data = json.load(f)

for key, value in data.items():
  print(key, value)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">name Lenin Mishra
age 30
hobby [<span class="hljs-string">'Biking'</span>, <span class="hljs-string">'Blogging'</span>, <span class="hljs-string">'Cooking'</span>]
websites [{<span class="hljs-string">'url'</span>: <span class="hljs-string">'https://www.pylenin.com'</span>, <span class="hljs-string">'Total blogs'</span>: <span class="hljs-string">'88'</span>, <span class="hljs-string">'description'</span>: <span class="hljs-string">'Everything about Python'</span>}, {<span class="hljs-string">'url'</span>: <span class="hljs-string">'https://www.100daysofdata.com'</span>, <span class="hljs-string">'Total blogs'</span>: <span class="hljs-string">'3'</span>, <span class="hljs-string">'description'</span>: <span class="hljs-string">'Everything about Data'</span>}]
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Data analysis from a CSV file in Python]]></title><description><![CDATA[What is a CSV file?
CSV stands for comma separated value. 
You might have come across this file format while downloading data from an excel spreadsheet or a database. CSV files are convenient for storing tabular data.
It should be clear from the name...]]></description><link>https://www.100daysofdata.com/python-csv-data-analysis</link><guid isPermaLink="true">https://www.100daysofdata.com/python-csv-data-analysis</guid><category><![CDATA[Python]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[python projects]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[data analysis]]></category><dc:creator><![CDATA[Lenin Mishra]]></dc:creator><pubDate>Wed, 25 May 2022 05:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1653414132916/ZJkgb9SX2.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-is-a-csv-file">What is a CSV file?</h2>
<p><strong>CSV stands for comma separated value.</strong> </p>
<p>You might have come across this file format while downloading data from an excel spreadsheet or a database. CSV files are convenient for storing tabular data.</p>
<p>It should be clear from the name that values in a CSV file are separated by a comma(by default).  </p>
<p>Below is an example of CSV file containing information about a family.</p>
<p><strong>my_family.csv</strong></p>
<pre><code class="lang-csv">name,age,height(cm),weight(kg)
Lenin,30,188,90
Phil,42,178,76
Claire,40,165,54
Alex,18,140,46
</code></pre>
<p>Usually the first line in a CSV file is called the <strong>Header</strong> which identifies the column name and data type. Every row after the header is a data record.</p>
<p>From the above example, you can see that each value(whether part of header or a data record) is separated by a comma. <strong>This separator character is called a Delimiter.</strong> A CSV file may use other delimiters other than comma.</p>
<p>Examples of other delimiter - </p>
<ol>
<li>tab <code>\t</code></li>
<li>colon <code>:</code></li>
<li>semi colon <code>;</code></li>
<li>pipe <code>|</code></li>
</ol>
<p><strong>In this article, you will learn to work with CSV files using the <code>csv</code> module and the <code>pandas</code> library.</strong></p>
<p></p><hr /><p></p>
<h2 id="heading-how-to-read-csv-files-using-the-csv-module">How to read CSV files using the csv module?</h2>
<p>Reading from a CSV file is done with the <code>csv.reader</code> object. You can open the CSV file as a text file with Python’s <a target="_blank" href="https://www.100daysofdata.com/python-file-io#heading-how-to-open-and-read-a-file-in-python">built-in open() function</a>.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import csv

with open('my_family.csv') as input:
    csv_reader = csv.reader(input, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Header row - {", ".join(row)}')
            line_count += 1
        else:
            print(f'{row[0]} is {row[1]} years old, {row[2]} cm tall and {row[3]} kg heavy')
            line_count += 1
    print(f'Total: {line_count} lines')
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Header row - name,  age,  height(cm),  weight(kg)
Lenin is  30 years old,  188 cm tall and  90 kg heavy
Phil is  42 years old,  178 cm tall and  76 kg heavy
Claire is  40 years old,  165 cm tall and  54 kg heavy
Alex is  18 years old,  140 cm tall and  46 kg heavy
Total: 5 lines
</code></pre>
<p>Since the first row is the header row(<code>line_count</code> will be 0), it is treated differently. <strong>You can also skip the header row while reading the CSV.</strong>
<br /></p>
<h3 id="heading-how-to-skip-the-header-row-in-csv-with-python">How to skip the header row in CSV with Python?</h3>
<p>Since <code>csv.reader</code> object is an <a target="_blank" href="https://www.pylenin.com/blogs/python-iterators/">iterable</a>, you can use <code>next(reader object, None)</code> function to return the header row and skip over it.</p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">import csv

with open('my_family.csv') as input:
    csv_reader = csv.reader(input, delimiter=',')
    line_count = 0
    next(csv_reader, None) #ignore the header
    for row in csv_reader:
        print(f'{row[0]} is {row[1]} years old, {row[2]} cm tall and {row[3]} kg heavy')
        line_count += 1
    print(f'Total: {line_count} lines')
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Lenin is  30 years old,  188 cm tall and  90 kg heavy
Phil is  42 years old,  178 cm tall and  76 kg heavy
Claire is  40 years old,  165 cm tall and  54 kg heavy
Alex is  18 years old,  140 cm tall and  46 kg heavy
Total: 4 lines
</code></pre>
<p><br /></p>
<h3 id="heading-reading-csv-files-as-a-dictionary">Reading CSV files as a dictionary</h3>
<p>You can read the CSV file as a dictionary by using the <code>csv.DictReader</code> object.</p>
<p><strong>An advantage of using the <code>DictReader</code> object is that it turns each row into a dictionary which make accessing the fields a little more easier.</strong></p>
<p><strong>Example 3</strong></p>
<pre><code class="lang-python3">import csv

with open('my_family.csv') as input:
    csv_reader = csv.DictReader(input, delimiter=',')
    for row in csv_reader:
        print(f'{row["name"]} is {row["age"]} years old, {row["height(cm)"]} cm tall and {row["weight(kg)"]} kg heavy')
    print(f'Total: {csv_reader.line_num} lines')
</code></pre>
<p><strong>The <code>csv_reader.line_num</code> method returns the total number of lines in the CSV file.</strong></p>
<p>For the <code>csv.DictReader</code> object, Python uses the column names as key from the header row. <strong>The <code>csv.DictReader</code> object doesn't have the header row in it.</strong> </p>
<p></p><hr /><p></p>
<h2 id="heading-how-to-write-to-csv-files-using-the-csv-module">How to write to CSV files using the csv module?</h2>
<p>You can write to a CSV file using the <code>csv.writer</code> object. Be careful to open the file in <a target="_blank" href="https://www.100daysofdata.com/python-file-io#heading-how-to-write-to-a-file-in-python">writing mode</a>.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import csv

header = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']

data = [ ['Phil', 42, 178, 76],
        ['Alex', 18, 140, 46],
        ['Claire', 40, 165, 54] ]

filename = "my_family.csv"

with open(filename, 'w') as output:
    csvwriter = csv.writer(output)

    # Write a single list
    csvwriter.writerow(header)

    # Writing a list of lists
    csvwriter.writerows(data)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Name,Age,Height(cm),Weight(kg)
Phil,42,178,76
Alex,18,140,46
Claire,40,165,54
</code></pre>
<p><strong>The <code>writerow</code> method is going to write a list of values into a single row whereas <code>writerows</code> is going to write multiple rows from a buffer that contains one or more lists.</strong></p>
<h3 id="heading-using-the-delimiter-parameter">Using the delimiter parameter</h3>
<p>Notice that no delimiter has been mentioned while creating the <code>csv.writer</code> object. In such cases, comma <code>,</code> is used as the default delimiter. You can also use a different delimiter by passing the <code>delimiter</code> parameter.</p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">import csv

header = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']

data = [ ['Phil', 42, 178, 76],
        ['Alex', 18, 140, 46],
        ['Claire', 40, 165, 54] ]

filename = "my_family.csv"

with open(filename, 'w') as output:
    csvwriter = csv.writer(output, delimiter = '|')

    # Write a single list
    csvwriter.writerow(header)

    # Writing a list of lists
    csvwriter.writerows(data)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Name|Age|Height(cm)|Weight(kg)
Phil|42|178|76
Alex|18|140|46
Claire|40|165|54
</code></pre>
<h3 id="heading-writing-a-dictionary-to-a-csv-file">Writing a dictionary to a CSV file</h3>
<p>You can write the dictionary into a CSV file using the <code>DictWriter</code> method. The <code>fieldnames</code> parameter is compulsory for passing the header information.</p>
<p><strong>Example 3</strong></p>
<pre><code class="lang-python3">import csv

header = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']

data = [
    {"Name":"Phil", "Age": 42, "Height(cm)":178, "Weight(kg)":76},
    {"Name":"Claire", "Age": 40, "Height(cm)":165, "Weight(kg)":54},
    {"Name":"Alex", "Age": 18, "Height(cm)":140, "Weight(kg)":46}
]

filename = "my_family.csv"

with open(filename, 'w') as output:
  csvwriter = csv.DictWriter(output, fieldnames=header)
  csvwriter.writeheader()
  for row in data:
    csvwriter.writerow(row)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Name,Age,Height(cm),Weight(kg)
Phil,42,178,76
Claire,40,165,54
Alex,18,140,46
</code></pre>
<p>You can also use <code>writerows</code> to write all the dictionaries to the CSV file at once.</p>
<p><strong>Example 4</strong></p>
<pre><code class="lang-python3">import csv

header = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']

data = [
    {"Name":"Phil", "Age": 42, "Height(cm)":178, "Weight(kg)":76},
    {"Name":"Claire", "Age": 40, "Height(cm)":165, "Weight(kg)":54},
    {"Name":"Alex", "Age": 18, "Height(cm)":140, "Weight(kg)":46}
]

filename = "my_family.csv"

with open(filename, 'w') as output:
  csvwriter = csv.DictWriter(output, fieldnames=header)
  csvwriter.writeheader()
  csvwriter.writerows(data)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Name,Age,Height(cm),Weight(kg)
Phil,42,178,76
Claire,40,165,54
Alex,18,140,46
</code></pre>
<h2 id="heading-playing-with-additional-parameters-in-csv-module">Playing with additional parameters in csv module</h2>
<h3 id="heading-quotechar">quotechar</h3>
<p>It refers to the character string that is used to quote values when special characters or delimiters appears inside the field. It defaults to <code>"</code>.</p>
<p>For example, if the delimiter of your CSV file is a comma and you have an address column that may have comma in it's values. Check out the example below.</p>
<p><strong>my_family.csv</strong></p>
<pre><code class="lang-bash">Name,Age,Height(cm),Weight(kg),Address
Phil,42,178,76,<span class="hljs-string">'Gryffindor room, Hogwarts'</span>
Claire,40,165,54,<span class="hljs-string">'Snapes room, Hogwarts'</span>
Alex,18,140,46,<span class="hljs-string">'4 Private Drive, Little Whinging'</span>
</code></pre>
<p>The above CSV file is using single quotes to separate the address fields for each data record. You can pass this as the <code>quotechar</code> value. </p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import csv

filename = "my_family.csv"

with open(filename, 'r') as output:
  csvreader = csv.reader(output, quotechar="'")
  for row in csvreader:
    print(row)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">[<span class="hljs-string">'Name'</span>, <span class="hljs-string">'Age'</span>, <span class="hljs-string">'Height(cm)'</span>, <span class="hljs-string">'Weight(kg)'</span>, <span class="hljs-string">'Address'</span>]
[<span class="hljs-string">'Phil'</span>, <span class="hljs-string">'42'</span>, <span class="hljs-string">'178'</span>, <span class="hljs-string">'76'</span>, <span class="hljs-string">'Gryffindor room, Hogwarts'</span>]
[<span class="hljs-string">'Claire'</span>, <span class="hljs-string">'40'</span>, <span class="hljs-string">'165'</span>, <span class="hljs-string">'54'</span>, <span class="hljs-string">'Snapes room, Hogwarts'</span>]
[<span class="hljs-string">'Alex'</span>, <span class="hljs-string">'18'</span>, <span class="hljs-string">'140'</span>, <span class="hljs-string">'46'</span>, <span class="hljs-string">'4 Private Drive, Little Whinging'</span>]
</code></pre>
<h3 id="heading-quoting">quoting</h3>
<p>The <code>quoting</code> argument controls when quotes should be generated by the writer or recognized by the reader. It is of 4 types.</p>
<ol>
<li><code>csv.QUOTE_MINIMAL</code> - It adds quote only when required(default).</li>
<li><code>csv.QUOTE_ALL</code> - It quotes everything regardless of the field type.</li>
<li><code>csv.QUOTE_NONNUMERIC</code> - It quotes everything except integers and floats.</li>
<li><code>csv.QUOTE_NONE</code> - It does not quote anything on output. However, while reading, quotes are included around the field values.</li>
</ol>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import csv

filename = "my_family.csv"

header = ['Name','Age','Height(cm)','Weight(kg)','Address']

data = [
  ['Phil',42,178,76,'Gryffindor room, Hogwarts'],
  ['Claire',40,165,54,'Snapes room, Hogwarts'],
  ['Alex',18,140,46,'4 Private Drive, Little Whinging']
]

with open(filename, 'w') as output:
  csvwriter = csv.writer(output, quotechar="'", quoting=csv.QUOTE_ALL)
  csvwriter.writerow(header)
  csvwriter.writerows(data)
</code></pre>
<p>The above code uses <code>csv.QUOTE_ALL</code> as the quoting argument. This will ensure that every data point has a single quotation wrapped around it while being written to the CSV.</p>
<p><strong>my_family.csv</strong></p>
<pre><code class="lang-bash"><span class="hljs-string">'Name'</span>,<span class="hljs-string">'Age'</span>,<span class="hljs-string">'Height(cm)'</span>,<span class="hljs-string">'Weight(kg)'</span>,<span class="hljs-string">'Address'</span>
<span class="hljs-string">'Phil'</span>,<span class="hljs-string">'42'</span>,<span class="hljs-string">'178'</span>,<span class="hljs-string">'76'</span>,<span class="hljs-string">'Gryffindor room, Hogwarts'</span>
<span class="hljs-string">'Claire'</span>,<span class="hljs-string">'40'</span>,<span class="hljs-string">'165'</span>,<span class="hljs-string">'54'</span>,<span class="hljs-string">'Snapes room, Hogwarts'</span>
<span class="hljs-string">'Alex'</span>,<span class="hljs-string">'18'</span>,<span class="hljs-string">'140'</span>,<span class="hljs-string">'46'</span>,<span class="hljs-string">'4 Private Drive, Little Whinging'</span>
</code></pre>
<h3 id="heading-escapechar">escapechar</h3>
<p>Let's say, you don't want any quotation in your CSV file while executing the above code. So you use <code>csv.QUOTE_NONE</code> as the quoting argument.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import csv

filename = "my_family.csv"

header = ['Name','Age','Height(cm)','Weight(kg)','Address']

data = [
  ['Phil',42,178,76,'Gryffindor room, Hogwarts'],
  ['Claire',40,165,54,'Snapes room, Hogwarts'],
  ['Alex',18,140,46,'4 Private Drive, Little Whinging']
]

with open(filename, 'w') as output:
  csvwriter = csv.writer(output, quotechar="'", quoting=csv.QUOTE_NONE)
  csvwriter.writerow(header)
  csvwriter.writerows(data)
</code></pre>
<p>The above code will throw you an error.</p>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Traceback (most recent call last):
  File <span class="hljs-string">"main.py"</span>, line 16, <span class="hljs-keyword">in</span> &lt;module&gt;
    csvwriter.writerows(data)
_csv.Error: need to escape, but no escapechar <span class="hljs-built_in">set</span>
</code></pre>
<p>The problem is that the address field contains commas. Since the quoting argument is set to <code>csv.QUOTE_NONE</code>, the <code>csv</code> module doesn't know how to escape the commas properly.</p>
<p>For this purpose, you can use the <code>escapechar</code> argument. It takes a single character string that is used to escape the delimiter when the the quoting is turned off.</p>
<p>The below code escapes the comma using a backslash <code>\</code>.</p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">import csv

filename = "my_family.csv"

header = ['Name','Age','Height(cm)','Weight(kg)','Address']

data = [
  ['Phil',42,178,76,'Gryffindor room, Hogwarts'],
  ['Claire',40,165,54,'Snapes room, Hogwarts'],
  ['Alex',18,140,46,'4 Private Drive, Little Whinging']
]

with open(filename, 'w') as output:
  csvwriter = csv.writer(output, quotechar="'", quoting=csv.QUOTE_NONE, escapechar='\\')
  csvwriter.writerow(header)
  csvwriter.writerows(data)
</code></pre>
<p><strong>my_family.csv</strong></p>
<pre><code class="lang-bash">Name,Age,Height(cm),Weight(kg),Address
Phil,42,178,76,Gryffindor room\, Hogwarts
Claire,40,165,54,Snapes room\, Hogwarts
Alex,18,140,46,4 Private Drive\, Little Whinging
</code></pre>
<p>Notice how the commas have been escaped with backslash <code>\</code> and no error is thrown.</p>
<h3 id="heading-skipinitialspace">skipinitialspace</h3>
<p>It skips the space following the delimiter. If True, the initial white spaces will be removed. It defaults to False.</p>
<p><strong>my_family.csv</strong></p>
<pre><code class="lang-bash">Name, Age, Height(cm), Weight(kg), Address
Phil, 42, 178, 76, <span class="hljs-string">'Gryffindor room, Hogwarts'</span>
Claire, 40, 165, 54, <span class="hljs-string">'Snapes room, Hogwarts'</span>
Alex, 18, 140, 46, <span class="hljs-string">'4 Private Drive, Little Whinging'</span>
</code></pre>
<p>The above CSV file has spaces after every delimiter. If you read it without the <code>skipinitialspace</code> argument, there will be white spaces in your data points.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import csv

with open('my_family.csv', 'r') as f:
    csv_reader = csv.reader(f, quotechar="'")

    for line in csv_reader:
        print(line)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">[<span class="hljs-string">'Name'</span>, <span class="hljs-string">' Age'</span>, <span class="hljs-string">' Height(cm)'</span>, <span class="hljs-string">' Weight(kg)'</span>, <span class="hljs-string">' Address'</span>]
[<span class="hljs-string">'Phil'</span>, <span class="hljs-string">' 42'</span>, <span class="hljs-string">' 178'</span>, <span class="hljs-string">' 76'</span>, <span class="hljs-string">" 'Gryffindor room"</span>, <span class="hljs-string">" Hogwarts'"</span>]
[<span class="hljs-string">'Claire'</span>, <span class="hljs-string">' 40'</span>, <span class="hljs-string">' 165'</span>, <span class="hljs-string">' 54'</span>, <span class="hljs-string">" 'Snapes room"</span>, <span class="hljs-string">" Hogwarts'"</span>]
[<span class="hljs-string">'Alex'</span>, <span class="hljs-string">' 18'</span>, <span class="hljs-string">' 140'</span>, <span class="hljs-string">' 46'</span>, <span class="hljs-string">" '4 Private Drive"</span>, <span class="hljs-string">" Little Whinging'"</span>]
</code></pre>
<p>To get rid of the whitespaces, set the <code>skipinitialspace</code> argument to <code>True</code>.</p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">import csv

with open('my_family.csv', 'r') as f:
    csv_reader = csv.reader(f, quotechar="'", skipinitialspace=True)

    for line in csv_reader:
        print(line)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">[<span class="hljs-string">'Name'</span>, <span class="hljs-string">'Age'</span>, <span class="hljs-string">'Height(cm)'</span>, <span class="hljs-string">'Weight(kg)'</span>, <span class="hljs-string">'Address'</span>]
[<span class="hljs-string">'Phil'</span>, <span class="hljs-string">'42'</span>, <span class="hljs-string">'178'</span>, <span class="hljs-string">'76'</span>, <span class="hljs-string">'Gryffindor room, Hogwarts'</span>]
[<span class="hljs-string">'Claire'</span>, <span class="hljs-string">'40'</span>, <span class="hljs-string">'165'</span>, <span class="hljs-string">'54'</span>, <span class="hljs-string">'Snapes room, Hogwarts'</span>]
[<span class="hljs-string">'Alex'</span>, <span class="hljs-string">'18'</span>, <span class="hljs-string">'140'</span>, <span class="hljs-string">'46'</span>, <span class="hljs-string">'4 Private Drive, Little Whinging'</span>]
</code></pre>
<p></p><hr /><p></p>
<h2 id="heading-how-to-read-csv-files-using-the-pandas-module">How to read CSV files using the pandas module?</h2>
<p>Reading CSV files into a pandas DataFrame is very straightforward. A pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import pandas as pd

df = pd.read_csv('my_family.csv')
print(df)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">     Name  Age  Height(cm)  Weight(kg)
0    Phil   42         178          76
1  Claire   40         165          54
2    Alex   18         140          46
</code></pre>
<p>Notice the following points:-</p>
<ol>
<li>It used the first line of CSV as column names automatically.</li>
<li>It displays the CSV data like a spreadsheet, thus making it easy to perform data analysis.</li>
<li>Pandas automatically converted the datatype for Age, Height(cm) and Weight(kg) columns to integer. </li>
</ol>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">import pandas as pd

df = pd.read_csv('my_family.csv')

print(type(df['Age'][0]))
print(type(df['Height(cm)'][0]))
print(type(df['Weight(kg)'][0]))
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">&lt;class <span class="hljs-string">'numpy.int64'</span>&gt;
&lt;class <span class="hljs-string">'numpy.int64'</span>&gt;
&lt;class <span class="hljs-string">'numpy.int64'</span>&gt;
</code></pre>
<h3 id="heading-pandas-trick-to-deal-with-csvs-without-header">Pandas trick to deal with CSVs without header</h3>
<p>If your CSV is missing the header row, use the <code>names</code> arguments in <code>pd.read_csv()</code> method.</p>
<p><strong>my_family.csv</strong></p>
<pre><code class="lang-bash">Phil,42,178,76
Claire,40,165,54
Alex,18,140,46
</code></pre>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">import pandas as pd

df = pd.read_csv('my_family.csv', 
            index_col='Name', 
            names=['Name', 'Age', 'Height(cm)', 'Weight(kg)']
                    )
print(df)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">        Age  Height(cm)  Weight(kg)
Name                               
Phil     42         178          76
Claire   40         165          54
Alex     18         140          46
</code></pre>
<p></p><hr /><p></p>
<h2 id="heading-how-to-write-to-csv-files-using-the-pandas-module">How to write to CSV files using the pandas module?</h2>
<p>To write a pandas dataframe to a CSV file, use <code>df.to_csv</code> method.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">import pandas as pd

df = pd.read_csv('my_family.csv', 
            index_col='Name', 
            names=['Name', 'Age', 'Height(cm)', 'Weight(kg)']
                    )
df.to_csv('my_new_family.csv')
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">        Age  Height(cm)  Weight(kg)
Name                               
Phil     42         178          76
Claire   40         165          54
Alex     18         140          46
</code></pre>
<p></p><hr /><p></p>
<h2 id="heading-data-analyst-project-analyze-titanic-data-from-kaggle">Data Analyst Project: Analyze Titanic data from Kaggle</h2>
<p>The famous Titanic challenge by Kaggle is to build Machine Learning model that predicts which passengers survived the Titanic shipwreck.</p>
<p>However, in this section you are going to do simple data analysis on <a target="_blank" href="https://www.kaggle.com/competitions/titanic">train.csv</a> file and figure out the answers to the following questions:-</p>
<ol>
<li>How many male and female passengers were onboard the Titanic?</li>
<li>How many male and female members survived the Titanic shipwreck?</li>
<li>What is the median age of each sex?</li>
</ol>
<h3 id="heading-male-to-female-ratio-on-the-titanic">Male to Female ratio on the Titanic</h3>
<pre><code class="lang-python3">import pandas as pd

#load the csv file
df = pd.read_csv('train.csv')

# Column Names
print(df.columns)

# Count unique values in Sex column
print(df['Sex'].value_counts())

# Percentage of male and female passengers
print(df['Sex'].value_counts(normalize=True))
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Index([<span class="hljs-string">'PassengerId'</span>, <span class="hljs-string">'Survived'</span>, <span class="hljs-string">'Pclass'</span>, <span class="hljs-string">'Name'</span>, <span class="hljs-string">'Sex'</span>, <span class="hljs-string">'Age'</span>, <span class="hljs-string">'SibSp'</span>,
       <span class="hljs-string">'Parch'</span>, <span class="hljs-string">'Ticket'</span>, <span class="hljs-string">'Fare'</span>, <span class="hljs-string">'Cabin'</span>, <span class="hljs-string">'Embarked'</span>],
      dtype=<span class="hljs-string">'object'</span>)
male      577
female    314
Name: Sex, dtype: int64
male      0.647587
female    0.352413
Name: Sex, dtype: float64
</code></pre>
<p><strong>The above analysis shows that 65% of people on Titanic were Male and 35% were Female. </strong></p>
<h3 id="heading-surviving-male-to-female-ratio-on-the-titanic">Surviving male to female ratio on the Titanic</h3>
<pre><code class="lang-python3">import pandas as pd

#load the csv file
df = pd.read_csv('train.csv')

# Column Names
print(df.columns)

# Count unique values in Sex column
print(df[df["Survived"] == 1]['Sex'].value_counts())

# Percentage of surviving male and female passengers
print(df[df["Survived"] == 1]['Sex'].value_counts(normalize=True))
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Index([<span class="hljs-string">'PassengerId'</span>, <span class="hljs-string">'Survived'</span>, <span class="hljs-string">'Pclass'</span>, <span class="hljs-string">'Name'</span>, <span class="hljs-string">'Sex'</span>, <span class="hljs-string">'Age'</span>, <span class="hljs-string">'SibSp'</span>,
       <span class="hljs-string">'Parch'</span>, <span class="hljs-string">'Ticket'</span>, <span class="hljs-string">'Fare'</span>, <span class="hljs-string">'Cabin'</span>, <span class="hljs-string">'Embarked'</span>],
      dtype=<span class="hljs-string">'object'</span>)
female    233
male      109
Name: Sex, dtype: int64
female    0.681287
male      0.318713
Name: Sex, dtype: float64
</code></pre>
<p>In the above code, first you filter the dataframe for surviving passengers and then use the <code>value_counts()</code> method to find out the unique male and female passengers.</p>
<p><strong>The above analysis shows that 68% of surviving people on the Titanic were Female.</strong></p>
<h3 id="heading-median-age-of-each-sex">Median age of each sex</h3>
<pre><code class="lang-python3">import pandas as pd

#load the csv file
df = pd.read_csv('train.csv')

# median age of each sex
median_age_men=df[df['Sex']=='male']['Age'].median()
median_age_women=df[df['Sex']=='female']['Age'].median()

print(f"The median age of men is {median_age_men}")
print(f"The median age of women is {median_age_women}")
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">The median age of men is 29.0
The median age of women is 27.0
</code></pre>
<p><strong>The above analysis shows that median age of male was 29 whereas median age of female was 27.</strong></p>
]]></content:encoded></item><item><title><![CDATA[Python File I/O]]></title><description><![CDATA[Introduction to Python I/O
I/O stands for Input/Output.
There are some commonly used built-in functions in Python like input() and print(), that help with input and output operations respectively.
The input() function reads user input into memory  wh...]]></description><link>https://www.100daysofdata.com/python-file-io</link><guid isPermaLink="true">https://www.100daysofdata.com/python-file-io</guid><category><![CDATA[Python]]></category><category><![CDATA[Programming Blogs]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[Beginner Developers]]></category><category><![CDATA[python beginner]]></category><dc:creator><![CDATA[Lenin Mishra]]></dc:creator><pubDate>Mon, 23 May 2022 04:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1653230383942/ZWpVJriCl.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction-to-python-io">Introduction to Python I/O</h2>
<p><strong>I/O stands for Input/Output.</strong></p>
<p>There are some commonly used built-in functions in Python like <code>input()</code> and <code>print()</code>, that help with input and output operations respectively.</p>
<p><strong>The <code>input()</code> function reads user input into memory  which is defined as sys.stdin and the <code>print()</code> function send data to display and is identified as sys.stdout.</strong></p>
<p><br /></p>
<h3 id="heading-input-operation-with-input">Input operation with input()</h3>
<p>To read input from user, you can use the <code>input()</code> built-in method.</p>
<p><strong>**All codes are tested with Python 3.8.2.</strong></p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">user_name = input('Enter your name--&gt; ')  
print(f"User name is {user_name}")
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Enter your name--&gt; Pylenin
User name is Pylenin
</code></pre>
<p>The <code>input()</code> function reads a line from input, converts it to a string (stripping a trailing newline), and returns the string.</p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">value = input("Enter anything==&gt; ")
print(type(value))
print("Input received from the user is: ", value)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Enter anything==&gt; [1, 2]
&lt;class <span class="hljs-string">'str'</span>&gt;
Input received from the user is:  [1, 2]
</code></pre>
<p>To learn more about the <code>input()</code> function, check out <a target="_blank" href="https://www.pylenin.com/blogs/how-input-works-in-python/">this article</a> written by <a target="_blank" href="https://youtube.com/pylenin">Pylenin</a>.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.pylenin.com/blogs/how-input-works-in-python/">https://www.pylenin.com/blogs/how-input-works-in-python/</a></div>
<p><br /></p>
<h3 id="heading-output-operation-with-print">Output operation with print()</h3>
<p>The <code>print()</code> function provides an interface to the <strong>standard output(sys.stdout)</strong> object. When you use print, you are asking your Python interpreter to echo your message to standard output.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">print('Python', 3, 'Rocks', sep='|')
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">Python|3|Rocks
</code></pre>
<p>Check out this <a target="_blank" href="https://www.pylenin.com/blogs/python-print/">deep-dive article</a> on <code>print()</code> to learn more about it.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.pylenin.com/blogs/python-print/">https://www.pylenin.com/blogs/python-print/</a></div>
<hr />

<h2 id="heading-how-to-open-and-read-a-file-in-python">How to open and read a file in Python?</h2>
<p>When you want to read from or write to a file, you need to open it first. Once you have performed the necessary operations, it needs to be closed so that the changes are saved properly.</p>
<p>Hence, in Python, a file operation takes place in the following order:</p>
<ol>
<li>Open a file</li>
<li>Perform your operation</li>
<li>Close the file</li>
</ol>
<p><strong>To open a file in Python, use Python's inbuilt function <code>open()</code> and specify the mode, which represents the purpose for opening the file.</strong></p>
<p>Below is an example of a program that opens a csv file in <strong>reading mode</strong></p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">for</span> data <span class="hljs-keyword">in</span> open(<span class="hljs-string">'test.csv'</span>, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>):
  print(data)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">name, age
Pylenin, 2
Hello, 4
</code></pre>
<p><strong>Note</strong> - The reading mode is also the default mode. So if you don't specify any mode, Python treats it as a reading mode.</p>
<p>Below is the table showing the other useful modes.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Mode</td><td>Description</td></tr>
</thead>
<tbody>
<tr>
<td>r</td><td>Opens a file in reading mode.</td></tr>
<tr>
<td>w</td><td>Opens a file for writing. Creates a new file if it does not exist or truncates the file if it exists.</td></tr>
<tr>
<td>a</td><td>Opens the file in append mode. The file pointer exists at the end of the previously written file if exists any. It creates a new file if no file exists with the same name.</td></tr>
<tr>
<td>b</td><td>Opens a file in binary mode.</td></tr>
<tr>
<td>r+</td><td>Opens a file for both reading and writing.</td></tr>
</tbody>
</table>
</div><p><br /></p>
<h4 id="heading-do-you-need-to-use-encoding-with-open-method">Do you need to use encoding with open() method?</h4>
<p>Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding. <strong>Because UTF-8 is the modern de-facto standard, <code>encoding='utf-8'</code> is recommended unless you know that you need to use a different encoding. </strong></p>
<p>Appending a <code>b</code> to the mode opens the file in binary mode. Binary mode data is read and written as bytes objects. <strong>You can not specify encoding when opening file in binary mode in Python.</strong></p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">for data in open('test.csv','rb'): #binary mode
  print(data)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">b<span class="hljs-string">'name, age\n'</span>
b<span class="hljs-string">'Pylenin, 2\n'</span>
b<span class="hljs-string">'Hello, 4'</span>
</code></pre>
<hr />

<h2 id="heading-how-to-write-to-a-file-in-python">How to write to a file in Python?</h2>
<p>To only write(not append) into a file in Python, you have to open the file in write <code>w</code> mode.</p>
<p>Be careful with the <code>w</code> mode, as it overwrites the file if it already exists and all the previous data is erased.</p>
<p>If the file doesn't exist, a new file is created.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">with open("my_essay.txt",'w',encoding = 'utf-8') as f:
   f.write("My name is Pylenin.")
   f.write("I am a Modern Data Architect.")
   f.write("Follow me on Twitter @pylenin.")
</code></pre>
<p>The above code will create a file called <code>my_essay.txt</code> with the lines written on the same line.</p>
<p><strong>my_essay.txt</strong></p>
<pre><code class="lang-my_essay.txt">My name is Pylenin.I am a Modern Data Architect.Follow me on Twitter @pylenin.
</code></pre>
<p>To write each line to a new line, use a <code>\n</code> at the end of each line.</p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3">with open("my_essay.txt",'w',encoding = 'utf-8') as f:
   f.write("My name is Pylenin.\n")
   f.write("I am a Modern Data Architect.\n")
   f.write("Follow me on Twitter @pylenin.\n")
</code></pre>
<p>The above code will create a file called <code>my_essay.txt</code> with the lines written on the same line.</p>
<p><strong>my_essay.txt</strong></p>
<pre><code class="lang-my_essay.txt">My name is Pylenin.
I am a Modern Data Architect.
Follow me on Twitter @pylenin.
</code></pre>
<hr />

<h2 id="heading-how-to-append-to-a-file-in-python">How to append to a file in Python?</h2>
<p>To add more data to an existing file without overwriting, use the append <code>a</code> mode. If the file doesn't exist, it creates a new file.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">with open("my_essay.txt",'a',encoding = 'utf-8') as f:
   f.write("My new website is www.100daysofdata.com.\n")
</code></pre>
<p>The above code will create a new line in <code>my_essay.txt</code> file.</p>
<p><strong>my_essay.txt</strong></p>
<pre><code class="lang-bash">My name is Pylenin.
I am a Modern Data Architect.
Follow me on Twitter @pylenin.
My new website is www.100daysofdata.com.
</code></pre>
<hr />

<h2 id="heading-how-to-close-a-file-in-python">How to close a file in Python?</h2>
<p>When you are done with performing operations on the file, you need to properly close the file. To close a file in Python, use the <code>close()</code> method. </p>
<p>Even if you don't close the file explicitly, Python has a garbage collector which cleans up unreferenced objects.  </p>
<p>However, this is not good practice! Closing a file, frees up the resources that are tied with the file.</p>
<p><strong>Example 1</strong></p>
<pre><code class="lang-python3">try:
   f = open("my_essay.txt",'a',encoding = 'utf-8')
   # perform file operations
finally:
   f.close()
</code></pre>
<p>Notice the use of <a target="_blank" href="https://www.pylenin.com/blogs/python-error-handling/">try, except and finally</a> blocks to handle opening and closing the files. This allows you to handle any exception that might arise while working with the file. </p>
<p>To learn more about handling exceptions in Python, check out <a target="_blank" href="https://www.pylenin.com/tags/exceptions/">these list of articles</a>.</p>
<h3 id="heading-with-statement-in-python">with statement in Python</h3>
<p><strong>If you are using the with statement while opening files, you don't need to use the <code>close()</code> method.</strong></p>
<p><strong>with statement</strong> in Python is used in exception handling. By using the with statement, Python ensures that file is automatically closed and resources released, once the statement is done executing.</p>
<p><strong>Example 2</strong></p>
<pre><code class="lang-python3"> with open("my_essay.txt",'a',encoding = 'utf-8') as f:
     # perform file operations
</code></pre>
<hr />

<h2 id="heading-how-to-read-and-write-to-a-file-at-the-same-time">How to read and write to a file at the same time?</h2>
<p>In order to perform simultaneous read/write operations, use the <code>r+</code> mode.</p>
<p><strong>Example</strong></p>
<pre><code class="lang-python3">with open("my_essay.txt",'r+',encoding = 'utf-8') as f:
   lines = f.read()
   print(lines)
   f.write("Hello Data community!\n")


with open("my_essay.txt",'r+',encoding = 'utf-8') as new_data:
  print(new_data.read())
</code></pre>
<p>In the above example, we are using the <code>r+</code> mode to read from <code>my_essays.txt</code> file and simultaneously writing a new line to it.</p>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">My name is Pylenin.
I am a Modern Data Architect.
Follow me on Twitter @pylenin.
My new website is www.100daysofdata.com.

My name is Pylenin.
I am a Modern Data Architect.
Follow me on Twitter @pylenin.
My new website is www.100daysofdata.com.
Hello Data community!
</code></pre>
<hr />

<h2 id="heading-python-file-methods-with-examples">Python File Methods (with examples)</h2>
<p><br /></p>
<h3 id="heading-readline">readline()</h3>
<p>This method reads the current line till it encounters a newline character.</p>
<p><strong>Example</strong></p>
<pre><code class="lang-python3">with open("my_essay.txt",'r',encoding = 'utf-8') as f:
   lines = f.readline()
   print(lines)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">My name is Pylenin.
</code></pre>
<p><br /></p>
<h3 id="heading-readlines">readlines()</h3>
<p>The method reads all the lines and returns a list object.</p>
<p><strong>Example</strong></p>
<pre><code class="lang-python3">with open("my_essay.txt",'r',encoding = 'utf-8') as f:
   lines = f.readlines()
   print(lines)
</code></pre>
<p><strong>Output</strong></p>
<pre><code class="lang-bash">[<span class="hljs-string">'My name is Pylenin.\n'</span>, <span class="hljs-string">'I am a Modern Data Architect.\n'</span>, <span class="hljs-string">'Follow me on Twitter @pylenin.\n'</span>, <span class="hljs-string">'My new website is www.100daysofdata.com.\n'</span>]
</code></pre>
<p><br /></p>
<h3 id="heading-writelines">writelines()</h3>
<p>The method writes a list of items to a file.</p>
<p><strong>Example</strong></p>
<pre><code class="lang-python3">lines = ['My name is Pylenin.\n', 'I am a Modern Data Architect.\n', 'Follow me on Twitter @pylenin.\n', 'My new website is www.100daysofdata.com.\n']

with open("my_essay.txt",'w',encoding = 'utf-8') as f:
   f.writelines(lines)
</code></pre>
<p><strong>my_essay.txt</strong></p>
<pre><code class="lang-bash">My name is Pylenin.
I am a Modern Data Architect.
Follow me on Twitter @pylenin.
My new website is www.100daysofdata.com.
</code></pre>
<p><br /></p>
<h3 id="heading-seek-and-read">seek() and read()</h3>
<p>These are two interesting methods in Python. The <code>seek(position)</code> method brings the cursor to the specified position. The <code>read(size)</code> method reads from the cursor position till the specified size. If no arguments are passed, it reads till the end of file.</p>
<p><strong>Example</strong></p>
<pre><code class="lang-python3">with open("my_essay.txt",'r',encoding = 'utf-8') as f:
   f.seek(10)
   data = f.read()
   print(data)
</code></pre>
<p>The above code, moves the cursor to the 10th position. From there when you call the <code>read()</code> method, you are only able to see partial data.</p>
<p><strong>Output</strong></p>
<pre><code class="lang-bash"> Pylenin.
I am a Modern Data Architect.
Follow me on Twitter @pylenin.
My new website is www.100daysofdata.com.
</code></pre>
<p>For any doubts and queries, use the comments section or tweet to me <a target="_blank" href="https://twitter.com/pylenin">@pylenin</a>.</p>
]]></content:encoded></item></channel></rss>