100 days of Data

10 data analytics dashboards with Matplotlib

Lenin Mishra — Mon, 11 Jul 2022 05:43:35 GMT

If your data doesn't provide your business actionable insights, it's useless!

Anyone can show numbers and statistics on a graph. But what significance do these number have for the business? What makes those numbers interesting, is a relevant story behind it. Every aspiring Data Analyst or a Business Intelligence Developer needs to learn the art of story telling.

https://twitter.com/pylenin/status/1546367353679536134

In this article, we will focus on 10 commonly used visualizations or plots using Matplotlib in Python. These plots are not mere graphs! Each plot, tells a story about a real-life scenario and corresponds to common dashboards used by Data Analysts and Management team in various companies to take actionable insights.

10 Data Analytics dashboard examples with Matplotlib

The 10 plots and problem scenarios discussed in this article are:-

Line Chart - How has the news paradigm in India shifted over the last 10 years?
Stacked Area Chart - What is the total sales generated by an MNC across all its market during last year?
Bar Chart - What is the YoY(Year-on-Year) monthly sales comparison for a company?
Pie Chart - What was the approval % of a bill introduced in the winter session of Parliament?
Scatter Plot - How does the rent of a house vary with the house size?
Bubble chart - How deadly(fatality rate) and widespread(number of fatalities) is a particular disease?
Candlestick - How has Nifty 50 performed on the National Stock Exchange in the month of October?
Timeseries - What is the distribution of "Close" value of Nifty 50 for the last 1 year?
Histograms - What is the gender-wise distribution of students' height in a school?
Heatmap - What is the Monthly Recurring Revenue(MRR) retention of a company?

Attention!This article is only intended to show readers different concepts and tricks to plot useful graphs in Python using the Matplotlib library. The data shown in the following graphs is unreal and is not intended to depict the truth on the ground.

Line chart

A line chart displays information as a series of data points connected by a straight line. It allows you to track changes in the value of an entity over time.

Line charts are useful to show trends of how a certain thing changes over a period. The below example uses line charts to show how the primary source of news has changed among Indians over the last decade.

Important points

No y axis labels are shown in the graph - Use the set_visible() function.
The first and last data point for every news medium is shown - Use plt.text() function.

Check out the code below to build this line chart.

import matplotlib.pyplot as pltfig, ax = plt.subplots(figsize=(5, 4),                    constrained_layout=True)# Sets y-axis visibility to False              ax.yaxis.set_visible(False)xData = [  [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021],  [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021],  [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021],  [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]]yData = [  [74, 82, 80, 74, 73, 72, 74, 70, 70, 66, 66],  [45, 42, 50, 46, 36, 36, 34, 35, 32, 31, 31],  [13, 14, 20, 24, 20, 24, 24, 40, 35, 41, 43],  [18, 21, 18, 21, 16, 14, 13, 18, 17, 16, 19]]labels = ['Television', 'Newspaper', 'Internet', 'Radio']colors = ['#434343', '#737373', '#3182bd', '#bdbdbd']font_style = dict(size=12, color='black')for data in zip(xData, yData, labels, colors):    ax.plot(data[0],             data[1],             label=data[2],             color=data[3],             linewidth=3)    # Annotate first and last data point            ax.text(data[0][0] - 0.3,             data[1][0],             str(data[1][0])+'%',             **font_style)    ax.text(data[0][-1],             data[1][-1],             str(data[1][-1])+'%',             **font_style)plt.legend(fontsize='x-large')plt.title('Source of news in India for last 10 years',           fontsize='x-large')plt.ylabel('% of respondents', fontsize='x-large')plt.show()

Stacked Area Chart

A stacked area chart displays the change in KPI for different of a dataset. Each group is displayed on top of each other, making it easy to deduce not only the total value, but also the contribution of each group.

For example, an important analysis could be measuring and comparing a company's sales across all its marketing countries. In such scenarios, having a grid layout could be useful in figuring out the approximate sales numbers for each country.

To reproduce the above graph or create a similar graph, use the code below.

import numpy as npimport matplotlib.pyplot as plt# Create datamonths = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']india_sales = [1, 4, 6, 8, 9, 7, 8, 5, 9, 11, 12, 13]uk_sales = [2, 2, 7, 10, 12, 4, 8, 8, 10, 12, 14, 10]usa_sales = [2, 8, 5, 10, 6, 10, 12, 7, 9, 8, 10, 13]COLORS = ["#74A9CF", "#2B8CBE", "#045A8D"]# Basic stacked area chart.plt.stackplot(months, india_sales, uk_sales, usa_sales, colors=COLORS, labels=['India','UK','USA'])plt.legend(loc='upper left', fontsize='x-large')plt.grid(True)plt.xlabel('Month 2020')plt.ylabel('Sales(Million $)')plt.title('Sales of an MNC in 3 countries')plt.show()

Bar chart

A bar chart shows the relationship between a numerical and a categorical variable. The categorical variable is represented as a bar. The size of the bar represents its numerical value.

The below example uses bar charts to compare Year-on-Year monthly sales of a company.

You can recreate the above graph using the code below. Pay attention to how the width of the bars are fixed in the code and the x-axis labels are aligned to the centre.

import matplotlib.pyplot as pltimport numpy as npmonths = ['Jan', 'Feb', 'Mar', 'Apr',           'May', 'Jun', 'Jul', 'Aug',           'Sep', 'Oct', 'Nov', 'Dec']sales_2020 = [19, 14, 22, 14, 16, 19, 15, 14, 10, 12, 12, 16]sales_2021 = [20, 14, 25, 16, 18, 22, 19, 15, 12, 16, 14, 17]x = np.arange(len(months))  # the label locationswidth = 0.35  # the width of the barsfig, ax = plt.subplots(figsize=(5, 4), constrained_layout=True)font_style = dict(size=12, color='black')rects1 = ax.bar(x - width/2, sales_2021, width,           label='Sales 2021', color='#3182BD')rects2 = ax.bar(x + width/2, sales_2020, width,           label='Sales 2020', color='#CCCCCC')ax.set_xticks(x)ax.set_xticklabels(months, fontsize='x-large')fig.tight_layout()plt.legend(fontsize='x-large')plt.title('YOY monthly sales comparison', fontsize='x-large')plt.ylabel('Total Sales(Million $)', fontsize='x-large')plt.show()

Pie chart

A Pie Chart is a circle divided into categorical variables, each representing their value as a numeric percentage of the whole. Although they are not the best plotting choice if you want to know the actual percentages of each entity, especially when you are plotting a lot of entities, they do provide a general understanding of the each entity's contribution to the whole.

The below graph provides an analysis of the approval % of a new bill introduced in the winter session of parliament.

Notice how the approval share has been exploded out for better clarity. To reproduce the above pie chart or create a similar plot, use the code below.

import matplotlib.pyplot as pltimport numpy as np# pie chart parametersratios = [.27, .56, .17]labels = ['Approve', 'Disapprove', 'Undecided']explode = [0.1, 0, 0]# rotate so that first wedge is split by the x-axisangle = -180 * ratios[0]plt.pie(ratios, autopct='%1.1f%%', startangle=angle,        labels=labels, explode=explode)plt.title("Bill Approval Stats for Parliament Winter session")plt.show()

Scatter Plot

A scatter plot shows the relationship between 2 numerical variables. You can use any kind of marker to create a scatter plot.

The below graph shows the relationship between size of a house and it's rent.

You can also see a straight fit line (through linear regression). You can use numpy.polyfit() function to draw the regression line. Use the code below to recreate the above plot.

import matplotlib.pyplot as pltimport numpy as np#random number generatorseed = np.random.default_rng(1234)x = seed.uniform(0, 10, size=100)y = x + seed.normal(size=100)# Initialize layoutfig, ax = plt.subplots(figsize = (9, 9))# Add scatterplot# Use the marker parameter to choose# an appropriate markerax.scatter(x, y, s=60, alpha=0.7, edgecolors="k")# Fit linear regression via least squares with numpy.polyfit# It returns a slope (b) and intercept (a)# deg=1 means linear fitb, a = np.polyfit(x, y, deg=1)# Create sequence of 100 numbers from 0 to 100 xseq = np.linspace(0, 10, num=100)# Plot regression lineax.plot(xseq, a + b * xseq, color="k", lw=2.5)plt.title('Rent variation with number of rooms', fontsize='x-large')plt.ylabel('Rent(k)', fontsize='x-large')plt.xlabel('Number of rooms', fontsize='x-large')plt.show()

Bubble chart

A bubble chart is kind of a scatter plot. Based on a third numerical variable, the size of each bubble is determined. This shows the weight of that particular variable in the dataset.

The plot below compares the fatality rate(deadliness) vs the total number of fatalities for different diseases.

Above data is borrowed from Microbescope. Data Chef is not responsible for the authenticity of this data.

To recreate the above plot, use the code below.

import matplotlib.pyplot as pltimport numpy as np# Diseases with their case fatality rates# (Disease Name, Fatality Rate, Total Fatalities)bacterial_diseases = [ ('Diphtheria', 7.5, 2600),                        ('Meningitis', 45, 127000),                        ('Syphilis', 33, 79000),                       ('MRSA', 20, 11000) ]viral_diseases = [ ('Ebola', 50, 4555),                    ('Bird Flu', 58, 20),                    ('Dengue Fever', 22, 47000),                    ('Hepatitis A', 1, 5200) ]parasite_diseases = [ ('Sleeping Sickness', 40, 2300),                       ('Malaria', 1.5, 150000) ]fig, ax = plt.subplots(figsize=(10,8))ax.scatter(        [x[1] for x in bacterial_diseases],         [x[2] for x in bacterial_diseases],         label='Bacteria',        s=[x[1]*500 for x in bacterial_diseases],         color='#7570B3', alpha=0.7    )ax.scatter(        [x[1] for x in viral_diseases],         [x[2] for x in viral_diseases],         label='Virus',        s=[x[1]*500 for x in viral_diseases],         color='#1B9E77', alpha=0.7    )ax.scatter(        [x[1] for x in parasite_diseases],         [x[2] for x in parasite_diseases],        s=[x[1]*500 for x in parasite_diseases],        label='Bacteria', color='#D95F02', alpha=0.7     )all_diseases = bacterial_diseases + viral_diseases + parasite_diseasesfor data in all_diseases:    disease, x, y = data    plt.annotate(disease, (x, y))ax.ticklabel_format(useOffset=False, style='plain', axis='y')lgnd = plt.legend(loc="right", fontsize=10)#change the marker size manually for both lineslgnd.legendHandles[0]._sizes = [30]lgnd.legendHandles[1]._sizes = [30]lgnd.legendHandles[2]._sizes = [30]plt.title('Fatalities vs Fatality Rate for diseases', fontsize='x-large')plt.ylabel('Total Fatalities', fontsize='x-large')plt.xlabel('Fatality Rate', fontsize='x-large')plt.show()

Candlestick

A candlestick is similar to a box plot. A candlestick shows the market's open, high, low, and close price for the day.

The below plot shows the daily statistics of NIFTY50 index for the month of October.

The data has been downloaded from NSE website and stored as october_2021_nse.csv.

To recreate the above graph or a similar graph, use the code below.

import matplotlib.pyplot as pltfrom mplfinance.original_flavor import candlestick_ohlcimport pandas as pdimport matplotlib.dates as mpl_datesplt.style.use('ggplot')# Extracting Data for plottingdata = pd.read_csv('october_2021_nse.csv')ohlc = data.loc[:, ['Date', 'Open', 'High', 'Low', 'Close']]ohlc['Date'] = pd.to_datetime(ohlc['Date'])ohlc['Date'] = ohlc['Date'].apply(mpl_dates.date2num)ohlc = ohlc.astype(float)# Creating Subplotsfig, ax = plt.subplots()candlestick_ohlc(ax, ohlc.values, width=0.6, colorup='green', colordown='red', alpha=0.8)# Setting labels & titlesax.set_xlabel('Date')ax.set_ylabel('Price')fig.suptitle('October 2021 Candlestick Chart of NIFTY50')# Formatting Datedate_format = mpl_dates.DateFormatter('%d-%m-%Y')ax.xaxis.set_major_formatter(date_format)fig.autofmt_xdate()fig.tight_layout()plt.show()

Timeseries

Timeseries charts represent the evolution of a numeric value over time. They are used in the field of statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting etc.

The below plot shows the Close price of NIFTY50 index over the last one year period.

To create the above timeseries plot, use the code below.

import matplotlib.pyplot as pltimport pandas as pdimport matplotlib.dates as mpl_datesplt.style.use('ggplot')# Extracting Data for plottingdata = pd.read_csv('last_year_nse.csv')data["Date"] = pd.to_datetime(data["Date"])date = data["Date"]value = data["Close"]fig, ax = plt.subplots(figsize=(8, 6))ax.plot(date, value)plt.title("NSE Close price for the past 1 year", fontsize="x-large")plt.xlabel("Date", fontsize="x-large")plt.ylabel("Close Price on NSE(INR)", fontsize="x-large")plt.show()

Histograms

Histogram shows the frequency distribution of any given variable. For example - distribution of height of students in a school. The values are split into bins. Each bin is represented as a bar.

The below histogram plot goes further to compare the height distribution between male and female students of a school.

To recreate the above plot or to create a similar one, use the code below.

import numpy as npimport matplotlib.pyplot as plttotal_samples = 300#female datasetmuf, sigmaf = 155, 4xf = np.random.normal(muf, sigmaf, total_samples).astype(int)# the histogram of the datan, bins, patches = plt.hist(xf, 20, facecolor='#ff6466', alpha=0.75, label='Female')#male datasetmum, sigmam = 168, 6xm = np.random.normal(mum, sigmam, total_samples).astype(int)# the histogram of the datan, bins, patches = plt.hist(xm, 20, facecolor='#64c866', alpha=0.75, label='Male')plt.xlabel('Height(cm)')plt.ylabel('Number of students')plt.title('Distribution of student heights of a school')plt.grid(True)plt.legend()plt.show()

Heatmap

A heatmap is a graphical representation of data where each value of a matrix is represented as a color. It shows magnitude of a KPI as color in two dimensions.

For example, the below heatmap shows the cohort analysis of Stripe Monthly Recurring Revenue(MRR) for a company. Each square represents the revenue retained in successive months from the starting month mentioned on the y axis.

To recreate a similar plot, use the code below.

import numpy as npimport matplotlibimport matplotlib.pyplot as pltdates = ["Jan 2021", "Feb 2021", "Mar 2021", "Apr 2021",              "May 2021", "June 2021", "July 2021"]month_num = ["Month 1", "Month 2", "Month 3", "Month 4", "Month 5", "Month 6", "Month 7"]stripe_cohort = np.array([[0.8, 0.76, 0.7, 0.5, 0.45, 0.43, 0.4],                    [0.95, 0.9, 0.87, 0.83, 0.79, 0.78, 0.0],                    [1, 0.93, 0.92, 0.9, 0.83, 0.0, 0.0],                    [0.82, 0.77, 0.76, 0.7, 0.0, 0.0 , 0.0],                    [0.93, 0.9, 0.87, 0.0, 0.0, 0.0, 0.0],                    [0.9, 0.88, 0.0, 0.0, 0.0, 0.0, 0.0],                    [0.95, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]])fig, ax = plt.subplots()im = ax.imshow(stripe_cohort, cmap="YlGnBu")#Show all ticksax.set_xticks(np.arange(len(month_num)))ax.set_yticks(np.arange(len(dates)))#Show all labelsax.set_xticklabels(month_num)ax.set_yticklabels(dates)# Rotate the tick labels and set their alignment.plt.setp(ax.get_xticklabels(), rotation=45, ha="right",         rotation_mode="anchor")# Loop over data dimensions and create text annotations.for i in range(len(dates)):    for j in range(len(month_num)):        text = ax.text(j, i, stripe_cohort[i, j],                       ha="center", va="center", color="w")ax.set_title("Cohort Analysis of a company's revenue")fig.tight_layout()plt.show()

Conclusion

I hope this article has helped improve your plotting skills in Matplotlib. If you want to start out as Business Intelligence Specialist or a Data Analyst, having such visualization skills will help you a lot in progressing your career.

Remember - Numbers are boring. What makes them interesting is the story behind.

How to invoke a Lambda function using an S3 event notification trigger?

Lenin Mishra — Thu, 23 Jun 2022 06:46:04 GMT

In this article, we will learn to invoke a lambda function using an AWS Simple Storage Service(S3) event notification trigger.

To follow along this article, you need to have an AWS account and some knowledge about the Python programming language.

You should also have a basic understanding of AWS Lambda and how it works. Check out this visual guide on 100 days of data.

https://www.100daysofdata.com/aws-lambda-for-beginners

You don't have to reinvent the wheel, unless it is for an educational purpose! AWS Lambda comes with s3-get-object-python blueprint lambda function that already has the sample code and function configuration presets for a certain runtime.

Note - This blueprint's permission are set to allow you to get objects from the S3 bucket. It doesn't let you write to the S3 bucket.

Step 1 - Create an S3 bucket

Open the Amazon S3 console and choose Create bucket.
Enter a unique and a descriptive name for your bucket. For example - nse50, a bucket to store the top 50 performing stocks from National Stock Exchange.
Next, you have to choose an AWS region. Note - Your Lambda function should be created in the same Region.
Choose Create bucket.

After creating the bucket, Amazon S3 opens the Buckets page, which displays a list of all buckets in your account in the current Region.

To upload a test object using the Amazon S3 console

On the Buckets page of the Amazon S3 console, choose the name of the bucket that you created.
On the Objects tab, choose Upload.
Drag a test file from your local machine to the Upload page.
Choose Upload.

Step 2 - Create a Lambda function

To create a Lambda function from a blueprint in the console

Go to the Lambda Functions page and Choose Create function.
On the Create function page, choose Use a blueprint.
Choose s3-get-object-python for a Python function or s3-get-object for a Node.js function. Choose Configure.
Enter a function name of your choice. For example - s3_audit_function.
For Execution role, choose Create a new role from AWS policy templates and enter a role name of your choice. For example - s3_audit_role.
Under S3 trigger, choose the S3 bucket that you created previously.
When you configure an S3 trigger using the Lambda console, the console modifies your function's resource-based policy to allow Amazon S3 to invoke the function.
Choose Create function.

Open the above image in new tab for better viewing experience.

Pay attention to Step 5 of the above image. This shows that this lambda function has only read-only permissions. So you can read from S3, but you can not write to it.

Step 3 - Testing the function

As mentioned earlier, this blueprint comes with its own sample code.

Before putting your code into production, you need to test your code. AWS Lambda lets you configure different types of test event from different services to help you in testing your code.

On the Code tab, under Code source, choose the drop down arrow next to Test, and then choose Configure test events from the dropdown list.

In the Configure test event window, do the following:

Choose Create new test event.
From Event template, choose Amazon S3 Put (s3-put). This is similar to the event triggered in S3 when you upload a file.
For Event name, enter a name for the test event.
In the test event JSON, replace the S3 bucket name and object key with your bucket name and test file name. Your test event should look similar to the following:

{"Records": [ {   "eventVersion": "2.0",   "eventSource": "aws:s3",   "awsRegion": "us-west-2",   "eventTime": "1970-01-01T00:00:00.000Z",   "eventName": "ObjectCreated:Put",   "userIdentity": {     "principalId": "EXAMPLE"   },   "requestParameters": {     "sourceIPAddress": "127.0.0.1"   },   "responseElements": {     "x-amz-request-id": "EXAMPLE123456789",     "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH"   },   "s3": {     "s3SchemaVersion": "1.0",     "configurationId": "testConfigRule",     "bucket": {       "name": "nse50", # Replace your bucket name       "ownerIdentity": {         "principalId": "EXAMPLE"       },       "arn": "arn:aws:s3:::example-bucket"     },     "object": {       "key": "HappyFace.jpg",# Replace the name of file       "size": 1024,       "eTag": "0123456789abcdef0123456789abcdef",       "sequencer": "0A1B2C3D4E5F678901"     }   } }]}

Choose Create.

To invoke the function with your test event, under Code source, choose Test. The Execution results tab displays the response, function logs, and request ID, similar to the following:

Test Event Nameput-eventResponse"image/jpeg"Function LogsSTART RequestId: ca820e7b-0e24-465a-97be-edce7f43ace2 Version: $LATESTCONTENT TYPE: image/jpegEND RequestId: ca820e7b-0e24-465a-97be-edce7f43ace2REPORT RequestId: ca820e7b-0e24-465a-97be-edce7f43ace2    Duration: 141.62 ms    Billed Duration: 142 ms    Memory Size: 128 MB    Max Memory Used: 74 MBRequest IDca820e7b-0e24-465a-97be-edce7f43ace2

Final result

Now, your lambda function will be invoked every time you upload a new file to your bucket.

On the Buckets page of the Amazon S3 console, choose the name of the source bucket that you created earlier.
On the Upload page, upload any file of your choice to the bucket.
Open the Functions page on the Lambda console.
Choose the name of your function (my-s3-function).
To verify that the function ran once for each file that you uploaded, choose the Monitor tab. This page shows graphs for the metrics that Lambda sends to AWS CloudWatch. The count in the Invocations graph should match the number of files that you uploaded to the Amazon S3 bucket.

A visual guide to AWS Lambda for beginners

Lenin Mishra — Sun, 19 Jun 2022 15:10:32 GMT

Amazon Web Services deserve all the credit for beginning the age of serverless computing with the launch of AWS Lambda in 2014. Since its launch, Lambda functions have found innumerable uses in the field of File Processing, ETL, Data Analytics, deployment of web and mobile applications, and much more.

In this blog, you will learn to create and fire your first Lambda function. Conceptually, you will also learn about the basic configurations, the importance of event and context in a lambda function, and experience the AWS Lambda dashboard to not feel intimidated by it. All you need to have is an account on Amazon Web Services.

Benefits of using AWS Lambda

No servers to manage.
Better software development by decoupling architecture from code.
1 million free requests per month.
Testing feature that allows for code validation before putting it in production.

Your first Lambda function

Once you have logged in to AWS, this is what your AWS console should look like.

If you have used AWS Lambda before, it should appear in your Recently visited services else, just extend the All services dropdown and click on Lambda. This should take you to the AWS Lambda Console.

Once you are in the AWS Lambda console, click on the Create function button.

On the Create function page, you have to fill in some basic information about your Lambda function. For example:-

Starting point for your function - For this article, we will create a function from scratch.
Name of your function - Remember to use only letters, numbers, hyphens, or underscores with no spaces. Else you might see an error.
Runtime - Which programming language you want to use to write your Lambda function? For this article, we will use Python3.8.

You will also see a few other things like Permissions, Change default execution role, and Advanced settings. Don't bother yourself with these settings for now. We will learn about them in future articles and videos.

Now press on Create Function. Within a few minutes, you should be redirected to your newly created Lambda function console.

Now scroll down the code editor section.

Carefully notice the name of the .py file and the name of the python function defined in it.

Filename - lambda_function.py

Python Function name - lambda_handler

This is the handler method that is invoked by AWS Lambda, every time it is triggered. You can confirm this by scrolling further down to the Runtime settings section.

So if you were to change the name of the function or create a different file to hold your function, you need to update them in the Runtime settings.

As you can see, there is already a placeholder code in our lambda_function.py inside the lambda_handler function. Let's just try and run it as it is. Press on the Test button.

You will be asked to Configure test event. A test event is a way of mimicking a real event that will act as an input to your Lambda function in the form of a JSON payload. The test events are a great way to validate your Lambda function, before putting it on production. You can choose any event you want to mimic from the Event template dropdown list. In this article, we will use the simple hello-world template.

Replace all the key-value pairs in the JSON payload with some meaningful data. I am using my information in the payload. Then press on Create.

Once the test event is created, go ahead and press on Test again in the Lambda console. You should get a similar execution result.

Congratulations! You have deployed your first Lambda function. It may not look like much, but believe me, it is the first successful step into the world of Serverless!

Significance of event and context in Lambda function

Let us inspect the execution result closely.

You can see the Test event name, Response, Function logs, and Request ID. Each of these metrics provides some information about your lambda function.

Look at the Response metric. It is basically the JSON payload that the Lambda function returns.

Response{  "statusCode": 200,  "body": "\"Hello from Lambda!\""}

But it is a very generic response code. Let's go ahead and personalize it. If you remember, we had changed the values of our JSON payload in our configured test event. Let's try to use that test event. How do you think, we can access that event?

The answer lies in the arguments passed into your lambda_handler function - event and context. Let us understand what they are.

When your Lambda function is invoked, event and context arguments are passed into the function. The event object contains the JSON payload that is to be processed by the function. Hence, the payload we created in our test event, is present in the event object. When passed into the function, the event is converted to a native data type - usually a dictionary(if you are using Python). However, it could also be list, str, int, float or the NoneType type.

The second argument is context. This object provides methods and properties that provide information about the invocation, function, and runtime environment.

You can verify these two arguments by printing them inside the function.

import jsondef lambda_handler(event, context):    # Newly added print statements    print(event)    print(dir(context))    return {        'statusCode': 200,        'body': json.dumps('Hello from Lambda!')    }

Since you have changed the contents of the file, you need to save it. Press on Deploy to save the changes in the file. Click on the Test button.

You should get a similar execution result.

As you can see, the event object is returning the key-value pairs that we had mentioned in our test event. For the context object, we can use the dir() function to return all the properties and functions of the object.

Let us now use the event object to personalize the result of our Lambda function. Change the code inside the lambda_function.py to the below-mentioned code.

def lambda_handler(event, context):    name = event.get("name", None)    twitter_handle = event.get("twitter", None)    return f"Welcome to {name}. "\           f"Join us on Twitter - {twitter_handle}"

Click on Deploy to deploy the changes and then click on Test. You should see a similar execution result.

The Response metric now contains the newly returned statement using the data passed in the test event.

Hope you enjoyed this visual guide towards understanding and taking your first step towards understanding AWS Lambda. In future articles, we will slowly learn other features of AWS Lambda and perform some real-world applications to show you its advantages.

Our objective is to present important Data Engineering and Machine Learning concepts in a lucid format. For this purpose, your feedback is important! Do share what you like and dislike about the articles in the comment section.

Connecting Python to Google Sheets

Lenin Mishra — Thu, 02 Jun 2022 14:27:33 GMT

Google Sheets

Google Sheets is a free web-based spreadsheet application provided by Google that allows users to create, manage and format spreadsheets online. It also allows users to collaborate with other users.

Why connect Python to Google Sheets?

As the volume of data increases, Python proves much more powerful and practical to perform data analysis and machine learning on your Google sheet data to provide actionable intelligence. It does that through the help of powerful libraries and huge online support group.

It is easy to setup and perform data analysis in Google Sheets. However, as the volume of data increases, you want more computational flexibility and power.

Today, Python is the go-to programming language for data analysis and machine learning, thanks to its third party libraries like Pandas, Numpy, Scikit learn and Matplotlib. The presence of an active online support group also helps.

It is easier to draw the power of those libraries and build actionable business intelligence around your data in Python.

How to connect Python to Google Sheets ?

You can connect Python to Google Sheets by creating a service account in Google Cloud Console which allows you to make authorized API calls to the Google Sheets API.

Follow the steps below.

Create a new project in Google Cloud Console

Log in to Google Cloud Console in your browser.
Click on the Menu > IAM & Admin > Create a Project.
Provide a Project name and click Create.

Enable Google drive API

Click on Menu > APIs & Services > Enabled APIs & Services.
Click on + Enable APIS AND SERVICES button in the top middle of the page.
Search for the Google Drive API and click on it.
Enable the Google Drive API.
Search for the Google Sheets API and enable it.

Once you enable Google Drive and Google Sheets API, you will be redirected to its page. To start using this API, you have to create credentials.

Create a Service account

Click on Create Credentials on the Google Drive API page.
On the Create Credentials page, fill in the necessary details and click on Done. You will be directed to Service accounts page.
Provide a Service account name and description. Click on CREATE AND CONTINUE.
You will receive an email on the screen. Copy that email for later use.
On the same page, click on Keys tab. Add Key > Create a new key. Click on JSON to create a private key.

Once the private key is downloaded, rename it to credentials.json for use later to perform OAuth2 authentication with Google APIs.

Create a google sheet to experiment with. Here I have created a To do list.

Click on Share button and share the spreadsheet with the client email you have saved from Step 4 of previous section.

You can use the gspread third party library to interact with the sheet.

Reading and writing to Google Sheet with Python

Install the necessary libraries.

pip install gspreadpip install --upgrade google-api-python-client oauth2client

Create a new Python file. Import the following libraries into the file.

import gspreadimport pandas as pdfrom oauth2client.service_account import ServiceAccountCredentials

Copy the credentials.json into the same directory as your file and perform authorization.

# defining the scope of the applicationscope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive'] #credentials to the accountcred = ServiceAccountCredentials.from_json_keyfile_name('credentials.json',scope) # authorize the clientsheet client = gspread.authorize(cred)

Read the data from the spreadsheet using gspread get_all_values() method.

# Provide the Google Sheet Idgs1 = client.open_by_key('18ZG9iEJN4c2SRdhWxo3o05ch6_TF4r2-7joAGtOeKG0')ws1 = gs1.sheet1print(ws1.get_all_values())

You should get the all the rows from the sheets.

Output

[['\n To Do', '', '0/3 completed  '], ['', '', ''], ['', 'Date', 'Task'], ['FALSE', '2/6', 'Finish blog on connecting Python to Google Sheets'], ['FALSE', '2/7', 'Go book shopping'], ['FALSE', '2/8', 'Conduct meeting with Jesus']]

You can also write to the Google sheet. For example - The below code create a new worksheet.

# create a new spreadsheetnew_ws = gs1.add_worksheet(title="Trial Worksheet", rows=10, cols=20)

I hope this article provides a clear step-by-step guide to connect to Google Sheets from Python.In the next article, we will leverage the power of Python and perform some data analysis on the spreadsheet data.

Working with JSON data in Python

Lenin Mishra — Sat, 28 May 2022 16:27:03 GMT

What is JSON?

JSON stands for Javascript Object Notation. It is a format for structuring data.

It is one of the most popular formats to be used for exchanging information between servers and browsers.

Below is an example of JSON.

{  "name": "Lenin",  "age": 30,  "twitter": "@pylenin",  "website": "www.100daysofdata.com"}

It looks familiar to Python dictionaries. Data is represented as key-value pairs, where the key and value are separated by a colon :.

However, there is a fundamental difference between JSON and a dictionary. Dictionary is a data type whereas JSON is a data format.

If you want to send the dictionary data over a series of network connection as an HTTP request(see image above), it needs to be converted into series of bytes. This is called Serialization. It helps save the state of the data type to be recreated when needed.

Similarly, if you convert the series of bytes you get as response from the server into a readable format, it is called Deserialization.

JSON is a set of rules used to convert such data types into series of bytes and vice-versa.

Python has a module called json that helps you analyze JSON data.

What is JSON serialization?

As explained above, Serialization is the process of encoding naive data types to JSON format.

In Python, different data types convert to different object types when converted to JSON.

Python object	JSON equivalent
dict	object
list, tuple	array
str	string
int, float	number
True	true
False	false
None	null

The json module in Python has two methods for serializing Python objects into JSON format.

json.dump() - writes Python data type to a file-like object in JSON format.
json.dumps() - writes Python data to a string in JSON format.

Writing JSON to a file with json.dump()

To write JSON to a file, you can use json.dump() method. You have to pass in two arguments to the method - the data you want to serialize and the name of the file you are writing into.

Example 1

import jsondata = {  "name": "Lenin Mishra",  "age": 30,  "hobby": ["Biking", "Blogging", "Cooking"],  "websites": [    {    "url": "https://www.pylenin.com",    "Total blogs": "88",    "description": "Everything about Python"    },    {    "url": "https://www.100daysofdata.com",    "Total blogs": "3",    "description": "Everything about Data"              }]}with open('details.json', 'w') as file:  json.dump(data, file)

The above code will transform the dictionary object into a JSON string and write it to a file named details.json.

If details.json doesn't exist, the above code will create a new file with the same name. To learn more about file operations in Python, check out this article.

https://www.100daysofdata.com/python-file-io

Writing Python object to a JSON string with json.dumps()

To convert the same dictionary to just a string representation of JSON, you can use json.dumps() method. Since you are writing to a string in memory, you just have to pass in the python object as an argument.

Example 2

import jsondata = {  "name": "Lenin Mishra",  "age": 30,  "hobby": ["Biking", "Blogging", "Cooking"],  "websites": [    {    "url": "https://www.pylenin.com",    "Total blogs": "88",    "description": "Everything about Python"    },    {    "url": "https://www.100daysofdata.com",    "Total blogs": "3",    "description": "Everything about Data"              }]}data_string_json = json.dumps(data)print(type(data_string_json))print(data_string_json)

Output

'str'>{"name": "Lenin Mishra", "age": 30, "hobby": ["Biking", "Blogging", "Cooking"], "websites": [{"url": "https://www.pylenin.com", "Total blogs": "88", "description": "Everything about Python"}, {"url": "https://www.100daysofdata.com", "Total blogs": "3", "description": "Everything about Data"}]}

How to pretty print JSON in Python?

When you printed out the JSON string in the above example, the output must have looked messy. There are few arguments you can use to make the JSON look prettier!

indent

The indent argument allows us to either print the JSON string or the file to which JSON is outputted, in a more readable manner.

Example 3

import jsondata = {  "name": "Lenin Mishra",  "age": 30,  "hobby": ["Biking", "Blogging", "Cooking"],  "websites": [    {    "url": "https://www.pylenin.com",    "Total blogs": "88",    "description": "Everything about Python"    },    {    "url": "https://www.100daysofdata.com",    "Total blogs": "3",    "description": "Everything about Data"              }]}data_string_json = json.dumps(data, indent=4)print(data_string_json)

Output

{    "name": "Lenin Mishra",    "age": 30,    "hobby": [        "Biking",        "Blogging",        "Cooking"    ],    "websites": [        {            "url": "https://www.pylenin.com",            "Total blogs": "88",            "description": "Everything about Python"        },        {            "url": "https://www.100daysofdata.com",            "Total blogs": "3",            "description": "Everything about Data"        }    ]}

You can pass in different values for the indent argument.

If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level.

sort_keys

If set to True, the sort_keys argument sorts the output JSON according to its keys.

Example 4

import jsondata = {  "name": "Lenin Mishra",  "2022":"hello",  "age": 30,  "hobby": ["Biking", "Blogging", "Cooking"],  "websites": [    {    "url": "https://www.pylenin.com",    "Total blogs": "88",    "description": "Everything about Python"    },    {    "url": "https://www.100daysofdata.com",    "Total blogs": "3",    "description": "Everything about Data"              }]}data_string_json = json.dumps(data, indent=4, sort_keys=True)print(data_string_json)

Output

{    "age": 30,    "hobby": [        "Biking",        "Blogging",        "Cooking"    ],    "name": "Lenin Mishra",    "websites": [        {            "Total blogs": "88",            "description": "Everything about Python",            "url": "https://www.pylenin.com"        },        {            "Total blogs": "3",            "description": "Everything about Data",            "url": "https://www.100daysofdata.com"        }    ]}

As you can see, the keys have been sorted in alphabetical order.

Difference between json.dump() and json.dumps()

If you want to dump the JSON into a file, then you should use json.dump(). If you only need it as a string, then use json.dumps().

Tip to remember - If the method ends with an s, it converts to string.

What is JSON deserialization?

JSON deserialization is the process of decoding JSON data into a native data type in Python. Unless the data is something very simple, these methods will most likely return a Python dictionary or list containing the deserialized data.

The json module has two methods for deserializing JSON.

json.load() - loads JSON data from a file-like object.
json.loads() - loads JSON data from a string containing JSON-encoded data.

Parsing a JSON string in Python using json.loads()

To parse JSON string and convert it to a Python dictionary, use the json.loads() method.

Example 1

import jsondata = '{"name": "Lenin", "website": "100daysofdata.com", "age":30}'json_dict = json.loads(data)print(type(json_dict))print(json_dict)

Output

'dict'>{'name': 'Lenin', 'website': '100daysofdata.com', 'age': 30}

If the data being deserialized is not a valid JSON document, a JSONDecodeError will be raised.

Parsing a JSON file in Python using json.load()

You can use json.load() method to read a file containing JSON object.

Example 2

import jsonfile_name = 'details.json'with open(file_name, 'r') as f:  data = json.load(f)print(type(data))

Output

'dict'>

Difference between json.load() and json.loads()

As mentioned earlier, s stands for string.

The json. load() is used to convert a JSON file into a dictionary whereas, json. loads() is used to convert a JSON String into the Python dictionary.

How to unpack JSON data in Python?

Deserialization helps to decode JSON values in Python. Once JSON is deserialized and converted to a dictionary, you can easily go through the keys and values of the dictionary using dict.items() and extract the necessary data.

Example 3

import jsonfile_name = 'details.json'with open(file_name, 'r') as f:  data = json.load(f)for key, value in data.items():  print(key, value)

Output

name Lenin Mishraage 30hobby ['Biking', 'Blogging', 'Cooking']websites [{'url': 'https://www.pylenin.com', 'Total blogs': '88', 'description': 'Everything about Python'}, {'url': 'https://www.100daysofdata.com', 'Total blogs': '3', 'description': 'Everything about Data'}]

Data analysis from a CSV file in Python

Lenin Mishra — Wed, 25 May 2022 05:30:00 GMT

What is a CSV file?

CSV stands for comma separated value.

You might have come across this file format while downloading data from an excel spreadsheet or a database. CSV files are convenient for storing tabular data.

It should be clear from the name that values in a CSV file are separated by a comma(by default).

Below is an example of CSV file containing information about a family.

my_family.csv

name,age,height(cm),weight(kg)Lenin,30,188,90Phil,42,178,76Claire,40,165,54Alex,18,140,46

Usually the first line in a CSV file is called the Header which identifies the column name and data type. Every row after the header is a data record.

From the above example, you can see that each value(whether part of header or a data record) is separated by a comma. This separator character is called a Delimiter. A CSV file may use other delimiters other than comma.

Examples of other delimiter -

tab \t
colon :
semi colon ;
pipe |

In this article, you will learn to work with CSV files using the csv module and the pandas library.

How to read CSV files using the csv module?

Reading from a CSV file is done with the csv.reader object. You can open the CSV file as a text file with Pythons built-in open() function.

Example 1

import csvwith open('my_family.csv') as input:    csv_reader = csv.reader(input, delimiter=',')    line_count = 0    for row in csv_reader:        if line_count == 0:            print(f'Header row - {", ".join(row)}')            line_count += 1        else:            print(f'{row[0]} is {row[1]} years old, {row[2]} cm tall and {row[3]} kg heavy')            line_count += 1    print(f'Total: {line_count} lines')

Output

Header row - name,  age,  height(cm),  weight(kg)Lenin is  30 years old,  188 cm tall and  90 kg heavyPhil is  42 years old,  178 cm tall and  76 kg heavyClaire is  40 years old,  165 cm tall and  54 kg heavyAlex is  18 years old,  140 cm tall and  46 kg heavyTotal: 5 lines

Since the first row is the header row(line_count will be 0), it is treated differently. You can also skip the header row while reading the CSV.

How to skip the header row in CSV with Python?

Since csv.reader object is an iterable, you can use next(reader object, None) function to return the header row and skip over it.

Example 2

import csvwith open('my_family.csv') as input:    csv_reader = csv.reader(input, delimiter=',')    line_count = 0    next(csv_reader, None) #ignore the header    for row in csv_reader:        print(f'{row[0]} is {row[1]} years old, {row[2]} cm tall and {row[3]} kg heavy')        line_count += 1    print(f'Total: {line_count} lines')

Output

Lenin is  30 years old,  188 cm tall and  90 kg heavyPhil is  42 years old,  178 cm tall and  76 kg heavyClaire is  40 years old,  165 cm tall and  54 kg heavyAlex is  18 years old,  140 cm tall and  46 kg heavyTotal: 4 lines

Reading CSV files as a dictionary

You can read the CSV file as a dictionary by using the csv.DictReader object.

An advantage of using the DictReader object is that it turns each row into a dictionary which make accessing the fields a little more easier.

Example 3

import csvwith open('my_family.csv') as input:    csv_reader = csv.DictReader(input, delimiter=',')    for row in csv_reader:        print(f'{row["name"]} is {row["age"]} years old, {row["height(cm)"]} cm tall and {row["weight(kg)"]} kg heavy')    print(f'Total: {csv_reader.line_num} lines')

The csv_reader.line_num method returns the total number of lines in the CSV file.

For the csv.DictReader object, Python uses the column names as key from the header row. The csv.DictReader object doesn't have the header row in it.

How to write to CSV files using the csv module?

You can write to a CSV file using the csv.writer object. Be careful to open the file in writing mode.

Example 1

import csvheader = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']data = [ ['Phil', 42, 178, 76],        ['Alex', 18, 140, 46],        ['Claire', 40, 165, 54] ]filename = "my_family.csv"with open(filename, 'w') as output:    csvwriter = csv.writer(output)    # Write a single list    csvwriter.writerow(header)    # Writing a list of lists    csvwriter.writerows(data)

Output

Name,Age,Height(cm),Weight(kg)Phil,42,178,76Alex,18,140,46Claire,40,165,54

The writerow method is going to write a list of values into a single row whereas writerows is going to write multiple rows from a buffer that contains one or more lists.

Using the delimiter parameter

Notice that no delimiter has been mentioned while creating the csv.writer object. In such cases, comma , is used as the default delimiter. You can also use a different delimiter by passing the delimiter parameter.

Example 2

import csvheader = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']data = [ ['Phil', 42, 178, 76],        ['Alex', 18, 140, 46],        ['Claire', 40, 165, 54] ]filename = "my_family.csv"with open(filename, 'w') as output:    csvwriter = csv.writer(output, delimiter = '|')    # Write a single list    csvwriter.writerow(header)    # Writing a list of lists    csvwriter.writerows(data)

Output

Name|Age|Height(cm)|Weight(kg)Phil|42|178|76Alex|18|140|46Claire|40|165|54

Writing a dictionary to a CSV file

You can write the dictionary into a CSV file using the DictWriter method. The fieldnames parameter is compulsory for passing the header information.

Example 3

import csvheader = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']data = [    {"Name":"Phil", "Age": 42, "Height(cm)":178, "Weight(kg)":76},    {"Name":"Claire", "Age": 40, "Height(cm)":165, "Weight(kg)":54},    {"Name":"Alex", "Age": 18, "Height(cm)":140, "Weight(kg)":46}]filename = "my_family.csv"with open(filename, 'w') as output:  csvwriter = csv.DictWriter(output, fieldnames=header)  csvwriter.writeheader()  for row in data:    csvwriter.writerow(row)

Output

Name,Age,Height(cm),Weight(kg)Phil,42,178,76Claire,40,165,54Alex,18,140,46

You can also use writerows to write all the dictionaries to the CSV file at once.

Example 4

import csvheader = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']data = [    {"Name":"Phil", "Age": 42, "Height(cm)":178, "Weight(kg)":76},    {"Name":"Claire", "Age": 40, "Height(cm)":165, "Weight(kg)":54},    {"Name":"Alex", "Age": 18, "Height(cm)":140, "Weight(kg)":46}]filename = "my_family.csv"with open(filename, 'w') as output:  csvwriter = csv.DictWriter(output, fieldnames=header)  csvwriter.writeheader()  csvwriter.writerows(data)

Output

Name,Age,Height(cm),Weight(kg)Phil,42,178,76Claire,40,165,54Alex,18,140,46

Playing with additional parameters in csv module

quotechar

It refers to the character string that is used to quote values when special characters or delimiters appears inside the field. It defaults to ".

For example, if the delimiter of your CSV file is a comma and you have an address column that may have comma in it's values. Check out the example below.

my_family.csv

Name,Age,Height(cm),Weight(kg),AddressPhil,42,178,76,'Gryffindor room, Hogwarts'Claire,40,165,54,'Snapes room, Hogwarts'Alex,18,140,46,'4 Private Drive, Little Whinging'

The above CSV file is using single quotes to separate the address fields for each data record. You can pass this as the quotechar value.

Example 1

import csvfilename = "my_family.csv"with open(filename, 'r') as output:  csvreader = csv.reader(output, quotechar="'")  for row in csvreader:    print(row)

Output

['Name', 'Age', 'Height(cm)', 'Weight(kg)', 'Address']['Phil', '42', '178', '76', 'Gryffindor room, Hogwarts']['Claire', '40', '165', '54', 'Snapes room, Hogwarts']['Alex', '18', '140', '46', '4 Private Drive, Little Whinging']

quoting

The quoting argument controls when quotes should be generated by the writer or recognized by the reader. It is of 4 types.

csv.QUOTE_MINIMAL - It adds quote only when required(default).
csv.QUOTE_ALL - It quotes everything regardless of the field type.
csv.QUOTE_NONNUMERIC - It quotes everything except integers and floats.
csv.QUOTE_NONE - It does not quote anything on output. However, while reading, quotes are included around the field values.

Example 1

import csvfilename = "my_family.csv"header = ['Name','Age','Height(cm)','Weight(kg)','Address']data = [  ['Phil',42,178,76,'Gryffindor room, Hogwarts'],  ['Claire',40,165,54,'Snapes room, Hogwarts'],  ['Alex',18,140,46,'4 Private Drive, Little Whinging']]with open(filename, 'w') as output:  csvwriter = csv.writer(output, quotechar="'", quoting=csv.QUOTE_ALL)  csvwriter.writerow(header)  csvwriter.writerows(data)

The above code uses csv.QUOTE_ALL as the quoting argument. This will ensure that every data point has a single quotation wrapped around it while being written to the CSV.

my_family.csv

'Name','Age','Height(cm)','Weight(kg)','Address''Phil','42','178','76','Gryffindor room, Hogwarts''Claire','40','165','54','Snapes room, Hogwarts''Alex','18','140','46','4 Private Drive, Little Whinging'

escapechar

Let's say, you don't want any quotation in your CSV file while executing the above code. So you use csv.QUOTE_NONE as the quoting argument.

Example 1

import csvfilename = "my_family.csv"header = ['Name','Age','Height(cm)','Weight(kg)','Address']data = [  ['Phil',42,178,76,'Gryffindor room, Hogwarts'],  ['Claire',40,165,54,'Snapes room, Hogwarts'],  ['Alex',18,140,46,'4 Private Drive, Little Whinging']]with open(filename, 'w') as output:  csvwriter = csv.writer(output, quotechar="'", quoting=csv.QUOTE_NONE)  csvwriter.writerow(header)  csvwriter.writerows(data)

The above code will throw you an error.

Output

Traceback (most recent call last):  File "main.py", line 16, in     csvwriter.writerows(data)_csv.Error: need to escape, but no escapechar set

The problem is that the address field contains commas. Since the quoting argument is set to csv.QUOTE_NONE, the csv module doesn't know how to escape the commas properly.

For this purpose, you can use the escapechar argument. It takes a single character string that is used to escape the delimiter when the the quoting is turned off.

The below code escapes the comma using a backslash \.

Example 2

import csvfilename = "my_family.csv"header = ['Name','Age','Height(cm)','Weight(kg)','Address']data = [  ['Phil',42,178,76,'Gryffindor room, Hogwarts'],  ['Claire',40,165,54,'Snapes room, Hogwarts'],  ['Alex',18,140,46,'4 Private Drive, Little Whinging']]with open(filename, 'w') as output:  csvwriter = csv.writer(output, quotechar="'", quoting=csv.QUOTE_NONE, escapechar='\\')  csvwriter.writerow(header)  csvwriter.writerows(data)

my_family.csv

Name,Age,Height(cm),Weight(kg),AddressPhil,42,178,76,Gryffindor room\, HogwartsClaire,40,165,54,Snapes room\, HogwartsAlex,18,140,46,4 Private Drive\, Little Whinging

Notice how the commas have been escaped with backslash \ and no error is thrown.

skipinitialspace

It skips the space following the delimiter. If True, the initial white spaces will be removed. It defaults to False.

my_family.csv

Name, Age, Height(cm), Weight(kg), AddressPhil, 42, 178, 76, 'Gryffindor room, Hogwarts'Claire, 40, 165, 54, 'Snapes room, Hogwarts'Alex, 18, 140, 46, '4 Private Drive, Little Whinging'

The above CSV file has spaces after every delimiter. If you read it without the skipinitialspace argument, there will be white spaces in your data points.

Example 1

import csvwith open('my_family.csv', 'r') as f:    csv_reader = csv.reader(f, quotechar="'")    for line in csv_reader:        print(line)

Output

['Name', ' Age', ' Height(cm)', ' Weight(kg)', ' Address']['Phil', ' 42', ' 178', ' 76', " 'Gryffindor room", " Hogwarts'"]['Claire', ' 40', ' 165', ' 54', " 'Snapes room", " Hogwarts'"]['Alex', ' 18', ' 140', ' 46', " '4 Private Drive", " Little Whinging'"]

To get rid of the whitespaces, set the skipinitialspace argument to True.

Example 2

import csvwith open('my_family.csv', 'r') as f:    csv_reader = csv.reader(f, quotechar="'", skipinitialspace=True)    for line in csv_reader:        print(line)

Output

['Name', 'Age', 'Height(cm)', 'Weight(kg)', 'Address']['Phil', '42', '178', '76', 'Gryffindor room, Hogwarts']['Claire', '40', '165', '54', 'Snapes room, Hogwarts']['Alex', '18', '140', '46', '4 Private Drive, Little Whinging']

How to read CSV files using the pandas module?

Reading CSV files into a pandas DataFrame is very straightforward. A pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

Example 1

import pandas as pddf = pd.read_csv('my_family.csv')print(df)

Output

     Name  Age  Height(cm)  Weight(kg)0    Phil   42         178          761  Claire   40         165          542    Alex   18         140          46

Notice the following points:-

It used the first line of CSV as column names automatically.
It displays the CSV data like a spreadsheet, thus making it easy to perform data analysis.
Pandas automatically converted the datatype for Age, Height(cm) and Weight(kg) columns to integer.

Example 2

import pandas as pddf = pd.read_csv('my_family.csv')print(type(df['Age'][0]))print(type(df['Height(cm)'][0]))print(type(df['Weight(kg)'][0]))

Output

'numpy.int64'>'numpy.int64'>'numpy.int64'>

Pandas trick to deal with CSVs without header

If your CSV is missing the header row, use the names arguments in pd.read_csv() method.

my_family.csv

Phil,42,178,76Claire,40,165,54Alex,18,140,46

Example 2

import pandas as pddf = pd.read_csv('my_family.csv',             index_col='Name',             names=['Name', 'Age', 'Height(cm)', 'Weight(kg)']                    )print(df)

Output

        Age  Height(cm)  Weight(kg)Name                               Phil     42         178          76Claire   40         165          54Alex     18         140          46

How to write to CSV files using the pandas module?

To write a pandas dataframe to a CSV file, use df.to_csv method.

Example 1

import pandas as pddf = pd.read_csv('my_family.csv',             index_col='Name',             names=['Name', 'Age', 'Height(cm)', 'Weight(kg)']                    )df.to_csv('my_new_family.csv')

Output

        Age  Height(cm)  Weight(kg)Name                               Phil     42         178          76Claire   40         165          54Alex     18         140          46

Data Analyst Project: Analyze Titanic data from Kaggle

The famous Titanic challenge by Kaggle is to build Machine Learning model that predicts which passengers survived the Titanic shipwreck.

However, in this section you are going to do simple data analysis on train.csv file and figure out the answers to the following questions:-

How many male and female passengers were onboard the Titanic?
How many male and female members survived the Titanic shipwreck?
What is the median age of each sex?

Male to Female ratio on the Titanic

import pandas as pd#load the csv filedf = pd.read_csv('train.csv')# Column Namesprint(df.columns)# Count unique values in Sex columnprint(df['Sex'].value_counts())# Percentage of male and female passengersprint(df['Sex'].value_counts(normalize=True))

Output

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],      dtype='object')male      577female    314Name: Sex, dtype: int64male      0.647587female    0.352413Name: Sex, dtype: float64

The above analysis shows that 65% of people on Titanic were Male and 35% were Female.

Surviving male to female ratio on the Titanic

import pandas as pd#load the csv filedf = pd.read_csv('train.csv')# Column Namesprint(df.columns)# Count unique values in Sex columnprint(df[df["Survived"] == 1]['Sex'].value_counts())# Percentage of surviving male and female passengersprint(df[df["Survived"] == 1]['Sex'].value_counts(normalize=True))

Output

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],      dtype='object')female    233male      109Name: Sex, dtype: int64female    0.681287male      0.318713Name: Sex, dtype: float64

In the above code, first you filter the dataframe for surviving passengers and then use the value_counts() method to find out the unique male and female passengers.

The above analysis shows that 68% of surviving people on the Titanic were Female.

Median age of each sex

import pandas as pd#load the csv filedf = pd.read_csv('train.csv')# median age of each sexmedian_age_men=df[df['Sex']=='male']['Age'].median()median_age_women=df[df['Sex']=='female']['Age'].median()print(f"The median age of men is {median_age_men}")print(f"The median age of women is {median_age_women}")

Output

The median age of men is 29.0The median age of women is 27.0

The above analysis shows that median age of male was 29 whereas median age of female was 27.

Python File I/O

Lenin Mishra — Mon, 23 May 2022 04:30:00 GMT

Introduction to Python I/O

I/O stands for Input/Output.

There are some commonly used built-in functions in Python like input() and print(), that help with input and output operations respectively.

The input() function reads user input into memory which is defined as sys.stdin and the print() function send data to display and is identified as sys.stdout.

Input operation with input()

To read input from user, you can use the input() built-in method.

**All codes are tested with Python 3.8.2.

Example 1

user_name = input('Enter your name--> ')  print(f"User name is {user_name}")

Output

Enter your name--> PyleninUser name is Pylenin

The input() function reads a line from input, converts it to a string (stripping a trailing newline), and returns the string.

Example 2

value = input("Enter anything==> ")print(type(value))print("Input received from the user is: ", value)

Output

Enter anything==> [1, 2]'str'>Input received from the user is:  [1, 2]

To learn more about the input() function, check out this article written by Pylenin.

https://www.pylenin.com/blogs/how-input-works-in-python/

Output operation with print()

The print() function provides an interface to the standard output(sys.stdout) object. When you use print, you are asking your Python interpreter to echo your message to standard output.

Example 1

print('Python', 3, 'Rocks', sep='|')

Output

Python|3|Rocks

Check out this deep-dive article on print() to learn more about it.

https://www.pylenin.com/blogs/python-print/

How to open and read a file in Python?

When you want to read from or write to a file, you need to open it first. Once you have performed the necessary operations, it needs to be closed so that the changes are saved properly.

Hence, in Python, a file operation takes place in the following order:

Open a file
Perform your operation
Close the file

To open a file in Python, use Python's inbuilt function open() and specify the mode, which represents the purpose for opening the file.

Below is an example of a program that opens a csv file in reading mode

Example 1

for data in open('test.csv', 'r', encoding='utf-8'):  print(data)

Output

name, agePylenin, 2Hello, 4

Note - The reading mode is also the default mode. So if you don't specify any mode, Python treats it as a reading mode.

Below is the table showing the other useful modes.

Mode	Description
r	Opens a file in reading mode.
w	Opens a file for writing. Creates a new file if it does not exist or truncates the file if it exists.
a	Opens the file in append mode. The file pointer exists at the end of the previously written file if exists any. It creates a new file if no file exists with the same name.
b	Opens a file in binary mode.
r+	Opens a file for both reading and writing.

Do you need to use encoding with open() method?

Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding. Because UTF-8 is the modern de-facto standard, encoding='utf-8' is recommended unless you know that you need to use a different encoding.

Appending a b to the mode opens the file in binary mode. Binary mode data is read and written as bytes objects. You can not specify encoding when opening file in binary mode in Python.

Example 2

for data in open('test.csv','rb'): #binary mode  print(data)

Output

b'name, age\n'b'Pylenin, 2\n'b'Hello, 4'

How to write to a file in Python?

To only write(not append) into a file in Python, you have to open the file in write w mode.

Be careful with the w mode, as it overwrites the file if it already exists and all the previous data is erased.

If the file doesn't exist, a new file is created.

Example 1

with open("my_essay.txt",'w',encoding = 'utf-8') as f:   f.write("My name is Pylenin.")   f.write("I am a Modern Data Architect.")   f.write("Follow me on Twitter @pylenin.")

The above code will create a file called my_essay.txt with the lines written on the same line.

my_essay.txt

My name is Pylenin.I am a Modern Data Architect.Follow me on Twitter @pylenin.

To write each line to a new line, use a \n at the end of each line.

Example 2

with open("my_essay.txt",'w',encoding = 'utf-8') as f:   f.write("My name is Pylenin.\n")   f.write("I am a Modern Data Architect.\n")   f.write("Follow me on Twitter @pylenin.\n")

The above code will create a file called my_essay.txt with the lines written on the same line.

my_essay.txt

My name is Pylenin.I am a Modern Data Architect.Follow me on Twitter @pylenin.

How to append to a file in Python?

To add more data to an existing file without overwriting, use the append a mode. If the file doesn't exist, it creates a new file.

Example 1

with open("my_essay.txt",'a',encoding = 'utf-8') as f:   f.write("My new website is www.100daysofdata.com.\n")

The above code will create a new line in my_essay.txt file.

my_essay.txt

My name is Pylenin.I am a Modern Data Architect.Follow me on Twitter @pylenin.My new website is www.100daysofdata.com.

How to close a file in Python?

When you are done with performing operations on the file, you need to properly close the file. To close a file in Python, use the close() method.

Even if you don't close the file explicitly, Python has a garbage collector which cleans up unreferenced objects.

However, this is not good practice! Closing a file, frees up the resources that are tied with the file.

Example 1

try:   f = open("my_essay.txt",'a',encoding = 'utf-8')   # perform file operationsfinally:   f.close()

Notice the use of try, except and finally blocks to handle opening and closing the files. This allows you to handle any exception that might arise while working with the file.

To learn more about handling exceptions in Python, check out these list of articles.

with statement in Python

If you are using the with statement while opening files, you don't need to use the close() method.

with statement in Python is used in exception handling. By using the with statement, Python ensures that file is automatically closed and resources released, once the statement is done executing.

Example 2

 with open("my_essay.txt",'a',encoding = 'utf-8') as f:     # perform file operations

How to read and write to a file at the same time?

In order to perform simultaneous read/write operations, use the r+ mode.

Example

with open("my_essay.txt",'r+',encoding = 'utf-8') as f:   lines = f.read()   print(lines)   f.write("Hello Data community!\n")with open("my_essay.txt",'r+',encoding = 'utf-8') as new_data:  print(new_data.read())

In the above example, we are using the r+ mode to read from my_essays.txt file and simultaneously writing a new line to it.

Output

My name is Pylenin.I am a Modern Data Architect.Follow me on Twitter @pylenin.My new website is www.100daysofdata.com.My name is Pylenin.I am a Modern Data Architect.Follow me on Twitter @pylenin.My new website is www.100daysofdata.com.Hello Data community!

Python File Methods (with examples)

readline()

This method reads the current line till it encounters a newline character.

Example

with open("my_essay.txt",'r',encoding = 'utf-8') as f:   lines = f.readline()   print(lines)

Output

My name is Pylenin.

readlines()

The method reads all the lines and returns a list object.

Example

with open("my_essay.txt",'r',encoding = 'utf-8') as f:   lines = f.readlines()   print(lines)

Output

['My name is Pylenin.\n', 'I am a Modern Data Architect.\n', 'Follow me on Twitter @pylenin.\n', 'My new website is www.100daysofdata.com.\n']

writelines()

The method writes a list of items to a file.

Example

lines = ['My name is Pylenin.\n', 'I am a Modern Data Architect.\n', 'Follow me on Twitter @pylenin.\n', 'My new website is www.100daysofdata.com.\n']with open("my_essay.txt",'w',encoding = 'utf-8') as f:   f.writelines(lines)

my_essay.txt

My name is Pylenin.I am a Modern Data Architect.Follow me on Twitter @pylenin.My new website is www.100daysofdata.com.

seek() and read()

These are two interesting methods in Python. The seek(position) method brings the cursor to the specified position. The read(size) method reads from the cursor position till the specified size. If no arguments are passed, it reads till the end of file.

Example

with open("my_essay.txt",'r',encoding = 'utf-8') as f:   f.seek(10)   data = f.read()   print(data)

The above code, moves the cursor to the 10th position. From there when you call the read() method, you are only able to see partial data.

Output

 Pylenin.I am a Modern Data Architect.Follow me on Twitter @pylenin.My new website is www.100daysofdata.com.

For any doubts and queries, use the comments section or tweet to me @pylenin.

100 days of Data

10 data analytics dashboards with Matplotlib

10 Data Analytics dashboard examples with Matplotlib

Line chart

Stacked Area Chart

Bar chart

Pie chart

Scatter Plot

Bubble chart

Candlestick

Timeseries

Histograms

Heatmap

Conclusion

How to invoke a Lambda function using an S3 event notification trigger?

Step 1 - Create an S3 bucket

Step 2 - Create a Lambda function

Step 3 - Testing the function

Final result

A visual guide to AWS Lambda for beginners

Benefits of using AWS Lambda

Your first Lambda function

Significance of event and context in Lambda function

Connecting Python to Google Sheets

Google Sheets

Why connect Python to Google Sheets?

How to connect Python to Google Sheets ?

Create a new project in Google Cloud Console

Enable Google drive API

Create a Service account

Share the Google Sheet with client email

Reading and writing to Google Sheet with Python

Working with JSON data in Python

What is JSON?

What is JSON serialization?

Writing JSON to a file with json.dump()

Writing Python object to a JSON string with json.dumps()

How to pretty print JSON in Python?

indent

sort_keys

Difference between json.dump() and json.dumps()

What is JSON deserialization?

Parsing a JSON string in Python using json.loads()

Parsing a JSON file in Python using json.load()

Difference between json.load() and json.loads()

How to unpack JSON data in Python?

Data analysis from a CSV file in Python

What is a CSV file?

How to read CSV files using the csv module?

How to skip the header row in CSV with Python?

Reading CSV files as a dictionary

How to write to CSV files using the csv module?

Using the delimiter parameter

Writing a dictionary to a CSV file

Playing with additional parameters in csv module

quotechar

quoting

escapechar

skipinitialspace

How to read CSV files using the pandas module?

Pandas trick to deal with CSVs without header

How to write to CSV files using the pandas module?

Data Analyst Project: Analyze Titanic data from Kaggle

Male to Female ratio on the Titanic

Surviving male to female ratio on the Titanic

Median age of each sex

Python File I/O

Introduction to Python I/O

Input operation with input()

Output operation with print()

How to open and read a file in Python?

Do you need to use encoding with open() method?

How to write to a file in Python?

How to append to a file in Python?

How to close a file in Python?

with statement in Python

How to read and write to a file at the same time?

Python File Methods (with examples)

readline()

readlines()