Python Foundations, Lab 4: Practice with Pandas¶
Instructor: Wesley Beckner
Contact: wesleybeckner@gmail.com
Solved: notebook
In this lab we will continue to practice manipulating pandas DataFrames.
import pandas as pd
import numpy as np
🐼 L4 Q1¶
Convert the two series into the columns of a DataFrame
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series([i for i in range(26)])
🐼 L4 Q2¶
Convert the series into a DataFrame with 7 rows and 5 columns
ser = pd.Series(np.random.randint(1, 10, 35))
🐼 L4 Q3¶
Compute the difference of differences between consecutive numbers in a series using ser.diff()
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])
🐼 L4 Q4¶
Convert a series of dates to datetime
format using pd.to_datetime()
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])
🐼 L4 Q5¶
Compute the mean of weights grouped by fruit
fruit = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10))
weights = pd.Series(np.linspace(1, 10, 10))
print(weights.tolist())
print(fruit.tolist())
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
['banana', 'banana', 'banana', 'carrot', 'carrot', 'carrot', 'carrot', 'carrot', 'banana', 'banana']
🐼 L4 Q6¶
Compute the euclidian distance between vectors p and q
Euclidean distance is calculated as the square root of the sum of the squared differences between the two vectors
This is related to the L2 vector norm
import math
p = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
q = pd.Series([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
🐼 L4 Q7¶
Fill in missing values for dates with the previous dates' value using ser.bfill()
or ser.ffill()
ser = pd.Series([1,10,3,np.nan], index=pd.to_datetime(['2000-01-01', '2000-01-03', '2000-01-06', '2000-01-08']))
print(ser)
2000-01-01 1.0
2000-01-03 10.0
2000-01-06 3.0
2000-01-08 NaN
dtype: float64
🐼 L4 Q8¶
Check if there are missing values in each column of a dataframe using .isnull()
, and .any()
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')
🐼 L4 Q9¶
Grab the first column and return it as a DataFrame rather than as a series
df = pd.DataFrame(np.arange(20).reshape(-1, 5), columns=list('abcde'))
🐼 L4 Q10¶
In df
, interchange columns 'a' and 'c'.
df = pd.DataFrame(np.arange(20).reshape(-1, 5), columns=list('abcde'))