Python Foundations, Session 7: NumPy¶
Instructor: Wesley Beckner
Contact: wesleybeckner@gmail.com
Recording: Video (32 min)
Today, we will jump into the Numpy package.
numpy
: Numerical Python¶
Numpy is short for "Numerical Python", and contains tools for efficient manipulation of arrays of data. If you have used other computational tools like IDL or MatLab, Numpy should feel very familiar.
Import Libraries¶
# for numpy section
import numpy as np
np.random.seed(42)
7.1 NumPy Arrays¶
7.1.1 Creating NumPy Arrays¶
When we worked with lists, we saw that we could fill them with all sorts of datatypes. NumPy arrays are necessarily of one datatype:
# these will all be ints
np.array([1, 2, 3, 6, 5, 4])
array([1, 2, 3, 6, 5, 4])
# these will all be floats
np.array([1, 2, 3.14, 6, 5, 4])
array([1. , 2. , 3.14, 6. , 5. , 4. ])
We can check the data types in the standard way:
arr = np.array([1, 2, 3, 6, 5, 4])
for i in arr:
print(type(i))
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
We see that all types are upcast to the most complext object in the array. For instance, because 3.14 is a float, all the other numbers in the array will be a float:
for i in np.array([1, 2, 3.14, 6, 5, 4]):
print(type(i))
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
We can also specify the datatypes in the array:
np.array([1, 2, 3.14, 6, 5, 4], dtype='float32')
array([1. , 2. , 3.14, 6. , 5. , 4. ], dtype=float32)
🏋️ Exercise 1: Specify datatype¶
Create an array of 5 numbers whose datatypes are 16 bit integers. Make one of the numbers not a whole number. What happens to the number when it is stored in the 16 bit integer array?
# Cell for Exercise 1
7.1.2 Creating Arrays from NumPy Methods¶
# create an array of 10 zeros
# how can we specify the datatype?
np.zeros(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
# create an array of 10 1's
np.ones(10)
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
# fill an array of the following dimensions
# with value 42
np.full((2,3), 42)
array([[42, 42, 42],
[42, 42, 42]])
# arange from start (inc) to stop (exc)
# integers with step size
np.arange(1, 10, 2)
array([1, 3, 5, 7, 9])
# create an array of numbers that divides
# the space between start and stop (inc, inc)
# with X equally spaced intervals
np.linspace(0, 10, 5)
array([ 0. , 2.5, 5. , 7.5, 10. ])
# create an array of values drawn from a
# uniform distribution
np.random.random(5)
array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864])
# create an array of values from a normal distribution
np.random.normal(loc=0, scale=1, size=(5,5))
array([[ 0.61764085, 1.2170708 , 0.22628827, 0.84740143, 0.17483301],
[-1.21685489, 1.04934739, 1.32510566, 0.73450106, -0.95449726],
[-0.75117942, -1.13042805, 0.76997736, 1.26838952, 0.42448624],
[ 0.94053558, -0.86764109, 0.14586185, -1.36987106, -0.77178075],
[ 0.87867355, -0.23959451, 1.20938197, 0.53796 , 2.73442216]])
# create an array of random integers between 5 and 10 with shape 2x2
np.random.randint(5, 11, (2,2))
array([[5, 9],
[8, 9]])
🏋️ Exercise 2: Creating Arrays¶
a. Create a 5x5 array of ones with datatype int16
# Cell for Exercise 2a
b. Create an array of 10 numbers drawn from a uniform distribution between 0 and 1
# Cell for Exercise 2b
c. Create an array of 10 numbers drawn from a normal distribution centered at 80 with a standard deviation of 5
# Cell for Exercise 2c
d. Create an array of 10 intergers drawn from a uniform distribution between 5 and 10 inclusive
# Cell for Exercise 2d
7.2 NumPy Array Attributes¶
Common array attributes are shape
, size
, nbytes
, and itemsize
my_arr = np.random.randint(low=5, high=10, size=(5,5))
print(my_arr)
[[7 8 7 5 5]
[8 8 9 9 7]
[8 5 9 9 5]
[9 7 8 5 8]
[9 9 5 7 6]]
my_arr.shape
(5, 5)
my_arr.dtype
dtype('int64')
my_arr.size
25
my_arr.nbytes
200
my_arr.itemsize
8
🏋️ Exercise 3: Conditional Check on Array Attributes¶
write a conditional that checks that the total number of bytes of the array object my_arr
divided by the size of each item (in bytes) is equal to the number of items in the array (hint: we covered the attributes above)
# Cell for exercise 3
True
7.3 NumPy Array Slicing, Copying, Setting¶
Array slicing operates much the same way as with python lists
my_arr
array([[7, 8, 7, 5, 5],
[8, 8, 9, 9, 7],
[8, 5, 9, 9, 5],
[9, 7, 8, 5, 8],
[9, 9, 5, 7, 6]])
# grab the first row
my_arr[0]
array([7, 8, 7, 5, 5])
# grab the first element of the first row
# instead of this
print(my_arr[0][0])
# we do this
print(my_arr[0, 0])
9
9
We can time these...
%%timeit
my_arr[0][0]
The slowest run took 38.40 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 381 ns per loop
%%timeit
my_arr[0, 0]
The slowest run took 60.67 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 170 ns per loop
We can use the same slicing notation as with lists
my_arr[start:stop:step]
for n-dimensional arrays
my_arr[1-start:1-stop:1-step, 2-start:2-stop:2-step, ... n-start:n-stop:n-step]
# with arrays, we simply separate each dimension with a comma
my_arr[:2, :2]
array([[7, 8],
[8, 8]])
Slices are views not copies. This means we can set slices of arrays to new values, and the original object will change:
my_arr[:2, :2] = 0
my_arr
array([[0, 0, 7, 5, 5],
[0, 0, 9, 9, 7],
[8, 5, 9, 9, 5],
[9, 7, 8, 5, 8],
[9, 9, 5, 7, 6]])
my_arr[-2:, -2:] = 1
my_arr
array([[0, 0, 7, 5, 5],
[0, 0, 9, 9, 7],
[8, 5, 9, 9, 5],
[9, 7, 8, 1, 1],
[9, 9, 5, 1, 1]])
Step through an array slice
# remember that we can use steps in slicing
my_arr[:, ::2] # the last number after :: is the step size
array([[0, 7, 5],
[0, 9, 7],
[8, 9, 5],
[9, 8, 1],
[9, 5, 1]])
We can use negative step sizes the way we do with lists. A negative step size reverses the order of start and stop, so it is a convenient way to reverse the order of one or more dimensions of an array
# reverse the rows
my_arr[::-1]
array([[9, 9, 5, 1, 1],
[9, 7, 8, 1, 1],
[8, 5, 9, 9, 5],
[0, 0, 9, 9, 7],
[0, 0, 7, 5, 5]])
# reverse the columns
my_arr[:, ::-1]
array([[5, 5, 7, 0, 0],
[7, 9, 9, 0, 0],
[5, 9, 9, 5, 8],
[1, 1, 8, 7, 9],
[1, 1, 5, 9, 9]])
Sometimes we want to create a copy of an array, despite the default slicing behavior. We can do this with the .copy()
method
new_arr = my_arr.copy()
new_arr[:,:] = 0
print(my_arr)
[[0 0 7 5 5]
[0 0 9 9 7]
[8 5 9 9 5]
[9 7 8 1 1]
[9 9 5 1 1]]
🏋️ Exercise 4: Array Setting and Slicing¶
set all the even columns of my_arr
to 0 and all the odd columns to 1 (interpret the first column to be 1 and the last to be 5, i.e. don't index at 0 when thinking of each column as even/odd!)
Example Output:
array([[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0]])
# Cell for Exercise 4
array([[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0]])
7.4 NumPy Array Reshaping, Concatenation, and Splitting¶
reshaping is going to be a common task for us:
arr = np.arange(9)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
# reshape into a 3x3 array
arr.reshape(3,3) # rows then columns
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
The reshaped dimensions have to be evenly divisible into the total number of elements:
-1
will infer the proper dimension based on the other dimensions provided and the total number of elements
# arr.reshape(4,2) # throws and error
arr = np.arange(12)
arr.reshape(4,3)
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
A common manipulation in numpy is to convert a 1 dimensional array into a 2 dimensional array. You will see this frequently when working with test/train datasets in machine learning.
arr = np.arange(9)
# reshape into 2 dimensions
arr.reshape(-1,1)
array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8]])
# back to one dimension
arr.reshape(9)
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
We can also concatenate arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(arr1, arr2)
[1 2 3] [4 5 6]
# now a single array
np.concatenate((arr1, arr2))
array([1, 2, 3, 4, 5, 6])
vstack
or vertical stack will place the two arrays on top of eachother:
np.vstack((arr1,arr2))
array([[1, 2, 3],
[4, 5, 6]])
hstack
will place them side by side
np.hstack((arr1,arr2))
array([1, 2, 3, 4, 5, 6])
arr1 = np.array([[1, 2, 3],[7,8,9]])
arr2 = np.array([4, 5, 6])
print(arr1)
print(arr2)
print(arr1.shape)
[[1 2 3]
[7 8 9]]
[4 5 6]
(2, 3)
np.vstack((arr1, arr2))
array([[1, 2, 3],
[7, 8, 9],
[4, 5, 6]])
Lastly, we can also split arrays. We give the indices where the split should be performed
arr = np.random.randint(5, 11, (10,10))
arr
array([[ 5, 6, 10, 6, 7, 10, 6, 10, 6, 7],
[ 6, 6, 6, 5, 5, 5, 7, 10, 9, 6],
[ 6, 7, 6, 5, 9, 8, 6, 5, 8, 9],
[ 8, 5, 10, 10, 8, 7, 8, 6, 6, 10],
[ 7, 5, 6, 10, 9, 10, 6, 6, 10, 5],
[ 8, 6, 10, 10, 7, 8, 9, 5, 9, 8],
[ 8, 8, 9, 8, 10, 9, 8, 10, 7, 8],
[ 9, 6, 8, 6, 10, 7, 5, 7, 8, 6],
[ 6, 9, 6, 9, 10, 5, 8, 9, 5, 6],
[ 6, 5, 6, 10, 10, 5, 9, 9, 5, 9]])
a, b = np.split(arr, [5])
print(a)
print(b)
[[ 8 9 7 7 10 8 6 6 9 10]
[ 5 9 10 8 8 8 8 8 10 10]
[ 7 6 8 5 10 5 5 5 7 10]
[ 5 8 9 5 7 10 7 5 10 9]
[ 5 7 6 8 7 10 5 8 5 10]]
[[ 5 6 8 8 10 6 7 5 9 5]
[ 5 7 5 6 6 8 10 9 5 5]
[ 7 10 6 9 8 6 10 8 7 7]
[ 5 10 9 8 6 10 10 7 5 5]
[ 8 7 10 9 7 8 8 7 8 7]]
np.vsplit(arr, [2,4,6,8])
[array([[ 5, 6, 10, 6, 7, 10, 6, 10, 6, 7],
[ 6, 6, 6, 5, 5, 5, 7, 10, 9, 6]]),
array([[ 6, 7, 6, 5, 9, 8, 6, 5, 8, 9],
[ 8, 5, 10, 10, 8, 7, 8, 6, 6, 10]]),
array([[ 7, 5, 6, 10, 9, 10, 6, 6, 10, 5],
[ 8, 6, 10, 10, 7, 8, 9, 5, 9, 8]]),
array([[ 8, 8, 9, 8, 10, 9, 8, 10, 7, 8],
[ 9, 6, 8, 6, 10, 7, 5, 7, 8, 6]]),
array([[ 6, 9, 6, 9, 10, 5, 8, 9, 5, 6],
[ 6, 5, 6, 10, 10, 5, 9, 9, 5, 9]])]
np.hsplit(arr, [5])
[array([[ 5, 6, 10, 6, 7],
[ 6, 6, 6, 5, 5],
[ 6, 7, 6, 5, 9],
[ 8, 5, 10, 10, 8],
[ 7, 5, 6, 10, 9],
[ 8, 6, 10, 10, 7],
[ 8, 8, 9, 8, 10],
[ 9, 6, 8, 6, 10],
[ 6, 9, 6, 9, 10],
[ 6, 5, 6, 10, 10]]), array([[10, 6, 10, 6, 7],
[ 5, 7, 10, 9, 6],
[ 8, 6, 5, 8, 9],
[ 7, 8, 6, 6, 10],
[10, 6, 6, 10, 5],
[ 8, 9, 5, 9, 8],
[ 9, 8, 10, 7, 8],
[ 7, 5, 7, 8, 6],
[ 5, 8, 9, 5, 6],
[ 5, 9, 9, 5, 9]])]
🏋️ Exercise 5: Reshaping and Concatenating¶
We'll practice a few of these methods we've learned.
- make
arr2
match the shape ofarr1
usingreshape
- stack
arr1
on top ofarr2
usingvstack
and call this new arrayarr
- replace all the even columns of
arr
with zeros - return the sum of
arr
usingarr.sum()
starting code:
np.random.seed(42)
arr1 = np.random.randint(5, 11, (5,10))
arr2 = np.random.randint(5, 11, (10,5))
expected output:
374
np.random.seed(42)
arr1 = np.random.randint(5, 11, (5,10))
arr2 = np.random.randint(5, 11, (10,5))
print(arr1,end='\n\n')
print(arr2)
[[ 8 9 7 9 9 6 7 7 7 9]
[ 8 7 10 9 6 8 10 10 6 8]
[ 9 5 8 6 10 9 8 5 5 7]
[ 7 6 8 8 10 10 10 7 8 8]
[ 5 7 9 7 9 5 6 8 5 8]]
[[10 6 6 5 6]
[ 9 6 8 8 8]
[ 8 9 7 10 5]
[ 8 6 8 6 10]
[10 10 6 8 10]
[ 9 6 6 8 6]
[ 6 10 8 10 10]
[ 8 5 10 9 9]
[ 6 9 6 5 8]
[ 8 8 9 5 9]]
# Cell for Exercise 5
374
7.5 Additional Exercises¶
🏋️ Exercise 6: boolean Array¶
Create a 3x3 array of all True's (booleans)
# Cell for Exercise 6
🏋️ Exercise 7: Index on Conditional¶
extract all numbers divisible by 3
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Cell for Exercise 7
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
7.5.3 np.where
¶
There is a nifty tool, np.where()
. The syntax works like np.where(<where this condition is true> , <return elements from this array>, <otherwise return elements from this aray>)
# Example
a = np.arange(10)
np.where(a<5, a, a*10)
array([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90])
7.5.4 np.argwhere
¶
A similar but slightly different tool is np.argwhere
which will return the indices of the array where the conditional is true
# Example
np.argwhere(a<5)
array([[0],
[1],
[2],
[3],
[4]])
🏋️ Exercise 8: Edit a Copy not a View¶
replace all odd numbers in arr
with -1 without changing arr
(return a new array using np.where
)
# Cell for Exercise 8
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
🏋️ Exercise 9: Read NumPy Documentation¶
create the following array without hard coding (i.e. don't write any of the values in your code)
a = np.array([1,2,3])`
# desired output:
#> array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
Hints:
# Cell for Exercise 9
a = np.array([1,2,3])
# desired output:
#> array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
🏋️ Exercise 10: More Slicing¶
swap columns 2 and 3 in arr1
# Cell for Exercise 10