Introduction to NumPy Arrays¶
Goals:¶
Learn the basics of Python Numpy Arrays
References:
http://www.numpy.org/
https://docs.scipy.org/doc/numpy/user/quickstart.html
https://www.datacamp.com/community/tutorials/python-numpy-tutorial
https://blog.thedataincubator.com/2018/02/numpy-and-pandas/
https://medium.com/@ericvanrees/pandas-series-objects-and-numpy-arrays-15dfe05919d7
https://www.machinelearningplus.com/python/numpy-tutorial-part1-array-python-examples/
https://towardsdatascience.com/a-hitchhiker-guide-to-python-numpy-arrays-9358de570121
McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media. Kindle Edition
What is NumPy?¶
NumPy is short for “Numerical Python” and it is a fundamental python package for scientific computing.
It uses a high-performance data structure known as the n-dimensional array or ndarray, a multi-dimensional array object, for efficient computation of arrays and matrices.
What is an Array?¶
Python arrays are data structures that store data similar to a list, except the type of objects stored in them is constrained.
Elements of an array are all of the same type and indexed by a tuple of positive integers.
The python module array allows you to specify the type of array at object creation time by using a type code, which is a single character. You can read more about each type code here: https://docs.python.org/3/library/array.html?highlight=array#module-array
import array
array_one = array.array('i',[1,2,3,4])
type(array_one)
array.array
type(array_one[0])
int
What is a NumPy N-Dimensional Array (ndarray)?¶
It is an efficient multidimensional array providing fast array-oriented arithmetic operations.
An ndarray as any other array, it is a container for homogeneous data (Elements of the same type)
In NumPy, data in an ndarray is simply referred to as an array.
As with other container objects in Python, the contents of an ndarray can be accessed and modified by indexing or slicing operations.
For numerical data, NumPy arrays are more efficient for storing and manipulating data than the other built-in Python data structures.
import numpy as np
np.__version__
'1.19.2'
list_one = [1,2,3,4,5]
numpy_array = np.array(list_one)
type(numpy_array)
numpy.ndarray
numpy_array
array([1, 2, 3, 4, 5])
Advantages of NumPy Arrays¶
Vectorized Operations¶
The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.
NumPy operations perform complex computations on entire arrays without the need for Python for loops.
In other words, if you apply a function to an array, it is performed on every item in the array, rather than on the whole array object.
In a python list, you will have to perform a loop over the elements of the list.
list_two = [1,2,3,4,5]
# The following will throw an error:
list_two + 2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-03923fe34c76> in <module>
1 list_two = [1,2,3,4,5]
2 # The following will throw an error:
----> 3 list_two + 2
TypeError: can only concatenate list (not "int") to list
Performing a loop to add 2 to every integer in the list
for index, item in enumerate(list_two):
list_two[index] = item + 2
list_two
[3, 4, 5, 6, 7]
With a NumPy array, you can do the same simply by doing the following:
numpy_array
array([1, 2, 3, 4, 5])
numpy_array + 2
array([3, 4, 5, 6, 7])
Any arithmetic operations between equal-size arrays applies the operation element-wise:
numpy_array_one = np.array([1,2])
numpy_array_two = np.array([4,6])
numpy_array_one + numpy_array_two
array([5, 8])
numpy_array_one > numpy_array_two
array([False, False])
Memory.¶
NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects.
NumPy arrays takes significantly less amount of memory as compared to python lists.
import numpy as np
import sys
python_list = [1,2,3,4,5,6]
python_list_size = sys.getsizeof(1) * len(python_list)
python_list_size
168
python_numpy_array = np.array([1,2,3,4,5,6])
python_numpy_array_size = python_numpy_array.itemsize * python_numpy_array.size
python_numpy_array_size
48
Basic Indexing and Slicing¶
One Dimensional Array¶
When it comes down to slicing and indexing, one-dimensional arrays are the same as python lists
numpy_array
array([1, 2, 3, 4, 5])
numpy_array[1]
2
numpy_array[1:4]
array([2, 3, 4])
You can slice the array and pass it to a variable. Remember that variables just reference objects.
Any change that you make to the array slice, it will be technnically done on the original array object. Once again, variables just reference objects.
numpy_array_slice = numpy_array[1:4]
numpy_array_slice
array([2, 3, 4])
numpy_array_slice[1] = 10
numpy_array_slice
array([ 2, 10, 4])
numpy_array
array([ 1, 2, 10, 4, 5])
Two-Dimensional Array¶
In a two-dimensional array, elements of the array are one-dimensional arrays
numpy_two_dimensional_array = np.array([[1,2,3],[4,5,6],[7,8,9]])
numpy_two_dimensional_array
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
numpy_two_dimensional_array[1]
array([4, 5, 6])
Instead of looping to the one-dimensional arrays to access specific elements, you can just pass a second index value
numpy_two_dimensional_array[1][2]
6
numpy_two_dimensional_array[1,2]
6
Slicing two-dimensional arrays is a little different than one-dimensional ones.
numpy_two_dimensional_array
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
numpy_two_dimensional_array[:1]
array([[1, 2, 3]])
numpy_two_dimensional_array[:2]
array([[1, 2, 3],
[4, 5, 6]])
numpy_two_dimensional_array[:3]
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
numpy_two_dimensional_array[:2,1:]
array([[2, 3],
[5, 6]])
numpy_two_dimensional_array[:2,:1]
array([[1],
[4]])
numpy_two_dimensional_array[2][1:]
array([8, 9])