Skip to content

NumPy

NumPy

NumPy is a Python package used for numerical calculations, working with arrays of homogeneous values, and scientific computing. This section introduces NumPy and arrays then explains the difference between Python lists and NumPy arrays.

Python Lists and NumPy Arrays

In previous chapters, NumPy was used some of the functions and methods the package provides. NumPy is used to construct homogeneous arrays and perform mathematical operations on arrays. A NumPy array is different from a Python list. The data types stored in a Python list can all be different.

python_list =[ 1, -0.038, 'gear', True]

The Python list above contains four different data types: 1 is an integer, -0.038 is a float, 'gear' is a string, and 'True' is a boolean.

The code below prints the data type of each value store in python_list.

In [1]:
python_list = [1, -0.038, 'gear', True]
for item in python_list:
    print(type(item))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>

The values stored in a NumPy array must all share the same data type. Consider the NumPy array below:

np.array([1.0, 3.1, 5e-04, 0.007])

All four values stored in the NumPy array above share the same data type: 1.0, 3.1, 5e-04, and 0.007 are all floats.

The code below prints the data type of each value stored in the NumPy array above.

In [2]:
import numpy as np

np_array = [1, -0.038, 'gear', True] for value in np.array([1.0, 3.1, 5e-04, 0.007]): print(type(value))

<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>

If the same four elements stored in the previous Python list are stored in a NumPy array, NumPy forces all of the four items in the list to conform to the same data type.

In the next code section, all four items are converted to type '<U32', which is a string data type in NumPy (the U refers Unicode strings; all strings in Python are Unicode by default).

In [3]:
np.array([1, -0.038, 'gear', True])

Out[3]:
array(['1', '-0.038', 'gear', 'True'], dtype='<U32')

NumPy arrays can also be two-dimensional, three-dimensional, or up to n-dimensional. Computer resources limit array size, but the data type stored in each array is limited to the same type. NumPy arrays are useful because mathematical operations can be run on an entire array simultaneously. If a numbers are stored in a regular Python list, when the list is multiplied by a scalar, the list extends and repeats- instead of multiplying each number in the list by the scalar.
In [4]:
lst = [1, 2, 3, 4]
lst*2

Out[4]:
[1, 2, 3, 4, 1, 2, 3, 4]

To multiply each element of a Python list by the scalar number 2, a loop can be used:
In [5]:
lst = [1, 2, 3, 4]
for i, item in enumerate(lst):
    lst[i] = lst[i]*2
lst

Out[5]:
[2, 4, 6, 8]

The method above is relatively cumbersome and is also quite computationally expensive. An operation that is computationally expensive is an operation that takes a lot of processing time or storage resources like RAM or CPU bandwidth.

Another way of completing the same action as the loop above is to use a NumPy array.

An entire NumPy array can be multiplied by a scalar in one step. The scalar multiplication operation below produces an array with each element multiplied by the scalar 2.

In [6]:
nparray= np.array([1,2,3,4])
2*nparray

Out[6]:
array([2, 4, 6, 8])

If we have a very long list of numbers, we can compare the amount of time it takes each of the two computation methods above to complete the same operation. We'll compare the Python list calculation to the NumPy array calculation.

Jupyter notebooks have a nice built-in way to time how a line of code takes to execute. In a Jupyter notebook, when a line starts with %timeit followed by code, the notebook runs the line of code multiple times and outputs an average of the time spent to execute the line of code.

We can use %timit to compare a mathematical operation on a Python list using a for loop to the same mathematical operation on a NumPy array.

In [7]:
lst = list(range(10000))
%timeit for i, item in enumerate(lst): lst[i] = lst[i]*2

4.47 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [8]:
nparray= np.arange(0,10000,1)
%timeit 2*nparray

11.9 µs ± 155 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

With 10,000 integers, the Python list and for loop takes an average of single milliseconds, while the NumPy array completes the same operation in tens of microseconds. This is a speed increase of over 100x by using the NumPy array (1 millisecond = 1000 microseconds).

For larger lists of numbers, the speed increase using NumPy is considerable.