= [1, 2, 3]
my_list type(my_list)
list
The NumPy package is the key building block of the Python scientific ecosystem.
In this chapter we introduce a few of the key concepts. You should refer to the documentation for details. As with any mature software ecosystem, you should first assume that what you want to achieve can be achieved in a highly optimised way within the existing framework, and only resort to creating your own solution if and when you satisfy yourself that this is not the case.
There are a huge number of resources for learning NumPy online. Here are a couple of good ones:
NumPy: the absolute basics for beginners from the NumPy documentation is pretty great.
This tutorial from a machine learning course at Stanford is also particularly nice and compact.
Everything in Python is an object. For example [1,2,3]
is a list
:
= [1, 2, 3]
my_list type(my_list)
list
You can think of an object as a container for properties and methods, the latter being functions associated with the object. Properties and methods are accessed with the .
syntax. For example, lists have the append
method, which adds an element to the end of the list:
"boop")
my_list.append( my_list
[1, 2, 3, 'boop']
With IPython you can see all the available methods by hitting tab:
import numpy as np
The fundamental object in NumPy is the Array (or ndarray
), which you can think of as a multidimensional version of a list. If we were representing two dimensional data — or a matrix — in plain old Python you would use a list of lists.
= [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]] data
Here data[i]
represents each row:
1] data[
[4, 5, 6]
If I wanted to multiply every element by a number I would have to do something like this:
for row in data:
for j, _ in enumerate(row):
*= 2
row[j] data
[[2, 4, 6], [8, 10, 12], [14, 16, 18], [20, 22, 24]]
If the details here are a bit unfamiliar don’t worry. The point I want to emphasize is: don’t do this.
NumPy provides us with the tools to perform this kind of operation with minimum code and maximum efficiency. First we create our data as an array. There are numerous NumPy functions that produce arrays. The simplest is numpy.array. It takes data in the “Pythonic” list-of-lists(-of-lists-of… etc.) form and produces the corresponding ndarray
= np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
my_array type(my_array)
numpy.ndarray
Now, how do we multiply the whole array by a number? Like this, of course
2 * my_array
array([[ 2, 4, 6],
[ 8, 10, 12],
[14, 16, 18],
[20, 22, 24]])
It even prints nicely
print(my_array)
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Arrays can be indexed, similar to lists
print(my_array[0], my_array[1], my_array[3][1])
[1 2 3] [4 5 6] 11
but — different from a ordinary list of lists — the last one can be much more pleasantly achieved with the syntax
3,1] my_array[
11
We also have a generalization of the slice syntax
1:, 1:] my_array[
array([[ 5, 6],
[ 8, 9],
[11, 12]])
Slicing can be mixed with integer indexing
1:, 1] my_array[
array([ 5, 8, 11])
NumPy offers all sorts of fancy indexing options for slicing and dicing your data: see the documentation for details.
A fundamental property of an array is its shape
:
# [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
my_array.shape
(4, 3)
The way to read off the shape of an array is as follows. To begin with you encounter a number of [
corresponding to the rank of the array (two in the above example). You then scan over a number of entries that give the rightmost (innermost) dimension in the shape tuple before closing ]
(3 here). After a number of 1D arrays [...]
equal to the next innermost dimension (4 here), we have another closing ]
, and so on.
It’s definitely something that will take a bit of time getting used to!
Notice that slicing does not change the rank of the array
1:, 1:].shape my_array[
(3, 2)
but integer indexing does
1:, 1].shape my_array[
(3,)
It wouldn’t be very convenient if you always had to create your arrays from nested lists using np.array
. Fortunately NumPy has lots of methods to create arrays with a given shape and populated in different ways:
= np.zeros((2,2))
a print(a)
= np.ones((2,2))
b print(b)
= np.full((2,2), 5)
c print(c)
= np.random.random((2,2)) # random numbers uniformly in [0.0, 1.0)
d print(d)
[[0. 0.]
[0. 0.]]
[[1. 1.]
[1. 1.]]
[[5 5]
[5 5]]
[[0.09186552 0.47712133]
[0.16653795 0.87510057]]
There are also lots of methods to change the shape of arrays, for example
numpy.reshape to change the shape of an array.
numpy.expand_dims to insert new axes of length one.
numpy.squeeze (the opposite) to remove new axes of length one.
Let’s see an example of the first one:
2, 2, 3) my_array.reshape(
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
Reshaping only works if the shapes are compatible. In this case it’s OK because the original shape was \((4,3)\) and \(4\times 3 = 2\times 2\times 3\). If the shapes aren’t compatible, we’ll get an error
2, 3, 3) my_array.reshape(
ValueError: cannot reshape array of size 12 into shape (2,3,3)
A NumPy array has a dtype
property that gives the datatype. If the array was created from data, this will be inferred
my_array.dtype
dtype('int64')
Functions that construct arrays also have an optional argument to specify the datatype
= np.array([1,2,3], dtype=np.float64)
my_float_array my_float_array.dtype
dtype('float64')
Importantly, complex numbers are supported
= np.array([1.1 + 2.3j,2.2,3.6])
my_float_array my_float_array.dtype
dtype('complex128')
It’s not hard to come up with examples of array-like data in physics. The position, velocity, or acceleration of a particle will be three dimensional vectors, so have shape (3,)
. If we had \(N\) particles you could either put them in a \(3N\) dimensional vector, but for reasons that will become clear it would be better to store them in an array of shape (N,3)
, so that the first index indexes the particle number and the second the particle coordinate.
An \(N\times M\) matrix has shape (N,M)
. The Riemann curvature tensor in General Relativity \(R_{abcd}\) has shape (4,4,4,4)
.
Often we deal with fields which are functions of space and time e.g. the electric potential \(\phi(\mathbf{r},t)\). On a computer we often approximate these using a grid of space-time points \(N_x\times N_y \times N_z\times N_t\). Thus the values of a scalar field could be stored in an array of shape (N_x,N_y,N_z,N_t)
. If we had a vector field like \(\mathbf{E}(\mathbf{r},t)\) this would be (N_x,N_y,N_z,N_t,3)
. You get the idea.
A very useful method to create a grid of coordinate values (at which you can evaluate a function, say) is as follows
# Grid of x, y points
= 64, 64
nx, ny = np.linspace(-2, 2, nx)
x = np.linspace(-2, 2, ny)
y = np.meshgrid(x, y) X, Y
# Example from https://scipython.com/blog/visualizing-a-vector-field-with-matplotlib/
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
def E(q, r0, x, y):
"""Return the electric field vector E=(Ex,Ey) due to charge q at r0."""
= np.hypot(x-r0[0], y-r0[1])**3
den return q * (x - r0[0]) / den, q * (y - r0[1]) / den
# Dipole
= [(1, (1, 0)), (-1, (-1, 0))]
charges
# Electric field vector, E=(Ex, Ey), as separate components
= np.zeros((ny, nx)), np.zeros((ny, nx))
Ex, Ey for charge in charges:
= E(*charge, x=X, y=Y)
ex, ey += ex
Ex += ey
Ey
= plt.figure()
fig = fig.add_subplot(111)
ax
# Plot the streamlines with an appropriate arrow style
= 2 * np.log(np.hypot(Ex, Ey))
color =color, linewidth=1,
ax.streamplot(x, y, Ex, Ey, color=2, arrowstyle='->', arrowsize=1.5)
density
# Add filled circles for the charges themselves
= {True: '#aa0000', False: '#0000aa'}
charge_colors for q, pos in charges:
0.05, color=charge_colors[q>0]))
ax.add_artist(Circle(pos,
'$x$')
ax.set_xlabel('$y$')
ax.set_ylabel(-2,2)
ax.set_xlim(-2,2)
ax.set_ylim('equal')
ax.set_aspect( plt.show()
I’ve already hinted at the payoff from all of this. On lists, multiplication by an integer concatentates multiple copies
2 * [1, 2, 3]
[1, 2, 3, 1, 2, 3]
which is sometimes useful. But in numerical applications what we really want is this
2 * np.array([1, 2, 3])
array([2, 4, 6])
This illustrates a general feature of NumPy that all mathematical operations are performed elementwise on arrays!
print(np.array([1, 2, 3]) + np.array([4, 5, 6]))
print(np.array([1, 2, 3])**2)
print(np.sqrt(np.array([1, 2, 3])))
[5 7 9]
[1 4 9]
[1. 1.41421356 1.73205081]
This avoids the need to write nested loops to perform some operation on each element of some multidimensional data. Of course, the loops are still there, it’s just that NumPy handles them in highly optimized C rather than Python. Code which operates in this way – rather than with explicit loops – is often described as vectorized, and in NumPy-speak vectorized functions are called ufuncs, short for universal functions (you can write your own if you need to). As a basic principle you should never use a Python loop to access your data in NumPy code. Loops may appear at a high level in stepping through time steps in a simulation, for example.
Vectorization is even more versatile than the above examples might suggest. Broadcasting is a powerful protocol that allows us to combine arrays of different shapes. Thus we can add a number to an array
1, 2, 3]) + 2.3 np.array([
array([3.3, 4.3, 5.3])
More generally, elementwise operations can be performed on two arrays of the same rank if in each dimension the sizes either match or one array has size 1.
# These have shape (2, 3) and (1, 3)
1, 2, 3], [4, 5, 6]]) + np.array([[4, 3, 2]]) np.array([[
array([[5, 5, 5],
[8, 8, 8]])
In fact, we can simplify this last example
# These have shape (2, 3) and (3,)
1, 2, 3], [4, 5, 6]]) + np.array([4, 3, 2]) np.array([[
array([[5, 5, 5],
[8, 8, 8]])
Broadcasting two arrays follows these rules:
If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
The arrays can be broadcast together if they are compatible in all dimensions. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension.
The documentation has more detail.
Recall our earlier example of an \(N\)-particle system described by a position array of shape (N,3)
. If we want to shift the entire system by a vector, we can just add a vector of shape (3,)
and broadcasting will ensure that this applied correctly to each particle.
There are various specialized Python plotting libraries but the entry-level option is the catchily named Matplotlib. The pyplot
module provides a plotting system that is similar to MATLAB (I’m told)
import matplotlib.pyplot as plt
This is probably the second most common import you will make. Here’s a simple example of the plot
function, used to plot 2D data
# Compute the x and y coordinates for points on a sine curve
= np.arange(0, 3 * np.pi, 0.1)
x = np.sin(x)
y
# Plot the points using matplotlib
plt.plot(x, y) plt.show()
Note: you must call plt.show() to make graphics appear. Here’s a fancier example with some labelling
# Compute the x and y coordinates for points on sine and cosine curves
= np.arange(0, 3 * np.pi, 0.1)
x = np.sin(x)
y_sin = np.cos(x)
y_cos
# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)'x axis label')
plt.xlabel('y axis label')
plt.ylabel('Sine and Cosine')
plt.title('Sine', 'Cosine'])
plt.legend([ plt.show()
Often you’ll want to make several related plots and present them together, which can be achieved using the subplot
function
import matplotlib.pyplot as plt
# Compute the x and y coordinates for points on sine and cosine curves
= np.arange(0, 3 * np.pi, 0.1)
x = np.sin(x)
y_sin = np.cos(x)
y_cos
# Set up a subplot grid that has height 2 and width 1,
# and set the first such subplot as active.
2, 1, 1)
plt.subplot(
# Make the first plot
plt.plot(x, y_sin)'Sine')
plt.title(
# Set the second subplot as active, and make the second plot.
2, 1, 2)
plt.subplot(
plt.plot(x, y_cos)'Cosine')
plt.title(
# Show the figure.
plt.show()
Pixels in an image are encoded as a triple of RGB values in the range [0,255] i.e. 8 bits of type uint8
(the “u” is for “unsigned”). Tinting an image gives us a nice example of broadcasting
= plt.imread('../assets/lucian.jpeg')
img
= img * [1, 0.55, 1]
img_tinted
# Show the original image
1, 2, 1)
plt.subplot(
plt.imshow(img)"Lucian")
plt.title(
# Show the tinted image
1, 2, 2)
plt.subplot("Pink Panther")
plt.title(# Having multiplied by floats,
# we must cast the image to uint8 before displaying it.
plt.imshow(np.uint8(img_tinted))
plt.show() img.shape, img.dtype
((4032, 3024, 3), dtype('uint8'))
This is a standard 12 megapixel image.
In the course of your work you are likely to produce, as well as consume lots of data. While it’s good practice to keep notebooks capable of reproducing any of your analyses, this could be time consuming and resource heavy for larger computations. Thus at some point you’ll probably want to save and load data. For example, after saving the data of a large scale simulation you’d like to load it and perform some analysis.
NumPy comes with its own save and load functions and associated binary format .npy
. The benefit of using these is that after loading you get back a NumPy array ready to be used.
A related function savez allows several arrays to be saved and then loaded as a dictionary-like object.
= np.random.rand(4, 4)
random_matrix_1 = np.random.rand(4, 4)
random_matrix_2 "../assets/my-matrices", first_matrix=random_matrix_1, second_matrix=random_matrix_2)
np.savez(%ls ../assets
fibonacci.png ising.js metropolis.png tab-complete.png
hard-spheres.png ising.py my-matrices.npz
ia-question.png lucian.jpeg tab-complete.gif
= np.load("../assets/my-matrices.npz")
my_matrix_file 'first_matrix'] my_matrix_file[
array([[0.74247601, 0.5018754 , 0.94228006, 0.42209143],
[0.71437331, 0.54953203, 0.50018677, 0.87390299],
[0.70960527, 0.43038589, 0.18670472, 0.08482399],
[0.15953534, 0.67654528, 0.38712039, 0.24182652]])