NumPy package is the key building block of the Python scientific ecosystem.
Assume that what you want to achieve can be achieved in a highly optimised way within the existing framework
Only resort to your own solution if this is not the case
Many resources for learning NumPy online (see links in notes)
Everything in Python is an object
For example [1,2,3]
is a list
:
Object is container for properties and methods (functions associated with object), accessed with .
syntax.
e.g. lists have append
method:
dir
:['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
len(my_list)
is actually calling my_list.__len__
which does job of actually finding length.Fundamental object in NumPy is Array (or ndarray
), multidimensional version of a list
In plain old Python a matrix would be a list of lists.
data[i]
represents each row:[[2, 4, 6], [8, 10, 12], [14, 16, 18], [20, 22, 24]]
First create data as array
Numerous NumPy functions produce arrays
Simplest is numpy.array: takes data in “Pythonic” list-of-lists(-of-lists-of… etc.) form and produces ndarray
NumPy has all sorts of fancy indexing options
Indexing with integer arrays, with boolean arrays, etc.
See the documentation
shape
:First a number of [
corresponding to the rank of the array (two in the above example)
Then number of entries giving rightmost (innermost) dimension in shape before closing ]
(3 here)
After a number of 1D arrays [...]
equal to the next innermost dimension (4 here), we have another closing ]
, and so on
(3,)
is tuple giving the shape while (3)
is just the number 3 in bracketsa = np.zeros((2,2))
print(a)
b = np.ones((2,2))
print(b)
c = np.full((2,2), 5)
print(c)
d = np.random.random((2,2)) # random numbers uniformly in [0.0, 1.0)
print(d)
eye = np.eye(2) # Identity matrix
print(eye)
[[0. 0.]
[0. 0.]]
[[1. 1.]
[1. 1.]]
[[5 5]
[5 5]]
[[0.88033089 0.21886361]
[0.82283125 0.64978079]]
[[1. 0.]
[0. 1.]]
numpy.reshape to change the shape of an array
numpy.expand_dims to insert new axes of length one.
numpy.squeeze (the opposite) to remove new axes of length one.
reshape
Only works if the shapes are compatible. Here it’s OK because the original shape was \((4,3)\) and \(4\times 3 = 2\times 2\times 3\)
If shapes aren’t compatible, we’ll get an error
dtype
Arrays have dtype
property that gives datatype
If array was created from data, this will be inferred
dtype
Position, velocity, or acceleration of particle will be three dimensional vectors, so have shape (3,)
With \(N\) particles could use a \(3N\) dimensional vector
Better: an array of shape (N,3)
. First index indexes particle number and second particle coordinate.
\(N\times M\) matrix has shape (N,M)
Riemann curvature tensor in General Relativity \(R_{abcd}\) has shape (4,4,4,4)
Fields are functions of space and time e.g. the electric potential \(\phi(\mathbf{r},t)\)
Approximate these using a grid of space-time points \(N_x\times N_y \times N_z\times N_t\)
Scalar field can be stored in an array of shape (N_x,N_y,N_z,N_t)
A vector field like \(\mathbf{E}(\mathbf{r},t)\) would be (N_x,N_y,N_z,N_t,3)
# Grid of x, y points
nx, ny = 64, 64
x = np.linspace(-2, 2, nx)
y = np.linspace(-2, 2, ny)
X, Y = np.meshgrid(x, y)
X.shape
(64, 64)
print(np.array([1, 2, 3]) + np.array([4, 5, 6]))
print(np.array([1, 2, 3])**2)
print(np.sqrt(np.array([1, 2, 3])))
[5 7 9]
[1 4 9]
[1. 1.41421356 1.73205081]
Avoids need to write nested loops
Loops are still there, but written in C
This style of code is often described as vectorized
In NumPy-speak vectorized functions are called ufuncs
As a basic principle never use a Python loop to access your data in NumPy code
array([[5, 5, 5],
[8, 8, 8]])
Recall example of an \(N\)-particle system described by a position array of shape (N,3)
If we want to shift the entire system by a vector, just add a vector of shape (3,)
and broadcasting will ensure that this applied correctly to each particle.
Broadcasting two arrays follows these rules:
The documentation has more detail
Broadcasting takes some time to get used to but is immensely powerful!
Various specialized Python plotting libraries
“entry-level” option is Matplotlib
pyplot
module provides a plotting system that is similar to MATLAB (I’m told)
plot
function# Compute the x and y coordinates for points on a sine curve
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)
# Plot the points using matplotlib
plt.plot(x, y)
plt.show()
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)
# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.title('Sine and Cosine')
plt.legend(['Sine', 'Cosine'])
plt.show()
import matplotlib.pyplot as plt
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)
# Set up a subplot grid that has height 2 and width 1,
# and set the first such subplot as active.
plt.subplot(2, 1, 1)
# Make the first plot
plt.plot(x, y_sin)
plt.title('Sine')
# Set the second subplot as active, and make the second plot.
plt.subplot(2, 1, 2)
plt.plot(x, y_cos)
plt.title('Cosine')
# Show the figure.
plt.show()
Pixels in an image encoded as a triple of RGB values in the range [0,255] i.e. 8 bits of type uint8
(the “u” is for “unsigned”)
Tinting an image gives a nice example of broadcasting
img = plt.imread('../assets/lucian.jpeg')
img_tinted = img * [1, 0.55, 1]
# Show the original image
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Lucian")
# Show the tinted image
plt.subplot(1, 2, 2)
plt.title("Pink Panther")
# Having multiplied by floats,
# we must cast the image to uint8 before displaying it.
plt.imshow(np.uint8(img_tinted))
plt.show()
img.shape, img.dtype
((4032, 3024, 3), dtype('uint8'))
This is a standard 12 megapixel image
random_matrix_1 = np.random.rand(4, 4)
random_matrix_2 = np.random.rand(4, 4)
np.savez("../assets/my-matrices", first_matrix=random_matrix_1, second_matrix=random_matrix_2)
%ls ../assets
fibonacci.png ising.js metropolis.png tab-complete.png
hard-spheres.png ising.py my-matrices.npz
ia-question.png lucian.jpeg tab-complete.gif
array([[0.43303463, 0.46625546, 0.75395349, 0.69456282],
[0.5452065 , 0.15507252, 0.87227907, 0.93961307],
[0.10746136, 0.52692358, 0.0624021 , 0.21338092],
[0.62688037, 0.14867568, 0.04505824, 0.19230437]])