Numerical Linear Algebra

Jump to: navigation, search

Much of scientific programming involves handling numerical linear algebra. This is because a huge number of numerical problems which occur in science can be reduced to linear algebra problems. It is very important to know how to formulate these linear algebra problems, and how to solve them numerically.



Array representations of vectors, matrices, and tensors

Thus far, we have discussed simple "one-dimensional" (1D) arrays, which are linear sequences of numbers. In linear algebra terms, 1D arrays represent vectors. The array length corresponds to the "vector dimension" (e.g., a 1D array of length 3 corresponds to a 3-vector). In accordance with Scipy terminology, we will use the work "dimension" to refer to the dimensionality of the array (called the rank in linear algebra), and not the vector dimension.

You are probably familiar with the fact that vectors can be represented using index notation, which is pretty similar to Python's notation for addressing 1D arrays. Consider a length-d vector

\vec{x} = \begin{bmatrix}x_0\\x_1\\\vdots\\x_{d-1}\end{bmatrix}.

The jth element can be written as

x_j \quad \leftrightarrow\quad \texttt{x[j]},

where j=0,1,\dots,d-1. The notation on the left is mathematical index notation, and the notation on the right is Python's array notation. Note that we are using 0-based indexing, so that the first element has index 0 and the last element has index d-1.

A matrix is a collection of numbers organized using two indices, rather than a single index like a vector. Under 0-based indexing, the elements of an m\times n matrix are:

\mathbf{M} = \begin{bmatrix}M_{00} & M_{01} & \cdots & M_{0,n-1} \\ M_{10} & M_{11}& \cdots & M_{1,n-1} \\ \vdots & \vdots &\ddots&\vdots \\ M_{m-1,0} & M_{m-1,1} & \cdots & M_{m-1,n-1} \end{bmatrix}.

More generally, numbers that are organized using multiple indices are collectively referred to as tensors. Tensors can have more than two indices. For example, vector cross products are computed using the Levi-Civita tensor \varepsilon, which has three indices:

\left(\vec{A} \times \vec{B}\right)_i = \sum_{jk} \varepsilon_{ijk} A_j B_k.

Multi-dimensional arrays

In Python, tensors are represented by multi-dimensional arrays, which are similar to 1D arrays except that they are addressed using more than one index. For example, matrices are represented by 2D arrays, and the (i,j)th component of an m\times n matrix is written in Python notation as follows:

M_{ij} \quad \leftrightarrow\quad \texttt{M[i,j]} \quad\quad \mathrm{for} \;\;i=0,\dots,m-1, \;\;j=0,\dots,n-1.
Fig. 1: Memory model of a 2D array.

The way multi-dimensional arrays are laid out in memory is very similar to the memory layout of 1D arrays. There is a book-keeping block, which is associated with the array name, and which stores information about the array size (including the number of indices and the size of each index), as well as the memory location of the array contents. The elements lie in a sequence of storage blocks, in a specific order (depending on the array size). This arrangement is shown schematically in Fig. 1.

When Python needs to access any element of of a multi-dimensional array, it knows exactly which memory location the element is stored in. The location can be worked out from the size of the multi-dimensional array, and the memory location of the first element. In Fig. 1, for example, M is a 2\times 3 array, containing 6 storage blocks laid out in a specific sequence. If we need to access M[1,1], Python knows that it needs to jump to the storage block four blocks down from the (0,0) block. Hence, reading/writing the elements of a multi-dimensional aray is an O(1) operation, just like for 1D arrays.

In the following subsections, we will describe how multi-dimensional arrays can be created and manipulated in Python code.

Note: There is also a special Scipy class called matrix which can be used to represent matrices. Don't use this. It's a layer on top of Scipy's multi-dimensional array facilities, mostly intended as a crutch for programmers transitioning from Matlab. Arrays are better to use, and more consistent with the rest of Scipy.

Creating multi-dimensional arrays

You can create a 2D array with specific elements using the array command, with an input consisting of a list of lists:

>>> x = array([[1., 2., 3.], [4., 5., 6.]])
>>> x
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

The above code creates a 2D array (i.e. a matrix) named x. It is a 2\times3 array, containing the elements x_{00} = 1, x_{01} = 2, x_{02} = 3, etc. Similarly, you can create a 3D array by supplying an input consisting of a list of lists of lists; and so forth.

It is more common, however, to create multi-dimensional arrays using ones or zeros. These functions return arrays whose elements are all initialized to 0.0 and 1.0, respectively. (You can then assign values to the elements as desired.) To do this, instead of specifying a number as the input (which would create a 1D array of that size), you should specify a tuple as the input. For example,

>>> x = zeros((2,3))
>>> x
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
>>> y = ones((3,2))
>>> y
array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

There are many more ways to create multi-dimensional arrays, which we'll discuss when needed.

Basic array operations

To check on the dimension of an array, consult its ndim slot:

>>> x = zeros((5,4))
>>> x.ndim

To determine the exact shape of the array, use the shape slot (the shape is stored in the form of a tuple):

>>> x.shape
(3, 5)

To access the elements of a multi-dimensional array, use square-bracket notation: M[2,3], T[0,4,2], etc. Just remember that each component is zero-indexed.

Multi-dimensional arrays can be sliced, similar to 1D arrays. There is one important feature of multi-dimensional slicing: if you specify a single value as one of the indices, the slice results in an array of smaller dimensionality. For example:

>>> x = array([[1., 2., 3.], [4., 5., 6.]])
>>> x[:,0]
array([ 1.,  4.])

In the above code, x is a 2D array of size 2\times3. The slice x[:,0] specifies the value 0 for index 1, so the result is a 1D array containing the elements [x_{00}, x_{10}].

If you don't specify all the indices of a multi-dimensional array, the omitted indices implicitly included, and run over their entire range. For example, for the above x array,

>>> x[1]
array([ 4.,  5.,  6.])

This is also equivalent to x[1,:].

Arithmetic operations

The basic arithmetic operations can all be performed on multi-dimensional arrays, and act on the arrays element-by-element. For example,

>>> x = ones((2,3))
>>> y = ones((2,3))
>>> z = x + y
>>> z
array([[ 2.,  2.,  2.],
       [ 2.,  2.,  2.]])

You can think of this in terms of index notation:

z_{ij} = x_{ij} + y_{ij}.

What is the runtime for performing such arithmetic operations on multi-dimensional arrays? With a bit of thinking, we can convince ourselves that the runtime scales linearly with the number of elements in the multi-dimensional array, because the arithmetic operation is performed on each individual index. For example, the runtime for adding a pair of M\times N matrices scales as O(MN).

Important note: The multiplication operator * also acts element-by-element. It does not refer to matrix multiplication! For example,

>>> x = ones((2,3))
>>> y = ones((2,3))
>>> z = x * y
>>> z
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In index notation, we can think of the * operator as doing this:

z_{ij} = x_{ij} y_{ij}.

By contrast, matrix multiplication is z_{ij} = \sum_k x_{ik} y_{kj}. We'll see how this is accomplished in the next subsection.

The dot operation

The most commonly-used function for array multiplication is the dot function, which takes two array inputs x and y and returns their "dot product". It constructs a product by summing over the last index of array x, and over the next-to-last index of array y (or over its last index, if y is a 1D array). This may sound like a complicated rule, but you should be able to convince yourself that it corresponds to the appropriate type of multiplication operation for the most common cases encountered in linear algebra:

  • If x and y are both 1D arrays (vectors), then dot corresponds to the usual dot product between two vectors:
        \texttt{z} = \texttt{dot(x,y)} \;\;\; \leftrightarrow \;\;\; z = \sum_{k} x_k\, y_k
  • If x is a 2D array and y is a 1D array, then dot corresponds to right-multiplying a matrix by a vector:
        \texttt{z} = \texttt{dot(x,y)} \;\;\; \leftrightarrow \;\;\; z_i = \sum_{k} x_{ik}\, y_k
  • If x is a 1D array and y is a 2D array, then dot corresponds to left-multiplication:
        \texttt{z} = \texttt{dot(x,y)} \;\;\; \leftrightarrow \;\;\; z_i = \sum_{k} x_{k} \,y_{ki}
  • If x and y are both 2D arrays, dot corresponds to matrix multiplication:
        \texttt{z} = \texttt{dot(x,y)} \;\;\; \leftrightarrow \;\;\; z_{ij} = \sum_{k} x_{ik}\, y_{kj}
  • The rule applies to higher-dimensional arrays as well. For example, two rank-3 tensors are multiplied together in this way:
        \texttt{z} = \texttt{dot(x,y)} \;\;\; \leftrightarrow \;\;\; z_{ijpq} = \sum_{k} x_{ijk} \,y_{pkq}

Should you need to perform more general products than what the dot function provides, you can use the tensordot function. This takes two array inputs, x and y, and a tuple of two integers specifying which components of x and y to sum over. For example, if x and y are 2D arrays,

\texttt{z} = \texttt{tensordot(x, y, (0,1))} \;\;\; \leftrightarrow \;\;\; z_{ij} = \sum_{k} x_{ki} \,y_{jk}

What is the runtime for dot and tensordot? Consider a simple case: matrix multiplication of an M\times N matrix with an N\times P matrix. In index notation, this has the form

C_{ij} = \sum_{k=0}^{N-1} A_{ik} \, B_{kj},\quad \mathrm{for}\;i\in\{0,\dots,M-1\}, \;\;j\in\{0,\dots,P-1\}.

The resulting matrix has a total of (M\times P) indices to be computed. Each of these calculations requires a sum involving O(N) arithmetic operations. Hence, the total runtime scales as O(MNP). By similar reasoning, we can figure out the runtime scaling for any tensor product between two tensors: it is the product of the sizes of the unsummed indices, times the size of the summed index. For example, for a tensordot product between an M\times N\times P tensor and a Q\times S\times P tensor, summing over the last index of each tensor, the runtime would scale as O(MNPQS).

Linear equations

In physics, we are often called upon to solve linear equations of the form

\mathbf{A} \vec{x} = \vec{b},

where \mathbf{A} is some N\times N matrix, and both \vec{x} and \vec{b} are vectors for length N. Given \mathbf{A} and \vec{b}, the goal is to solve for \vec{x}.

It's an important and useful skill to recognize linear systems of equations when they arise in physics problems. Such equations can arise in many diverse contexts; we will give a couple of simple examples below.

Example Suppose there is a set of N electrically charged point particles at positions \{\vec{R}_0, \vec{R}_1, \dots, \vec{R}_{N-1}\}. We do not know the value of the electric charges, but we able to measure the electric potential at any point \vec{r}. The electric potential is given by
    \phi(\vec{r}) = \sum_{j=0}^{N-1} \frac{q_j}{|\vec{r}-\vec{R}_j|}.
If we measure the potential at N positions, \{\vec{r}_0, \vec{r}_1, \dots, \vec{r}_{N-1}\}, how can the charges \{q_0,\dots,q_{N-1}\} be deduced?
To do this, let us write the equation for the electric potential at point \vec{r}_i as:
    \phi(\vec{r}_i) = \sum_{j=0}^{N-1} \left[\frac{1}{|\vec{r}_i-\vec{R}_j|}\right] \, q_j.
This has the form \mathbf{A} \vec{x} = \vec{b}, where \mathbf{A}_{ij} \equiv \frac{1}{|\vec{r}_i-\vec{R}_j|}, \vec{b}_i \equiv \phi(\vec{r}_i), and the unknowns are \vec{x}_j = q_j.

Example Linear systems of equations commonly appear in circuit theory. For example, consider the following parallel circuit of N power supplies and resistances:
Assume the voltage on the right-hand side of the circuit is V_0 = 0. Given the resistances \{R_0, \dots, R_{N-1}\} and the EMFs \{\mathcal{E}_0, \dots, \mathcal{E}_{N-1}\}, how do we find the left-hand voltage V and the currents \{\mathcal{I}_0, \dots, \mathcal{I}_{N-1}\}?
We follow the usual laws of circuit theory. Each branch of the parallel circuit obeys Ohm's law,
    \mathcal{I}_j R_j + V = \mathcal{E}_j.
Furthermore, the currents obey Kirchoff's law (conservation of current), so
    \sum_{j=0}^{N-1} \mathcal{I}_j = 0.
We can combine these N+1 equations into a matrix equation of the form \mathcal{A}\,\vec{x} = \vec{b}:
   \begin{bmatrix}R_0 & 0 & \cdots & 0 & 1 \\ 0 & R_1 & \cdots & 0 & 1 \\ \vdots & \vdots &\ddots & \vdots & \vdots \\ 0& 0& \cdots & R_{N-1} & 1 \\ 1& 1& \cdots & 1& 0\end{bmatrix} \begin{bmatrix}\mathcal{I}_0 \\ \mathcal{I}_1 \\ \vdots \\ \mathcal{I}_{N-1} \\ V\end{bmatrix} = \begin{bmatrix}\mathcal{E}_0 \\ \mathcal{E}_1 \\ \vdots \\ \mathcal{E}_{N-1} \\ 0\end{bmatrix}
Here, the unknown vector \vec{x} consists of the N currents passing through the branches of the circuit, and the potential V.

Direct solution

Faced with a system of linear equations, one's first instinct is usually to solve for \vec{x} by inverting the matrix \mathbf{A}:

\mathbf{A} \vec{x} = \vec{b} \quad\Rightarrow\quad \vec{x} = \mathbf{A}^{-1}\, \vec{b}.

Don't do this. It is mathematically correct, but numerically inefficient. As we'll see, computing the matrix inverse \mathbf{A}^{-1}, and then right-multiplying by \vec{b}, involves more steps than simply solving the equation directly

To solve a system of linear equations, use the solve function from the scipy.linalg module. (You will need to import scipy.linalg explicitly, because it is a submodule of scipy and does not get imported by our usual from scipy import * statement.) Here is an example:

>>> A = array([[1., 2., 3.], [2., 4., 0.], [1., 3., 9.]])
>>> b = array([6., 6., 9.])
>>> import scipy.linalg as lin
>>> x = lin.solve(A, b)
>>> x
array([ 9., -3.,  1.])

We can verify that this is indeed the solution:

>>> dot(A, x)              # This should equal b.
array([ 6.,  6.,  9.])

The direct solver uses an algorithm known as Gaussian elimination, which we'll discuss in the next article. The runtime of Gaussian elimination is O(N^3), where N is the size of the linear algebra problem.

The reason we avoid solving linear equations by inverting the matrix \mathbf{A} is that the matrix inverse is itself calculated using the Gaussian elimination algorithm! If you are going to use Gaussian elimination anyway, it is far better to apply the algorithm directly on the desired \mathbf{A} and b. Solving by calculating \mathbf{A}^{-1} involves about twice as many computational steps.


  1. Write Python code to construct a 3D array of size 3\times3\times3 corresponding to the Levi-Civita tensor,
       \varepsilon_{ijk} = \begin{cases} +1 & \text{if } (i,j,k) \text{ is } (1,2,3), (2,3,1) \text{ or } (3,1,2), \\ -1 & \text{if } (i,j,k) \text{ is } (3,2,1), (1,3,2) \text{ or } (2,1,3), \\ \;\;\,0 & \text{if }i=j \text{ or } j=k \text{ or } k=i \end{cases}
    Then, using the tensordot function, verify the identity \sum_i \varepsilon_{ijk} \varepsilon_{imn}=\delta_{jm}\delta_{kn} - \delta_{jn}\delta_{km}.

Computational physics notes  |  Prev: Numbers, arrays, and scaling      Next: Gaussian elimination