13  Random number generation using Numpy

Open in Google Colab | Download notebook


Code
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade iqplot watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
# ------------------------------
import numpy as np
import scipy.stats as st

import iqplot

import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()
Loading BokehJS ...

A good portion of the random number generation functionality you will need is in the np.random module. It allows for draws of independent random numbers for many convenient named distributions. The scipy.stats module offers even more distributions, but for most applications, Numpy’s generators suffice and are typically faster than using Scipy, which has more overhead.

13.1 Uniform random numbers

Let’s start by generating random numbers from a Uniform distribution.

np.random.uniform(low=0, high=1, size=10)
array([0.40312242, 0.04051189, 0.03851483, 0.41999136, 0.99686689,
       0.86482983, 0.57146661, 0.71342451, 0.20395884, 0.32728968])

The function uniform() in the np.random module generates random numbers on the interval [low, high) from a Uniform distribution. The size keyword argument is how many random numbers you wish to generate, and is a keyword argument in all Numpy’s functions to draw from specific distributions. The random numbers are returned as a Numpy array.

We can check to make sure it is appropriately drawing random numbers out of the Uniform distribution by plotting the cumulative distribution function. We’ll generate 1,000 random numbers and plot them along with the CDF of a Uniform distribution.

# Generate random numbers
x = np.random.uniform(low=0, high=1, size=1000)

# Plot the ECDF of randomly generated numbers
p = iqplot.ecdf(x, marker_kwargs={"fill_color": None},)

p.line(
    x=[0, 1], y=[0, 1], line_width=2, line_color="orange",
)

bokeh.io.show(p)

So, it looks like our random number generator is doing a good job.

Generating random numbers on the uniform interval is one of the most commonly used RNG applications. For example, you can simulate flipping a biased (unfair) coin by drawing from a Uniform distribution and then asking if the random number if less than the bias.

# Generate 20 random numbers on uniform interval
x = np.random.uniform(low=0, high=1, size=20)

# Make the coin flips (< 0.7 means we have a 70% chance of heads)
heads = x < 0.7

# Show which were heads, and count the number of heads
print(heads)
print("\nThere were", np.sum(heads), "heads.")
[ True  True False  True  True  True  True False  True  True  True  True
  True  True  True False False False  True  True]

There were 15 heads.

Of course, you could also do this by drawing out of a Binomial distribution.

print(f"There were {np.random.binomial(20, 0.7)} heads.")
There were 12 heads.

13.2 Choice of generator

As of version 1.23 of Numpy, the algorithm under the hood of calls to functions like np.random.uniform() is the Mersenne Twister Algorithm for generating random numbers. It is a very widely used and reliable method for generating random numbers. However, starting with version 1.17, the numpy.random module offers random number generators with better speed and statistical performance, including a 64-bit permuted congruential generator (PCG64). Going forward, the preferred approach to doing random number generation is to first instantiate a generator of your choice, and then use its methods to generate numbers out of probability distributions.

Let’s set up a PCG64 generator, which is Numpy’s default (though this will soon be updated to the PCG64 DXSM, which works better for massively parallel generation, per Numpy’s documentation).

rng = np.random.default_rng()

Now that we have the generator, we can use it to draw numbers out of distributions. The syntax is the same as before, except rng replaces np.random.

rng.uniform(low=0, high=1, size=20)
array([0.83950893, 0.18995913, 0.7455994 , 0.65977687, 0.50251107,
       0.3920528 , 0.98069072, 0.43620495, 0.42139065, 0.67577344,
       0.44188521, 0.42143143, 0.64836288, 0.29192271, 0.17708979,
       0.92594815, 0.04850722, 0.06206417, 0.09710099, 0.75790207])

Or, for the Binomial,

rng.binomial(20, 0.7)
18

13.3 Seeding random number generators

Now, just to demonstrate that random number generation is deterministic, we will explicitly seed the random number generator (which is usually seeded with a number representing the date/time to avoid repeats) to show that we get the same random numbers.

# Instantiate generator with a seed
rng = np.random.default_rng(seed=3252)

# Draw random numbers
rng.uniform(size=10)
array([0.18866535, 0.04418857, 0.02961285, 0.22083971, 0.43341773,
       0.13166813, 0.42112164, 0.43507845, 0.61380912, 0.30627603])

If we reinstantiate with the same seed, we get the same sequence of random numbers.

# Re-seed the RNG
rng = np.random.default_rng(seed=3252)

# Draw random numbers
rng.uniform(size=10)
array([0.18866535, 0.04418857, 0.02961285, 0.22083971, 0.43341773,
       0.13166813, 0.42112164, 0.43507845, 0.61380912, 0.30627603])

The random number sequence is exactly the same. If we choose a different seed, we get totally different random numbers.

rng = np.random.default_rng(seed=3253)
rng.uniform(size=10)
array([0.31390226, 0.73012457, 0.05800998, 0.01557021, 0.29825701,
       0.10106784, 0.06329107, 0.58614237, 0.52023168, 0.52779988])

If you are writing tests, it is often useful to seed the random number generator to get reproducible results. Otherwise, it is best to use the default seed, based on the date and time, so that you get a new set of random numbers in your applications each time do computations.

13.4 Drawing random numbers out of other distributions

Say we wanted to draw random samples from a Normal distribution with mean μ and standard deviation σ.

# Set parameters
mu = 10
sigma = 1

# Draw 100000 random samples
x = rng.normal(mu, sigma, size=100000)

# Plot the histogram
p = iqplot.histogram(x, rug=False, density=True, y_axis_label="approximate PDF",)

bokeh.io.show(p)

It looks Normal, but, again, comparing the resulting ECDF is a better way to look at this. We’ll check out the ECDF with 1000 samples so as not to choke the browser with the display. I will also make use of the theoretical CDF for the Normal distribution available from the scipy.stats module.

# Compute theoretical CDF
x_theor = np.linspace(6, 14, 400)
y_theor = st.norm.cdf(x_theor, mu, sigma)

# Plot the ECDF of randomly generated numbers
p = iqplot.ecdf(x, marker_kwargs={"fill_color": None},)

p.line(
    x=x_theor, y=y_theor, line_width=2, line_color="orange",
)

bokeh.io.show(p)

Yup, right on!

13.5 Choosing elements from an array

It is often useful to randomly choose elements from an existing array. (Actually, this is probably the functionality we will use the most, since it is used in bootstrapping.) The rng.choice() function does this. You equivalently could do this using rng.integers(), where the integers represent indices in the array, except rng.choice() has a great keyword argument, replace, which allows random draws with or without replacement. For example, say you had 50 samples that you wanted to send to a facility for analysis, but you can only afford to send 20. If we used rng.integers(), we might have a problem.

rng = np.random.default_rng(seed=126969234)
rng.integers(0, 51, size=20)
array([12, 31, 47, 26,  3,  5, 46, 49, 26, 38, 24, 17, 46, 26,  6, 17, 35,
        4, 13, 29])

Sample 17 was selected twice and sample 26 was selected thrice. This is not unexpected. We can use rng.choice() instead.

rng.choice(np.arange(51), size=20, replace=False)
array([27, 34,  0, 46,  2, 48, 35, 50, 40, 12, 28, 19, 37, 38, 11, 23, 45,
       15, 29, 32])

Now, because we chose replace=False, we do not get any repeats.

13.6 Shuffling an array

Similarly, the rng.permutation() function is useful. It takes the entries in an array and shuffles them! Let’s shuffle a deck of cards.

rng.permutation(np.arange(53))
array([12, 18,  2, 34, 27, 10,  0, 30, 49,  7,  5, 35, 11, 23, 37, 17,  4,
       44, 15, 28, 14,  8, 40, 21, 39, 36, 46, 24, 33, 20, 22,  1, 41, 45,
       50, 26, 16, 42, 52,  3,  9, 48, 38, 25, 43, 51, 19, 47, 32,  6, 13,
       29, 31])

13.7 When do we need RNG?

Answer: VERY OFTEN! We will use random number generator extensively as we explore probability distributions.

In many ways, probability is the language of biology. Molecular processes have energetics that are comparable to the thermal energy, which means they are always influenced by random thermal forces. The processes of the central dogma, including DNA replication, are no exceptions. This gives rise to random mutations, which are central to understanding how evolution works. And of course neuronal firing is also probabilistic. If we want to understand how biology works, it is often useful to use random number generators to model the processes.

13.8 Computing environment

%load_ext watermark
%watermark -v -p numpy,scipy,bokeh,iqplot,jupyterlab
Python implementation: CPython
Python version       : 3.12.9
IPython version      : 9.1.0

numpy     : 2.1.3
scipy     : 1.15.2
bokeh     : 3.6.2
iqplot    : 0.3.7
jupyterlab: 4.3.6