samplespace.algorithms - General Sampling Algorithms


This module implements several general-purpose sampling algorithms.


samplespace.algorithms.sample_discrete_roulette(randfloat: Callable[], float], cum_weights: Sequence[float])int

Sample from a given discrete distribution given its cumulative weights, using binary search / the roulette wheel selection algorithm.

Parameters
  • randfloat (Callable[[], float]) – A function returning a random float in [0.0, 1.0), e.g. random.random(). Will only be called once.

  • cum_weights (Sequence[float]) – The list of cumulative weights. These do not need to be normalized, so cum_weights[-1] does not necessarily have to be 1.0.

Returns

An index into the population from 0 to len(cum_weights) - 1.

class samplespace.algorithms.AliasTable(probability: Sequence[float], alias: Sequence[int])

Sample from a given discrete distribution given its relative weights, using the alias table algorithm.

Alias tables are slow to construct, but offer increased sampling speed compared to the roulette wheel algorithm.

Construct a table using from_weights().

probability

The probability row of the table

Type

List[float]

alias

The alias row of the table

Type

List[int]

classmethod from_weights(weights: Sequence[float])

Construct an alias table using a list of relative weights.

The provided weights need not sum to 1.0.

sample(randbelow: Callable[[int], int], randchance: Callable[[float], bool])int

Sample from the alias table.

Parameters
  • randbelow (Callable[[int], int]) – A function returning a random int in [0, arg), e.g. random.randrange(). Will only be called once per sample.

  • randchance (Callable[[float], bool) – A function returning True with chance arg and False otherwise, e.g. lambda arg: random.random() < arg. Will only be called once per sample.

Returns

An index into the population from 0 to len(weights) - 1.

Examples

Sampling from a discrete categorical distribution using the Roulette Wheel Selection algorithm:

import random
import itertools
from samplespace import algorithms


population = 'abcde'
weights = [0.1, 0.4, 0.2, 0.3, 0.5]

cum_weights = list(itertools.accumulate(weights))
indices = [algorithms.sample_discrete_roulette(random.random, cum_weights)
           for _ in range(25)]  # [0, 4, 4, ...]
samples = [population[index] for index in indices]  # ['a', 'e', 'e', ...]

Sampling from a discrete categorical distribution using the Alias Table algorithm and a repeatable sequence:

from samplespace import algorithms, RepeatableRandomSequence

population = 'abcde'
weights = [0.1, 0.4, 0.2, 0.3, 0.5]

rrs = RepeatableRandomSequence(seed=12345)
at = algorithms.AliasTable.from_weights(weights)
indices = [at.sample(rrs.randrange, rrs.chance) for _ in range(25)]
samples = [population[index] for index in indices]  # ['e', 'e', 'c', ...]

# Note that rrs.index == 50 at this point, since two random values
# were required for each sample.

# It is also possible to use AliasTable within a cascade:
rrs.reset()
indices = [0] * 25
for i in range(len(indices)):
    with rrs.cascade():
        indices[i] = at.sample(rrs.randrange, rrs.chance)
samples = [population[index] for index in indices]  # ['e', 'd', 'b', ...]

# Because cascades were used, rrs.index == 25.
# Note that the values produced are different than the previous
# attempt, because a different number of logical samples were made.