SampleSpace: Cross-Platform Tools for Generating Random Numbers¶

SampleSpace is a cross-platform library for describing and sampling from random distributions.
While SampleSpace is primarily intended for creating procedurally-generated content, it is also useful for Monte-Carlo simulations, unit testing, and anywhere that flexible, repeatable random numbers are required.
Installation is simple:
$ pip install pysamplespace
SampleSpace’s only dependency is xxHash, though it optionally offers additional functionality if PyYAML is installed.
SampleSpace was created by Coriander V. Pines and is available under the BSD 3-Clause License.
The source is available on GitLab.
samplespace.repeatablerandom
- Repeatable Random Sequences¶
RepeatableRandomSequence
allows for generating repeatable,
deterministic random sequences. It is compatible with the built-in
random
module as a drop-in replacement.
A key feature of RepeatableRandomSequence
is its ability to
get, serialize, and restore internal state. This is especially useful
when generating procedural content from a fixed seed.
A RepeatableRandomSequence
can also be used for unit testing
by replacing the built-in random
module. Because each random
sequence is deterministic and repeatable for a given seed, expected
values can be recorded and compared against within unit tests.
RepeatableRandomSequence
produces high-quality pseudo-random
values. See Random Generator Quality for results from randomness tests.
-
class
samplespace.repeatablerandom.
RepeatableRandomSequence
(seed=None)¶ A deterministic and repeatable random number generator compatible with Python’s builtin
random
module.
-
RepeatableRandomSequence.
BLOCK_SIZE_BITS
: int = 64¶ The number of bits generated for each unique index.
-
RepeatableRandomSequence.
BLOCK_MASK
: int = 18446744073709551615¶ A bitmask corresponding to
(1 << BLOCK_SIZE_BITS) - 1
Bookkeeping functions¶
-
RepeatableRandomSequence.
seed
(value=None) → None¶ Re-initialize the random generator with a new seed. Resets the sequence to its first value.
Caution
This method cannot be called from within
cascade()
, and will raise aRuntimeError
if attempted.- Parameters
value (int, str, bytes, bytearray) – The value to seed with. If this is a sequence type, the sequence is first hashed with seed 0.
- Raises
ValueError – If the seed value is not a supported type.
-
RepeatableRandomSequence.
getseed
()¶ Returns: the original value passed to
RepeatableRandomSequence()
orseed()
.
-
RepeatableRandomSequence.
getstate
() → samplespace.repeatablerandom.RepeatableRandomSequenceState¶ Returns an opaque object representing the sequence’s current state, used for saving, restoring, and serializing the sequence. This object can be passed to
setstate()
to restore the sequence’s state.
-
RepeatableRandomSequence.
setstate
(state: samplespace.repeatablerandom.RepeatableRandomSequenceState) → None¶ Restore the sequence’s state from previous call to
getstate()
.
-
RepeatableRandomSequence.
reset
() → None¶ Reset the sequence to the beginning, setting
index
to 0.Caution
This method cannot be called from within
cascade()
, and will raise aRuntimeError
if attempted.
-
RepeatableRandomSequence.
index
¶ The sequence’s current index. Generating random values will always increase the index.
Tip
Prefer
getstate()
andsetstate()
over index manipulation when saving and restoring state.Caution
The index cannot be read or written within
cascade()
, and will raise aRuntimeError
if attempted.- Type
-
RepeatableRandomSequence.
getnextblock
() → int¶ Get a block of random bits from the random generator.
Tip
Calling this method will advance the sequence exactly one time. The resulting index depends on whether or not the call occurs within a cascade.
- Returns
A block of
BLOCK_SIZE_BITS
random bits as an int
-
RepeatableRandomSequence.
getrandbits
(k: int) → int¶ Generate an int with k random bits.
This method may call
getnextblock()
multiple times if k is greater thanBLOCK_SIZE_BITS
; regardless of the number of calls, the index will only advance once.Tip
Note that
k == 0
is a valid input, which will advance the sequence even though no elements are chosen.- Parameters
k (int) – The number of random bits to generate.
- Returns
A random integer in [0,
max(2^k, 1)
)
-
RepeatableRandomSequence.
cascade
()¶ Returns a context manager that defines a generation cascade.
Cascades are not required for most applications, but may be useful for certain advanced procedural generation techniques.
A cascade allows multiple random samples to be treated as part the same logical unit.
For example, generating a random position in space can be treated as a single “transaction” by grouping the fields into a single cascade:
>>> print(rrs.index) 10 >>> with rrs.cascade(): ... x = rrs.random() # index is 10 ... y = rrs.random() # index is the previously-generated random block ... z = rrs.random() # index is the previously generated random block ... >>> print(rrs.index) 11
The use of cascades ensure that random values are repeatable and based only on the number of previously-generated values, not the type of the values themselves. This is often useful when creating procedurally-generated content using pre-defined seeds.
While cascading, subsequent calls to random generation functions use the result of the most recently generated random data rather than incrementing the index. When a cascade completes, the index is set to one more than the index when the cascade began.
-
class
samplespace.repeatablerandom.
RepeatableRandomSequenceState
(_seed: Any, _hash_input: bytes, _index: int)¶ An object representing a
RepeatableRandomSequence
’s internal state.-
as_dict
()¶ Return the sequence state as a dictionary for serialization.
-
classmethod
from_dict
(as_dict)¶ Construct a new sequence state from a dictionary, as returned by
as_dict()
.Examples
>>> rrs = RepeatableRandomSequence() >>> >>> state_as_dict = rrs.getstate().as_dict() >>> state_as_dict {'seed': 0, 'hash_input': 'NMlqzcrbG7s=', 'index': 0} >>> >>> new_state = RepeatableRandomSequenceState.from_dict(state_as_dict) >>> rrs.setstate(new_state)
-
Integer distributions¶
-
RepeatableRandomSequence.
randrange
(start: int, stop: Optional[int] = None, step: int = 1) → int¶ Generate a random integer from
range(start[, stop][, step]).
If only one argument is provided, samples from
range(start)
.- Parameters
- Raises
TypeError – if any arguments are not integral.
ValueError – if step is 0
-
RepeatableRandomSequence.
randint
(a: int, b: int) → int¶ Generate a random integer in the range [a, b].
This is an alias for
randrange(a, b + 1)
, included for compatibility with the builtinrandom
module.
-
RepeatableRandomSequence.
randbytes
(num_bytes) → bytes¶ Generate a sequence of random bytes.
The index will only increment once, regardless of the number of bytes in the sequence.
This method produces similar results to
with rrs.cascade(): result = bytes([rrs.randrange(256) for _ in range(num_bytes])
but offers significantly-improved performance and does not discard excess random bits.
- Returns
A
bytes
object with num_bytes random integers in [0, 255].
-
RepeatableRandomSequence.
geometric
(mean: float, include_zero: bool = False) → int¶ Generate integers according to a geometric distribution.
If include_zero is
False
, returns integers from 1 to infinity according to \(\text{Pr}(x = k) = p {(1 - p)}^{k - 1}\) where \(p = \frac{1}{\mathit{mean}}\).If include_zero is
True
, returns integers from 0 to infinity according to \(\text{Pr}(x = k) = p {(1 - p)}^{k}\) where \(p = \frac{1}{\mathit{mean} + 1}\).- Parameters
- Raises
ValueError – if mean is less than 1 if include_zero is
False
, or less than 0 if include_zero isTrue
.
-
RepeatableRandomSequence.
finitegeometric
(s: float, n: int)¶ Generate a random integer according to a geometric-like distribution with exponent s and finite support {1, …, n}.
The finite geometric distribution is defined by the equation
\[\text{Pr}(x=k) = \frac{s^{k}}{\sum_{i=1}^{N} s^{i}}\]- Raises
ValueError – if n is not at least 1
-
RepeatableRandomSequence.
zipfmandelbrot
(s: float, q: float, n: int)¶ Generate a random integer according to a Zipf-Mandelbrot distribution with exponent s, offset q, and support {1, …, n}.
The Zipf-Mandelbrot distribution is defined by the equation
\[\text{Pr}(x=k) = \frac{(k + q)^{-s}}{\sum_{i=1}^{N} (i+q)^{-s}}\]- Raises
ValueError – if n is not at least 1
Categorical distributions¶
-
RepeatableRandomSequence.
choice
(sequence: Sequence)¶ Choose a single random element from within a sequence.
- Raises
IndexError – if the sequence is empty.
-
RepeatableRandomSequence.
choices
(population: Sequence, weights: Optional[Sequence[float]] = None, *, cum_weights: Optional[Sequence[float]] = None, k: int = 1) → Sequence¶ Choose k elements from a population, with replacement.
Either relative (via weights) or cumulative (via cum_weights) weights may be specified. If no weights are specified, selections are made uniformly with equal probability.
Tip
Note that
k == 0
is a valid input, which returns an empty list an advances the sequence once even though no elements are chosen.- Parameters
population (sequence) – The population to sample from, which may include duplicate elements.
weights (sequence[float], optional) – Relative weights for each element in the population. Need not sum to 1.
cum_weights (sequence[float], optional) – Cumulative weights for each element in the population, as calculated by something like
list(accumulate(weights))
.k (int) – The number of elements to choose.
- Raises
IndexError – The population is empty.
ValueError – The length of weights or cum_weights does not match the population size.
TypeError – Both weights and cum_weights are specified.
-
RepeatableRandomSequence.
shuffle
(sequence: Sequence) → None¶ Shuffle a sequence in place.
The index is only incremented once.
-
RepeatableRandomSequence.
sample
(population, k: int) → Sequence¶ Choose k unique random elements from a population, without replacement.
Elements are returned in selection order, so all subsets of the returned list are valid samples.
If the population includes duplicate values, each occurrence is as distinct possible selection in the result.
Tip
Note that
k == 0
is a valid input, which returns an empty list an advances the sequence once even though no elements are chosen.- Parameters
- Raises
IndexError – if population is empty.
ValueError – if k is greater than the population size.
Continuous distributions¶
-
RepeatableRandomSequence.
uniform
(a: float, b: float) → float¶ Return a random float uniformly distributed in [a, b).
-
RepeatableRandomSequence.
triangular
(low: float = 0.0, high: float = 1.0, mode: Optional[float] = None) → float¶ Sample from a triangular distribution with lower limit low, upper limit low, and mode mode.
The triangular distribution is defined by
\[\begin{split}\text{P}(x) = \begin{cases} 0 & \text{for } x \lt l, \\ \frac{2(x-l)}{(h-l)(m-l)} & \text{for }l\le x \lt h, \\ \frac{2}{h-l} & \text{for } x = m, \\ \frac{2(h-x)}{(h-l)(h-m)} & \text{for } m \lt x \le h, \\ 0 & \text{for } h \lt x \end{cases}\end{split}\]- Raises
ValueError – if the mode is not in [low, high].
-
RepeatableRandomSequence.
uniformproduct
(n: int) → float¶ Sample from a distribution whose values are the product of N uniformly distributed variables.
This distribution has the following PDF
\[\begin{split}\text{P}(x) = \begin{cases} \frac{(-1)^{n-1} log^{n-1}(x)}{(n - 1)!} & \text{for } x \in [0, 1) \\ 0 & \text{otherwise} \end{cases}\end{split}\]- Raises
ValueError – if n is not at least 1.
-
RepeatableRandomSequence.
gauss
(mu: float, sigma: float) → float¶ Sample from a Gaussian distribution with parameters mu and sigma.
The Gaussian, or Normal distribution is defined by
\[\text{P}(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2} \left(\frac{x - \mu}{\sigma}\right)^2}\]Tip
If multiple normally-distributed values are required, consider using
gausspair()
for improved performance.
-
RepeatableRandomSequence.
gausspair
(mu: float, sigma: float) → Tuple[float, float]¶ Return a pair of independent samples from a Gaussian distribution with parameters mu and sigma.
This method produces similar results to
with rrs.cascade(): result = rrs.gauss(mu, sigma), rrs.gauss(mu, sigma)
but offers improved performance.
-
RepeatableRandomSequence.
lognormvariate
(mu: float, sigma: float) → float¶ Sample from a log-normal distribution with parameters mu and sigma.
The logarithms of values sampled from a log-normal distribution are normally distributed with mean mu and standard deviation sigma. The distribution is defined by
\[\text{P}(x) = \frac{1}{x \sigma \sqrt{2 \pi}} \exp{\left(- \frac{(\ln{x} - \mu)^2}{2 \sigma ^2}\right)}\]
-
RepeatableRandomSequence.
expovariate
(lambd: float) → float¶ Sample from an exponential distribution with rate lambd.
The probability density function for the exponential distribution is defined by \(\text{P}(x)= \lambda e^{-\lambda x}\) where \(x\ge 0\)
-
RepeatableRandomSequence.
vonmisesvariate
(mu: float, kappa: float) → float¶ Sample from a von Mises distribution with parameters mu and kappa.
Samples from a von Mises distribution represent randomly-chosen angles clustered around a mean angle. This distribution is an approximation to the wrapped normal distribution and is defined by
\[\text{P}(x) = \frac{e^{\kappa \cos{(x - \mu)}}}{2 \pi I_0(\kappa)}\]where \(I_0(\kappa)\) is the modified Bessel function of order 0.
-
RepeatableRandomSequence.
gammavariate
(alpha: float, beta: float) → float¶ Sample from a gamma distribution with parameters alpha and beta. Not to be confused with
math.gamma()
!The gamma distribution is defined as
\[\text{P}(x) = \frac{x^{\alpha - 1} e^{-\frac{x}{\beta}}} {\Gamma(\alpha) \beta^{\alpha}}\]Caution
This implementation defines its parameters to match
random.gammavariate()
. The parametrization differs from most common definitions of the gamma distribution, as defined on Wikipedia, et al. Take care when setting alpha and beta!- Raises
ValueError – if either alpha or beta is not greater than 0.
-
RepeatableRandomSequence.
betavariate
(alpha: float, beta: float) → float¶ Sample from a beta distribution with parameters alpha and beta.
The beta distribution is defined by
\[\text{P}(x) = x^{\alpha - 1} (1 - x)^{\beta - 1} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}\]- Returns
A random, beta-distributed value in [0.0, 1.0]
- Raises
ValueError – if either alpha or beta is not greater than 0.
-
RepeatableRandomSequence.
paretovariate
(alpha: float) → float¶ Sample from a Pareto distribution with shape parameter alpha and minimum value 1.
The Pareto distribution has PDF
\[\text{P}(x) = \frac{\alpha}{x^{\alpha + 1}}\]- Raises
ValueError – if alpha is zero.
-
RepeatableRandomSequence.
weibullvariate
(alpha: float, beta: float) → float¶ Sample from a Weibull distribution with scale parameter alpha and shape parameter beta.
The distribution is defined by
\[\text{P}(x) = \frac{\beta}{\alpha} \left(\frac{x}{\alpha}\right)^{k-1} e^{-(x/\alpha)^k}\]where \(x \ge 0\).
- Raises
ValueError – if alpha is zero.
Examples¶
Generating random values:
import samplespace
rrs = samplespace.RepeatableRandomSequence(seed=1234)
samples = [rrs.randrange(30) for _ in range(10)]
print(samples)
# Will always print:
# [21, 13, 28, 19, 16, 29, 28, 24, 29, 25]
Distinct seeds produce unique results:
import samplespace
# Each seed will generate a unique sequence of values
for seed in range(5):
rrs = samplespace.RepeatableRandomSequence(seed=seed)
samples = [rrs.random() for _ in range(5)]
print('Seed: {0}\tResults: {1}'.format(
seed,
' '.join('{:.3f}'.format(x) for x in samples)))
Results depend only on number of previous calls, not their type:
import samplespace
rrs = samplespace.RepeatableRandomSequence(seed=1234)
dummy = rrs.random()
x1 = rrs.random()
rrs.reset()
dummy = rrs.gauss(6.0, 2.0)
x2 = rrs.random()
assert x2 == x1
rrs.reset()
dummy = rrs.randrange(50)
x3 = rrs.random()
assert x3 == x1
rrs.reset()
dummy = list('abcdefg')
rrs.shuffle(dummy)
x4 = rrs.random()
assert x4 == x1
rrs.reset()
with rrs.cascade():
dummy = [rrs.random() for _ in range(10)]
x5 = rrs.random()
assert x5 == x1
Replay random sequences using RepeatableRandomSequence.reset()
:
import samplespace
rrs = samplespace.RepeatableRandomSequence(seed=1234)
samples = [rrs.random() for _ in range(10)]
print(' '.join(samples))
# Using reset() returns the sequence to its initial state
rrs.reset()
samples2 = [rrs.random() for _ in range(10)]
print(' '.join(samples2))
assert samples == sample2
Replay random sequences using RepeatableRandomSequence.getstate()
/
RepeatableRandomSequence.setstate()
:
import samplespace
rrs = samplespace.RepeatableRandomSequence(seed=12345)
# Generate some random values to advance the state
[rrs.random() for _ in range(100)]
# Save the state for later recall
state = rrs.getstate()
print(rrs.random())
# Will print 0.2736967629462168
# Generate some more values
[rrs.random() for _ in range(100)]
# Return the sequence to the saved state. The next value will match
# the value following when the state was saved.
rrs.setstate(state)
print(rrs.random())
# Will also print 0.2736967629462168
Replay random sequences using RepeatableRandomSequence.index()
:
import samplespace
rrs = samplespace.RepeatableRandomSequence(seed=5)
# Generate a sequence of values
samples = [rrs.randrange(10) for _ in range(15)]
print(samples)
# Will print
# [0, 2, 2, 7, 9, 4, 1, 5, 5, 6, 7, 1, 7, 6, 8]
# Rewind the sequence by 5
rrs.index -= 5
# Generate a new sequence, will overlap by 5 elements
samples = [rrs.randrange(10) for _ in range(15)]
print(samples)
# Will print
# [7, 1, 7, 6, 8, 6, 6, 6, 3, 7, 6, 9, 5, 2, 7]
Serialize sequence state as simple data types:
import samplespace
import samplespace.repeatablerandom
import json
rrs = samplespace.RepeatableRandomSequence(seed=12345)
# Generate some random values to advance the state
[rrs.random() for _ in range(100)]
# Save the state for later recall
# State can be serialzied to a dict and serialized as JSON
state = rrs.getstate()
state_as_dict = state.as_dict()
state_as_json = json.dumps(state_as_dict)
print(state_as_json)
# Prints {"seed": 12345, "hash_input": "gxzNfDj4Ypc=", "index": 100}
print(rrs.random())
# Will print 0.2736967629462168
# Generate some more values
[rrs.random() for _ in range(100)]
# Return the sequence to the saved state. The next value will match
# the value following when the state was saved.
new_state_as_dict = json.loads(state_as_json)
new_state = samplespace.repeatablerandom.RepeatableRandomSequenceState.from_dict(new_state_as_dict)
rrs.setstate(new_state)
print(rrs.random())
# Will also print 0.2736967629462168
samplespace.distributions
- Serializable Probability Distributions¶
This module implements a number of useful probability distributions.
Each distribution can be sampled using any random number generator
providing at least the same functionality as the random
module;
this includes samplespace.repeatablerandom.RepeatableRandomSequence
.
The classes in this module are primarily intended for storing information
on random distributions in configuration files using
Distribution.as_dict()
/distribution_from_dict()
or
Distribution.as_list()
/distribution_from_list()
.
See the Examples section for examples on how to do this.
Integer distributions¶
-
class
samplespace.distributions.
DiscreteUniform
(min_val: int, max_val: int)¶ Represents a discrete uniform distribution of integers in [min_value, max_value).
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
max_val
¶ Read-only property for the distribution’s upper limit.
-
property
min_val
¶ Read-only property for the distribution’s lower limit.
-
-
class
samplespace.distributions.
Geometric
(mean: float, include_zero: bool = False)¶ Represents a geometric distribution.
If include_zero is
False
, returns integers from 1 to infinity according to \(\text{Pr}(x = k) = p {(1 - p)}^{k - 1}\) where \(p = \frac{1}{\mathit{mean}}\).If include_zero is
True
, returns integers from 0 to infinity according to \(\text{Pr}(x = k) = p {(1 - p)}^{k}\) where \(p = \frac{1}{\mathit{mean} + 1}\).-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
include_zero
¶ Read-only property for whether or not the distribution’s support includes zero.
-
property
mean
¶ Read-only property for the distribution’s mean.
-
-
class
samplespace.distributions.
FiniteGeometric
(s: float, n: int)¶ Represents a geometric-like distribution with exponent s and finite support {1, …, n}.
The finite geometric distribution is defined by the equation
\[\text{Pr}(x=k) = \frac{s^{k}}{\sum_{i=1}^{N} s^{i}}\]The distribution is defined such that each result is s times as likely to occur as the previous; i.e. \(\text{Pr}(x=k) = s \text{Pr}(x=k-1)\) over the support.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
n
¶ Read-only property for the number of values in the distribution’s support.
-
property
s
¶ Read-only property for the distribution’s s exponent.
-
-
class
samplespace.distributions.
ZipfMandelbrot
(s: float, q: float, n: int)¶ Represents a Zipf-Mandelbrot distribution with exponent s, offset q, and support {1, …, n}.
The Zipf-Mandelbrot distribution is defined by the equation
\[\text{Pr}(x=k) = \frac{(k+q)^{-s}}{\sum_{i=1}^{N} (i+q)^{-s}}\]When
q == 0
, the distribution becomes the Zipf distribution, and as n increases, it approaches the Zeta distribution.-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
n
¶ Read-only property for the number of values in the distribution’s support.
-
property
q
¶ Read-only property for the distribution’s q offset.
-
property
s
¶ Read-only property for the distribution’s s exponent.
-
-
class
samplespace.distributions.
Bernoulli
(p: float)¶ Represents a Bernoulli distribution with parameter p.
Returns
True
with probability p, andFalse
otherwise.-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
p
¶ Read-only property for the distribution’s p parameter.
-
Categorical distributions¶
-
class
samplespace.distributions.
WeightedCategorical
(items: Optional[Sequence[Tuple[float, Any]]] = None, population: Optional[Sequence] = None, weights: Optional[Sequence[float]] = None, *, cum_weights: Optional[Sequence[float]] = None)¶ Represents a categorical distribution defined by a population and a list of relative weights.
Either items, population and weights, or population and cum_weights should be provided, not all three.
- Parameters
items (Sequence[Tuple[Any]]) – A sequence of tuples in the format (weight, relative value).
population (Sequence) – A sequence of possible values.
weights (Sequence[float], optional) – A sequence of relative weights corresponding to each item in the population. Must be the same length as the population list.
cum_weights (Sequence[float], optional) – A sequence of cumulative weights corresponding to each item in the population. Must be the same length as the population list. Only one of weights and cum_weights should be provided.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
cum_weights
¶ A read-only property for the distribution’s cumulative weights.
-
property
items
¶ A read-only property returning a sequence of tuples in the format (weight, relative value).
-
property
population
¶ A read-only property for the distribution’s population.
-
sample
(rand)¶ Sample from the distribution.
- Parameters
rand – The random generator used to generate the sample.
-
class
samplespace.distributions.
UniformCategorical
(population: Sequence)¶ Represents a uniform categorical distribution over a given population.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
population
¶ A read-only property for the distribution’s population.
-
sample
(rand)¶ Sample from the distribution.
- Parameters
rand – The random generator used to generate the sample.
-
-
class
samplespace.distributions.
FiniteGeometricCategorical
(population: Sequence, s: float)¶ Represents a categorical distribution with weights corresponding to a finite geometric-like distribution with exponent s.
The finite geometric distribution is defined by the equation
\[\text{Pr}(x=k) = \frac{s^{k}}{\sum_{i=1}^{N} s^{i}}\]The distribution is defined such that each result is s times as likely to occur as the previous; i.e. \(\text{Pr}(x=k) = s \text{Pr}(x=k-1)\) over the support.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
cum_weights
¶ A read-only property for the distribution’s cumulative weights.
-
property
items
¶ A read-only property returning a sequence of tuples in the format (weight, relative value).
-
property
n
¶ Read-only property for the number of values in the distribution’s support.
-
property
population
¶ A read-only property for the distribution’s population.
-
property
s
¶ Read-only property for the distribution’s s exponent.
-
sample
(rand)¶ Sample from the distribution.
- Parameters
rand – The random generator used to generate the sample.
-
-
class
samplespace.distributions.
ZipfMandelbrotCategorical
(population: Sequence, s: float, q: float)¶ Represents a categorical distribution with weights corresponding to a Zipf-Mandelbrot distribution with exponent s and offset q.
The Zipf-Mandelbrot distribution is defined by the equation
\[\text{Pr}(x=k) = \frac{(k+q)^{-s}}{\sum_{i=1}^{N} (i+q)^{-s}}\]When
q == 0
, the distribution becomes the Zipf distribution, and as n increases, it approaches the Zeta distribution.-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
cum_weights
¶ A read-only property for the distribution’s cumulative weights.
-
property
items
¶ A read-only property returning a sequence of tuples in the format (weight, relative value).
-
property
n
¶ Read-only property for the number of values in the distribution’s support.
-
property
population
¶ A read-only property for the distribution’s population.
-
property
q
¶ Read-only property for the distribution’s q offset.
-
property
s
¶ Read-only property for the distribution’s s exponent.
-
sample
(rand)¶ Sample from the distribution.
- Parameters
rand – The random generator used to generate the sample.
-
Continuous distributions¶
-
class
samplespace.distributions.
Constant
(value)¶ Represents a distribution that always returns a constant value.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
sample
(rand)¶ Sample from the distribution.
- Parameters
rand – The random generator used to generate the sample.
-
property
value
¶ Read-only property for the distribution’s constant value.
-
-
class
samplespace.distributions.
Uniform
(min_val: float = 0.0, max_val: float = 1.0)¶ Represents a continuous uniform distribution with support [min_val, max_val).
The uniform distribution is defined as
\[\begin{split}\text{P}(x) = \begin{cases} \frac{1}{b - a} & \text{for } x \in [a, b) \\ 0 & \text{otherwise} \end{cases}\end{split}\]-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
max_val
¶ Read-only property for the distribution’s upper limit.
-
property
min_val
¶ Read-only property for the distribution’s lower limit.
-
-
class
samplespace.distributions.
Gamma
(alpha: float, beta: float)¶ Represents a gamma distribution with parameters alpha and beta.
The gamma distribution is defined as
\[\text{P}(x) = \frac{x^{\alpha - 1} e^{-\frac{x}{\beta}}} {\Gamma(\alpha) \beta^{\alpha}}\]Caution
This implementation defines its parameters to match
random.gammavariate()
. The parametrization differs from most common definitions of the gamma distribution, as defined on Wikipedia, et al. Take care when setting alpha and beta!-
property
alpha
¶ Read-only property for the distribution’s alpha parameter.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
beta
¶ Read-only property for the distribution’s beta parameter.
-
property
-
class
samplespace.distributions.
Triangular
(low: float = 0.0, high: float = 1.0, mode: Optional[float] = None)¶ Represents a triangular distribution with lower limit low, upper limit low, and mode mode.
The triangular distribution is defined by
\[\begin{split}\text{P}(x) = \begin{cases} 0 & \text{for } x \lt l, \\ \frac{2(x-l)}{(h-l)(m-l)} & \text{for }l\le x \lt h, \\ \frac{2}{h-l} & \text{for } x = m, \\ \frac{2(h-x)}{(h-l)(h-m)} & \text{for } m \lt x \le h, \\ 0 & \text{for } h \lt x \end{cases}\end{split}\]-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
high
¶ Read-only property for the distribution’s upper bound.
-
property
low
¶ Read-only property for the distribution’s lower bound.
-
property
mode
¶ Read-only property for the distribution’s mode, if specified, otherwise
None
.
-
-
class
samplespace.distributions.
UniformProduct
(n: int)¶ Represents a distribution whose values are the product of N uniformly distributed variables.
This distribution has the following PDF
\[\begin{split}\text{P}(x) = \begin{cases} \frac{(-1)^{n-1} log^{n-1}(x)}{(n - 1)!} & \text{for } x \in [0, 1) \\ 0 & \text{otherwise} \end{cases}\end{split}\]-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
n
¶ Read-only property for the number of uniformly distributed variables to multiply.
-
-
class
samplespace.distributions.
LogNormal
(mu: float = 0.0, sigma: float = 1.0)¶ Represents a log-normal distribution with parameters mu and sigma.
The logarithms of values sampled from a log-normal distribution are normally distributed with mean mu and standard deviation sigma. The distribution is defined by
\[\text{P}(x) = \frac{1}{x \sigma \sqrt{2 \pi}} \exp{\left(- \frac{(\ln{x} - \mu)^2}{2 \sigma ^2}\right)}\]-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
mu
¶ Read-only property for the distribution’s mu parameter.
-
sample
(rand) → float¶ Sample from the distribution.
- Parameters
rand – The random generator used to generate the sample.
-
property
sigma
¶ Read-only property for the distribution’s sigma parameter.
-
-
class
samplespace.distributions.
Exponential
(lambd: float)¶ Represents an exponential distribution with rate lambd.
The probability density function for the exponential distribution is defined by \(\text{P}(x)= \lambda e^{-\lambda x}\) where \(x\ge 0\)
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
lambd
¶ Read-only property for the distribution’s rate parameter.
-
-
class
samplespace.distributions.
VonMises
(mu: float, kappa: float)¶ Represents a von Mises distribution with parameters mu and kappa.
Samples from a von Mises distribution represent randomly-chosen angles clustered around a mean angle. This distribution is an approximation to the wrapped normal distribution and is defined by
\[\text{P}(x) = \frac{e^{\kappa \cos{(x - \mu)}}}{2 \pi I_0(\kappa)}\]where \(I_0(\kappa)\) is the modified Bessel function of order 0.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
kappa
¶ Read-only property for the distribution’s kappa parameter.
-
property
mu
¶ Read-only property for the distribution’s mu parameter.
-
-
class
samplespace.distributions.
Beta
(alpha: float, beta: float)¶ Represents a beta distribution with parameters alpha and beta.
The beta distribution is defined by
\[\text{P}(x) = x^{\alpha - 1} (1 - x)^{\beta - 1} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}\]-
property
alpha
¶ Read-only property for the distribution’s alpha parameter.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
beta
¶ Read-only property for the distribution’s beta parameter.
-
property
-
class
samplespace.distributions.
Pareto
(alpha: float)¶ Represents a Pareto distribution with shape parameter alpha and minimum value 1.
The Pareto distribution has PDF
\[\text{P}(x) = \frac{\alpha}{x^{\alpha + 1}}\]-
property
alpha
¶ Read-only property for the distribution’s alpha parameter.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
-
class
samplespace.distributions.
Weibull
(alpha: float, beta: float)¶ Represents a Weibull distribution with scale parameter alpha and shape parameter beta.
The distribution is defined by
\[\text{P}(x) = \frac{\beta}{\alpha} \left(\frac{x}{\alpha}\right)^{k-1} e^{-(x/\alpha)^k}\]where \(x \ge 0\).
-
property
alpha
¶ Read-only property for the distribution’s alpha parameter.
-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
beta
¶ Read-only property for the distribution’s beta parameter.
-
property
-
class
samplespace.distributions.
Gaussian
(mu: float = 0.0, sigma: float = 1.0)¶ Represents a Gaussian distribution with parameters mu and sigma.
The Gaussian, or Normal distribution is defined by
\[\text{P}(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2} \left(\frac{x - \mu}{\sigma}\right)^2}\]-
as_dict
() → Dict¶ Return a representation of the distribution as a dict.
The ‘distribution’ key is the name of the distribution, and the remaining keys are the distribution’s kwargs.
-
as_list
() → List¶ Return a representation of the distribution as a list.
The first element if the list is the name of the distribution, and the subsequent elements are ordered parameters.
-
property
mu
¶ Read-only property for the distribution’s mu parameter.
-
sample
(rand)¶ Sample from the distribution.
- Parameters
rand – The random generator used to generate the sample.
-
property
sigma
¶ Read-only property for the distribution’s sigma parameter.
-
Serialization functions¶
-
samplespace.distributions.
distribution_from_list
(as_list)¶ Build a distribution using a list of parameters.
Expects a list of values in the same form return by a call to
Distribution.as_list()
; i.e.['name', arg0, arg1, ...]
.Examples
>>> tri = Triangular(2.0, 5.0) >>> tri_as_list = tri.as_list() >>> tri_as_list ['triangular', 2.0, 5.0] >>> new_tri = distribution_from_list(tri_as_list) >>> assert tri == new_tri
-
samplespace.distributions.
distribution_from_dict
(as_dict)¶ Build a distribution using a dictionary of keyword arguments.
Expects a dict in the same form as returned by a call to
Distribution.as_dict()
; i.e.{'distribution':'name', 'arg0':'value', 'arg1':'value', ...}
.Examples
>>> gauss = Gaussian(0.8, 5.0) >>> gauss_as_dict = gauss.as_dict() >>> gauss_as_dict {'distribution': 'gaussian', 'mu': 0.8, 'sigma': 5.0} >>> new_gauss = distribution_from_dict(gauss_as_dict) >>> assert gauss == new_gauss
-
classmethod
samplespace.distribution.Distribution.
from_list
()¶ An alias for
distribution_from_list()
.
-
classmethod
samplespace.distribution.Distribution.
from_dict
()¶ An alias for
distribution_from_dict()
.
Examples¶
Sampling from a distribution:
import random
import statistics
from samplespace.distributions import Gaussian, FiniteGeometric
gauss = Gaussian(15.0, 2.0)
samples = [gauss.sample(random) for _ in range(100)]
print('Mean:', statistics.mean(samples))
print('Standard deviation:', statistics.stdev(samples))
geo = FiniteGeometric(['one', 'two', 'three', 'four', 'five'], 0.7)
samples = [geo.sample(random) for _ in range(10)]
print(' '.join(samples))
Using other random generators:
from samplespace import distributions, RepeatableRandomSequence
exponential = distributions.Exponential(0.8)
rrs = RepeatableRandomSequence(seed=12345)
print([exponential.sample(rrs) for _ in range(5)])
# Will always print:
# [1.1959827296976795, 0.6056492468915003, 0.9155454941988664, 0.5653478889068511, 0.6500080335986231]
Representations of distributions:
from samplespace.distributions import Pareto, DiscreteUniform, UniformCategorical
pareto = Pareto(2.5)
print('Pareto as dict:', pareto.as_dict()) # {'distribution': 'pareto', 'alpha': 2.5}
print('Pareto as list:', pareto.as_list()) # ['pareto', 2.5]
discrete = DiscreteUniform(3, 8)
print('Discrete uniform as dict:', discrete.as_dict()) # {'distribution': 'discreteuniform', 'min_val': 3, 'max_val': 8}
print('Discrete uniform as list:', discrete.as_list()) # ['discreteuniform', 3, 8]
cat = UniformCategorical(['string', 4, {'a':'dict'}])
print('Uniform categorical as dict:', cat.as_dict()) # {'distribution': 'uniformcategorical', 'population': ['string', 4, {'a': 'dict'}]}
print('Uniform categorical as list:', cat.as_list()) # ['uniformcategorical', ['string', 4, {'a': 'dict'}]]
Storing distributions as in config files as lists:
import random
from samplespace import distributions
...
skeleton_config = {
'name': 'Skeleton',
'starting_hp': ['gaussian', 50.0, 5.0],
'coins_dropped': ['geometric', 0.8, True],
}
...
class Skeleton(object):
def __init__(self, name, starting_hp, coins_dropped_dist):
self.name = name
self.starting_hp = starting_hp
self.coins_dropped_dist = coins_dropped_dist
def drop_coins(self):
return self.coins_dropped_dist.sample(random)
...
class SkeletonFactory(object):
def __init__(self, config):
self.name = config['name']
self.starting_hp_dist = distributions.distribution_from_list(config['starting_hp'])
self.coins_dropped_dist = distributions.distribution_from_list(config['coins_dropped'])
def make_skeleton(self):
return Skeleton(
self.name,
int(self.starting_hp_dist.sample(random)),
self.coins_dropped_dist)
Storing distributions in config files as dictionaries:
from samplespace import distributions, RepeatableRandomSequence
city_config = {
"building_distribution": {
"distribution": "weightedcategorical",
"items": [
["house", 0.2],
["store", 0.4],
["tree", 0.8],
["ground", 5.0]
]
}
}
rrs = RepeatableRandomSequence()
building_dist = distributions.distribution_from_dict(city_config['building_distribution'])
buildings = [[building_dist.sample(rrs) for col in range(20)] for row in range(5)]
for row in buildings:
for building_type in row:
if building_type == 'house':
print('H', end='')
elif building_type == 'store':
print('S', end='')
elif building_type == 'tree':
print('T', end='')
else:
print('.', end='')
print()
samplespace.algorithms
- General Sampling Algorithms¶
This module implements several general-purpose sampling algorithms.
-
samplespace.algorithms.
sample_discrete_roulette
(randfloat: Callable[], float], cum_weights: Sequence[float]) → int¶ Sample from a given discrete distribution given its cumulative weights, using binary search / the roulette wheel selection algorithm.
- Parameters
- Returns
An index into the population from 0 to
len(cum_weights) - 1
.
-
class
samplespace.algorithms.
AliasTable
(probability: Sequence[float], alias: Sequence[int])¶ Sample from a given discrete distribution given its relative weights, using the alias table algorithm.
Alias tables are slow to construct, but offer increased sampling speed compared to the roulette wheel algorithm.
Construct a table using
from_weights()
.-
classmethod
from_weights
(weights: Sequence[float])¶ Construct an alias table using a list of relative weights.
The provided weights need not sum to 1.0.
-
sample
(randbelow: Callable[[int], int], randchance: Callable[[float], bool]) → int¶ Sample from the alias table.
- Parameters
randbelow (Callable[[int], int]) – A function returning a random int in [0, arg), e.g.
random.randrange()
. Will only be called once per sample.randchance (Callable[[float], bool) – A function returning
True
with chance arg andFalse
otherwise, e.g.lambda arg: random.random() < arg
. Will only be called once per sample.
- Returns
An index into the population from 0 to
len(weights) - 1
.
-
classmethod
Examples¶
Sampling from a discrete categorical distribution using the Roulette Wheel Selection algorithm:
import random
import itertools
from samplespace import algorithms
population = 'abcde'
weights = [0.1, 0.4, 0.2, 0.3, 0.5]
cum_weights = list(itertools.accumulate(weights))
indices = [algorithms.sample_discrete_roulette(random.random, cum_weights)
for _ in range(25)] # [0, 4, 4, ...]
samples = [population[index] for index in indices] # ['a', 'e', 'e', ...]
Sampling from a discrete categorical distribution using the Alias Table algorithm and a repeatable sequence:
from samplespace import algorithms, RepeatableRandomSequence
population = 'abcde'
weights = [0.1, 0.4, 0.2, 0.3, 0.5]
rrs = RepeatableRandomSequence(seed=12345)
at = algorithms.AliasTable.from_weights(weights)
indices = [at.sample(rrs.randrange, rrs.chance) for _ in range(25)]
samples = [population[index] for index in indices] # ['e', 'e', 'c', ...]
# Note that rrs.index == 50 at this point, since two random values
# were required for each sample.
# It is also possible to use AliasTable within a cascade:
rrs.reset()
indices = [0] * 25
for i in range(len(indices)):
with rrs.cascade():
indices[i] = at.sample(rrs.randrange, rrs.chance)
samples = [population[index] for index in indices] # ['e', 'd', 'b', ...]
# Because cascades were used, rrs.index == 25.
# Note that the values produced are different than the previous
# attempt, because a different number of logical samples were made.
samplespace.pyyaml_support
- YAML serialization support¶
When enabled, this module provides YAML serialization support for
classes in samplespace.repeatablerandom
,
samplespace.distributions
, and samplespace.algorithms
.
The yaml
module provided by
PyYaml is required to use this
module, and an exception will be thrown if it is not available.
Tip
Once the module is imported, enable_yaml_support()
must be
called to enable YAML support.
-
samplespace.pyyaml_support.
enable_yaml_support
(_force=False)¶ Add YAML serialization support. This function only needs to be called once per application.
Examples¶
Repeatable random sequence:
>>> import yaml
>>> from samplespace import RepeatableRandomSequence
>>> import samplespace.pyyaml_support
>>> samplespace.pyyaml_support.enable_yaml_support()
>>>
>>> rrs = RepeatableRandomSequence(seed=678)
>>> [rrs.randrange(10) for _ in range(5)]
[7, 7, 0, 8, 3]
>>>
>>> # Serialize the sequence as YAML
...
>>> as_yaml = yaml.dump(rrs)
>>> as_yaml
'!samplespace.rrs\nhash_input: s1enBV+SSXk=\nindex: 5\nseed: 678\n'
>>>
>>> # Generate some random values to compare against later
...
>>> [rrs.randrange(10) for _ in range(5)]
[0, 5, 1, 3, 9]
>>> [rrs.randrange(10) for _ in range(5)]
[7, 1, 1, 0, 5]
>>> [rrs.randrange(10) for _ in range(5)]
[2, 9, 1, 4, 8]
>>>
>>> # Restore the saved sequence
...
>>> rrs2 = yaml.load(as_yaml, Loader=yaml.FullLoader)
>>>
>>> # Verify that values match those at time of serialization
...
>>> [rrs2.randrange(10) for _ in range(5)]
[0, 5, 1, 3, 9]
>>> [rrs2.randrange(10) for _ in range(5)]
[7, 1, 1, 0, 5]
>>> [rrs2.randrange(10) for _ in range(5)]
[2, 9, 1, 4, 8]
Distributions:
>>> import yaml
>>> from samplespace import distributions
>>> import samplespace.pyyaml_support
>>> samplespace.pyyaml_support.enable_yaml_support()
>>>
>>> gamma = distributions.Gamma(5.0, 3.0)
>>> gamma_as_yaml = yaml.dump(gamma)
>>> print(gamma_as_yaml)
!samplespace.distribution
alpha: 5.0
beta: 3.0
distribution: gamma
>>> assert yaml.load(gamma_as_yaml, Loader=yaml.FullLoader) == gamma
>>>
>>> zipf = distributions.Zipf(0.9, 10)
>>> zipf_as_yaml = yaml.dump(zipf)
>>> print(zipf_as_yaml)
!samplespace.distribution
distribution: zipf
n: 10
s: 0.9
>>> assert yaml.load(zipf_as_yaml, Loader=yaml.FullLoader) == zipf
>>>
>>> wcat = distributions.WeightedCategorical(
... population='abcde',
... cum_weights=[0.1, 0.4, 0.6, 0.7, 1.1])
>>> wcat_as_yaml = yaml.dump(wcat)
>>> print(wcat_as_yaml)
!samplespace.distribution
distribution: weightedcategorical
items:
- - a
- 0.1
- - b
- 0.30000000000000004
- - c
- 0.19999999999999996
- - d
- 0.09999999999999998
- - e
- 0.40000000000000013
>>> assert yaml.load(wcat_as_yaml, Loader=yaml.FullLoader) == wcat
Algorithms:
>>> import yaml
>>> from samplespace.algorithms import AliasTable
>>> import samplespace.pyyaml_support
>>> samplespace.pyyaml_support.enable_yaml_support()
>>>
>>> at = AliasTable.from_weights([0.1, 0.3, 0.5, 0.2, 0.6, 0.7])
>>> at.alias
[4, 5, 0, 5, 2, 4]
>>> at.probability
[0.25, 0.7499999999999998, 1.0, 0.5, 0.7499999999999991, 0.9999999999999996]
>>>
>>> at_as_yaml = yaml.dump(at)
>>> print(at_as_yaml)
!samplespace.aliastable
alias:
- 4
- 5
- 0
- 5
- 2
- 4
probability:
- 0.25
- 0.7499999999999998
- 1.0
- 0.5
- 0.7499999999999991
- 0.9999999999999996
>>> at2 = yaml.load(at_as_yaml, Loader=yaml.FullLoader)
>>> assert at2.alias == at.alias
>>> assert at2.probability == at.probability
Random Generator Quality¶
RepeatableRandomSequence
produces high-quality psuedo-random data
regardless of seed or cascade size.
The script tests/generate_random_bytes.py
can be used to generate
files filled with random bytes for testing with external testing programs,
or tests/stream_random_bytes.py [seed]
can be used to stream an
unlimited number of random bytes to stdout.
Randomness Test Results¶
Results from ENT:
$ python stream_random_bytes.py | head -c 8388608 | ent
Entropy = 7.999979 bits per byte.
Optimum compression would reduce the size
of this 8388608 byte file by 0 percent.
Chi square distribution for 8388608 samples is 244.76, and randomly
would exceed this value 66.64 percent of the times.
Arithmetic mean value of data bytes is 127.4945 (127.5 = random).
Monte Carlo value for Pi is 3.141415391 (error 0.01 percent).
Serial correlation coefficient is 0.000034 (totally uncorrelated = 0.0).
Results from Dieharder:
#=============================================================================#
# dieharder version 3.31.1 Copyright 2003 Robert G. Brown #
#=============================================================================#
test_name |ntup| tsamples |psamples| p-value |Assessment
#=============================================================================#
diehard_birthdays| 0| 100| 100|0.46260921| PASSED
diehard_operm5| 0| 1000000| 100|0.30499458| PASSED
diehard_rank_32x32| 0| 40000| 100|0.84557704| PASSED
diehard_rank_6x8| 0| 100000| 100|0.75986634| PASSED
diehard_bitstream| 0| 2097152| 100|0.26336497| PASSED
diehard_opso| 0| 2097152| 100|0.09991336| PASSED
diehard_oqso| 0| 2097152| 100|0.47106073| PASSED
diehard_dna| 0| 2097152| 100|0.09032458| PASSED
diehard_count_1s_str| 0| 256000| 100|0.93473326| PASSED
diehard_count_1s_byt| 0| 256000| 100|0.84493493| PASSED
diehard_parking_lot| 0| 12000| 100|0.33164281| PASSED
diehard_2dsphere| 2| 8000| 100|0.90268828| PASSED
diehard_3dsphere| 3| 4000| 100|0.87557333| PASSED
diehard_squeeze| 0| 100000| 100|0.75415145| PASSED
diehard_sums| 0| 100| 100|0.15694874| PASSED
diehard_runs| 0| 100000| 100|0.19158234| PASSED
diehard_runs| 0| 100000| 100|0.27969502| PASSED
diehard_craps| 0| 200000| 100|0.99326664| PASSED
diehard_craps| 0| 200000| 100|0.37802538| PASSED
marsaglia_tsang_gcd| 0| 40000| 100|0.44158442| PASSED
marsaglia_tsang_gcd| 0| 40000| 100|0.38455344| PASSED
sts_monobit| 1| 100000| 100|0.90249899| PASSED
sts_runs| 2| 100000| 100|0.77696296| PASSED
sts_serial| 1| 100000| 100|0.90249899| PASSED
sts_serial| 2| 100000| 100|0.59050254| PASSED
sts_serial| 3| 100000| 100|0.90871975| PASSED
sts_serial| 3| 100000| 100|0.99336098| PASSED
sts_serial| 4| 100000| 100|0.34637953| PASSED
sts_serial| 4| 100000| 100|0.23240191| PASSED
sts_serial| 5| 100000| 100|0.65698384| PASSED
sts_serial| 5| 100000| 100|0.20441284| PASSED
sts_serial| 6| 100000| 100|0.11455355| PASSED
sts_serial| 6| 100000| 100|0.10947486| PASSED
sts_serial| 7| 100000| 100|0.33236940| PASSED
sts_serial| 7| 100000| 100|0.90065107| PASSED
sts_serial| 8| 100000| 100|0.71691528| PASSED
sts_serial| 8| 100000| 100|0.72658614| PASSED
sts_serial| 9| 100000| 100|0.05947803| PASSED
sts_serial| 9| 100000| 100|0.35266839| PASSED
sts_serial| 10| 100000| 100|0.25645609| PASSED
sts_serial| 10| 100000| 100|0.96328452| PASSED
sts_serial| 11| 100000| 100|0.26614777| PASSED
sts_serial| 11| 100000| 100|0.75690153| PASSED
sts_serial| 12| 100000| 100|0.72460595| PASSED
sts_serial| 12| 100000| 100|0.59228401| PASSED
sts_serial| 13| 100000| 100|0.43108810| PASSED
sts_serial| 13| 100000| 100|0.39814551| PASSED
sts_serial| 14| 100000| 100|0.93222323| PASSED
sts_serial| 14| 100000| 100|0.10847133| PASSED
sts_serial| 15| 100000| 100|0.97030212| PASSED
sts_serial| 15| 100000| 100|0.74232610| PASSED
sts_serial| 16| 100000| 100|0.30293298| PASSED
sts_serial| 16| 100000| 100|0.74289769| PASSED
rgb_bitdist| 1| 100000| 100|0.52609114| PASSED
rgb_bitdist| 2| 100000| 100|0.35992730| PASSED
rgb_bitdist| 3| 100000| 100|0.80654719| PASSED
rgb_bitdist| 4| 100000| 100|0.84084274| PASSED
rgb_bitdist| 5| 100000| 100|0.54266728| PASSED
rgb_bitdist| 6| 100000| 100|0.04443687| PASSED
rgb_bitdist| 7| 100000| 100|0.22723549| PASSED
rgb_minimum_distance| 4| 10000| 1000|0.23472614| PASSED
rgb_permutations| 5| 100000| 100|0.68681017| PASSED
rgb_lagged_sum| 0| 1000000| 100|0.57837513| PASSED
rgb_kstest_test| 0| 10000| 1000|0.52944830| PASSED
Distribution Validity Tests¶
The scripts tests/scripts/calculate_*_metrics.py
can be used to
check that Repeatable Random Sequences produce values with the correct
probability distributions.
Bear in mind that as limited samples are taken from each distribution, some false negatives may occur. If anomalous results are present, try re-running the scripts.