fanf: (Default)
[personal profile] fanf

https://dotat.at/@/2023-06-23-random-double.html

Here are a couple of algorithms for generating uniformly distributed floating point numbers 0.0 <= n < 1.0 using an unbiased random bit generator and IEEE 754 double precision arithmetic. Both of them depend on details of how floating point numbers work, so before getting into the algorithms I'll review IEEE 754.

The first algorithm uses bit hacking and type punning. The second uses a hexadecimal floating point literal. They are both fun and elegant in their own ways, but these days the second one is better.

ieee 754

A floating point number has three parts:

  • a sign bit
  • an exponent
  • a significand aka mantissa aka fraction

bit layout of IEEE 754 double precision

It works like a binary version of decimal scientific notation, so its value is

    +/- 1.fff...fff * 2ᴱ

exponent

Instead of using two's complement, the exponent is stored as a biased integer,

    E = eeeeeeeeeee - 1023

The maximum value of the exponent bits is 2047, and the bias is half that, so 2⁰ is represented as 1023 in the exponent bits.

(A consequence of using biased integers instead of two's complement is that IEEE 754 floating point numbers sort correctly if you treat them as sign-magnitude integers.)

fraction

In decimal scientific notation, a number is normalized by adjusting its exponent so its leftmost non-zero digit (most significant digit) is immediately to the left of the decimal point.

Similarly, a floating point number has its leftmost non-zero bit immediately to the left of its binary point. Because a non-zero bit is always 1, there is no need to store it explicitly. So double-precision numbers actually have 53 significant bits, of which 52 appear in their representation.

the problem

To get a uniform distribution of random floating point numbers 0.0 <= n < 1.0:

  • half of the random results must be 0.5 <= n < 1.0, with E = -1;
  • a quarter of the results must be 0.25 <= n < 0.5, with E = -2;
  • an eighth must be 0.125 <= n < 0.25, with E = -3;
  • etc. usw.

bithacking solution

Observe that floating point numbers 1.0 <= n < 2.0 span the same range of values as we want, but all the numbers have the same exponent E = 0.

So the idea is to use bitwise integer operations to put together a number with,

  • sign bit = 0
  • eeeeeeeeeee = 1023 = 0x3FF so that E = 0
  • fff...fff filled with unbiased random bits

Then the bits of the integer are reinterpreted as the bits of a floating point number, subtract 1.0, and that's the result.

However, C's strict aliasing rules make it difficult to do this kind of reinterpretation. For example, it is often done using a union, but the C standard says that is undefined behaviour. But there is a loophole in the rules.

    double pcg64_bithack_double(pcg64_t *rng) {
        double d;
        uint64_t u = pcg64_random(rng);
        /* 52 bits of randomness in the little end */
        u = u >> 12;
        /* set exponent to 0 + bias = 1023 */
        u = u | (1023 << 52);
        /* reinterpret as floating point */
        memmove(&d, &u, sizeof(d));
        return(d - 1.0);
    }

Modern compilers will inline the memmove() so the function ends up being compiled correctly and efficiently.

In practice, using a union is OK because too much code would break if compilers were as strict as they could be, and a union will be more efficient with older compilers.

hexadecimal solution

C99 added support for hexadecimal floating point literals, which look like

    0xHHHH.HHHHp+dddd

Slightly weirdly, while the fraction is written in hex, the exponent (p for power of 2) is written in decimal.

Hex floating literals are useful when it's important to get exact control over the bit pattern in the number, which is much harder if you have to go via decimal.

The idea is to convert a random N-bit integer to floating point,

    /* 53 bits of randomness in the little end */
    uint64_t u = pcg64_random(rng) >> 11;

then divide it by the maximum range of the integer, 2N, so the result is between 0.0 and 1.0. Or get the same effect by multiplying by 2-N.

    /* convert and scale to correct range */
    double d = (double)u * 0x1.0p-53;

Note that this solution returns 53 bits of randomness because it is able to make use of the implicit most-significant bit, whereas the bithack solution only has 52 bits of randomness.

As well as using a hex floating literal, this solution assumes that double precision multiplication is fast. Which is true, because it has been the subject of important benchmarks for decades.

Putting it together gives us this beautifully lucid one-liner.

    double pcg64_random_double(pcg64_t *rng) {
        return((double)(pcg64_random(rng) >> 11) * 0x1.0p-53);
    }

Date: 2023-06-24 09:35 (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
generate FP bit patterns with the same probability distribution as you'd get from rounding genuinely random real numbers to double

Ugh, though, thinking about this, that probability distribution is actually pretty weird, because of the default round-to-nearest semantics! The value 1.0 is a viable output because things can round up to it; weird things happen at exponent boundaries, where half as large a range of reals can round up to each smaller power of 2 as can round down to it; denormals complicate things even further. If I've thought it through correctly, we can get 1.0 with probability 2−54; each float strictly between 0.5 and 1.0 with probability 2−53; 0.5 itself with probability 3 × 2−55, then the next set of nonzero ones with prob 2−54, and so on, so that each power of 2 occurs with a probability that's the arithmetic mean of the probabilities of the numbers next to it. And finally denormals are treated as an extension of the lowest exponent range, so that there's no glitch anywhere near the smallest normalised number, but 0.0 itself can occur with exactly half the probability of the denormal and exponent-1 values just above it.

(At least for this purpose we can ignore the round-to-even tie-breaking case, because it occurs with probability literally zero.)

For all that this is the Platonically-right interpretation of the normal IEEE 754 semantics, it's clearly too horrible to use! Although spigot can implement it already if you pipe random input into it, along the lines of

perl -e 'print "0."; printf "%02x", unpack "C", $a while read STDIN, $a, 1' > /dev/urandom | spigot -D -d0 --rn base16stdin

May 2025

S M T W T F S
     123
45678910
1112 1314151617
18192021222324
25262728293031

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated 2025-05-24 12:16
Powered by Dreamwidth Studios