fanf: (Default)
[personal profile] fanf

https://dotat.at/@/2023-06-23-random-double.html

Here are a couple of algorithms for generating uniformly distributed floating point numbers 0.0 <= n < 1.0 using an unbiased random bit generator and IEEE 754 double precision arithmetic. Both of them depend on details of how floating point numbers work, so before getting into the algorithms I'll review IEEE 754.

The first algorithm uses bit hacking and type punning. The second uses a hexadecimal floating point literal. They are both fun and elegant in their own ways, but these days the second one is better.

ieee 754

A floating point number has three parts:

  • a sign bit
  • an exponent
  • a significand aka mantissa aka fraction

bit layout of IEEE 754 double precision

It works like a binary version of decimal scientific notation, so its value is

    +/- 1.fff...fff * 2ᴱ

exponent

Instead of using two's complement, the exponent is stored as a biased integer,

    E = eeeeeeeeeee - 1023

The maximum value of the exponent bits is 2047, and the bias is half that, so 2⁰ is represented as 1023 in the exponent bits.

(A consequence of using biased integers instead of two's complement is that IEEE 754 floating point numbers sort correctly if you treat them as sign-magnitude integers.)

fraction

In decimal scientific notation, a number is normalized by adjusting its exponent so its leftmost non-zero digit (most significant digit) is immediately to the left of the decimal point.

Similarly, a floating point number has its leftmost non-zero bit immediately to the left of its binary point. Because a non-zero bit is always 1, there is no need to store it explicitly. So double-precision numbers actually have 53 significant bits, of which 52 appear in their representation.

the problem

To get a uniform distribution of random floating point numbers 0.0 <= n < 1.0:

  • half of the random results must be 0.5 <= n < 1.0, with E = -1;
  • a quarter of the results must be 0.25 <= n < 0.5, with E = -2;
  • an eighth must be 0.125 <= n < 0.25, with E = -3;
  • etc. usw.

bithacking solution

Observe that floating point numbers 1.0 <= n < 2.0 span the same range of values as we want, but all the numbers have the same exponent E = 0.

So the idea is to use bitwise integer operations to put together a number with,

  • sign bit = 0
  • eeeeeeeeeee = 1023 = 0x3FF so that E = 0
  • fff...fff filled with unbiased random bits

Then the bits of the integer are reinterpreted as the bits of a floating point number, subtract 1.0, and that's the result.

However, C's strict aliasing rules make it difficult to do this kind of reinterpretation. For example, it is often done using a union, but the C standard says that is undefined behaviour. But there is a loophole in the rules.

    double pcg64_bithack_double(pcg64_t *rng) {
        double d;
        uint64_t u = pcg64_random(rng);
        /* 52 bits of randomness in the little end */
        u = u >> 12;
        /* set exponent to 0 + bias = 1023 */
        u = u | (1023 << 52);
        /* reinterpret as floating point */
        memmove(&d, &u, sizeof(d));
        return(d - 1.0);
    }

Modern compilers will inline the memmove() so the function ends up being compiled correctly and efficiently.

In practice, using a union is OK because too much code would break if compilers were as strict as they could be, and a union will be more efficient with older compilers.

hexadecimal solution

C99 added support for hexadecimal floating point literals, which look like

    0xHHHH.HHHHp+dddd

Slightly weirdly, while the fraction is written in hex, the exponent (p for power of 2) is written in decimal.

Hex floating literals are useful when it's important to get exact control over the bit pattern in the number, which is much harder if you have to go via decimal.

The idea is to convert a random N-bit integer to floating point,

    /* 53 bits of randomness in the little end */
    uint64_t u = pcg64_random(rng) >> 11;

then divide it by the maximum range of the integer, 2N, so the result is between 0.0 and 1.0. Or get the same effect by multiplying by 2-N.

    /* convert and scale to correct range */
    double d = (double)u * 0x1.0p-53;

Note that this solution returns 53 bits of randomness because it is able to make use of the implicit most-significant bit, whereas the bithack solution only has 52 bits of randomness.

As well as using a hex floating literal, this solution assumes that double precision multiplication is fast. Which is true, because it has been the subject of important benchmarks for decades.

Putting it together gives us this beautifully lucid one-liner.

    double pcg64_random_double(pcg64_t *rng) {
        return((double)(pcg64_random(rng) >> 11) * 0x1.0p-53);
    }

June 2025

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated 2025-06-09 08:55
Powered by Dreamwidth Studios