![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
https://dotat.at/@/2023-06-23-random-double.html
Here are a couple of algorithms for generating uniformly distributed
floating point numbers 0.0
<=
n <
1.0
using an unbiased
random bit generator and IEEE 754 double precision arithmetic. Both of
them depend on details of how floating point numbers work, so before
getting into the algorithms I'll review IEEE 754.
The first algorithm uses bit hacking and type punning. The second uses a hexadecimal floating point literal. They are both fun and elegant in their own ways, but these days the second one is better.
ieee 754
A floating point number has three parts:
- a sign bit
- an exponent
- a significand aka mantissa aka fraction
It works like a binary version of decimal scientific notation, so its value is
+/- 1.fff...fff * 2ᴱ
exponent
Instead of using two's complement, the exponent is stored as a biased integer,
E = eeeeeeeeeee - 1023
The maximum value of the exponent bits is 2047, and the bias is half
that, so 2⁰
is represented as 1023 in the exponent bits.
(A consequence of using biased integers instead of two's complement is that IEEE 754 floating point numbers sort correctly if you treat them as sign-magnitude integers.)
fraction
In decimal scientific notation, a number is normalized by adjusting its exponent so its leftmost non-zero digit (most significant digit) is immediately to the left of the decimal point.
Similarly, a floating point number has its leftmost non-zero bit immediately to the left of its binary point. Because a non-zero bit is always 1, there is no need to store it explicitly. So double-precision numbers actually have 53 significant bits, of which 52 appear in their representation.
the problem
To get a uniform distribution of random
floating point numbers 0.0
<=
n <
1.0
:
- half of the random results must be
0.5
<=
n<
1.0
, withE = -1
; - a quarter of the results must be
0.25
<=
n<
0.5
, withE = -2
; - an eighth must be
0.125
<=
n<
0.25
, withE = -3
; - etc. usw.
bithacking solution
Observe that floating point numbers 1.0
<=
n <
2.0
span the same range of values as we want,
but all the numbers have the same exponent E = 0
.
So the idea is to use bitwise integer operations to put together a number with,
- sign bit = 0
eeeeeeeeeee = 1023 = 0x3FF
so thatE = 0
fff...fff
filled with unbiased random bits
Then the bits of the integer are reinterpreted as the bits of a floating point number, subtract 1.0, and that's the result.
However, C's strict aliasing rules make it difficult to do this kind
of reinterpretation. For example, it is often done using a union
,
but the C standard says that is undefined behaviour. But there is a
loophole in the rules.
double pcg64_bithack_double(pcg64_t *rng) {
double d;
uint64_t u = pcg64_random(rng);
/* 52 bits of randomness in the little end */
u = u >> 12;
/* set exponent to 0 + bias = 1023 */
u = u | (1023 << 52);
/* reinterpret as floating point */
memmove(&d, &u, sizeof(d));
return(d - 1.0);
}
Modern compilers will inline the memmove()
so the function ends up
being compiled correctly and efficiently.
In practice, using a union
is OK because too much code would break
if compilers were as strict as they could be, and a union
will be
more efficient with older compilers.
hexadecimal solution
C99 added support for hexadecimal floating point literals, which look like
0xHHHH.HHHHp+dddd
Slightly weirdly, while the fraction is written in hex, the exponent
(p
for power of 2) is written in decimal.
Hex floating literals are useful when it's important to get exact control over the bit pattern in the number, which is much harder if you have to go via decimal.
The idea is to convert a random N-bit integer to floating point,
/* 53 bits of randomness in the little end */
uint64_t u = pcg64_random(rng) >> 11;
then divide it by the maximum range of the integer, 2N, so the result is between 0.0 and 1.0. Or get the same effect by multiplying by 2-N.
/* convert and scale to correct range */
double d = (double)u * 0x1.0p-53;
Note that this solution returns 53 bits of randomness because it is able to make use of the implicit most-significant bit, whereas the bithack solution only has 52 bits of randomness.
As well as using a hex floating literal, this solution assumes that double precision multiplication is fast. Which is true, because it has been the subject of important benchmarks for decades.
Putting it together gives us this beautifully lucid one-liner.
double pcg64_random_double(pcg64_t *rng) {
return((double)(pcg64_random(rng) >> 11) * 0x1.0p-53);
}
no subject
Date: 2023-06-24 09:35 (UTC)Ugh, though, thinking about this, that probability distribution is actually pretty weird, because of the default round-to-nearest semantics! The value 1.0 is a viable output because things can round up to it; weird things happen at exponent boundaries, where half as large a range of reals can round up to each smaller power of 2 as can round down to it; denormals complicate things even further. If I've thought it through correctly, we can get 1.0 with probability 2−54; each float strictly between 0.5 and 1.0 with probability 2−53; 0.5 itself with probability 3 × 2−55, then the next set of nonzero ones with prob 2−54, and so on, so that each power of 2 occurs with a probability that's the arithmetic mean of the probabilities of the numbers next to it. And finally denormals are treated as an extension of the lowest exponent range, so that there's no glitch anywhere near the smallest normalised number, but 0.0 itself can occur with exactly half the probability of the denormal and exponent-1 values just above it.
(At least for this purpose we can ignore the round-to-even tie-breaking case, because it occurs with probability literally zero.)
For all that this is the Platonically-right interpretation of the normal IEEE 754 semantics, it's clearly too horrible to use! Although
spigot
can implement it already if you pipe random input into it, along the lines ofno subject
Date: 2023-06-24 12:26 (UTC)Great observations, thanks :-) In fact a few others made similar comments so I learned some more about random floating point numbers and posted a note https://dotat.at/@/2023-06-23-random-more.html which I have not yet cross-posted here.