The implementation is needlessly obfuscated. It's not immediately clear that
MODMULT is a simple modular multiplication, despite its name. Specifically it's
not clear which of the parameters is the second factor.
Furthermore the stated period is off-by-one: A value of `0` is part of its own
chain, thus it may not be included in the period of the underlying generators.
As all the input bits and pieces are mixed with SHA-1, cross-architecture
compatibility is not required and we can just mix in whatever they may look
like in memory, instead of going through the `write_*()` helpers that were
created for a previous in-development version that first filled a buffer that
was then hashed (allowing for easy inspection of the input data, but making it
harder to safely add values without checking for buffer overflows all the
time).
This change should also fix a build error on macOS ZTS: The thread ID is an
opaque type and not guaranteed to be arithmetic as per IEEE Std 1003.1-2017.
And indeed macOS defines it as a pointer to a structure, failing due to the
implicit pointer to integer conversion.
Now that the CombinedLCG is no longer used within GENERATE_SEED(), we can
safely use the CSPRNG with a php_random_generate_fallback_seed() fallback to
seed the CombinedLCG.
* random: Make Mt19937's `mode` field an enum
* random: Reorder the `php_random_status_state_mt19937` struct
Empirical testing did not show any differences in performance, but it makes
sense to me to put the `count` field (which is accessed for every invocation of
Mt19937) at the beginning of the struct, keeping it near the values from the
state array that are returned first, resulting in only a single cache line load
if only a small amount of numbers are requested.
It naturally follows to also put the `mode` field there and move the
humongous state array to the end.
* random: Remove the `MT_N` constant
`MT_N` is an awfully generic name that bleeds into every file including
`php_random.h`. As it's an implementation detail, remove it entirely to keep
`php_random.h` clean.
To prevent the state struct from diverging from the implementation, the size of
the state vector is statically verified. Furthermore there are phpt tests
verifying the Mt19937 output across a reload, revealing when the state vector
is reloaded too early or too late.
These are always dynamically allocated in GINIT, thus always take up memory. By
embedding them here we can avoid the dynamic allocation and additional pointer
indirection accessing them.
The test script:
<?php
for ($i = 0; $i < 9999999; $i++) mt_rand(1, 100);
Appears to run slightly faster with this change applied: Before this change it
always ran in just over 3 seconds, after this change I was also seeing times
below 3 seconds. Howver results are too close and too jittery to state this
performance improvement as a fact.
PHP 8.1 and below interpreted unknown modes as `MT_RAND_MT19937`, but PHP 8.2+
interprets them as `MT_RAND_PHP`.
Align the behavior with PHP 8.1 and below, because folks should be steered
towards the standard mode.
* random: Expose xoshiro256**'s seeding functions
* random: Expose pcgoneseq128xslrr64's seeding functions
* random: Expose Mt19937's seeding functions
* random: Expose CombinedLCG's seeding functions
* random: Call php_random_mt19937_seed32 to seed the global Mt19937
This avoids the function pointer indirection and improves type safety.
* random: NULL the generic seeding function
Different engines work quite differently, it is not useful to attempt to seed
them in a generic way using a 64 bit integer. As an example Mt19937 completely
ignores the upper 32 bits.
* random: Remove the `seed` member from `php_random_algo`
See the explanation in the previous commit for the reasoning. This member is
unused since the previous commit and was not consistently available even before
that (specifically for the Secure engine).
* UPGRADING.INTERNALS
* random: Remove useless cast in `php_mt_srand()`
* random: Remove `php_random_status`
Since 162e1dce98, the `php_random_status` struct
contains just a single `void*`, resulting in needless indirection when
accessing the engine state and thus decreasing readability because of the
additional non-meaningful `->state` references / the local helper variables.
There is also a small, but measurable performance benefit:
<?php
$e = new Random\Engine\Xoshiro256StarStar(0);
$r = new Random\Randomizer($e);
for ($i = 0; $i < 15; $i++)
var_dump(strlen($r->getBytes(100000000)));
goes from roughly 3.85s down to 3.60s.
The names of the `status` variables have not yet been touched to keep the diff
small. They will be renamed to the more appropriate `state` in a follow-up
cleanup commit.
* Introduce `php_random_algo_with_state`
This allows consumers of just the CSPRNG to include a much smaller header. It
also allows to verify at a glance whether a source file might use non-secure
randomness.
This commit includes the new header wherever the CSPRNG is used, possibly
replacing the inclusion of php_random.h if nothing else is used, but also
includes it in the main php_random.h header for compatibility.
Somewhat related to 45f8cfaf10,
2b30f18708, and
b14dd85dca.
As the `__construct()` implementation is engine-specific anyway, we know what
engine were dealing with and can just call the seeding function directly
instead of going through a function pointer.
This likely improves construction performance a little, but I did not measure.
These internal-only functions accepted a `php_random_status`, just to extract
the internal state from the `state` pointer. Make them take the state struct
directly to improve type safety.
The implementation of `php_random_uint128_*` exists specifically for
pcgoneseq128xslrr66 and takes up a third of php_random.h. Split it into its own
header to keep php_random.h focused on the functionality directly related to
randomness.
Instead of returning the generated `uint64_t` and providing the size (i.e. the
number of bytes of the generated value) out-of-band via the
`last_generated_size` member of the `php_random_status` struct, the `generate`
function is now expected to return a new `php_random_result` struct containing
both the `size` and the `result`.
This has two benefits, one for the developer:
It's no longer possible to forget setting `last_generated_size` to the correct
value, because it now happens at the time of returning from the function.
and the other benefit is for performance:
The `php_random_result` struct will be returned as a register pair, thus the
`size` will be directly available without reloading it from main memory.
Checking a simplified version of `php_random_range64()` on Compiler Explorer
(“Godbolt”) with clang 17 shows a single change in the resulting assembly
showcasing the improvement (https://godbolt.org/z/G4WjdYxqx):
- add rbp, qword ptr [r14]
+ add rbp, rdx
Empirical testing confirms a measurable performance increase for the
`Randomizer::getBytes()` method:
<?php
$e = new Random\Engine\Xoshiro256StarStar(0);
$r = new Random\Randomizer($e);
var_dump(strlen($r->getBytes(100000000)));
goes from 250ms (before the change) to 220ms (after the change). While
generating 100 MB of random data certainly is not the most common use case, it
confirms the theoretical improvement in practice.
The reference implementation of the "Drawing Random Floating-Point Numbers from
an Interval" paper contains two mistakes that will result in erroneous values
being returned under certain circumstances:
- For large values of `g` the multiplication of `k * g` might overflow to
infinity.
- The value of `ceilint()` might exceed 2^53, possibly leading to a rounding
error when promoting `k` to double within the multiplication of `k * g`.
This commit updates the implementation based on Prof. Goualard suggestions
after reaching out to him. It will correctly handle inputs larger than 2^-1020
in absolute values. This limitation will be documented and those inputs
possibly be rejected in a follow-up commit depending on performance concerns.
This macro is no longer used within php-src since
60ace13f9c, it invokes undefined behavior
depending on the input and the corresponding MT_RAND_PHP mode was deprecated in
PHP 8.3.
Thus remove this macro. Any remaining non-php-src user should just inline it
into their code, but should ideally migrate to a non-biased scaler. In any case
the undefined behavior of the original implementation should be accounted for.
`PHPAPI` is defined in `php.h`. It appears that without the explicit include,
recent versions of Visual Studio Code’s intellisense (rightfully) no longer
detect `PHPAPI`. This will then lead to a misparsing of the file, because
`PHPAPI` is assumed to be the return type and the actual return type is assumed
to be the function name, thus expecting a semicolon after the actual return
type. This in turn colors the entire header in red due to the detected syntax
error(s), making development very hard / impossible.
This did not cause issues outside of the IDE use case, because apparently all
users of `php_random.h` include `php.h` before including `php_random.h`.
The CSPRNG is a delicate and security relevant piece of code and having it in
the giant random.c makes it much harder to verify changes to it. Split it into
a separate file.
* random: Add `max_offset` local to Randomizer::getBytesFromString()
* random: Use branchless implementation for mask generation in Randomizer::getBytesFromString()
This was benchmarked against clzl with a standalone script with random inputs
and is slightly faster. clzl requires an additional branch to handle the
source_length = 1 / max_offset = 0 case.
* Improve comment for masking in Randomizer::getBytesFromString()
With a single byte we can choose offsets between 0x00 and 0xff, thus 0x100
different offsets. We only need to use the slow path for sources of more than
0x100 bytes.
The previous version was correct with regard to the output expectations, it was
just slower than necessary. Better fix this now while we still can before being
bound by our BC guarantees with regard to emitted sequences.
This also adds a test to verify the behavior: For powers of two we never reject
any values during rejection sampling, we just need to mask off the unneeded
bits. Thus we can specifically verify that the number of calls to the engine
match the expected amount. We also verify that all the possible values are
emitted to make sure the masking does not remove any required bits. For inputs
longer than 0x100 bytes we need trust the `range()` implementation to be
unbiased, but still verify the number of engine calls and perform a basic
output check.