Both macros are supposed to be defined in limits.h (C99) and as such it
is superfluous to provide fallback definitions. Even worse, because
these fallback definitions didn't cater to LP64, ILP64 and SILP64 data
models (and maybe some rather uncommon ones), but just assumed ILP32,
they are confusing.
* Added function for squaring to improve performance of power calculation
* Aligned backslashes
* Removed unnecessary comments
* Extracted common part of multiplication and square functions
* Added comment to bc_fast_square
* Improved wording of bc_mul_finish_from_vector
* Reused new function name
* Replaced macro with function
Fixed the incorrect scale that should be used when dividing by 1, that is,
comparing the divisor and 1 to confirm equality.
Additionally, have increased the number of test cases for bcdiv_by_pow_10.phpt.
In the original specification, the scale of bc_num was directly changed
and compared.
This becomes a problem when objects are supported, so we will modify it
to compare without changing bc_num.
The original calculation method for prod_arr_size allowed for some error,
which could have increased the number of simple loops without byte tricks
at the end of the calculation when converting to bc_num.
The new method calculates the size accurately, so the number of loops does
not increase unnecessarily.
Multiplication is performed after converting to uint32_t/uint64_t, making calculations faster.
---------
Co-authored-by: Niels Dossche <7771979+nielsdos@users.noreply.github.com>
Co-authored-by: Gina Peter Banyard <girgias@php.net>
The code for _bc_do_add and _bc_do_sub were written slightly differently for
similar processing (and add was slower than sub), so I changed the code to one
similar to sub.
Also, _bc_do_add has been changed to use SIMD to perform faster calculations
when possible.
Changed to count trailing zeros using SIMD when converting a string to
a bc_num structure if possible.
Removed unnecessary pointer resetting.
Added UNEXPECTED to some branches.
This simplifies the code, and also might indirectly improve performance
due to a decrease in instruction cache pressure. Although the latter is
probably negligible.
This works because 0x30 has no overlapping bits with [0, 9].
Also avoid some memsets where we do call bc_new_num.
After:
```
1.2066178321838
1.5389559268951
1.6050860881805
```
Before:
```
1.3858470916748
1.6806011199951
1.9091980457306
```
Using SIMD to accelerate the validation.
Using the benchmark from #14076.
After:
```
1.3504369258881
1.6206321716309
1.6845638751984
```
Before:
```
1.4750170707703
1.9039781093597
1.9632289409637
```