Both macros are supposed to be defined in limits.h (C99) and as such it
is superfluous to provide fallback definitions. Even worse, because
these fallback definitions didn't cater to LP64, ILP64 and SILP64 data
models (and maybe some rather uncommon ones), but just assumed ILP32,
they are confusing.
* Added function for squaring to improve performance of power calculation
* Aligned backslashes
* Removed unnecessary comments
* Extracted common part of multiplication and square functions
* Added comment to bc_fast_square
* Improved wording of bc_mul_finish_from_vector
* Reused new function name
* Replaced macro with function
Fixed the incorrect scale that should be used when dividing by 1, that is,
comparing the divisor and 1 to confirm equality.
Additionally, have increased the number of test cases for bcdiv_by_pow_10.phpt.
In the original specification, the scale of bc_num was directly changed
and compared.
This becomes a problem when objects are supported, so we will modify it
to compare without changing bc_num.
The original calculation method for prod_arr_size allowed for some error,
which could have increased the number of simple loops without byte tricks
at the end of the calculation when converting to bc_num.
The new method calculates the size accurately, so the number of loops does
not increase unnecessarily.
Multiplication is performed after converting to uint32_t/uint64_t, making calculations faster.
---------
Co-authored-by: Niels Dossche <7771979+nielsdos@users.noreply.github.com>
Co-authored-by: Gina Peter Banyard <girgias@php.net>
The code for _bc_do_add and _bc_do_sub were written slightly differently for
similar processing (and add was slower than sub), so I changed the code to one
similar to sub.
Also, _bc_do_add has been changed to use SIMD to perform faster calculations
when possible.
Changed to count trailing zeros using SIMD when converting a string to
a bc_num structure if possible.
Removed unnecessary pointer resetting.
Added UNEXPECTED to some branches.
This simplifies the code, and also might indirectly improve performance
due to a decrease in instruction cache pressure. Although the latter is
probably negligible.
This works because 0x30 has no overlapping bits with [0, 9].
Also avoid some memsets where we do call bc_new_num.
After:
```
1.2066178321838
1.5389559268951
1.6050860881805
```
Before:
```
1.3858470916748
1.6806011199951
1.9091980457306
```
Using SIMD to accelerate the validation.
Using the benchmark from #14076.
After:
```
1.3504369258881
1.6206321716309
1.6845638751984
```
Before:
```
1.4750170707703
1.9039781093597
1.9632289409637
```
Since freeing can deal with NULL, we can avoid calling bc_init_num and
avoid resetting the number during parsing.
Using benchmark from #14076.
Before:
```
1.544440984726
2.0288550853729
2.092139005661
```
After:
```
1.5324399471283
1.9081380367279
2.065819978714
```
On my i7-4790 with benchmark from #14076, on top of #14101 I obtain the
following results:
before (with #14101):
```
1.672737121582
2.3618471622467
2.3474779129028
```
after (with #14101 + this):
```
1.5878579616547
2.0568618774414
2.0204811096191
```
Since the two allocations are tied together anyway, we can just use a
single allocation. Moreover, this actually seemed like the intention
because the bc_struct allocation already accounted for the length and
scale.
power is a copy of base and returns early if base->n_scale is non-zero. Since
scale is size_t, it is always greater than or equal to 0, so rscale is always
the value of scale.