This has been sporadically failing on macos for a while. We've tried
a couple of things (using int arithmetic, normalizing to gettimeofday),
but this issue remains. I'm adding some tolerance to the test to
avoid CI failures.
The use of gettimeofday in time_sleep_until is technically incorrect, it's not possible
to use gettimeofday in this way reliably on any platform: It relies on operating system
global structures, which may be modified by any other process on the system at any time.
While in practice, users may be ignoring this flaw, it entirely depends on the other software
running on the system to which the application is deployed, there is no possible way to write
a test that will always pass on any system, therefore it must be marked XFAIL.
Particularly on slower VMs, the sporadic fails can still happen.
The timing is kept in an uncritical range, but allows the tests
pass there. Mayby, it'd make sense to introduce a new group for
this kind of tests, so tests requiring exact time measurement
can be avoided on unsuitable environments.