NVHPC compiler has several issues that make it impossible to
build liblzma:
- the compiler cannot handle unions that contain pointers that
are not the first members;
- the compiler cannot handle the assembler code in range_decoder.h
(LZMA_RANGE_DECODER_CONFIG has to be set to zero);
- the compiler fails to produce valid code for delta_decode if the
vectorization is enabled, which results in failed tests.
This introduces NVHPC-specific workarounds that address the issues.
The x32 port has a x86-64 ABI in term of all registers but uses only
32bit pointer like x86-32. The assembly optimisation fails to compile on
x32. Given the state of x32 I suggest to exclude it from the
optimisation rather than trying to fix it.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
It's used only for basic bittrees and fixed-size reverse bittree
because those showed a clear benefit on x86-64 with GCC and Clang.
The other methods were more mixed and thus are commented out but
they should be tested on other archs.
The new "safe" range decoder mode is the same as old range decoder, but
now the default behavior of the range decoder will not check if there is
enough input or output to complete the operation. When the buffers are
close to fully consumed, the "safe" operations must be used instead. This
will improve speed because it will reduce the number of branches needed
for most of the range decoder operations.
broken. API has changed a lot and it will still change a
little more here and there. The command line tool doesn't
have all the required changes to reflect the API changes, so
it's easy to get "internal error" or trigger assertions.