If the cache variable TUKLIB_FAST_UNALIGNED_ACCESS is already set,
the autodetection result isn't needed because the option() command
does nothing when the cache variable is already set.
This is largely white space change to indent the if...endif block.
CMake >= 4.1 sets CMAKE_<LANG>_COMPILER_ARCHITECTURE_ID on many
platforms. The list of possible values are documented. Use this
variable when available. On older CMake versions CMAKE_SYSTEM_PROCESSOR
is still used, thus the regexes have to include values like ^amd64 still.
With old CMake versions, checking CMAKE_C_COMPILER_ARCHITECTURE_ID
is somewhat useful with MSVC because CMAKE_SYSTEM_PROCESSOR might
not match the target architecture.
On ARM64, support for fast unaligned memory access was autodetected by
checking if __ARM_FEATURE_UNALIGNED is defined. However, at least GCC
versions up to 15.2.0 define the macro even when -mstrict-align has
been specified. Thus, autodetection with GCC doesn't work correctly,
and binaries built using -mstrict-align can be much slower than they
need to be, unless the user also passes --disable-unaligned-access
to configure or -DTUKLIB_FAST_UNALIGNED_ACCESS=OFF to cmake.
See the GCC bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111555
Workaround the issue by using heuristics with GCC on ARM64.
With Clang, the detection using __ARM_FEATURE_UNALIGNED works.
It also works with GCC on 32-bit ARM.
Fixes: e5f13a6656 ("tuklib_integer: Autodetect support for unaligned access on ARM.")
There are no strings to translate in that file now, but it's good to
list it anyway in case translatable strings are added in the future.
Fixes: 374868d81d ("xz: Move sandboxing code to sandbox.c and improve Landlock sandbox.")
This was already the case in practice because I had forgotten to list
src/xz/sandbox.c in po/POTFILES.in. However, it seems better to never
translate this particular error message. It should almost never occur
and if it does, an untranslated message is should make it easier to
find bug reports about it.
This still relies on CMAKE_SYSTEM_PROCESSOR. CMake 4.1 added more
CMAKE_<LANG>_COMPILER_ARCHITECTURE_ID values to detect the arch in
a more defined manner, but 4.1 is too new to require for now.
Thanks-to: Li Chenggang <lichenggang@deepin.org>
Closes: https://github.com/tukaani-project/xz/pull/186
According to [1] sections 7.4, 8.1, and 8.2, desktop and server
processors support fast unaligned access, but embedded systems likely
don't.
It's important that TUKLIB_FAST_UNALIGNED_ACCESS isn't defined when
-mstrict-align is in use because it will result in slower binaries
even if running on a processor that supports fast unaligned access.
It's because compilers will translate multibyte memcpy() to multiple
byte-by-byte instructions instead of wider loads and stores. The
compression times from [2] show this well:
Unaligned access CFLAGS Compression time
enabled -O2 -mno-strict-align 66.1 s
disabled -O2 -mno-strict-align 79.5 s
disabled -O2 -mstrict-align 79.9 s
enabled -O2 -mstrict-align 129.1 s
There currently (GCC 15.2) is no preprocessor macro on LoongArch
to detect if -mstrict-align or -mno-strict-align is in effect (the
default is -mno-strict-align). Use heuristics to detect which of the
flags is in effect.
[1] https://github.com/loongson/la-softdev-convention/blob/v0.2/la-softdev-convention.adoc
[2] https://github.com/tukaani-project/xz/pull/186#issuecomment-3494570304
Thanks-to: Li Chenggang <lichenggang@deepin.org>
Thanks-to: Xi Ruoyao
See: https://github.com/tukaani-project/xz/pull/186
lzma_lzma_decoder_memusage() returns UINT64_MAX if lc/lp/pb aren't
valid. alone_decoder.c and lzip_decoder.c didn't check the return
value because in both it is known that lc/lp/pb are valid. Make them
call the _nocheck() variant instead which skips the validation (it
already existed for LZMA2's internal use).
Fixes: Coverity CID 464658
Fixes: Coverity CID 897069
The partial_update_mode enumeration had three states, _DISABLED,
_START, and _ENABLED. Main thread changed it from _DISABLED to _START
while holding a mutex. Once set to _START, worker thread changed it
to _ENABLED without a mutex. Later main thread read it without a mutex,
so it could see either _START or _ENABLED. However, it made no
difference because the main thread checked for != _DISABLED, so
it didn't matter if it saw _START or _ENABLED.
Nevertheless, such things must not be done. It's clear it was a mistake
because there were two comments that directly contradicted each
other about how the variable was accessed.
Split the enumeration into two booleans:
- partial_update_enabled: A worker thread locks the mutex to read
this variable and the main thread locks the mutex to change the
value. Because only the main thread modifies the variable, the
main thread can read the value without locking the mutex.
This variable replaces the _DISABLED -> _START transition.
- partial_update_started is for worker thread's internal use and thus
needs no mutex. This replaces the _START -> _ENABLED transition.
Fixes: Coverity CID 456025
Fixes: bd93b776c1 ("liblzma: Fix a deadlock in threaded decoder.")
In short, sort the names with this command (-k1,1 isn't needed because
the lines with names start with " -"):
LC_ALL=en_US.UTF-8 sort -k2,2 -k3,3 -k4,4 -k5,5
When THANKS was created, I wrote the names as "First Last" and attempted
to keep them sorted by last name / surname / family name. This works
with many names in THANKS, but it becomes complicated with names that
don't fit that pattern. For example, names that are written as
"Last First" can be manually sorted by family name, but only if one
knows which part of the name is the family name.[*] And of course,
the concept of first/last name doesn't apply to all names.
[*] xz had a co-maintainer who could help me with such names,
but fortunately he isn't working on the project anymore.
Adding the names in chronological order could have worked too, although
if something is contributed by multiple people, one would still have to
decide how to sort the names within the batch. Another downside would
be that if THANKS is updated in more than one work-in-progress branch,
merge conflicts would occur more often.
Don't attempt to sort by last name. Let's be happy that people tend to
provide names that can be expressed in a reasonable number of printable
Unicode characters. In practice, people have been even nicer: if the
native language doesn't use a Latin script alphabet, people often provide
a transliterated name (only or in addition to the original spelling),
which is very much appreciated by those who don't know the native script.
Treat the names as opaque strings or space-separated strings for sorting
purposes. This means that most names will now be sorted by first name.
There still are many choices how to sort:
(1) LC_ALL=en_US.UTF-8 sort
The project is in English, so this may sound like a logical choice.
However, spaces have a lower weight than letters, which results in
this order:
- A Ba
- Ab C
- A Bc
- A Bd
(2) LC_ALL=en_US.UTF-8 sort -k2,2
This first sorts by the first word and then by the rest of the
string. It's -k2,2 instead of -k1,1 to skip the leading dash.
- A Ba
- A Bc
- A Bd
- Ab C
I like this more than (1). One could add -k3,3 -k4,4 -k5,5 ... too.
With current THANKS it makes no difference but it might some day.
NOTE: The ordering in en_US.UTF-8 can differ between libc versions
and operating systems. Luckily it's not a big deal in THANKS.
(3) LC_ALL=en_US.UTF-8 sort -f -k2,2
Passing -f (--ignore-case) to sort affects sorting of single-byte
characters but not multibyte characters (GNU coreutils 9.9):
No -f With -f LC_ALL=C
Aa A.A A.A
A.A Aa Aa
Ää Ää Ä.Ä
Ä.Ä Ä.Ä Ää
In GNU coreutils, the THANKS file is sorted using "sort -f -k1,1".
There is also a basic check that the en_US.UTF-8 locale is
behaving as expected.
(4) LC_ALL=C sort
This sorts by byte order which in UTF-8 is the same as Unicode
code point order. With the strings in (1) and (2), this produces
the same result as in (2). The difference in (3) can be seen above.
The results differ from en_US.UTF-8 when a name component starts
with a lower case ASCII letter (like "von" or "de"). Worse, any
non-ASCII characters sort after ASCII chars. These properties might
look weird in English language text, although it's good to remember
that en_US.UTF-8 sorting can appear weird too if one's native
language isn't English.
The choice between (2) and (4) was difficult but I went with (2).
;-)
To make a non-threaded liblzma-only build work with WASI SDK, <signal.h>
and mythread_sigmask() were omitted from mythread.h in the commit
81db3b8898. This broke non-threaded full build with Emscripten because
src/xz/signals.c needs mythread_sigmask() (liblzma-only build was fine).
If __wasm__ is defined, omit <signal.h> and mythread_sigmask() in
non-threaded builds only when __EMSCRIPTEN__ isn't defined.
Reported-by: Marcus Tillmanns
Thanks-to: ChanTsune
Fixes: https://github.com/tukaani-project/xz/issues/161
Fixes: 81db3b8898 ("mythread.h: Disable signal functions in builds targeting Wasm + WASI.")
getauxval() can be available even if HWCAP_CRC32 isn't #defined, so
both have to be checked. HWCAP_CRC32 was added in glibc 2.24 (2016).
Fixes: https://github.com/tukaani-project/xz/issues/190
When no memory usage limits have been set by the user, the default
for multithreaded mode has been 1/4 of total RAM. If this limit is
too high and memory allocation fails, liblzma (and xz) fail. Perhaps
liblzma should handle it better by reducing the number of threads
and continuing with the amount of memory it can allocate, but currently
that isn't the case.
If resource limits were set to about 1/4 of RAM or lower, then xz
could fail for the above reason. This commit makes xz look at
RLIMIT_DATA, RLIMIT_AS, and RLIMIT_VMEM when they are available,
and set the limit 64 MiB below the lowest of those limits. This is
more or less a hack just like the 1/4-of-RAM method is, but this is
simple and quick to implement.
On Linux, there are other limits like cgroup v2 memory.max which
can still make xz fail. The same is likely possible with FreeBSD's
rctl(8).
Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>
Thanks-to: Fangrui Song
Fixes: https://github.com/tukaani-project/xz/issues/195
Closes: https://github.com/tukaani-project/xz/pull/196
GitHub has removed the runner image.
A breakage with CLMUL CRC code occurred with VS 2019 but not 2022,
see b5a5d9e3f7. MS supports VS 2019 for a few more years, so it's
unfortunate that it can no longer be tested on GitHub.
It feels better to fix the docs than change the code because this
way newly-written applications will be forced to be compatible with
the lzma_allocator behavior of old liblzma versions. It can matter
if someone builds the application against an older liblzma version.
Fixes: https://github.com/tukaani-project/xz/issues/183
Set also shell because the xz*.in files start with '#!@POSIX_SHELL@'.
SC1003 and SC2016 are only info messages, not warnings. Several other
shellcheck info messages remain. They are safe to ignore, but I didn't
want to disable them now.
Partially-fixes: https://github.com/tukaani-project/xz/issues/174
Sometimes the VM workflows (like FreeBSD VM on Ubuntu) get stuck
and the default timeout is six hours. While at it, set a sensible
timeout for all workflows.
Compared to the file in the Translation Project, I still had to apply
a few fixes that were needed with the previous (5.7.1-dev1) version too:
- Remove two extra '<' characters that break the build with po4a.
- Don't translate XZ_DEFAULTS and XZ_OPT environment variable names.
It was (probably) intentionally without the null terminator, but the test
works with null terminator too (the test still fails with xz <= 5.0.3),
so simply omit one character to silence the warning.
tests/test_bcj_exact_size.c:30:32: error: initializer-string for array of ‘unsigned char’ truncates NUL terminator but destination lacks ‘nonstring’ attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization]
30 | const uint8_t in[16] = "0123456789ABCDEF";
| ^~~~~~~~~~~~~~~~~~
Fixes: d8db706acb ("liblzma: Fix possibility of incorrect LZMA_BUF_ERROR.")
Fixes: https://github.com/tukaani-project/xz/issues/176
cap_rights_clear() with no additional arguments acts as a no-op, so
instead of removing all capability rights from STDIN_FILENO, the same
rights were allowed for STDIN_FILENO as were allowed for src_fd.
Fixes: a0eecc235d ("xz: Make Capsicum sandbox more strict with stdin and stdout.")
(The commit message says "stdout". It should have said "stderr".)
Use --trace-children-skip instead of --trace-children-skip-by-arg
so that the skipping is only done based on the executable names.
(--trace-children-skip-by-arg can match other args than argv[0].)
Update the list of executables to skip to match what the scripts run.
Do not skip bash or sh. If Valgrind didn't trace the shell, then the
xz and xzdec programs run by the shell wouldn't be analyzed either.
Fixes: 7e99856f66 ("CI: Speed up Valgrind job by using --trace-children-skip-by-arg=...")
If shared liblzma is built, tests/test_* and src/xz/xz are wrapper
scripts created by Libtool. The wrappers set library search path
so that the freshly-built shared library is found.
With a static liblzma, no wrapper scripts are needed, and Libtool
places the real executables to the aforementioned locations. This
speeds up the tests under Valgrind dramatically.
Fixes: 6c095a98fb ("ci: test Valgrind")
Debian and Ubuntu have a patch that changes the upstream default to
HAVE_DOT = YES. Undo it to have more consistent results across distros.
This was noticed in Ubuntu CI runner where "doxygen" tried to run "dot"
but that failed due to "dot" not being installed. "doxygen" still
finished with exit status 0 until the commit that turned warnings to
errors with WARN_AS_ERROR = FAIL_ON_WARNINGS.
Re-enable CLANG64 environment. Add CLANGARM64. Don't add MINGW64
to slightly reduce the number of runner VMs needed.
Install the required packages using the setup-msys2 action instead
of running the commands separately.
Test Autotools and CMake in the same job to reduce the number of VMs.
This doesn't slow it down too much because the msys2-setup step is
needed by both. However, do only the full builds on ARM64 because
those runners seem to be slower.
Test fewer build configurations. The point of testing on MSYS2 is to
catch Windows-related issues. It should be enough that the more unusual
build configurations are tested in ci.yml.
Run the build commands directly instead of using ci_build.bash. This
makes it easier to see what commands are run even if it is a little
more verbose now.
Run the workflow automatically when commit are pushed to master.
With the fewer build variants it's not too slow.
Don't say that the .lz format allows trailing data. According to the
lzip 1.25 manual, trailing data isn't part of the file format at all.
However, tools are still expected to behave as usefully as possible
when there is trailing data.
Fix the description of lzip >= 1.20 behavior when some of the first
bytes of trailing data match the magic bytes. While the lzip 1.25 manual
recommends that none of the first four bytes in trailing data should
match the magic bytes, the default behavior of lzip 1.25 treats
trailing data as a corrupt member header only if two or three bytes
match the magic bytes; one matching byte isn't enough.
Reported-by: Antonio Diaz Diaz
Link: https://www.mail-archive.com/xz-devel@tukaani.org/msg00702.html
If <getopt.h> had optreset but not getopt_long, xz used optreset while
the replacement getopt_long doesn't support optreset. I'm not aware of
any relevant system where bug is possible. Autotools build didn't have
this bug.
Fixes: af66cd5859 ("CMake: Add support for replacement getopt_long (lib/getopt*).")
Using eight runners seems wasteful. Using only two runners isn't
much slower due to the runner startup overhead.
Also add a comment about the test that fails without b5a5d9e3f7.
On GitHub runners, VS 2019 16.11 (MSVC 19.29.30158) results in
test failures. VS 2022 17.13 (MSVC 19.43.34808) works.
In xz 5.6.x there was a #pragma-based workaround for MSVC builds for
32-bit x86. Another method was thought to work with the new rewritten
CLMUL CRC. Apparently it doesn't. Keep it simple and disable CLMUL CRC
with any non-recent MSVC when building for 32-bit x86.
Fixes: 54eaea5ea4 ("liblzma: x86 CLMUL CRC: Rewrite")
Fixes: https://github.com/tukaani-project/xz/issues/171
Reported-by: Andrew Murray
This makes it easy to crash fuzz_decode_stream_mt when tested
against the code from 5.8.0.
Obviously this might make it harder to reach some other code path now.
The previous code has been in use since 2018 when fuzzing was added
in 106d1a663d ("Tests: Add a fuzz test program and a config file
for OSS-Fuzz.").
It doesn't seem possible to trigger the CVE-2025-31115 bug with this
fuzzing target at the moment. It's because the code in fuzz_common.h
passes the whole input buffer to lzma_code() at once.
Single-shot decoding means calling lzma_code() by giving it the whole
input at once and enough output buffer space to store the uncompressed
data, and combining this with LZMA_FINISH and no timeout
(lzma_mt.timeout = 0). This way the file is decoded with a single
lzma_code() call if possible.
The bug prevented the decoder from starting more than one worker thread
in single-shot mode. The issue was noticed when reviewing the code;
there are no bug reports. Thus maybe few have tried this mode.
Fixes: 64b6d496dc ("liblzma: Threaded decoder: Always wait for output if LZMA_FINISH is used.")
Don't set thr->in_size = 0 when returning the thread to the stack of
available threads. Not only is it useless, but the main thread may
read the value in SEQ_BLOCK_THR_RUN. With valid inputs, it made
no difference if the main thread saw the original value or 0. With
invalid inputs (when worker thread stops early), thr->in_size was
no longer modified after the previous commit with the security fix
("Don't free the input buffer too early").
So while the bug appears harmless now, it's important to fix it because
the variable was being modified without proper locking. It's trivial
to fix because there is no need to change the value. Only main thread
needs to set the value in (in SEQ_BLOCK_THR_INIT) when starting a new
Block before the worker thread is activated.
Fixes: 4cce3e27f5 ("liblzma: Add threaded .xz decompressor.")
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
The input buffer must be valid as long as the main thread is writing
to the worker-specific input buffer. Fix it by making the worker
thread not free the buffer on errors and not return the worker thread to
the pool. The input buffer will be freed when threads_end() is called.
With invalid input, the bug could at least result in a crash. The
effects include heap use after free and writing to an address based
on the null pointer plus an offset.
The bug has been there since the first committed version of the threaded
decoder and thus affects versions from 5.3.3alpha to 5.8.0.
As the commit message in 4cce3e27f5 says, I had made significant
changes on top of Sebastian's patch. This bug was indeed introduced
by my changes; it wasn't in Sebastian's version.
Thanks to Harri K. Koskinen for discovering and reporting this issue.
Fixes: 4cce3e27f5 ("liblzma: Add threaded .xz decompressor.")
Reported-by: Harri K. Koskinen <x64nop@nannu.org>
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
The main thread can directly set THR_IDLE in threads_stop() which is
called when errors are detected. threads_stop() won't return the stopped
threads to the pool or free the memory pointed by thr->in anymore, but
it doesn't matter because the existing workers won't be reused after
an error. The resources will be cleaned up when threads_end() is
called (reinitializing the decoder always calls threads_end()).
Reviewed-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Thanks-to: Sam James <sam@gentoo.org>
Oracle Developer Studio 12.6 on Solaris 10 claims C11 support in
__STDC_VERSION__ and supports _Alignas. However, <stdalign.h> is missing.
We only need alignas, so define it to _Alignas with C11/C17 compilers.
If something included <stdalign.h> later, it shouldn't cause problems.
Thanks to Ihsan Dogan for reporting the issue and testing the fix.
Fixes: c0e7eaae8d
POT-Creation-Date is set to match the timestamp in 5.7.2beta which
in the Translation Project is known as 5.8.0-pre1. The strings
haven't changed since 5.7.1alpha but a few comments have.
This is a very noisy commit, but this helps keeping the PO files
similar between the Git repository and stable release tarballs.
Also remove the trivial obsolete messages like man page dates.
This is a noisy commit, but this helps keeping the PO files similar
between the Git repository and stable release tarballs.
Names of environment variables and some other strings must be present
in the original form. The translator couldn't be reached so I'm
changing some of the strings myself. In the "Robot mode" section,
occurrences in the middle of sentences weren't changed to reduce
the chance of grammar breakage, but I kept the translated strings in
parenthesis in the headings. It's not ideal, but now people shouldn't
need to look at the English man page to find the English strings.
SSE2 is supported on every x86-64 processor. The SSE2 code is used on
32-bit x86 if compiler options permit unconditional use of SSE2.
dict_repeat() copies short random-sized unaligned buffers. At least
on glibc, FreeBSD, and Windows (MSYS2, UCRT, MSVCRT), memcpy() is
clearly faster than byte-by-byte copying in this use case. Compared
to the memcpy() version, the new SSE2 version reduces decompression
time by 0-5 % depending on the machine and libc. It should never be
slower than the memcpy() version.
However, on musl 1.2.5 on x86-64, the memcpy() version is the slowest.
Compared to the memcpy() version:
- The byte-by-version takes 6-7 % less time to decompress.
- The SSE2 version takes 16-18 % less time to decompress.
The numbers are from decompressing a Linux kernel source tarball in
single-threaded mode on older AMD and Intel systems. The tarball
compresses well, and thus dict_repeat() performance matters more
than with some other files.
This doesn't make any difference in practice because compilers can
already see that writing through the dict->buf pointer cannot modify
the contents of *dict itself: The LZMA decoder makes a local copy of
the lzma_dict structure, and even if it didn't, the pointer to
lzma_dict in the LZMA decoder is already "restrict".
It's nice to add "restrict" anyway. uint8_t is typically unsigned char
which can alias anything. Without the above conditions or "restrict",
compilers could need to assume that writing through dict->buf might
modify *dict. This would matter in dict_repeat() because the loops
refer to dict->buf and dict->pos instead of making local copies of
those members for the duration of the loops. If compilers had to
assume that writing through dict->buf can affect *dict, then compilers
would need to emit code that reloads dict->buf and dict->pos after
every write through dict->buf.
Revert back to a macro so that list(APPEND CMAKE_REQUIRED_DEFINITIONS)
will affect the calling scope. I had forgotten that while CMake
functions inherit the variables from the parent scope, the changes
to them are local unless using set(... PARENT_SCOPE).
This also means that the commit message in 5bb77d0920 is wrong. The
commit itself is still fine, making it clearer that -DHAVE_SYS_PARAM_H
is only needed for specific check_c_source_compiles() calls.
Fixes: c1ea7bd0b6
Define NetBSD and Darwin/macOS feature test macros. Autoconf defines
these too (and a few others).
Define the macros on Windows except with MSVC. The _GNU_SOURCE macro
makes a difference with mingw-w64.
Use a function instead of a macro. Don't take the TARGET_OR_ALL argument
because there's always global effect because the global variable
CMAKE_REQUIRED_DEFINITIONS is modified.
All translators know that --command-line-options must not be translated.
With some other strings it's not obvious when the untranslated string
must be preserved. These comments hopefully help.
The deprecated aliases are lzcmp, lzdiff, lzless, lzmore,
lzgrep, lzegrep, and lzfgrep. The commands that start with
the xz prefix have identical behavior, for example, both
lzgrep and xzgrep handle all supported file formats.
This doesn't affect lzma, unlzma, lzcat, lzmadec, or lzmainfo.
The last release of LZMA Utils was made in 2008, but the lzma
compatibility alias for the gzip-like tool is still in common use.
Deprecating it would cause unnecessary breakage.
Nowaways $(top_builddir)/lib/getopt.h depends on headers in
$(top_srcdir)/lib, so both have to be in the include path.
CMake-based build already did this.
Fixes: 7e884c00d0
Now one can pass gl_replace_getopt=yes to configure to force the use
of GNU getopt_long from the lib directory. This only checks that the
value of gl_replace_getopt is non-empty, so one cannot force the
replacement to be disabled.
Closes: https://github.com/tukaani-project/xz/pull/166
Make it clearer that translations cannot be accepted if they don't
come via the Translation Project.
Column headings have been handled automatically for years and now --help
is autowrapped too, so the related instructions can be removed.
Make it work for out-of-tree builds without requiring one to specify
the location of the xz executable.
Add xz --filters-help.
Make the output shorter by reducing the number of xz -lvv test files.
Show the value of LANGUAGE environment variable.
Show the xz.git version using git describe --abbrev=8 instead of =4.
Since there are no spaces between words, the unsophisticated automatic
word wrapping code needs some help. Compared to the version in the
Translation Project, I added a few \t characters which the word
wrapping code interprets as zero width spaces (hopefully they are
placed correctly). These edits can be seen with this command:
grep -v ^# po/zh_TW.po | grep --color -F '\t'
Tabs have been converted to spaces and a "serial" number has been
added. The previous version was from 2008/2009. There are no functional
changes since then but now it's clearer that the copy in XZ Utils
isn't outdated.
The new file was picked from the Gnulib commit
81a4c1e3b7692e95c0806d948cbab9148ad85ef2. A later commit adds
a warranty disclaimer to the license, which obviously is fine,
but I didn't find a SPDX license identifier for the new license,
so for simplicity I used the earlier commit.
People may put -fsanitize in CC instead of CFLAGS so check both.
Landlock sandbox isn't compatible with sanitizers so it's nice
to catch the incompatible options at configure time.
Don't attempt to do the same in CMakeLists.txt; the check for
CMAKE_C_FLAGS / CFLAGS shall be enough there. The extra flags from
the CC environment variable go into the undocumented internal variable
CMAKE_C_COMPILER_ARG1 (all flags from CC go into that same variable).
Peeking the internal variable merely for improved diagnostics isn't
worth it.
Fixes: 88588b1246
The check can be skipped by passing SKIP_WERROR_CHECK=yes to configure.
It won't be documented anywhere else than in the error message.
Ways to test:
./configure CC=gcc CFLAGS=-Wunused-macros
./configure CC=clang CFLAGS=-Weverything
./configure CC=clang CFLAGS=-Weverything SKIP_WERROR_CHECK=yes
Also make xz not process more input files after a broken pipe has
been detected. This matches the behavior on POSIX. If all files
are being written to standard output, trying with the next file is
pointless when it's known that standard output won't accept more data.
xzdec already stopped after the first error. It does so with all
errors, so it differs from xz:
$ xz -dc not_found_1 not_found_2
xz: not_found_1: No such file or directory
xz: not_found_2: No such file or directory
$ xzdec not_found_1 not_found_2
xzdec: not_found_1: No such file or directory
Reported-by: Vincent Torri
This only affects builds with UCRT. With legacy MSVCRT, the replacement
functions are always enabled.
Omitting the MinGW-w64 replacements saves over 20 KiB per executable.
The downside is that --enable-small or XZ_SMALL=ON disables thousand
separator support in xz messages. If someone is OK with the slower
speed of slightly smaller builds, lack of thousand separators won't
matter.
Don't override __USE_MINGW_ANSI_STDIO if it is already defined (via
CPPFLAGS or such method).
Previously it was enabled only on x86-64 and ARM64 when also support
for unaligned access was detected or manually enabled at built time.
In the default build configuration, the 8-byte method is now enabled
also on 64-bit RISC-V and 64-bit PowerPC (both endiannesses). It was
reported that on big endian POWER9, encoding time may reduce 12-13 %.
This change only affects builds with GCC and Clang because the code
uses __builtin_ctzll or __builtin_clzll.
Thanks to Marcus Comstedt for testing on POWER9.
When the 8-byte method was enabled for ARM64, a check for endianness
wasn't added. This broke the LZMA/LZMA2 encoder. Test suite caught it.
Fixes: cd64dd70d5
Co-authored-by: Marcus Comstedt <marcus@mc.pp.se>
I rewrapped a few overlong lines. Those edits aren't in the
Translation Project. Automatic wrapping in the master branch
means that these strings need to be updated soon anyway.
Testing with musl 1.2.5 and Linux 6.12, O_SEARCH doesn't result
in a file descriptor that works with fsync() although it should work.
See the added comment.
The same issue affected gzip --synchronous:
https://bugs.gnu.org/75405
Thanks to Paul Eggert.
Opening a directory with O_SEARCH results in a file descriptor that can
be used with functions like openat(). Such a file descriptor cannot be
used with fsync(). Use O_RDONLY instead.
In musl, O_SEARCH becomes Linux-specific O_PATH. A file descriptor
from O_PATH doesn't allow fsync().
Seems that it's not possible to fsync() a directory that has write
and search permissions but not read permission.
Fixes: 2a9e91d796
A typical use case is like this:
printf("%s: %s\n", tuklib_mask_nonprint(filename), strerror(errno));
tuklib_mask_nonprint() may call mbrtowc() and malloc() which may modify
errno. If errno isn't preserved, the error message might be wrong if
a compiler decides to call tuklib_mask_nonprint() before strerror().
Fixes: 40e5733055
xz's default behavior is to delete the input file after successful
compression or decompression (unless writing to standard output).
If the system crashes soon after the deletion, it is possible that
the newly written file has not yet hit the disk while the previous
delete operation might have. In that case neither the original file
nor the written file is available.
Call fsync() on the file. On POSIX systems, sync also the directory
where the file was created.
Add a new option --no-sync which disables fsync() usage. It can avoid
a (possibly significant) performance penalty when processing many
small files. It's fine to use --no-sync when one knows that the files
are easy to recreate or restore after a system crash.
Using fsync() after every flush initiated by --flush-timeout was
considered. It wasn't implemented at least for now.
- --flush-timeout is typically used when writing to stdout. If stdout
is a file, xz cannot (portably) sync the directory of the file.
One would need to create the output file first, sync the directory,
and then run xz with fsync() enabled.
- If xz --flush-timeout output goes to a file, it's possible to use
a separate script to sync the file, for example, once per minute
while telling xz to flush more frequently.
- Not supporting syncing with --flush-timeout was simpler.
Portability notes:
- On systems that lack O_SEARCH (like Linux), "xz dir/file" will now
fail if "dir" cannot be opened for reading. If "dir" still has
write and search permissions (like d-wx------ in "ls -l"),
previously xz would have been able to compress "dir/file" still.
Now it only works if using --no-sync (or --keep or --stdout).
- <libgen.h> and dirname() should be available on all POSIX systems,
and aren't needed on non-POSIX systems.
- fsync() is available on all POSIX systems. The directory syncing
could be changed to fdatasync() although at least on ext4 it
doesn't seem to make a performance difference in xz's usage.
fdatasync() would need a build system check to support (old)
special cases, for example, MINIX 3.3.0 doesn't have fdatasync()
and Solaris 10 needs -lrt.
- On native Windows, _commit() is used to replace fsync(). Directory
syncing isn't done and shouldn't be needed. (In Cygwin, fsync() on
directories is a no-op.)
- DJGPP has fsync() for files. ;-)
Using fsync() was considered somewhere around 2009 and again in 2016 but
those times the idea was rejected. For comparison, GNU gzip 1.7 (2016)
added the option --synchronous which enables fsync().
Co-authored-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Fixes: https://bugs.debian.org/814089
Link: https://www.mail-archive.com/xz-devel@tukaani.org/msg00282.html
Closes: https://github.com/tukaani-project/xz/pull/151
lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The
call from str_to_filters() detects the string type from the first
character(s) and as a side-effect it validates the first digit of
the preset string. So this change makes no difference there.
However, the call from parse_options() doesn't pre-validate the string.
parse_lzma12_preset() will return an invalid value which is passed to
lzma_lzma_preset() which safely rejects it. The bug still affects the
the error message:
$ xz --filters=lzma2:preset=X
xz: Error in --filters=FILTERS option:
xz: lzma2:preset=X
xz: ^
xz: Unsupported preset
After the fix:
$ xz --filters=lzma2:preset=X
xz: Error in --filters=FILTERS option:
xz: lzma2:preset=X
xz: ^
xz: Unsupported preset
The ^ now correctly points to the X and not past it because the X itself
is the problematic character.
Fixes: cedeeca2ea
Forgetting the argument (or not using = to separate the option from
the argument) resulted in lzma_str_to_filters() being called with NULL
as input string argument. The function handles it fine but xz passes
the NULL to printf() too:
$ xz --filters
xz: Error in --filters=FILTERS option:
xz: (null)
xz: ^
xz: Unexpected NULL pointer argument(s) to lzma_str_to_filters()
Now it's correct:
$ xz --filters
xz: option '--filters' requires an argument
The --filters-help option doesn't take any arguments.
Fixes: 9ded880a02
Fixes: d6af7f3470
Fixes: a165d7df19
It's a POSIX feature that isn't in standard C. It's not available on
Windows. Even MinGW-w64 with __USE_MINGW_ANSI_STDIO doesn't support
it even though it supports POSIX %'d for thousand separators.
Gettext's <libintl.h> provides overrides for printf and other functions
which do support the %2$s formats. Translations use them. But xz should
work on Windows without <libintl.h> too.
Fixes: 3e9177fd20
A slightly silly thing is that xz may now query the ABI version up to
three times. We could call my_landlock_ruleset_attr_forbid_all() only
once and cache the result but it didn't seem worth doing.
Now that we have the FALLTHROUGH macro, use the strictest mode with
GCC so that comment-based fallthrough markings are no longer accepted.
In GCC, -Wextra includes -Wimplicit-fallthrough=3 and
-Wimplicit-fallthrough is the same as -Wimplicit-fallthrough=3.
Thus, the strict mode requires specifying -Wimplicit-fallthrough=5.
Clang has -Wimplicit-fallthrough which is *not* enabled by -Wextra.
Clang doesn't have a variant that takes an argument. Thus we need
to check for -Wimplicit-fallthrough. Do it before checking for
-Wimplicit-fallthrough=5 so that the latter overrides the former
when using GCC.
Also remove the recently-added workaround from tuklib_gettext.h.
Requiring a new enough gettext-runtime is cleaner. I guess it's
mostly MSYS2 where xz is built with translation support, so once
MSYS2 has Gettext >= 0.23.1, this requirement shouldn't be a problem
in practice.
The DESCRIPTION section always explained it, and the OPTIONS section
only described the differences to the default behavior. However, new
users in a hurry may skip reading DESCRIPTION. The default behavior
is a bit dangerous, thus it's good to repeat in --compress and
--decompress docs that source file is removed after successful operation.
Fixes: https://github.com/tukaani-project/xz/issues/150
Because this increases the Mach-O compatibility_version, this commit
shouldn't cause any ABI compatibility trouble for existing CMake users
on macOS. This is assuming that they won't later downgrade to an older
liblzma version that was built with CMake before this commit.
Meson allows customising the Mach-O versioning too. So the three
build systems can be configured to be compatible.
liblzma and xz can't be compiled as a unity/jumbo build because of
redeclarations and type name reuse. The CMake documentation recommends
setting UNITY_BUILD to false in this case.
This is especially important if we're compiled as a subproject and the
consumer wants to use CMAKE_UNITY_BUILD=ON for the rest of their code
base.
Closes: https://github.com/tukaani-project/xz/pull/158
See the comment. In this package, locale is set at program startup and
not changed later, so the point (2) in the comment isn't a problem.
Fixes: 46ee006162
xzdec isn't translated and doesn't need libintl on Windows even
when NLS is enabled, thus libintl_setlocale() cannot interfere
with the locale settings. Thus, standard setlocale() works perfectly.
In the commit 78868b6e, the explanation in the commit message is wrong.
Fixes: 78868b6ed6
Only leave the FindFileFirstA() notes from 20dfca81, reverting
the incorrect setlocale() notes. On Windows, Gettext's <libintl.h>
overrides setlocale() with libintl_setlocale() wrapper. I hadn't
noticed this, and thus my conclusions were wrong.
Fixes: 20dfca8171
Call tuklib_mask_nonprint() on filenames and also on a few other
strings from the command line too.
The filename printed by "xz --robot --list" (in list.c) is also masked.
It's good to get rid of tabs and newlines which would desync the output
but masking other chars wouldn't be strictly necessary. It might matter
with sensible filenames if LC_CTYPE is "C" (when iswprint() might reject
non-ASCII chars) and a script wants to read a filename from xz's output.
Hopefully it's an unusual enough corner case to not be a real problem.
Malicious filenames or other untrusted strings may affect the state of
the terminal when such strings are printed as part of (error) messages.
Add functions that mask such characters.
It's not enough to handle only single-byte control characters.
In multibyte locales, some control characters are multibyte too, for
example, terminals interpret C1 control characters (U+0080 to U+009F)
that are two bytes as UTF-8.
Instead of checking for control characters with iswcntrl(), this
uses iswprint() to detect printable characters. This is much stricter.
On Windows it's actually too strict as it rejects some characters that
definitely are printable.
Gnulib's quotearg would do a lot more but I hope this simpler method
is good enough here.
Thanks to Ryan Colyer for the discussion about the problems of
the earlier single-byte-only method.
Thanks to Christian Weisgerber for reporting a bug in an earlier
version of this code.
Thanks to Jeroen Roovers for a typo fix.
Closes: https://github.com/tukaani-project/xz/pull/118
Most of the auto-wrapped strings are translated already. A few
strings have changed since this was created though. This file
isn't in the Translation Project *yet* because these strings
are still very new.
Closes: https://github.com/tukaani-project/xz/pull/145
Automatic word wrapping makes translators' work easier and reduces
errors like misaligned columns or overlong lines. Right-to-left
languages and languages that don't use spaces between words will
still need extra effort. (xz hasn't been translated to any RTL
language so far.)
If wcwidth() isn't available (Windows), previously it was assumed
that one byte == one column in the terminal. Now it is assumed that
one multibyte character == one column. This works better with UTF-8.
Languages that only use single-width characters without any combining
characters should work correctly with this.
In xz, none of po/*.po contain combining characters and only ko.po,
zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only"
those three translations in xz are broken on Windows with the
UTF-8 code page. Broken means that column headings in xz -lvv and
(only in the master branch) strings in --long-help are misaligned,
so it's not a huge problem. I don't know if those three languages
displayed perfectly before the UTF-8 change because I hadn't tested
translations with native Windows builds before.
Fixes: 46ee006162
xzdec isn't translated and didn't have locale-specific behavior
in the past. On Windows with UTF-8 in the application manifest,
setting the locale makes a difference though:
- Without any setlocale() call, non-ASCII filenames don't display
properly in Command Prompt unless one first uses "chcp 65001"
to set the console code page to UTF-8.
- setlocale(LC_ALL, "") is enough to make non-ASCII filenames
print correctly in Command Prompt without using "chcp 65001",
assuming that the non-UTF-8 code page (like 850) supports
those non-ASCII characters.
- setlocale(LC_ALL, ".UTF8") is even better because then mbrtowc() and
such functions use an UTF-8 locale instead of a legacy code page.
The tuklib_gettext_setlocale() macro takes care of this (without
enabling any translations).
Fixes: 46ee006162
XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611.
This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus
non-ASCII characters from translations became mojibake.
Fixes: 46ee006162
With CMake 3.31, there were a few warnings from
CMP0177 "install() DESTINATION paths are normalized".
These occurred because the install(FILES) command in
my_install_man_lang() is called with a DESTINATION path
that contains two consecutive slashes, for example,
"share/man//man1". Such a path is for the English man pages.
With translated man pages, the language code goes between
the slashes. The warning was probably triggered because the
extra slash gets removed by the normalization.
The release files are signed but verifying the signatures cannot
catch certain types of attacks:
1. A malicious maintainer could make more than one variant of
a package. One could be for general distribution. Another
with malicious content could be targeted to specific users,
for example, distributing the malicious version on a mirror
controlled by the attacker.
2. If the signing key of an honest maintainer was compromised
without being detected, a similar situation as described
above could occur.
SHA256SUMS could be put on the project website but having it in
the Git repository makes it obvious that old lines aren't modified
when the file is updated.
Hashes of uncompressed files are included too. This way tarballs
can be recompressed and the hashes can still be verified.
One of the reasons to have this file in the xz repository was to
show vulnerability reporting info in the Security section on GitHub.
On 2024-11-25, I added SECURITY.md to the tukaani-project organization
on GitHub:
https://github.com/tukaani-project/.github/blob/main/SECURITY.md
GitHub shows that file in all projects in the organization unless
overridden by a project-specific SECURITY.md. Thus, removing
the file from the xz repo makes GitHub show the organization-wide
text instead.
Maintaining a single copy for the whole GitHub organization makes
things simpler. It's also nicer to have fewer GitHub-specific files
in the xz repo. Information how to report bugs (including security
issues) is available in README and on the home page too.
The OpenSSF Scorecard tool didn't find .github/SECURITY.md from the
xz repository. There was a suggestion to move the file to the top-level
directory where Scorecard should find it. However, Scorecard does find
the organization-wide SECURITY.md. Thus, the file isn't needed in the
xz repository to score points in the Scorecard game:
https://scorecard.dev/viewer/?uri=github.com/tukaani-project/xz
Closes: https://github.com/tukaani-project/xz/issues/148
Closes: https://github.com/tukaani-project/xz/pull/149
IMPORTANT: This includes a security fix to command line tool
argument handling.
Some toolchains embed an application manifest by default to declare
UAC-compliance. Some also declare compatibility with Vista/8/8.1/10/11
to let the app access features newer than those of Vista.
We want all the above but also two more things:
- Declare that the app is long path aware to support paths longer
than 259 characters (this may also require a registry change).
- Force the code page to UTF-8. This allows the command line tools
to access files whose names contain characters that don't exist
in the current legacy code page (except unpaired surrogates).
The UTF-8 code page also fixes security issues in command line
argument handling which can be exploited with malicious filenames.
See the new file w32_application.manifest.comments.txt.
Thanks to Orange Tsai and splitline from DEVCORE Research Team
for discovering this issue.
Thanks to Vijay Sarvepalli for reporting the issue to me.
Thanks to Kelvin Lee for testing with MSVC and helping with
the required build system fixes.
Now the information in the "Details" tab in the file properties
dialog matches the naming convention of Cygwin and MSYS2. This
is only a cosmetic change.
Autotools-based build has always done this so this is for consistency.
However, the CMake build won't create the DEF file when building
for Cygwin or MSYS2 because in that context it should be useless.
(If Cygwin or MSYS2 is used to host building of normal Windows
binaries then the DEF file is still created.)
The MB output can overflow with huge numbers. Most likely these are
invalid .lzma files anyway, but let's avoid garbage output.
lzmadec was adapted from LZMA Utils. The original code with this bug
was written in 2005, over 19 years ago.
Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>
Closes: https://github.com/tukaani-project/xz/pull/144
"xzdec -M123" exited with exit status 1 without printing
any messages. The "M:" entry should have been removed when
the memory usage limiter support was removed from xzdec.
Fixes: 792331bdee
Closes: https://github.com/tukaani-project/xz/pull/143
[ Lasse: Commit message edits ]
Differences to the zh_CN.po file from the Translation Project:
- Two uses of \v were fixed.
- Missing "OPTS" translation in --riscv[=OPTS] was copied from
previous lines.
- "make update-po" was run to remove line numbers from comments.
Differences to the ca.po file from the Translation Project:
- An overlong line translating --filters-help was wrapped.
- "make update-po" was used to remove line numbers from the comments
to match the changes in fccebe2b4f
and 093490b582. xz.pot in the TP
is older than these commits.
Support for instruction "movzw" without suffix in "GNU as" was
added in commit [1] and stabilized in binutils 2.27, released
in August 2016. Earlier systems don't accept this instruction
without a suffix, making range_decoder.h's inline assembly
unable to build on old systems such as Ubuntu 16.04, creating
error messages like:
lzma_decoder.c: Assembler messages:
lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi'
lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi'
lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx'
lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi'
Change "movzw" to "movzwl" for compatibility.
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2
Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Tested-by: Yifeng Li <tomli@tomli.me>
Signed-off-by: Yifeng Li <tomli@tomli.me>
Fixes: 3182a330c1
Fixes: https://github.com/tukaani-project/xz/issues/121
Closes: https://github.com/tukaani-project/xz/pull/136
This reverts commit dc03f6290f.
OpenBSD 7.6 will support elf_aux_info(3), and the detection code used
on FreeBSD will work on OpenBSD 7.6 too. Keep things simpler and drop
the OpenBSD-specific sysctl() method.
Thanks to Christian Weisgerber.
It won't be implemented. find + xargs is more flexible, for example,
it allows compressing small files in parallel. An example for that
has been included in the xz man page since 2010.
The liblzma target was recently changed to link against Threads::Threads
with the PRIVATE keyword. I had forgotten that xz itself depends on
pthreads too due to pthread_sigmask(). Thus, the build broke when
building shared liblzma and pthread_sigmask() wasn't in libc.
Thanks to Peter Seiderer for the bug report.
Fixes: ac05f1b0d7
Fixes: https://github.com/tukaani-project/xz/issues/129#issuecomment-2204522994
While po/*.gmo files won't be used from the release tarball,
the generated translated man pages will be used still. Those
are text files and po4a has slightly more dependencies than
gettext tools so installing po4a might be a bit more challenging
in some situations.
When a release tarball is created using Autotools, the tarball includes
po/*.gmo files which are binary files generated from po/*.po. Other
tarball creation methods don't and won't create the .gmo files.
It feels clearer if CMake will never install pre-generated binary files
from the source package. If people are able to install CMake, they
likely are able to install gettext tools as well (assuming they want
translations).
If a user set XZ_NLS=ON but find_package(Intl) failed or CMake version
wasn't at least 3.20, the configuration would fail in a cryptic way.
If XZ_NLS is enabled, require that CMake is new enough and that either
gettext tools or pre-generated .gmo files are available. Otherwise fail
the configuration. Previously missing gettext tools and .gmo files would
only result in a warning.
Missing man page translations are still only a warning.
Thanks to Peter Seiderer for the bug report.
Fixes: https://github.com/tukaani-project/xz/issues/129
Closes: https://github.com/tukaani-project/xz/pull/130
The CMake variables were renamed and accidentally also
the compile definition was renamed. As a result, translation
support wasn't actually enabled in the executables.
Fixes: 29f77c7b70
The crc.w.{b/h/w/d}.w instructions in LoongArch can calculate the CRC32
result for 1/2/4/8 bytes in a single operation. Using these is much
faster compared to the generic method.
Optimized CRC32 is enabled unconditionally on 64-bit LoongArch because
the LoongArch specification says that CRC32 instructions shall be
implemented for 64-bit processors. Optimized CRC32 isn't enabled for
32-bit LoongArch processors because not enough information is available
about them.
Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>
Closes: https://github.com/tukaani-project/xz/pull/86
This addresses the issue I mentioned in
6c095a98fb and speeds up the Valgrind
job a bit, because non-xz tools aren't run unnecessarily with
Valgrind by the script tests.
It shouldn't make any difference because LIBS should be empty
at that point in configure. But prepending is the correct way
because in general the libraries being added might require other
libraries that come later on the command line.
It's more robust in case the compiler allows pre-C99 implicit function
declarations. If an x86 intrinsic is missing and gets treated as
implicit function, the linking step will very probably fail. This
isn't the only way to workaround implicit function declarations but
it might be the simplest and cleanest.
The problem hasn't been observed in the wild.
There are a couple more AC_COMPILE_IFELSE uses in configure.ac.
Of these, Landlock check calls prctl() and in theory could have
the same problem. In practice it doesn't as the check program
looks for several other things too. However, it was changed to
AC_LINK_IFELSE still to look more correct.
Similarly, m4/tuklib_cpucores.m4 and m4/tuklib_physmem.m4 were
updated although they haven't given any trouble either. They
have worked all these years because those check programs rely
on specific headers and types: if headers or types are missing,
compilation will fail. Using the linker makes these checks more
similar to the ones in cmake/tuklib_*.cmake which always link.
AC_COMPILE_IFELSE needed -Werror because Clang <= 14 would merely
warn about the unsupported attribute and implicit function declaration.
Changing to AC_LINK_IFELSE handles the implicit declaration because
the symbol __crc32d is unlikely to exist in libc.
Note that the other part of the check is that #include <arm_acle.h>
must work. If the header is missing, most compilers give an error
and the linking step won't be attempted.
Avoiding -Werror makes the check more robust in case CFLAGS contains
warning flags that break -Werror anyway (but this isn't the only check
in configure.ac that has this problem). Using AC_LINK_IFELSE also makes
the check more similar to how it is done in CMakeLists.txt.
While the CMake support has gotten a lot less testing than
the Autotools-based build, the supported features should now
be equal. The output may differ slightly, for example,
liblzma.pc may have
Libs.private: -pthread -lpthread
with Autotools on GNU/Linux. CMake doesn't put any options
in Libs.private because on modern glibc the pthread functions
are in libc. The options options aren't required to link static
liblzma into an application.
Autotools-based build doesn't generate or install
lib/cmake/liblzma-*.cmake files. This means that on most
platforms one cannot rely on
find_package(liblzma 5.2.5 REQUIRED CONFIG)
or such finding those files.
It was weird to add CMAKE_THREAD_LIBS_INIT in CMAKE_REQUIRED_LIBRARIES
only if CLOCK_MONOTONIC is available. Alternative would be to remove
the thread libs from CMAKE_REQUIRED_LIBRARIES after the check for
pthread_condattr_setclock() but keeping the libs should be fine too.
Then it's ready in case more pthread functions were wanted some day.
In CMake, check_c_source_compiles() always links too. With
link-time optimization, unused functions may get omitted if
main() doesn't depend on them. Consider the following which
tries to check if somefunction() is available when <someheader.h>
has been included:
#include <someheader.h>
int foo(void) { return somefunction(); }
int main(void) { return 0; }
LTO may omit foo() completely because the program as a whole doesn't
need it and then the program will link even if the symbol somefunction
isn't available in libc or other library being linked in, and then
the test may pass when it shouldn't.
What happens if <someheader.h> doesn't declare somefunction()?
Shouldn't the test fail in the compilation phase already? It should
but many compilers don't follow the C99 and later standards that
prohibit implicit function declarations. Instead such compilers
assume that somefunction() exists, compilation succeeds (with a
warning), and then linker with LTO omits the call to somefunction().
Change the tests so that they are part of main(). If compiler accepts
implicitly declared functions, LTO cannot omit them because it has to
assume that they might have side effects and thus linking will fail.
On the other hand, if the functions/intrinsics being used are supported,
they might get optimized away but in that case it's fine because they
really are supported.
It is fine to use __attribute__((target(...))) for main(). At least
it works with GCC 4.9 to 14.1 on x86-64.
Reported-by: Sam James <sam@gentoo.org>
CC from environment is used to initialize CMAKE_C_COMPILER so
setting CMAKE_C_COMPILER explicitly isn't needed.
The syntax in ci_build.bash was broken in case one wished to put
spaces in CC.
Prefix ARM64_RUNTIME_DETECTION with CRC_ and reorder it to be with
the other ARM64-specific lines. That macro isn't used outside this
file.
ARM64 CLMUL implementation doesn't exist yet and thus CRC64_ARM64_CLMUL
isn't used anywhere yet.
It's not ideal that the single-letter CRC utility macros are here
as they pollute the namespace of the LZ encoder files. Those could
be moved their own crc_macros.h like they were in 5.2.x but in practice
this is fine enough already.
LZ encoder needs lzma_crc32_table[0] but otherwise those tables
are private to the CRC code. In contrast, the other things in
check.h are needed in several places.
Now runtime detection of CLMUL support can pick between the CLMUL and
the generic assembly implementations. Whatever overhead this has for
builds that omit CLMUL completely isn't important because builds for
any non-ancient system is likely to include the CLMUL code too.
Handle the CRC tables in crcXX_fast.c files because now these files
are built even when assembly code is used.
If 32-bit x86 assembly is enabled then it will always be built even
if compiler flags were such that CLMUL would be allowed unconditionally.
That is, runtime detection will be used anyway. This keeps the build
rules simpler.
In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC
is used without runtime detection. Previously this wasn't needed
because crc32_table.c included the lzma_crc32_table[][] in the build
unless encoder support had been disabled. Including an 8 KiB table
was silly when only 1 KiB is actually used. So now liblzma is 7 KiB
smaller if CLMUL is enabled without runtime detection.
It feels clearer this way, and when support for external SHA-256
is added, this will keep the order of the library detection the
same as in configure.ac (check for pthreads before libmd) although
it shouldn't matter in practice.
This simplifies things a little. Building liblzma with VS2013 probably
still worked but building the command line tools was not supported.
Microsoft ended support for VS2013 on 2024-04.
Make the available options and their behavior match
--enable-symbol-versions in configure.ac.
Don't enable symbol versions on Linux if not using glibc. Previously
the generic variant was selected on Microblaze or if using NVHPC
without checking that libc is glibc.
Leave the cache variable to "auto" or "yes" if that was specified
instead of setting it to the autodetected value by default. A downside
is that one cannot easily see which variant the autodetection code
has selected. The same applies to XZ_SANDBOX and XZ_THREADS though.
Also clarify that "yes" will fail if no threading support is found.
If no threading is wanted, it has to be disabled manually.
configure.ac doesn't behave this way at the moment. Instead it
assumes pthreads to be present if not targeting Windows. If pthreads
actually are missing, the build fails later.
The list was copied from configure.ac and should be kept in sync.
(Pretend that the deleted comment in CMakeLists.txt didn't exist.)
There is no need to add equivalent of --enable-werror as CMake >= 3.24
supports -DCMAKE_COMPILE_WARNING_AS_ERROR=ON.
This is for consistency with 4c81c9611f
where \040 has to be used because \0x20F gets interpret at three hex
digits. Octals escapes are never longer than three digits.
Now "crc32" is in the list too for completeness but it doesn't
actually have any effect. The description of the cache variable
says that "crc32 is always built" so it should be clear enough.
Update the description too.
It affects creation of not only the legacy lzma, unlzma, lzcat symlinks
but also lzgrep and other legacy names for the scripts. The last
LZMA Utils release was made in 2008 but these names are still used
in some places to handle .lzma files.
Also update the description to mention that this affects installation
of translated man pages too.
Prefixing the cache variables with the project name helps if
the package is used as a subproject in another package.
It also makes the package-specific options group more nicely
in ccmake and cmake-gui.
This way pthread options aren't passed to the linker when linking
against shared liblzma but they are still passed when linking against
static liblzma. (Also, one never needs the include path of the
threading library to use liblzma since liblzma's API headers
don't #include <pthread.h>. But <pthread.h> tends to be in the
default include path so here this change makes no difference.)
One cannot mix target_link_libraries() calls that use the scope
(PRIVATE, PUBLIC, or INTERFACE) keyword and calls that don't use it.
The calls without the keyword are like PUBLIC except perhaps when
they aren't, or something like that... It seems best to always
specify a scope keyword as the meanings of those three keywords
at least are clear.
This shouldn't make much difference in practice as on Windows
no flags are needed anyway and unitialized variable (when threading
is disabled) expands to empty. But it's clearer this way.
Now liblzma.pc can be relocatable only if using CMake >= 3.20
but that should be OK as now we shouldn't get broken liblzma.pc
if CMAKE_INSTALL_LIBDIR or CMAKE_INSTALL_INCLUDEDIR contain an
absolute path.
Thanks to Eli Schwartz.
This reverts commit 5d1c649ba9.
While CMAKE_INSTALL_<dir> tend to be relative paths, they don't need
to be. Thus the commit was broken. A fancier method is required.
Thanks to Eli Schwartz for the bug report and explanation.
On E2K the function compiles only due to compiler emulation but the
function is never used. It's cleaner to omit the function when it's
not needed even though it's a "static inline" function.
Thanks to Ilya Kurdyukov.
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.
The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.
Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.
Thanks to Sam James for general feedback.
Fixes: https://github.com/tukaani-project/xz/issues/112
Fixes: https://github.com/tukaani-project/xz/issues/122
Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p
and the p no longer has the lowest bit set (it makes no difference
as the output bits it affects are ignored).
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.
Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.
It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)
See also:
https://github.com/tukaani-project/xz/issues/112https://github.com/tukaani-project/xz/issues/122
There is no need to make a similar change in configure.ac.
With Autoconf 2.72, the deprecated macro AC_PROG_CC_C99
is an alias for AC_PROG_CC which prefers a C11 compiler.
GCC 4.2 doesn't have __builtin_bswap16() and friends so tuklib_integer.h
tries to use OS-specific byte swap methods instead. On OpenBSD those
macros are swap16/32/64 instead of bswap16/32/64 like on other *BSDs
and Darwin.
An alternative to "#ifdef __OpenBSD__" could be "#ifdef swap16" as it
is a macro. But since OpenBSD seems to be a special case under this
special case of "*BSDs and Darwin", checking for __OpenBSD__ seems
the more conservative choice now.
Thanks to Christian Weisgerber and Brad Smith who both submitted
the same patch a few hours apart.
Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Co-authored-by: Brad Smith <brad@comstyle.com>
Closes: https://github.com/tukaani-project/xz/pull/126
The C code is from Christian Weisgerber, I merely reordered the OSes.
Then I added the build system checks without testing them.
Also thanks to Brad Smith who submitted a similar patch on GitHub
a few hours after Christian had sent his via email.
Co-authored-by: Christian Weisgerber <naddy@mips.inka.de>
Closes: https://github.com/tukaani-project/xz/pull/125
This way the version string gets into xzgrep and other scripts
in full and also into liblzma.pc.
For the project() command, a suffixless string is required though.
liblzma_VERSION has never existed in the repository. xz_VERSION from
the project() command was used for liblzma SOVERSION so use xz_VERSION
here too.
The wrong variable did no harm in practice as PROJECT_VERSION
was used as the fallback. It has the same value as xz_VERSION.
Fixes: 7e3493d40e
This is a mess because liblzma DLL outside Cygwin and MSYS2
is liblzma.dll instead of lzma.dll to avoid a conflict with
lzma.dll from LZMA SDK.
On Cygwin the name was "liblzma-5.dll" while "cyglzma-5.dll"
would have been correct (and match what Libtool produces).
MSYS2 likely was broken too as it uses the "msys-" prefix.
This change has no effect with MinGW-w64 because with that
the "lib" prefix was correct already.
With MSVC builds this is a small breaking change that requires developers
to adjust the library name when linking against liblzma. The liblzma.dll
name is kept as is but the import library and static library are now
lzma.lib instead of liblzma.lib. This is helpful when using pkgconf
because "pkgconf --msvc-syntax --libs liblzma" outputs "lzma.lib"
(it's converted from "-llzma" in liblzma.pc). It would be easy to
keep the liblzma.lib naming but the pkgconf compatibility seems worth
it in the long run. The lzma.lib name is compatible with MinGW-w64
too as -llzma will find also lzma.lib.
vcpkg had been patching CMakeLists.txt this way since 2022 but I
learned this only recently. The reasoning for the patch makes sense,
and while this is a small breaking change with MSVC, it seems like
a decent compromise as it keeps the DLL name the same.
2022 patch in vcpkg: 0707a17ecf/ports/liblzma/win_output_name.patch
See the discussion: https://github.com/microsoft/vcpkg/pull/39024
Thanks to Vincent Torri for confirming the naming issue on Cygwin.
The ancient /bin/tr on Solaris doesn't support '\n'.
With /usr/xpg4/bin/tr it works but it might not be in PATH.
Another problem was that sed was given input that didn't have a newline
at the end. Text files must end with a newline to be portable.
Fix both problems:
- Handle multiline input within sed itself to avoid one tr invocation.
The default sed even on Solaris does understand \n.
- Use octals in tr -d. \012 works for ASCII "line feed", it's even
used as an example in the Solaris man page. But we must strip
also ASCII "carriage return" \015 and EBCDIC "next line" \025.
The EBCDIC case got handled with \n previously. Stripping \012
and \015 on EBCDIC system won't matter as those control chars
won't be present in the string in the first place.
An awk-based solution could be an alternative but it might need
special casing on Solaris to used nawk instead of awk. The changes
in this commit are smaller and should have a smaller risk for
regressions. It's also possible that version.sh will be dropped
entirely at some point.
There's no real value in doing it via commit for official GH actions. We
can keep using pinned commits for unofficial actions. It's hassle for no
gain.
Maybe going forward we can limit this further by only being paranoid
for the jobs with any access to tokens.
Solaris' GCC can't understand that our use is fine, unlike modern compilers:
```
list.c: In function 'print_totals_basic':
list.c:1191:4: error: format not a string literal, argument types not checked [-Werror=format-nonliteral]
uint64_to_str(totals.files, 0));
^~~~~~~~~~~~~
cc1: all warnings being treated as errors
```
It's presumably because of older gettext missing format attributes.
This is with `gcc (GCC) 7.3.0`.
In the past this wasn't done before releases; the Git repository
just contained the files from the Translation Project. But this
way it is clearer when comparing release tarballs against the
Git repository. In future releases this might no longer be necessary
within a stable branch as the .po files won't change so easily anymore
when creating a tarball.
When po/xz.pot doesn't exist, running "make" or "make dist" will
create it. Then the .po files will be updated but only if they
actually would change more than the POT-Creation-Date line.
Then the .gmo files would be generated from the .po files.
This is the case before and after this commit.
However, "make dist" and thus "make mydist" did a forced update
to the files, updating them even if the only change was the
POT-Creation-Date line. This had pros and cons: It made it clear
that the .po file really is in sync with the recent strings in
the package. On the other hand, it added noise in form of changed
files in the source tree and distribution tarballs. It can be
ignored with something like "diff -I'^"POT-Creation-Date: '" but
it's still a minor annoyance *if* there's not enough value in
having the most recent timestamp.
Setting DIST_DEPENDS_ON_UPDATE_PO = no means that such forced
update won't happen in "make dist" anymore. However, the "mydist"
target will use xz.pot-update target which is the same target that
is run when xz.pot doesn't exist at all yet. Thus "mydist" will
ensure that the translations are up to date, without noise from
changes that would affect only the POT-Creation-Date line.
Note that po4a always uses msgmerge with --update, so POT-Creation-Date
in the man page translations is never the only change in .po files.
In that sense this commit makes the message translations behave more
similarly to the man page translations.
Distribution tarballs will still have non-reproducible POT-Creation-Date
in po/xz.pot and po4a/xz-man.pot but those are just two files. Even they
could be made reproducible from a Git timestamp if desired.
The .po files from the Translation Project come with unwrapped
strings so this matches it.
This may reduce the noise in diffs too. When the beginning of
a paragraph had changed, the rest of the lines got rewrapped
in msgsid. Now it's just one very long line that changes when
a paragraph has been edited.
The --add-location=file option was removed as redundant. The line
numbers don't exist in the .pot file due to --porefs file and thus
they cannot get copied to the .po files either.
Names from git ls-files should be safe but if one runs it on
a tree without the .git dir and there are extra files, it's
safer to have the end of arguments marked with '--'.
This way the PO file diffs are less noisy but the locations of the
strings are still present at file level, just without line numbers.
The option is available since gettext 0.19 (2014).
configure.ac requires 0.19.6.
Since the source strings have changed, these would get marked as
fuzzy and the original string would be used instead. The original
and translated strings are identical in this case so it wouldn't
matter. But patching the translations helps still because then
po4a will show the correct translation percentage.
One has to pass -DENABLE_X86_ASM=ON to cmake to enable the
CRC assembly code. Autodetection isn't done. Looking at
CMAKE_SYSTEM_PROCESSOR might not work as it comes from uname
unless cross-compilation is done using a CMake toolchain file.
On top of this, if the code is run on modern processors that support
the CLMUL instruction, then the C code should be faster (but then
one should also be using a x86-64 build if possible).
In mydist the point is to check using the file list from the Git
repository. In dist-hook it is to check that the TARBALL_IGNORE
patterns work when the .git dir or the "git" command aren't available.
Refuse to create a distribution tarball if license issues are found.
This helps in spotting files that lack SPDX license identifier
and which haven't been explicitly white listed either. The script
requires the .git directory to be present as only the files that
are in the Git repository are checked.
XZ Utils isn't FSFE REUSE compliant for now.
The po4a directory is in EXTRA_DIST and thus all files there
are included in the package. .gitignore doesn't belong in the
package so keep that file out of the po4a directory.
Now the test scripts detect both
#define HAVE_DECODER_ARM
#define HAVE_DECODER_ARM 1
as support for the ARM filter without confusing it with these:
#define HAVE_DECODER_ARM64
#define HAVE_DECODER_ARM64 1
Previously only the ones ending with " 1" were accepted for
the macros where this kind of confusion was possible.
This should help with Meson support because Meson's built-in
features produce config.h entries that are either
#define FOO 1
#define FOO 0
or:
#define FOO
#undef FOO
The former method has a benefit that one can use "#if FOO" and -Wundef
will catch if a #define is missing (for example, it helps catching
typos). But XZ Utils has to use the latter since it has been
convenient with Autoconf's default behavior.[*] While it's easy to
emulate the Autoconf style (#define FOO 1 vs. no #define at all)
in Meson, it results in clumsy code. Thus it's better to change
the few places in the tests where this difference matters.
[*] While most checks in Autoconf default to the second style above,
a few things use the first style (like AC_CHECK_DECLS). The mix
of both styles is the most confusing as one has to remember which
macro needs #ifdef and which #if. Currently HAVE_VISIBILITY is
only such config.h entry that is 1 or 0. It comes unmodified
from Gnulib's visibility.m4.
Add a new optional argument to specify the directory of the xz and
xzdec executables.
If ../config.h doesn't exist, assume that all encoders and decoders
are available.
Add a new optional second argument: directory of the xz and xzdec
executables. This is need with the CMake build where the binaries
end up in the top-level build directory.
If ../config.h doesn't exist, assume that all encoders and decoders
are available. This will make this script usable from CMake in the
most common build configuration.
NOTE: Since the existence of ../config.h is checked, the working
directory of the test script must be a subdir in the build tree!
Otherwise ../config.h would look outside the build tree.
Use the default check type instead of forcing CRC32 or CRC64.
Now the script doesn't need to check if CRC64 is available.
This is a bit hacky since the scripts grep config.h to know which
features were built but the CMake build doesn't create config.h.
So instead those test scripts will be run only when all relevant
features have been enabled.
The code makes aligned 16-byte reads which may read up to 15 bytes
before the beginning or past the end of the buffer if the buffer
is misaligned. The unneeded bytes are then ignored. It cannot cross
page boundaries and thus cannot cause access violations.
This inherently trips address sanitizer which was already disabled
with __attribute__((__no_sanitize_address__)). However, it also
trips memory sanitizer if the extra bytes are uninitialized because
memory sanitizer doesn't see that those bytes then get ignored by
byte shuffling in the xmm registers.
The plan is to change the code so that all sanitizers pass but it's
not finished yet (performance shouldn't get worse) so as a temporary
measure to keep OSS Fuzz happy, the CLMUL CRC is now disabled even
though I think think the code is fine to use (and easy enough to review
the memory accesses in it too).
This is closer to what it was before the --filtersX support was added,
just extended to support for scaling all filter chains. The method
before this commit was an extended version of the original too but
it was done in a more complex way for no clear reason. In case of
an error, the complex version printed fewer informative messages
(a good thing) but it's not a sigificant benefit.
In the limit is too low even for single-threaded mode, the required
amount of memory is now reported like in 5.4.x instead of like in
5.5.1alpha - 5.6.1 which showed the original non-scaled usage. It
had been a FIXME in the old code but it's not clear what message
makes the most sense.
Fixes: 5f0c5a0438
The convention is that
lzma_filter filters[LZMA_FILTERS_MAX + 1];
contains the filters of a single filter chain.
It was so here as well before the commit
d6af7f3470.
It changes "filters" to a ten-element array of filter chains.
It's clearer to call this array-of-arrays "chains".
This also renames "filter_idx" to "chain_idx" which is used
as an index as in chains[chain_idx].
opt_mode == MODE_COMPRESS isn't possible when HAVE_ENCODERS isn't
defined. Thus, when *encoding*, the message about *decoder* memory
usage is possible to show only when both encoder and decoder have
been built.
Since the message is shown only at V_DEBUG, skip the memusage
calculation if verbosity level isn't high enough.
Fixes: 5f0c5a0438
Since 5.3.5beta, "xz --lzma2=mf=bt4,nice=2" works even though bt4 needs
at least nice=4. It is rounded up internally by liblzma when needed.
Fixes: 5cd9f0df78
These are temporary files that are needed only when running po4a.
The top-level Makefile.am puts the whole po4a directory into
distribution tarball (it's simpler) so deleting these temporary
files is needed to prevent them from getting into tarballs.
The latest stable is 3.3.0 and it's from 2014.
Don't mention the older versions in INSTALL.
3.3.0 ships with Clang already.
Testing with 3.4.0beta6 shows that tuklib_physmem
works too so omit comments about that from INSTALL.
Visibility warnigns weren't a problem either.
Thus it's enough to mention the need for --disable-threads
as configure doesn't autodetect the lack of pthreads.
This was added to xz in 02e3505991
but I forgot to do the same in xzdec.
The Landlock sandbox in xzdec could be stricter as now it's
active only for the last file being decompressed. In xz,
read-only sandbox is used for multi-file case. On the other hand,
xz doesn't go to the strictest mode when processing the last file
when more than one file was specified; xzdec does.
Clang 17 with -fsanitize=address,undefined:
src/liblzma/common/filter_common.c:366:8: runtime error:
call to function encoder_find through pointer to incorrect
function type 'const lzma_filter_coder *(*)(unsigned long)'
src/liblzma/common/filter_encoder.c:187: note:
encoder_find defined here
Use a wrapper function to get the correct type neatly.
This reduces the number of casts needed too.
This issue could be a problem with control flow integrity (CFI)
methods that check the function type on indirect function calls.
Fixes: 3b34851de1
It's undefined behavior. The result wasn't ever used as it occurred
in the last iteration of a loop.
Clang 17 with -fsanitize=address,undefined:
$ src/xz/xz --block-list=123
src/xz/args.c:164:12: runtime error: applying non-zero offset 1
to null pointer
Fixes: 88ccf47205
Co-authored-by: Sam James <sam@gentoo.org>
This is disabled by default to match the default in Autotools.
Use -DUSE_DOXYGEN=ON to enable Doxygen usage.
This uses the update-doxygen script, thus this is under if(UNIX)
although Doxygen itself can run on Windows too.
The stripping method worked well with Doxygen 1.8 and 1.9 but
it doesn't work with Doxygen 1.10 anymore. Since we won't ship
pre-generated liblzma API docs anymore, the extra bloat and
extra license info of the JavaScript files won't affect the
upstream source package anymore.
This moves the tests section as is from CMakeLists.txt into
tests/tests.cmake. CMakeLists.txt now includes tests/tests.cmake
if the latter file exists.
Now it's possible to delete the whole "tests" directory and
building with CMake will still work normally, just without
the tests. This way the tests are readily available for those
who want them, and those who won't run the tests anyway have
a straightforward way to ensure that nothing from the "tests"
directory can affect the build process.
These are very old but the exact test file isn't easy to reproduce
as it was compiled from a short C program (bcj_test.c) long ago.
These tests weren't very good anyway, just a little better than nothing.
liblzma guarantees that the product of the allocation size arguments
will fit in size_t.
Putting the pre-increment in the if-statement was clearly wrong
although in practice it didn't matter here as the function is
called only a couple of times.
If the arguments to lzma_index_decoder() or lzma_index_buffer_decode()
were such that LZMA_PROG_ERROR was returned, the lzma_index **i
argument wasn't touched even though the API docs say that *i = NULL
is done if an error occurs. This obviously won't be done even now
if i == NULL but otherwise it is best to do it due to the wording
in the API docs.
In practice this matters very little: The problem can occur only
if the functions are called with invalid arguments, that is,
the calling application must already have a bug.
The __builtin_bswapXX from GCC and Clang are preferred when
they are available. This can allow compilers to emit the x86 MOVBE
instruction instead of doing a load + byteswap as two instructions
(which would happen if the byteswapping is done in inline asm).
bswap16, bswap32, and bswap64 exist in system headers on *BSDs
and Darwin. #defining bswap16 on NetBSD results in a warning about
macro redefinition. It's safest to avoid this namespace conflict
completely.
No OS supported by tuklib_integer.h uses byteswapXX names and
a web search doesn't immediately find any obvious danger of
namespace conflicts. So let's try these still-pretty-short names
for the macros.
Thanks to Sam James for pointing out the compiler warning on
NetBSD 10.0.
The API docs clearly say that if error_pos isn't NULL then *error
is always set on any error. However, it wasn't touched if str == NULL
or filters == NULL or unsupported flags were specified.
Fixes: cedeeca2ea
It is logical why it cannot know for sure that the value has
to be at most 4 if it is less than 16.
The x86 filter is based on a very old LZMA SDK version. Newer
ones have quite a different implementation for the same filter.
Thanks to Sam James.
On macOS, we get:
```
signals.c: In function 'signals_init':
signals.c:76:17: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion]
76 | sigaddset(&hooked_signals, sigs[i]);
| ^~~~~~~~~
signals.c:81:17: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion]
81 | sigaddset(&hooked_signals, message_progress_sigs[i]);
| ^~~~~~~~~
signals.c:86:9: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion]
86 | sigaddset(&hooked_signals, SIGTSTP);
| ^~~~~~~~~
```
We use `int` for `hooked_signals` but we can't just cast to whatever
`sigset_t` is because `sigset_t` is an opaque type. It's an unsigned int
on macOS. On macOS, `sigaddset` is implemented as a macro.
Just suppress -Wsign-conversion for `signals_init` for macOS given
there's no real nice way of fixing this.
A few lines were reordered, a few ARRAY_SIZE were changed to sizeof,
and a few uint32_t were changed to size_t. No real functional changes
were intended.
We need this for when we're passing sanitizer flags or -gdwarf-4 for Clang
with Valgrind. Just always start with -O2 if CFLAGS isn't set in the
environment and append what was passed on the command line.
Using `--trace-children=yes` has a trade-off here, as it makes
`test_scripts.sh` pretty slow when calling various non-xz utilities.
But I also feel like it's not useless to have Valgrind used there and it's
not easy to exclude Valgrind just for that one test...
I did consider using AX_VALGRIND_CHECK [0][1] but I couldn't get it working
immediately with some conditionally-built tests and I wondered if it was
worth spending time on at least while we're debating xz's future build
system situation.
[0] https://www.gnu.org/software/autoconf-archive/ax_valgrind_check.html
[1] https://tecnocode.co.uk/2014/12/23/automatically-valgrinding-code-with-ax_valgrind_check/
This is *NOT* done for security reasons even though the backdoor
relied on the ifunc code. Instead, the reason is that in this
project ifunc provides little benefits but it's quite a bit of
extra code to support it. The only case where ifunc *might* matter
for performance is if the CRC functions are used directly by an
application. In normal compression use it's completely irrelevant.
While the backdoor was inactive (and thus harmless) without inserting
a small trigger code into the build system when the source package was
created, it's good to remove this anyway:
- The executable payloads were embedded as binary blobs in
the test files. This was a blatant violation of the
Debian Free Software Guidelines.
- On machines that see lots bots poking at the SSH port, the backdoor
noticeably increased CPU load, resulting in degraded user experience
and thus overwhelmingly negative user feedback.
- The maintainer who added the backdoor has disappeared.
- Backdoors are bad for security.
This reverts the following without making any other changes:
6e636819 Tests: Update two test files.
a3a29bbd Tests: Test --single-stream can decompress bad-3-corrupt_lzma2.xz.
0b4ccc91 Tests: Update RISC-V test files.
8c9b8b20 liblzma: Fix typos in crc32_fast.c and crc64_fast.c.
82ecc538 liblzma: Fix false Valgrind error report with GCC.
cf44e4b7 Tests: Add a few test files.
3060e107 Tests: Use smaller dictionary size in RISC-V test files.
e2870db5 Tests: Add two RISC-V Filter test files.
The RISC-V test files also have real content that tests the filter
but the real content would fit into much smaller files. A generator
program would need to be available as well.
Thanks to Andres Freund for finding and reporting it and making
it public quickly so others could act without a delay.
See: https://www.openwall.com/lists/oss-security/2024/03/29/4
This does the previous commit with CMake.
AC_EGREP_CPP uses AC_REQUIRE so the outermost if-commands must
be changed to AS_IF to ensure that things wont break some day.
See 5a5bd7f871.
It doesn't support the __symver__ attribute or __asm__(".symver ...").
The generic symbol versioning can still be used since it only needs
linker support.
NVHPC compiler has several issues that make it impossible to
build liblzma:
- the compiler cannot handle unions that contain pointers that
are not the first members;
- the compiler cannot handle the assembler code in range_decoder.h
(LZMA_RANGE_DECODER_CONFIG has to be set to zero);
- the compiler fails to produce valid code for delta_decode if the
vectorization is enabled, which results in failed tests.
This introduces NVHPC-specific workarounds that address the issues.
There are cases when the users want to decide themselves whether
they want to have the generic (even on GNU/Linux) or the linux
(even if we do not recommend that) symbol versioning variant.
The former might be needed to circumvent compiler issues (i.e.
the compiler does not support all features that are required
for the linux versioning), the latter might help in overriding
the assumptions made in the configure script.
The original files were generated with random local to my machine.
To better reproduce these files in the future, a constant seed was used
to recreate these files.
With GCC and a certain combination of flags, Valgrind will falsely
trigger an invalid write. This appears to be due to the omission of
instructions to properly save, set up, and restore the frame pointer.
The IFUNC resolver is a leaf function since it only calls a function
that is inlined. So sometimes GCC omits the frame pointer instructions
in the resolver unless this optimization is explictly disabled.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=2267598.
Using __attribute__((__no_profile_instrument_function__)) on the ifunc
resolver works around a bug in GCC -fprofile-generate:
it adds profiling code even to ifunc resolvers which can make
the ifunc resolver crash at program startup. This attribute
was not introduced until GCC 7 and Clang 13, so ifunc won't
be used with prior versions of these compilers.
This bug was brought to our attention by:
https://bugs.gentoo.org/925415
And was reported to upstream GCC by:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11411
Now that multi threaded encoding is the default, users do not need to
see a warning message everytime the number of threads is reduced. On
some machines, this could happen very often. It is not unreasonable for
users to need to set double verbose mode to see this kind of
information.
To see these warning messages -vv or --verbose --verbose must be passed
to set xz into the highest possible verbosity mode.
These warnings had caused automated testing frameworks to fail when they
expected no output to stderr.
Thanks to Sebastian Andrzej Siewior for reporting this and for the
initial version of the patch.
The previous Linux Landlock feature test assumed that having the
linux/landlock.h header file was enough. The new feature tests also
requires that prctl() and the required Landlock system calls are
supported.
XGETTEXT_OPTIONS = --add-location=file --no-wrap --keyword=_ --keyword=N_ '--keyword=W_:1,"This is word wrapped at spaces. The Unicode character U+00A0 works as a non-breaking space. Tab (\t) is interpret as a zero-width space (the tab itself is not displayed); U+200B is NOT supported. Manual word wrapping with \n is supported but requires care."'
# This is the copyright holder that gets inserted into the header of the
# $(DOMAIN).pot file. Set this to the copyright holder of the surrounding
@ -63,7 +63,7 @@ USE_MSGCTXT = no
# Useful options are in particular:
# --previous to keep previous msgids of translated messages,
# --quiet to reduce the verbosity.
MSGMERGE_OPTIONS = --no-wrap
MSGMERGE_OPTIONS = --add-location=file --no-wrap
# These options get passed to msginit.
# If you want to disable line wrapping when writing PO files, add
@ -84,4 +84,8 @@ PO_DEPENDS_ON_POT = yes
# regenerate PO files on "make dist". Possible values are "yes" and
# "no". Set this to no if the POT file and PO files are maintained
# externally.
DIST_DEPENDS_ON_UPDATE_PO = yes
#
# NOTE: The the custom "mydist" target in ../Makefile.am updates xz.pot.
# An updated xz.pot will cause the .po files to be updated too but
# only when updating would change more than the POT-Creation-Date line.