Pull perf fixes from Ingo Molnar:
"I'd like to apologize for this very late pull request: I was dithering
through the week whether to send the fixes, and then yesterday Jiri's
crash fix for a regression introduced in this cycle clearly marked
perf/urgent as 'must merge now'.
Most of the commits are tooling fixes, plus there's three kernel fixes
via four commits:
- race fix in the Intel PEBS code
- fix an AUX bug and roll back a previous attempt
- fix AMD family 17h generic HW cache-event perf counters
The largest diffstat contribution comes from the AMD fix - a new event
table is introduced, which is a fairly low risk change but has a large
linecount"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix race in intel_pmu_disable_event()
perf/x86/intel/pt: Remove software double buffering PMU capability
perf/ring_buffer: Fix AUX software double buffering
perf tools: Remove needless asm/unistd.h include fixing build in some places
tools arch uapi: Copy missing unistd.h headers for arc, hexagon and riscv
tools build: Add -ldl to the disassembler-four-args feature test
perf cs-etm: Always allocate memory for cs_etm_queue::prev_packet
perf cs-etm: Don't check cs_etm_queue::prev_packet validity
perf report: Report OOM in status line in the GTK UI
perf bench numa: Add define for RUSAGE_THREAD if not present
tools lib traceevent: Change tag string for error
perf annotate: Fix build on 32 bit for BPF annotation
tools uapi x86: Sync vmx.h with the kernel
perf bpf: Return value with unlocking in perf_env__find_btf()
MAINTAINERS: Include vendor specific files under arch/*/events/*
perf/x86/amd: Update generic hardware cache events for Family 17h
Pull scheduler fix from Ingo Molnar:
"Fix a kobject memory leak in the cpufreq code"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cpufreq: Fix kobject memleak
Pull x86 fix from Ingo Molnar:
"Disable function tracing during early SME setup to fix a boot crash on
SME-enabled kernels running distro kernels (some of which have
function tracing enabled)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/mem_encrypt: Disable all instrumentation for early SME setup
Pull vfs fixes from Al Viro:
- a couple of ->i_link use-after-free fixes
- regression fix for wrong errno on absent device name in mount(2)
(this cycle stuff)
- ancient UFS braino in large GID handling on Solaris UFS images (bogus
cut'n'paste from large UID handling; wrong field checked to decide
whether we should look at old (16bit) or new (32bit) field)
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
Abort file_remove_privs() for non-reg. files
[fix] get rid of checking for absent device name in vfs_get_tree()
apparmorfs: fix use-after-free on symlink traversal
securityfs: fix use-after-free on symlink traversal
New race in x86_pmu_stop() was introduced by replacing the
atomic __test_and_clear_bit() of cpuc->active_mask by separate
test_bit() and __clear_bit() calls in the following commit:
3966c3feca ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
The race causes panic for PEBS events with enabled callchains:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:perf_prepare_sample+0x8c/0x530
Call Trace:
<NMI>
perf_event_output_forward+0x2a/0x80
__perf_event_overflow+0x51/0xe0
handle_pmi_common+0x19e/0x240
intel_pmu_handle_irq+0xad/0x170
perf_event_nmi_handler+0x2e/0x50
nmi_handle+0x69/0x110
default_do_nmi+0x3e/0x100
do_nmi+0x11a/0x180
end_repeat_nmi+0x16/0x1a
RIP: 0010:native_write_msr+0x6/0x20
...
</NMI>
intel_pmu_disable_event+0x98/0xf0
x86_pmu_stop+0x6e/0xb0
x86_pmu_del+0x46/0x140
event_sched_out.isra.97+0x7e/0x160
...
The event is configured to make samples from PEBS drain code,
but when it's disabled, we'll go through NMI path instead,
where data->callchain will not get allocated and we'll crash:
x86_pmu_stop
test_bit(hwc->idx, cpuc->active_mask)
intel_pmu_disable_event(event)
{
...
intel_pmu_pebs_disable(event);
...
EVENT OVERFLOW -> <NMI>
intel_pmu_handle_irq
handle_pmi_common
TEST PASSES -> test_bit(bit, cpuc->active_mask))
perf_event_overflow
perf_prepare_sample
{
...
if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
data->callchain = perf_callchain(event, regs);
CRASH -> size += data->callchain->nr;
}
</NMI>
...
x86_pmu_disable_event(event)
}
__clear_bit(hwc->idx, cpuc->active_mask);
Fixing this by disabling the event itself before setting
off the PEBS bit.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Arcari <darcari@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 3966c3feca ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull powerpc fix from Michael Ellerman:
"One regression fix.
Changes we merged to STRICT_KERNEL_RWX on 32-bit were causing crashes
under load on some machines depending on memory layout.
Thanks to Christophe Leroy"
* tag 'powerpc-5.1-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWX
Pull KVM fixes from Paolo Bonzini:
- PPC and ARM bugfixes from submaintainers
- Fix old Windows versions on AMD (recent regression)
- Fix old Linux versions on processors without EPT
- Fixes for LAPIC timer optimizations
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: nVMX: Fix size checks in vmx_set_nested_state
KVM: selftests: make hyperv_cpuid test pass on AMD
KVM: lapic: Check for in-kernel LAPIC before deferencing apic pointer
KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size
x86/kvm/mmu: reset MMU context when 32-bit guest switches PAE
KVM: x86: Whitelist port 0x7e for pre-incrementing %rip
Documentation: kvm: fix dirty log ioctl arch lists
KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit
KVM: arm/arm64: Don't emulate virtual timers on userspace ioctls
kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
KVM: arm/arm64: Ensure vcpu target is unset on reset failure
KVM: lapic: Convert guest TSC to host time domain if necessary
KVM: lapic: Allow user to disable adaptive tuning of timer advancement
KVM: lapic: Track lapic timer advance per vCPU
KVM: lapic: Disable timer advancement if adaptive tuning goes haywire
x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
KVM: x86: Consider LAPIC TSC-Deadline timer expired if deadline too short
KVM: PPC: Book3S: Protect memslots while validating user address
KVM: PPC: Book3S HV: Perserve PSSCR FAKE_SUSPEND bit on guest exit
KVM: arm/arm64: vgic-v3: Retire pending interrupts on disabling LPIs
...
Pull i2c fixes from Wolfram Sang:
"I2C driver bugfixes and a MAINTAINERS update for you"
* 'i2c/for-current-fixed' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: Prevent runtime suspend of adapter when Host Notify is required
i2c: synquacer: fix enumeration of slave devices
MAINTAINERS: friendly takeover of i2c-gpio driver
i2c: designware: ratelimit 'transfer when suspended' errors
i2c: imx: correct the method of getting private data in notifier_call
Pull drm fix from Dave Airlie:
"Just a single qxl revert"
* tag 'drm-fixes-2019-05-03' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/qxl: drop prime import/export callbacks"
Pull clk fixes from Stephen Boyd:
"Two fixes for the NKMP clks on Allwinner SoCs, a locking fix for
clkdev where we forgot to hold a lock while iterating a list that can
change, and finally a build fix that adds some stubs for clk APIs that
are used by devfreq drivers on platforms without the clk APIs"
* tag 'clk-fixes-for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Add missing stubs for a few functions
clkdev: Hold clocks_mutex while iterating clocks list
clk: sunxi-ng: nkmp: Explain why zero width check is needed
clk: sunxi-ng: nkmp: Avoid GENMASK(-1, 0)
Pull sound fixes from Takashi Iwai:
"A few stable fixes at this round.
The USB Line6 audio fixes are a bit large, but they are rather trivial
and pretty much device-specific, so should be safe to apply at this
late stage. Ditto for other HD-audio quirks"
* tag 'sound-5.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Apply the fixup for ASUS Q325UAR
ALSA: line6: use dynamic buffers
ALSA: hda/realtek - Fixed Dell AIO speaker noise
ALSA: hda/realtek - Add new Dell platform for headset mode
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
tools UAPI:
Arnaldo Carvalho de Melo:
- Sync x86's vmx.h with the kernel.
- Copy missing unistd.h headers for arc, hexagon and riscv, fixing
a reported build regression on the ARC 32-bit architecture.
perf bench numa:
Arnaldo Carvalho de Melo:
- Add define for RUSAGE_THREAD if not present, fixing the build on the
ARC architecture when only zlib and libnuma are present.
perf BPF:
Arnaldo Carvalho de Melo:
- The disassembler-four-args feature test needs -ldl on distros such as
Mageia 7.
Bo YU:
- Fix unlocking on success in perf_env__find_btf(), detected with
the coverity tool.
libtraceevent:
Leo Yan:
- Change misleading hard coded 'trace-cmd' string in error messages.
ARM hardware tracing:
Leo Yan:
- Always allocate memory for cs_etm_queue::prev_packet, fixing a segfault
when processing CoreSight perf data.
perf annotate:
Thadeu Lima de Souza Cascardo:
- Fix build on 32 bit for BPF.
perf report:
Thomas Richter:
- Report OOM in status line in the GTK UI.
core libs:
- Remove needless asm/unistd.h that, used with sys/syscall.h ended
up redefining the syscalls defines in environments such as the
ARC arch when using uClibc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since those were introduced in:
c8ce48f065 ("asm-generic: Make time32 syscall numbers optional")
But when the asm-generic/unistd.h was sync'ed with tools/ in:
1a787fc5ba ("tools headers uapi: Sync copy of asm-generic/unistd.h with the kernel sources")
I forgot to copy the files for the architectures that define
__ARCH_WANT_TIME32_SYSCALLS, so the perf build was breaking there, as
reported by Vineet Gupta for the ARC architecture.
After updating my ARC container to use the glibc based toolchain + cross
building libnuma, zlib and elfutils, I finally managed to reproduce the
problem and verify that this now is fixed and will not regress as will
be tested before each pull req sent upstream.
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
CC: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20190426193531.GC28586@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Backlund reported that the perf build was failing on the Mageia 7
distro, that is because it uses:
cat /tmp/build/perf/feature/test-disassembler-four-args.make.output
/usr/bin/ld: /usr/lib64/libbfd.a(plugin.o): in function `try_load_plugin':
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:243:
undefined reference to `dlopen'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:271:
undefined reference to `dlsym'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:256:
undefined reference to `dlclose'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:246:
undefined reference to `dlerror'
as we allow dynamic linking and loading
Mageia 7 uses these linker flags:
$ rpm --eval %ldflags
-Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-O1 -Wl,--build-id -Wl,--enable-new-dtags
So add -ldl to this feature LDFLAGS.
Reported-by: Thomas Backlund <tmb@mageia.org>
Tested-by: Thomas Backlund <tmb@mageia.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lkml.kernel.org/r/20190501173158.GC21436@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Robert Walker reported a segmentation fault is observed when process
CoreSight trace data; this issue can be easily reproduced by the command
'perf report --itrace=i1000i' for decoding tracing data.
If neither the 'b' flag (synthesize branches events) nor 'l' flag
(synthesize last branch entries) are specified to option '--itrace',
cs_etm_queue::prev_packet will not been initialised. After merging the
code to support exception packets and sample flags, there introduced a
number of uses of cs_etm_queue::prev_packet without checking whether it
is valid, for these cases any accessing to uninitialised prev_packet
will cause crash.
As cs_etm_queue::prev_packet is used more widely now and it's already
hard to follow which functions have been called in a context where the
validity of cs_etm_queue::prev_packet has been checked, this patch
always allocates memory for cs_etm_queue::prev_packet.
Reported-by: Robert Walker <robert.walker@arm.com>
Suggested-by: Robert Walker <robert.walker@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Robert Walker <robert.walker@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 7100b12cf4 ("perf cs-etm: Generate branch sample for exception packet")
Fixes: 24fff5eb2b ("perf cs-etm: Avoid stale branch samples when flush packet")
Link: http://lkml.kernel.org/r/20190428083228.20246-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An -ENOMEM error is not reported in the GTK GUI. Instead this error
message pops up on the screen:
[root@m35lp76 perf]# ./perf report -i perf.data.error68-1
Processing events... [974K/3M]
Error:failed to process sample
0xf4198 [0x8]: failed to process type: 68
However when I use the same perf.data file with --stdio it works:
[root@m35lp76 perf]# ./perf report -i perf.data.error68-1 --stdio \
| head -12
# Total Lost Samples: 0
#
# Samples: 76K of event 'cycles'
# Event count (approx.): 99056160000
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .........
#
8.81% find [kernel.kallsyms] [k] ftrace_likely_update
8.74% swapper [kernel.kallsyms] [k] ftrace_likely_update
8.34% sshd [kernel.kallsyms] [k] ftrace_likely_update
2.19% kworker/u512:1- [kernel.kallsyms] [k] ftrace_likely_update
The sample precentage is a bit low.....
The GUI always fails in the FINISHED_ROUND event (68) and does not
indicate the reason why.
When happened is the following. Perf report calls a lot of functions and
down deep when a FINISHED_ROUND event is processed, these functions are
called:
perf_session__process_event()
+ perf_session__process_user_event()
+ process_finished_round()
+ ordered_events__flush()
+ __ordered_events__flush()
+ do_flush()
+ ordered_events__deliver_event()
+ perf_session__deliver_event()
+ machine__deliver_event()
+ perf_evlist__deliver_event()
+ process_sample_event()
+ hist_entry_iter_add() --> only called in GUI case!!!
+ hist_iter__report__callback()
+ symbol__inc_addr_sample()
Now this functions runs out of memory and
returns -ENOMEM. This is reported all the way up
until function
perf_session__process_event() returns to its caller, where -ENOMEM is
changed to -EINVAL and processing stops:
if ((skip = perf_session__process_event(session, event, head)) < 0) {
pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
head, event->header.size, event->header.type);
err = -EINVAL;
goto out_err;
}
This occurred in the FINISHED_ROUND event when it has to process some
10000 entries and ran out of memory.
This patch indicates the root cause and displays it in the status line
of ther perf report GUI.
Output before (on GUI status line):
0xf4198 [0x8]: failed to process type: 68
Output after:
0xf4198 [0x8]: failed to process type: 68 [not enough memory]
Committer notes:
the 'skip' variable needs to be initialized to -EINVAL, so that when the
size is less than sizeof(struct perf_event_attr) we avoid this valid
compiler warning:
util/session.c: In function ‘perf_session__process_events’:
util/session.c:1936:7: error: ‘skip’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
err = skip;
~~~~^~~~~~
util/session.c:1874:6: note: ‘skip’ was declared here
s64 skip;
^~~~
cc1: all warnings being treated as errors
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20190423105303.61683-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While cross building perf to the ARC architecture on a fedora 30 host,
we were failing with:
CC /tmp/build/perf/bench/numa.o
bench/numa.c: In function ‘worker_thread’:
bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’?
getrusage(RUSAGE_THREAD, &rusage);
^~~~~~~~~~~~~
SIGEV_THREAD
bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in
[perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1
arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
[perfbuilder@60d5802468f6 perf]$
Trying to reproduce a report by Vineet, I noticed that, with just
cross-built zlib and numactl libraries, I ended up with the above
failure.
So, since RUSAGE_THREAD is available as a define, check for that and
numactl libraries, I ended up with the above failure.
So, since RUSAGE_THREAD is available as a define in the system headers,
check if it is defined in the 'perf bench numa' sources and define it if
not.
Now it builds and I have to figure out if the problem reported by Vineet
only takes place if we have libelf or some other library available.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The traceevent lib is used by the perf tool, and when executing
perf test -v 6
it outputs error log on the ARM64 platform:
running test 33 '*:*'trace-cmd: No such file or directory
[...]
trace-cmd: Invalid argument
The trace event parsing code originally came from trace-cmd so it keeps
the tag string "trace-cmd" for errors, this easily introduces the
impression that the perf tool launches trace-cmd command for trace event
parsing, but in fact the related parsing is accomplished by the
traceevent lib.
This patch changes the tag string to "libtraceevent" so that we can
avoid confusion and let users to more easily connect the error with
traceevent lib.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190424013802.27569-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from:
2b27924bb1 ("KVM: nVMX: always use early vmcs check when EPT is disabled")
That causes this object in the tools/perf build process to be rebuilt:
CC /tmp/build/perf/arch/x86/util/kvm-stat.o
But it isn't using VMX_ABORT_ prefixed constants, so no change in
behaviour.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/n/tip-bjbo3zc0r8i8oa0udpvftya6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull networking fixes from David Miller:
1) Out of bounds access in xfrm IPSEC policy unlink, from Yue Haibing.
2) Missing length check for esp4 UDP encap, from Sabrina Dubroca.
3) Fix byte order of RX STBC access in mac80211, from Johannes Berg.
4) Inifnite loop in bpftool map create, from Alban Crequy.
5) Register mark fix in ebpf verifier after pkt/null checks, from Paul
Chaignon.
6) Properly use rcu_dereference_sk_user_data in L2TP code, from Eric
Dumazet.
7) Buffer overrun in marvell phy driver, from Andrew Lunn.
8) Several crash and statistics handling fixes to bnxt_en driver, from
Michael Chan and Vasundhara Volam.
9) Several fixes to the TLS layer from Jakub Kicinski (copying negative
amounts of data in reencrypt, reencrypt frag copying, blind nskb->sk
NULL deref, etc).
10) Several UDP GRO fixes, from Paolo Abeni and Eric Dumazet.
11) PID/UID checks on ipv6 flow labels are inverted, from Willem de
Bruijn.
12) Use after free in l2tp, from Eric Dumazet.
13) IPV6 route destroy races, also from Eric Dumazet.
14) SCTP state machine can erroneously run recursively, fix from Xin
Long.
15) Adjust AF_PACKET msg_name length checks, add padding bytes if
necessary. From Willem de Bruijn.
16) Preserve skb_iif, so that forwarded packets have consistent values
even if fragmentation is involved. From Shmulik Ladkani.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
udp: fix GRO packet of death
ipv6: A few fixes on dereferencing rt->from
rds: ib: force endiannes annotation
selftests: fib_rule_tests: print the result and return 1 if any tests failed
ipv4: ip_do_fragment: Preserve skb_iif during fragmentation
net/tls: avoid NULL pointer deref on nskb->sk in fallback
selftests: fib_rule_tests: Fix icmp proto with ipv6
packet: validate msg_namelen in send directly
packet: in recvmsg msg_name return at least sizeof sockaddr_ll
sctp: avoid running the sctp state machine recursively
stmmac: pci: Fix typo in IOT2000 comment
Documentation: fix netdev-FAQ.rst markup warning
ipv6: fix races in ip6_dst_destroy()
l2ip: fix possible use-after-free
appletalk: Set error code if register_snap_client failed
net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc
rxrpc: Fix net namespace cleanup
ipv6/flowlabel: wait rcu grace period before put_pid()
vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach
tcp: add sanity tests in tcp_add_backlog()
...
Pull io_uring fixes from Jens Axboe:
"This is mostly io_uring fixes/tweaks. Most of these were actually done
in time for the last -rc, but I wanted to ensure that everything
tested out great before including them. The code delta looks larger
than it really is, as it's mostly just comment additions/changes.
Outside of the comment additions/changes, this is mostly removal of
unnecessary barriers. In all, this pull request contains:
- Tweak to how we handle errors at submission time. We now post a
completion event if the error occurs on behalf of an sqe, instead
of returning it through the system call. If the error happens
outside of a specific sqe, we return the error through the system
call. This makes it nicer to use and makes the "normal" use case
behave the same as the offload cases. (me)
- Fix for a missing req reference drop from async context (me)
- If an sqe is submitted with RWF_NOWAIT, don't punt it to async
context. Return -EAGAIN directly, instead of using it as a hint to
do async punt. (Stefan)
- Fix notes on barriers (Stefan)
- Remove unnecessary barriers (Stefan)
- Fix potential double free of memory in setup error (Mark)
- Further improve sq poll CPU validation (Mark)
- Fix page allocation warning and leak on buffer registration error
(Mark)
- Fix iov_iter_type() for new no-ref flag (Ming)
- Fix a case where dio doesn't honor bio no-page-ref (Ming)"
* tag 'for-linus-20190502' of git://git.kernel.dk/linux-block:
io_uring: avoid page allocation warnings
iov_iter: fix iov_iter_type
block: fix handling for BIO_NO_PAGE_REF
io_uring: drop req submit reference always in async punt
io_uring: free allocated io_memory once
io_uring: fix SQPOLL cpu validation
io_uring: have submission side sqe errors post a cqe
io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP
io_uring: remove unnecessary barrier after incrementing dropped counter
io_uring: remove unnecessary barrier before reading SQ tail
io_uring: remove unnecessary barrier after updating SQ head
io_uring: remove unnecessary barrier before reading cq head
io_uring: remove unnecessary barrier before wq_has_sleeper
io_uring: fix notes on barriers
io_uring: fix handling SQEs requesting NOWAIT
Multiple users have reported their Synaptics touchpad has stopped
working between v4.20.1 and v4.20.2 when using SMBus interface.
The culprit for this appeared to be commit c5eb119007 ("PCI / PM: Allow
runtime PM without callback functions") that fixed the runtime PM for
i2c-i801 SMBus adapter. Those Synaptics touchpad are using i2c-i801
for SMBus communication and testing showed they are able to get back
working by preventing the runtime suspend of adapter.
Normally when i2c-i801 SMBus adapter transmits with the client it resumes
before operation and autosuspends after.
However, if client requires SMBus Host Notify protocol, what those
Synaptics touchpads do, then the host adapter must not go to runtime
suspend since then it cannot process incoming SMBus Host Notify commands
the client may send.
Fix this by keeping I2C/SMBus adapter active in case client requires
Host Notify.
Reported-by: Keijo Vaara <ferdasyn@rocketmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203297
Fixes: c5eb119007 ("PCI / PM: Allow runtime PM without callback functions")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Keijo Vaara <ferdasyn@rocketmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The I2C host driver for SynQuacer fails to populate the of_node and
ACPI companion fields of the struct i2c_adapter it instantiates,
resulting in enumeration of the subordinate I2C bus to fail.
Fixes: 0d676a6c43 ("i2c: add support for Socionext SynQuacer I2C controller")
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
I haven't heard from Haavard in years despite putting him to the CC list for
i2c-gpio related mails. Since I was doing the work on this driver for a while
now, let me take official maintainership, so it will be more clear to users.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Haavard Skinnemoen <hskinnemoen@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Add a new amd_hw_cache_event_ids_f17h assignment structure set
for AMD families 17h and above, since a lot has changed. Specifically:
L1 Data Cache
The data cache access counter remains the same on Family 17h.
For DC misses, PMCx041's definition changes with Family 17h,
so instead we use the L2 cache accesses from L1 data cache
misses counter (PMCx060,umask=0xc8).
For DC hardware prefetch events, Family 17h breaks compatibility
for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware
Prefetch DC Fills."
L1 Instruction Cache
PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward
compatible on Family 17h.
For prefetches, we remove the erroneous PMCx04B assignment which
counts how many software data cache prefetch load instructions were
dispatched.
LL - Last Level Cache
Removing PMCs 7D, 7E, and 7F assignments, as they do not exist
on Family 17h, where the last level cache is L3. L3 counters
can be accessed using the existing AMD Uncore driver.
Data TLB
On Intel machines, data TLB accesses ("dTLB-loads") are assigned
to counters that count load/store instructions retired. This
is inconsistent with instruction TLB accesses, where Intel
implementations report iTLB misses that hit in the STLB.
Ideally, dTLB-loads would count higher level dTLB misses that hit
in lower level TLBs, and dTLB-load-misses would report those
that also missed in those lower-level TLBs, therefore causing
a page table walk. That would be consistent with instruction
TLB operation, remove the redundancy between dTLB-loads and
L1-dcache-loads, and prevent perf from producing artificially
low percentage ratios, i.e. the "0.01%" below:
42,550,869 L1-dcache-loads
41,591,860 dTLB-loads
4,802 dTLB-load-misses # 0.01% of all dTLB cache hits
7,283,682 L1-dcache-stores
7,912,392 dTLB-stores
310 dTLB-store-misses
On AMD Families prior to 17h, the "Data Cache Accesses" counter is
used, which is slightly better than load/store instructions retired,
but still counts in terms of individual load/store operations
instead of TLB operations.
So, for AMD Families 17h and higher, this patch assigns "dTLB-loads"
to a counter for L1 dTLB misses that hit in the L2 dTLB, and
"dTLB-load-misses" to a counter for L1 DTLB misses that caused
L2 DTLB misses and therefore also caused page table walks. This
results in a much more accurate view of data TLB performance:
60,961,781 L1-dcache-loads
4,601 dTLB-loads
963 dTLB-load-misses # 20.93% of all dTLB cache hits
Note that for all AMD families, data loads and stores are combined
in a single accesses counter, so no 'L1-dcache-stores' are reported
separately, and stores are counted with loads in 'L1-dcache-loads'.
Also note that the "% of all dTLB cache hits" string is misleading
because (a) "dTLB cache": although TLBs can be considered caches for
page tables, in this context, it can be misinterpreted as data cache
hits because the figures are similar (at least on Intel), and (b) not
all those loads (technically accesses) technically "hit" at that
hardware level. "% of all dTLB accesses" would be more clear/accurate.
Instruction TLB
On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the
STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in
the STLB and completed a page table walk.
For AMD Family 17h and above, for 'iTLB-loads' we replace the
erroneous instruction cache fetches counter with PMCx084
"L1 ITLB Miss, L2 ITLB Hit".
For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss,
L2 ITLB Miss", but set a 0xff umask because without it the event
does not get counted.
Branch Predictor (BPU)
PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families.
Node Level Events
Family 17h does not have a PMCx0e9 counter, and corresponding counters
have not been made available publicly, so for now, we mark them as
unsupported for Families 17h and above.
Reference:
"Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh"
Released 7/17/2018, Publication #56255, Revision 3.03:
https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf
[ mingo: tidied up the line breaks. ]
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Fixes: e40ed1542d ("perf/x86: Add perf support for AMD family-17h processors")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are two problems with dev_err() here. One: It is not ratelimited.
Two: We don't see which driver tried to transfer something with a
suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
is that we don't see if multiple drivers are trying to transfer while
suspended. They need to be discovered one after the other now. This is
better than a high CPU load because a really broken driver might try to
resend endlessly.
Link: https://bugs.archlinux.org/task/62391
Fixes: 2751541555 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reported-by: skidnik <skidnik@gmail.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: skidnik <skidnik@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Pull PCI fixes from Bjorn Helgaas:
"I apologize for sending these so late in the cycle. We went back and
forth about how to deal with the unexpected logging of intentional
link state changes and finally decided to just config them off by
default.
PCI fixes:
- Stop ignoring "pci=disable_acs_redir" parameter (Logan Gunthorpe)
- Use shared MSI/MSI-X vector for Link Bandwidth Management (Alex
Williamson)
- Add Kconfig option for Link Bandwidth notification messages (Keith
Busch)"
* tag 'pci-v5.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/LINK: Add Kconfig option (default off)
PCI/portdrv: Use shared MSI/MSI-X vector for Bandwidth Management
PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored
Pull MTD fix from Richard Weinberger:
"A single regression fix for the marvell nand driver"
* tag 'mtd/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: marvell: Clean the controller state before each operation
e8303bb7a7 ("PCI/LINK: Report degraded links via link bandwidth
notification") added dmesg logging whenever a link changes speed or width
to a state that is considered degraded. Unfortunately, it cannot
differentiate signal integrity-related link changes from those
intentionally initiated by an endpoint driver, including drivers that may
live in userspace or VMs when making use of vfio-pci. Some GPU drivers
actively manage the link state to save power, which generates a stream of
messages like this:
vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link)
Since we can't distinguish the intentional changes from the signal
integrity issues, leave the reporting turned off by default. Add a Kconfig
option to turn it on if desired.
Fixes: e8303bb7a7 ("PCI/LINK: Report degraded links via link bandwidth notification")
Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
To choose whether to pick the GID from the old (16bit) or new (32bit)
field, we should check if the old gid field is set to 0xffff. Mainline
checks the old *UID* field instead - cut'n'paste from the corresponding
code in ufs_get_inode_uid().
Fixes: 252e211e90
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
syzbot was able to crash host by sending UDP packets with a 0 payload.
TCP does not have this issue since we do not aggregate packets without
payload.
Since dev_gro_receive() sets gso_size based on skb_gro_len(skb)
it seems not worth trying to cope with padded packets.
BUG: KASAN: slab-out-of-bounds in skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
Read of size 16 at addr ffff88808893fff0 by task syz-executor612/7889
CPU: 0 PID: 7889 Comm: syz-executor612 Not tainted 5.1.0-rc7+ #96
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
__asan_report_load16_noabort+0x14/0x20 mm/kasan/generic_report.c:133
skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
udp_gro_receive_segment net/ipv4/udp_offload.c:382 [inline]
call_gro_receive include/linux/netdevice.h:2349 [inline]
udp_gro_receive+0xb61/0xfd0 net/ipv4/udp_offload.c:414
udp4_gro_receive+0x763/0xeb0 net/ipv4/udp_offload.c:478
inet_gro_receive+0xe72/0x1110 net/ipv4/af_inet.c:1510
dev_gro_receive+0x1cd0/0x23c0 net/core/dev.c:5581
napi_gro_frags+0x36b/0xd10 net/core/dev.c:5843
tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
call_write_iter include/linux/fs.h:1866 [inline]
do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
do_iter_write fs/read_write.c:957 [inline]
do_iter_write+0x184/0x610 fs/read_write.c:938
vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
do_writev+0x15e/0x370 fs/read_write.c:1037
__do_sys_writev fs/read_write.c:1110 [inline]
__se_sys_writev fs/read_write.c:1107 [inline]
__x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441cc0
Code: 05 48 3d 01 f0 ff ff 0f 83 9d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 3d 51 93 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 74 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00
RSP: 002b:00007ffe8c716118 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 00007ffe8c716150 RCX: 0000000000441cc0
RDX: 0000000000000001 RSI: 00007ffe8c716170 RDI: 00000000000000f0
RBP: 0000000000000000 R08: 000000000000ffff R09: 0000000000a64668
R10: 0000000020000040 R11: 0000000000000246 R12: 000000000000c2d9
R13: 0000000000402b50 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 5143:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_kmalloc mm/kasan/common.c:497 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3393 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
mm_alloc+0x1d/0xd0 kernel/fork.c:1030
bprm_mm_init fs/exec.c:363 [inline]
__do_execve_file.isra.0+0xaa3/0x23f0 fs/exec.c:1791
do_execveat_common fs/exec.c:1865 [inline]
do_execve fs/exec.c:1882 [inline]
__do_sys_execve fs/exec.c:1958 [inline]
__se_sys_execve fs/exec.c:1953 [inline]
__x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 5351:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
__cache_free mm/slab.c:3499 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3765
__mmdrop+0x238/0x320 kernel/fork.c:677
mmdrop include/linux/sched/mm.h:49 [inline]
finish_task_switch+0x47b/0x780 kernel/sched/core.c:2746
context_switch kernel/sched/core.c:2880 [inline]
__schedule+0x81b/0x1cc0 kernel/sched/core.c:3518
preempt_schedule_irq+0xb5/0x140 kernel/sched/core.c:3745
retint_kernel+0x1b/0x2d
arch_local_irq_restore arch/x86/include/asm/paravirt.h:767 [inline]
kmem_cache_free+0xab/0x260 mm/slab.c:3766
anon_vma_chain_free mm/rmap.c:134 [inline]
unlink_anon_vmas+0x2ba/0x870 mm/rmap.c:401
free_pgtables+0x1af/0x2f0 mm/memory.c:394
exit_mmap+0x2d1/0x530 mm/mmap.c:3144
__mmput kernel/fork.c:1046 [inline]
mmput+0x15f/0x4c0 kernel/fork.c:1067
exec_mmap fs/exec.c:1046 [inline]
flush_old_exec+0x8d9/0x1c20 fs/exec.c:1279
load_elf_binary+0x9bc/0x53f0 fs/binfmt_elf.c:864
search_binary_handler fs/exec.c:1656 [inline]
search_binary_handler+0x17f/0x570 fs/exec.c:1634
exec_binprm fs/exec.c:1698 [inline]
__do_execve_file.isra.0+0x1394/0x23f0 fs/exec.c:1818
do_execveat_common fs/exec.c:1865 [inline]
do_execve fs/exec.c:1882 [inline]
__do_sys_execve fs/exec.c:1958 [inline]
__se_sys_execve fs/exec.c:1953 [inline]
__x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff88808893f7c0
which belongs to the cache mm_struct of size 1496
The buggy address is located 600 bytes to the right of
1496-byte region [ffff88808893f7c0, ffff88808893fd98)
The buggy address belongs to the page:
page:ffffea0002224f80 count:1 mapcount:0 mapping:ffff88821bc40ac0 index:0xffff88808893f7c0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea00025b4f08 ffffea00027b9d08 ffff88821bc40ac0
raw: ffff88808893f7c0 ffff88808893e440 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88808893fe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88808893ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88808893ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff888088940000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888088940080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fixes: e20cf8d3f1 ("udp: implement GRO for plain UDP sockets.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull power supply fixes from Sebastian Reichel:
"Two more fixes for the 5.1 cycle.
One division by zero fix in a specific driver and one core workaround
for bad userspace behaviour from systemd regarding uevents. IMHO this
can be considered to be a userspace bug, but the debug messages are
useless anyways
- cpcap-battery: fix a division by zero
- core: fix systemd issue due to log messages produced by uevent"
* tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
power: supply: cpcap-battery: Fix division by zero
It is a followup after the fix in
commit 9c69a13205 ("route: Avoid crash from dereferencing NULL rt->from")
rt6_do_redirect():
1. NULL checking is needed on rt->from because a parallel
fib6_info delete could happen that sets rt->from to NULL.
(e.g. rt6_remove_exception() and fib6_drop_pcpu_from()).
2. fib6_info_hold() is not enough. Same reason as (1).
Meaning, holding dst->__refcnt cannot ensure
rt->from is not NULL or rt->from->fib6_ref is not 0.
Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc()
is already doing, this patch chooses to extend the rcu section
to keep "from" dereference-able after checking for NULL.
inet6_rtm_getroute():
1. NULL checking is also needed on rt->from for a similar reason.
Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED.
Fixes: a68886a691 ("net/ipv6: Make from in rt6_info rcu protected")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While the endiannes is being handled correctly as indicated by the comment
above the offending line - sparse was unhappy with the missing annotation
as be64_to_cpu() expects a __be64 argument. To mitigate this annotation
all involved variables are changed to a consistent __le64 and the
conversion to uint64_t delayed to the call to rds_cong_map_updated().
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARC fixes from Vineet Gupta:
"A few minor fixes for ARC.
- regression in memset if line size !64
- avoid panic if PAE and IOC"
* tag 'arc-5.1-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: memset: fix build with L1_CACHE_SHIFT != 6
ARC: [hsdk] Make it easier to add PAE40 region to DTB
ARC: PAE40: don't panic and instead turn off hw ioc
The Interrupt Message Number in the PCIe Capabilities register (PCIe r4.0,
sec 7.5.3.2) indicates which MSI/MSI-X vector is shared by interrupts
related to the PCIe Capability, including Link Bandwidth Management and
Link Autonomous Bandwidth Interrupts (Link Control, 7.5.3.7), Command
Completed and Hot-Plug Interrupts (Slot Control, 7.5.3.10), and the PME
Interrupt (Root Control, 7.5.3.12).
pcie_message_numbers() checked whether we want to enable PME or Hot-Plug
interrupts but neglected to check for Link Bandwidth Management, so if we
only wanted the Bandwidth Management interrupts, it decided we didn't need
any vectors at all. Then pcie_port_enable_irq_vec() tried to reallocate
zero vectors, which failed, resulting in fallback to INTx.
On some systems, e.g., an X79-based workstation, that INTx seems broken or
not handled correctly, so we got spurious IRQ16 interrupts for Bandwidth
Management events.
Change pcie_message_numbers() so that if we want Link Bandwidth Management
interrupts, we use the shared MSI/MSI-X vector from the PCIe Capabilities
register.
Fixes: e8303bb7a7 ("PCI/LINK: Report degraded links via link bandwidth notification")
Link: https://lore.kernel.org/lkml/155597243666.19387.1205950870601742062.stgit@gimli.home
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Pull ACPI fix from Rafael Wysocki:
"Revert a recent ACPICA change that caused initialization to fail on
systems with Thunderbolt docking stations connected at the init time"
* tag 'acpi-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPICA: Clear status of GPEs before enabling them"
The 'extent_type' variable does seem to be reliably initialized, but
it's _very_ non-obvious, since there's a "goto next" case that jumps
over the normal initialization. That will then always trigger the
"start >= extent_end" test, which will end up never falling through to
the use of that variable.
But the code is certainly not obvious, and the compiler warning looks
reasonable. Make 'extent_type' an int, and initialize it to an invalid
negative value, which seems to be the common pattern in other places.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The pvlock_page and hvclock_page variables are (as the name implies)
addresses to pages, created by the linker script.
But we declared them as just "extern u8" variables, which _works_, but
now that gcc does some more bounds checking, it causes warnings like
warning: array subscript 1 is outside array bounds of ‘u8[1]’
when we then access more than one byte from those variables.
Fix this by simply making the declaration of the variables match
reality, which makes the compiler happy too.
Signed-off-by: Linus Torvalds <torvalds@-linux-foundation.org>
I'm not sure what made gcc warn about this code now. The 'ret' variable
does end up initialized in all cases, but it's definitely not obvious,
so the compiler is quite reasonable to warn about this.
So just add initialization to make it all much more obvious both to
compilers and to humans.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We already did this for clang, but now gcc has that warning too. Yes,
yes, the address may be unaligned. And that's kind of the point.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously, during fragmentation after forwarding, skb->skb_iif isn't
preserved, i.e. 'ip_copy_metadata' does not copy skb_iif from given
'from' skb.
As a result, ip_do_fragment's creates fragments with zero skb_iif,
leading to inconsistent behavior.
Assume for example an eBPF program attached at tc egress (post
forwarding) that examines __sk_buff->ingress_ifindex:
- the correct iif is observed if forwarding path does not involve
fragmentation/refragmentation
- a bogus iif is observed if forwarding path involves
fragmentation/refragmentatiom
Fix, by preserving skb_iif during 'ip_copy_metadata'.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In io_sqe_buffer_register() we allocate a number of arrays based on the
iov_len from the user-provided iov. While we limit iov_len to SZ_1G,
we can still attempt to allocate arrays exceeding MAX_ORDER.
On a 64-bit system with 4KiB pages, for an iov where iov_base = 0x10 and
iov_len = SZ_1G, we'll calculate that nr_pages = 262145. When we try to
allocate a corresponding array of (16-byte) bio_vecs, requiring 4194320
bytes, which is greater than 4MiB. This results in SLUB warning that
we're trying to allocate greater than MAX_ORDER, and failing the
allocation.
Avoid this by using kvmalloc() for allocations dependent on the
user-provided iov_len. At the same time, fix a leak of imu->bvec when
registration fails.
Full splat from before this patch:
WARNING: CPU: 1 PID: 2314 at mm/page_alloc.c:4595 __alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 2314 Comm: syz-executor326 Not tainted 5.1.0-rc7-dirty #4
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2f0 include/linux/compiler.h:193
show_stack+0x20/0x30 arch/arm64/kernel/traps.c:158
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x110/0x190 lib/dump_stack.c:113
panic+0x384/0x68c kernel/panic.c:214
__warn+0x2bc/0x2c0 kernel/panic.c:571
report_bug+0x228/0x2d8 lib/bug.c:186
bug_handler+0xa0/0x1a0 arch/arm64/kernel/traps.c:956
call_break_hook arch/arm64/kernel/debug-monitors.c:301 [inline]
brk_handler+0x1d4/0x388 arch/arm64/kernel/debug-monitors.c:316
do_debug_exception+0x1a0/0x468 arch/arm64/mm/fault.c:831
el1_dbg+0x18/0x8c
__alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595
alloc_pages_current+0x164/0x278 mm/mempolicy.c:2132
alloc_pages include/linux/gfp.h:509 [inline]
kmalloc_order+0x20/0x50 mm/slab_common.c:1231
kmalloc_order_trace+0x30/0x2b0 mm/slab_common.c:1243
kmalloc_large include/linux/slab.h:480 [inline]
__kmalloc+0x3dc/0x4f0 mm/slub.c:3791
kmalloc_array include/linux/slab.h:670 [inline]
io_sqe_buffer_register fs/io_uring.c:2472 [inline]
__io_uring_register fs/io_uring.c:2962 [inline]
__do_sys_io_uring_register fs/io_uring.c:3008 [inline]
__se_sys_io_uring_register fs/io_uring.c:2990 [inline]
__arm64_sys_io_uring_register+0x9e0/0x1bc8 fs/io_uring.c:2990
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
el0_svc_common.constprop.0+0x148/0x2e0 arch/arm64/kernel/syscall.c:83
el0_svc_handler+0xdc/0x100 arch/arm64/kernel/syscall.c:129
el0_svc+0x8/0xc arch/arm64/kernel/entry.S:948
SMP: stopping secondary CPUs
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
CPU features: 0x002,23000438
Memory Limit: none
Rebooting in 1 seconds..
Fixes: edafccee56 ("io_uring: add support for pre-mapped user IO buffers")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
update_chksum() accesses nskb->sk before it has been set
by complete_skb(), move the init up.
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent commit returns an error if icmp is used as the ip-proto for
IPv6 fib rules. Update fib_rule_tests to send ipv6-icmp instead of icmp.
Fixes: 5e1a99eae8 ("ipv4: Add ICMPv6 support when parse route ipproto")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet sockets in datagram mode take a destination address. Verify its
length before passing to dev_hard_header.
Prior to 2.6.14-rc3, the send code ignored sll_halen. This is
established behavior. Directly compare msg_namelen to dev->addr_len.
Change v1->v2: initialize addr in all paths
Fixes: 6b8d95f179 ("packet: validate address length if non-zero")
Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet send checks that msg_name is at least sizeof sockaddr_ll.
Packet recv must return at least this length, so that its output
can be passed unmodified to packet send.
This ceased to be true since adding support for lladdr longer than
sll_addr. Since, the return value uses true address length.
Always return at least sizeof sockaddr_ll, even if address length
is shorter. Zero the padding bytes.
Change v1->v2: do not overwrite zeroed padding again. use copy_len.
Fixes: 0fb375fb9b ("[AF_PACKET]: Allow for > 8 byte hardware addresses.")
Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 875f1d0769 ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
introduces one extra flag of ITER_BVEC_FLAG_NO_REF, and this flag
is stored into iter->type.
However, iov_iter_type() doesn't consider the new added flag, fix
it by masking this flag in iov_iter_type().
Fixes: 875f1d0769 ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 399254aaf4 ("block: add BIO_NO_PAGE_REF flag") introduces
BIO_NO_PAGE_REF, and once this flag is set for one bio, all pages
in the bio won't be get/put during IO.
However, if one bio is submitted via __blkdev_direct_IO_simple(),
even though BIO_NO_PAGE_REF is set, pages still may be put.
Fixes this issue by avoiding to put pages if BIO_NO_PAGE_REF is
set.
Fixes: 399254aaf4 ("block: add BIO_NO_PAGE_REF flag")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we don't end up actually calling submit in io_sq_wq_submit_work(),
we still need to drop the submit reference to the request. If we
don't, then we can leak the request. This can happen if we race
with ring shutdown while flushing the workqueue for requests that
require use of the mm_struct.
Fixes: e65ef56db4 ("io_uring: use regular request ref counts")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In io_sq_offload_start(), we call cpu_possible() on an unbounded cpu
value from userspace. On v5.1-rc7 on arm64 with
CONFIG_DEBUG_PER_CPU_MAPS, this results in a splat:
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpu_max_bits_warn include/linux/cpumask.h:121 [inline]
There was an attempt to fix this in commit:
917257daa0 ("io_uring: only test SQPOLL cpu after we've verified it")
... by adding a check after the cpu value had been limited to NR_CPU_IDS
using array_index_nospec(). However, this left an unbound check at the
start of the function, for which the warning still fires.
Let's fix this correctly by checking that the cpu value is bound by
nr_cpu_ids before passing it to cpu_possible(). Note that only
nr_cpu_ids of a cpumask are guaranteed to exist at runtime, and
nr_cpu_ids can be significantly smaller than NR_CPUs. For example, an
arm64 defconfig has NR_CPUS=256, while my test VM has 4 vCPUs.
Following the intent from the commit message for 917257daa0, the
check is moved under the SQ_AFF branch, which is the only branch where
the cpu values is consumed. The check is performed before bounding the
value with array_index_nospec() so that we don't silently accept bogus
cpu values from userspace, where array_index_nospec() would force these
values to 0.
I suspect we can remove the array_index_nospec() call entirely, but I've
conservatively left that in place, updated to use nr_cpu_ids to match
the prior check.
Tested on arm64 with the Syzkaller reproducer:
https://syzkaller.appspot.com/bug?extid=cd714a07c6de2bc34293https://syzkaller.appspot.com/x/repro.syz?x=15d8b397200000
Full splat from before this patch:
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpu_max_bits_warn include/linux/cpumask.h:121 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpumask_check include/linux/cpumask.h:128 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpumask_test_cpu include/linux/cpumask.h:344 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_sq_offload_start fs/io_uring.c:2244 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_uring_create fs/io_uring.c:2864 [inline]
WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_uring_setup+0x1108/0x15a0 fs/io_uring.c:2916
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 27601 Comm: syz-executor.0 Not tainted 5.1.0-rc7 #3
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2f0 include/linux/compiler.h:193
show_stack+0x20/0x30 arch/arm64/kernel/traps.c:158
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x110/0x190 lib/dump_stack.c:113
panic+0x384/0x68c kernel/panic.c:214
__warn+0x2bc/0x2c0 kernel/panic.c:571
report_bug+0x228/0x2d8 lib/bug.c:186
bug_handler+0xa0/0x1a0 arch/arm64/kernel/traps.c:956
call_break_hook arch/arm64/kernel/debug-monitors.c:301 [inline]
brk_handler+0x1d4/0x388 arch/arm64/kernel/debug-monitors.c:316
do_debug_exception+0x1a0/0x468 arch/arm64/mm/fault.c:831
el1_dbg+0x18/0x8c
cpu_max_bits_warn include/linux/cpumask.h:121 [inline]
cpumask_check include/linux/cpumask.h:128 [inline]
cpumask_test_cpu include/linux/cpumask.h:344 [inline]
io_sq_offload_start fs/io_uring.c:2244 [inline]
io_uring_create fs/io_uring.c:2864 [inline]
io_uring_setup+0x1108/0x15a0 fs/io_uring.c:2916
__do_sys_io_uring_setup fs/io_uring.c:2929 [inline]
__se_sys_io_uring_setup fs/io_uring.c:2926 [inline]
__arm64_sys_io_uring_setup+0x50/0x70 fs/io_uring.c:2926
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
el0_svc_common.constprop.0+0x148/0x2e0 arch/arm64/kernel/syscall.c:83
el0_svc_handler+0xdc/0x100 arch/arm64/kernel/syscall.c:129
el0_svc+0x8/0xc arch/arm64/kernel/entry.S:948
SMP: stopping secondary CPUs
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
CPU features: 0x002,23000438
Memory Limit: none
Rebooting in 1 seconds..
Fixes: 917257daa0 ("io_uring: only test SQPOLL cpu after we've verified it")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-block@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Simplied the logic
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ying triggered a call trace when doing an asconf testing:
BUG: scheduling while atomic: swapper/12/0/0x10000100
Call Trace:
<IRQ> [<ffffffffa4375904>] dump_stack+0x19/0x1b
[<ffffffffa436fcaf>] __schedule_bug+0x64/0x72
[<ffffffffa437b93a>] __schedule+0x9ba/0xa00
[<ffffffffa3cd5326>] __cond_resched+0x26/0x30
[<ffffffffa437bc4a>] _cond_resched+0x3a/0x50
[<ffffffffa3e22be8>] kmem_cache_alloc_node+0x38/0x200
[<ffffffffa423512d>] __alloc_skb+0x5d/0x2d0
[<ffffffffc0995320>] sctp_packet_transmit+0x610/0xa20 [sctp]
[<ffffffffc098510e>] sctp_outq_flush+0x2ce/0xc00 [sctp]
[<ffffffffc098646c>] sctp_outq_uncork+0x1c/0x20 [sctp]
[<ffffffffc0977338>] sctp_cmd_interpreter.isra.22+0xc8/0x1460 [sctp]
[<ffffffffc0976ad1>] sctp_do_sm+0xe1/0x350 [sctp]
[<ffffffffc099443d>] sctp_primitive_ASCONF+0x3d/0x50 [sctp]
[<ffffffffc0977384>] sctp_cmd_interpreter.isra.22+0x114/0x1460 [sctp]
[<ffffffffc0976ad1>] sctp_do_sm+0xe1/0x350 [sctp]
[<ffffffffc097b3a4>] sctp_assoc_bh_rcv+0xf4/0x1b0 [sctp]
[<ffffffffc09840f1>] sctp_inq_push+0x51/0x70 [sctp]
[<ffffffffc099732b>] sctp_rcv+0xa8b/0xbd0 [sctp]
As it shows, the first sctp_do_sm() running under atomic context (NET_RX
softirq) invoked sctp_primitive_ASCONF() that uses GFP_KERNEL flag later,
and this flag is supposed to be used in non-atomic context only. Besides,
sctp_do_sm() was called recursively, which is not expected.
Vlad tried to fix this recursive call in Commit c078669340 ("sctp: Fix
oops when sending queued ASCONF chunks") by introducing a new command
SCTP_CMD_SEND_NEXT_ASCONF. But it didn't work as this command is still
used in the first sctp_do_sm() call, and sctp_primitive_ASCONF() will
be called in this command again.
To avoid calling sctp_do_sm() recursively, we send the next queued ASCONF
not by sctp_primitive_ASCONF(), but by sctp_sf_do_prm_asconf() in the 1st
sctp_do_sm() directly.
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ReST underline warning:
./Documentation/networking/netdev-FAQ.rst:135: WARNING: Title underline too short.
Q: I made changes to only a few patches in a patch series should I resend only those changed?
--------------------------------------------------------------------------------------------
Fixes: ffa9125373 ("Documentation: networking: Update netdev-FAQ regarding patches")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we only post a cqe if we get an error OUTSIDE of submission.
For submission, we return the error directly through io_uring_enter().
This is a bit awkward for applications, and it makes more sense to
always post a cqe with an error, if the error happens on behalf of an
sqe.
This changes submission behavior a bit. io_uring_enter() returns -ERROR
for an error, and > 0 for number of sqes submitted. Before this change,
if you wanted to submit 8 entries and had an error on the 5th entry,
io_uring_enter() would return 4 (for number of entries successfully
submitted) and rewind the sqring. The application would then have to
peek at the sqring and figure out what was wrong with the head sqe, and
then skip it itself. With this change, we'll return 5 since we did
consume 5 sqes, and the last sqe (with the error) will result in a cqe
being posted with the error.
This makes the logic easier to handle in the application, and it cleans
up the submission part.
Suggested-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull fsnotify fix from Jan Kara:
"A fix of user trigerable NULL pointer dereference syzbot has recently
spotted.
The problem was introduced in this merge window so no CC stable is
needed"
* tag 'fsnotify_for_v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: Fix NULL ptr deref in fanotify_get_fsid()
KVM/ARM fixes for 5.1, take #2:
- Don't try to emulate timers on userspace access
- Fix unaligned huge mappings, again
- Properly reset a vcpu that fails to reset(!)
- Properly retire pending LPIs on reset
- Fix computation of emulated CNTP_TVAL
Enlightened VMCS is only supported on Intel CPUs but the test shouldn't
fail completely.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If a memory slot's size is not a multiple of 64 pages (256K), then
the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
either requires the requested page range to go beyond memslot->npages,
or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect
requires log->num_pages to be both in range and aligned.
To allow this case, allow log->num_pages not to be a multiple of 64 if
it ends exactly on the last page of the slot.
Reported-by: Peter Xu <peterx@redhat.com>
Fixes: 98938aa8ed ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 47c42e6b41 ("KVM: x86: fix handling of role.cr4_pae and rename it
to 'gpte_size'") introduced a regression: 32-bit PAE guests stopped
working. The issue appears to be: when guest switches (enables) PAE we need
to re-initialize MMU context (set context->root_level, do
reset_rsvds_bits_mask(), ...) but init_kvm_tdp_mmu() doesn't do that
because we threw away is_pae(vcpu) flag from mmu role. Restore it to
kvm_mmu_extended_role (as we now don't need it in base role) to fix
the issue.
Fixes: 47c42e6b41 ("KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM's recent bug fix to update %rip after emulating I/O broke userspace
that relied on the previous behavior of incrementing %rip prior to
exiting to userspace. When running a Windows XP guest on AMD hardware,
Qemu may patch "OUT 0x7E" instructions in reaction to the OUT itself.
Because KVM's old behavior was to increment %rip before exiting to
userspace to handle the I/O, Qemu manually adjusted %rip to account for
the OUT instruction.
Arguably this is a userspace bug as KVM requires userspace to re-enter
the kernel to complete instruction emulation before taking any other
actions. That being said, this is a bit of a grey area and breaking
userspace that has worked for many years is bad.
Pre-increment %rip on OUT to port 0x7e before exiting to userspace to
hack around the issue.
Fixes: 45def77ebf ("KVM: x86: update %rip after emulating IO")
Reported-by: Simon Becherer <simon@becherer.de>
Reported-and-tested-by: Iakov Karpov <srid@rkmail.ru>
Reported-by: Gabriele Balducci <balducci@units.it>
Reported-by: Antti Antinoja <reader@fennosys.fi>
Cc: stable@vger.kernel.org
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kalle Valo says:
====================
wireless-drivers fixes for 5.1
Third set of fixes for 5.1.
iwlwifi
* fix an oops when creating debugfs entries
* fix bug when trying to capture debugging info while in rfkill
* prevent potential uninitialized memory dumps into debugging logs
* fix some initialization parameters for AX210 devices
* fix an oops with non-MSIX devices
* fix an oops when we receive a packet with bogus lengths
* fix a bug that prevented 5350 devices from working
* fix a small merge damage from the previous series
mwifiex
* fig regression with resume on SDIO
ath10k
* fix locking problem with crashdump
* fix warnings during suspend and resume
Also note that this pull conflicts with net-next. And I want to emphasie
that it's really net-next, so when you pull this to net tree it should
go without conflicts. Stephen reported the conflict here:
https://lkml.kernel.org/r/20190429115338.5decb50b@canb.auug.org.au
In iwlwifi oddly commit 154d4899e4 adds the IS_ERR_OR_NULL() in
wireless-drivers but commit c9af7528c3 removes the whole check in
wireless-drivers-next. The fix is easy, just drop the whole check for
mvmvif->dbgfs_dir in iwlwifi/mvm/debugfs-vif.c, it's unneeded anyway.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull USB fixes from Greg KH:
"Here are some small USB fixes for a bunch of warnings/errors that the
syzbot has been finding with it's new-found ability to stress-test the
USB layer.
All of these are tiny, but fix real issues, and are marked for stable
as well. All of these have had lots of testing in linux-next as well"
* tag 'usb-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: w1 ds2490: Fix bug caused by improper use of altsetting array
USB: yurex: Fix protection fault after device removal
usb: usbip: fix isoc packet num validation in get_pipe
USB: core: Fix bug caused by duplicate interface PM usage counter
USB: dummy-hcd: Fix failure to give back unlinked URBs
USB: core: Fix unterminated string returned by usb_string()
There is no operation to order with afterwards, and removing the flag is
not critical in any way.
There will always be a "race condition" where the application will
trigger IORING_ENTER_SQ_WAKEUP when it isn't actually needed.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
smp_store_release in io_commit_sqring already orders the store to
dropped before the update to SQ head.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The memory operations before reading cq head are unrelated and we
don't care about their order.
Document that the control dependency in combination with READ_ONCE and
WRITE_ONCE forms a barrier we need.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
wq_has_sleeper has a full barrier internally. The smp_rmb barrier in
io_uring_poll synchronizes with it.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The application reading the CQ ring needs a barrier to pair with the
smp_store_release in io_commit_cqring, not the barrier after it.
Also a write barrier *after* writing something (but not *before*
writing anything interesting) doesn't order anything, so an smp_wmb()
after writing SQ tail is not needed.
Additionally consider reading SQ head and writing CQ tail in the notes.
Also add some clarifications how the various other fields in the ring
buffers are used.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Not all request types set REQ_F_FORCE_NONBLOCK when they needed async
punting; reverse logic instead and set REQ_F_NOWAIT if request mustn't
be punted.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Merged with my previous patch for this.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull selinux fix from Paul Moore:
"One small patch for the stable folks to fix a problem when building
against the latest glibc.
I'll be honest and say that I'm not really thrilled with the idea of
sending this up right now, but Greg is a little annoyed so here I
figured I would at least send this"
* tag 'selinux-pr-20190429' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: use kernel linux/socket.h for genheaders and mdp
If register_snap_client fails in atalk_init,
error code should be set, otherwise it will
triggers NULL pointer dereference while unloading
module.
Fixes: 9804501fa1 ("appletalk: Fix potential NULL pointer dereference in unregister_snap_client")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "fs->location" is a u32 that comes from the user in ethtool_set_rxnfc().
We can't pass unclamped values to test_bit() or it results in an out of
bounds access beyond the end of the bitmap.
Fixes: 7318166cac ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rxrpc_destroy_all_calls(), there are two phases: (1) make sure the
->calls list is empty, emitting error messages if not, and (2) wait for the
RCU cleanup to happen on outstanding calls (ie. ->nr_calls becomes 0).
To avoid taking the call_lock, the function prechecks ->calls and if empty,
it returns to avoid taking the lock - this is wrong, however: it still
needs to go and do the second phase and wait for ->nr_calls to become 0.
Without this, the rxrpc_net struct may get deallocated before we get to the
RCU cleanup for the last calls. This can lead to:
Slab corruption (Not tainted): kmalloc-16k start=ffff88802b178000, len=16384
050: 6b 6b 6b 6b 6b 6b 6b 6b 61 6b 6b 6b 6b 6b 6b 6b kkkkkkkkakkkkkkk
Note the "61" at offset 0x58. This corresponds to the ->nr_calls member of
struct rxrpc_net (which is >9k in size, and thus allocated out of the 16k
slab).
Fix this by flipping the condition on the if-statement, putting the locked
section inside the if-body and dropping the return from there. The
function will then always go on to wait for the RCU cleanup on outstanding
calls.
Fixes: 2baec2c3f8 ("rxrpc: Support network namespacing")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2019-04-30
1) Fix an out-of-bound array accesses in __xfrm_policy_unlink.
From YueHaibing.
2) Reset the secpath on failure in the ESP GRO handlers
to avoid dereferencing an invalid pointer on error.
From Myungho Jung.
3) Add and revert a patch that tried to add rcu annotations
to netns_xfrm. From Su Yanjun.
4) Wait for rcu callbacks before freeing xfrm6_tunnel_spi_kmem.
From Su Yanjun.
5) Fix forgotten vti4 ipip tunnel deregistration.
From Jeremy Sowden:
6) Remove some duplicated log messages in vti4.
From Jeremy Sowden.
7) Don't use IPSEC_PROTO_ANY when flushing states because
this will flush only IPsec portocol speciffic states.
IPPROTO_ROUTING states may remain in the lists when
doing net exit. Fix this by replacing IPSEC_PROTO_ANY
with zero. From Cong Wang.
8) Add length check for UDP encapsulation to fix "Oversized IP packet"
warnings on receive side. From Sabrina Dubroca.
9) Fix xfrm interface lookup when the interface is associated to
a vrf layer 3 master device. From Martin Willi.
10) Reload header pointers after pskb_may_pull() in _decode_session4(),
otherwise we may read from uninitialized memory.
11) Update the documentation about xfrm[46]_gc_thresh, it
is not used anymore after the flowcache removal.
From Nicolas Dichtel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is no route to an IPv6 dest addr, skb_dst(skb) points
to loopback dev in the case of that the IP6CB(skb)->iif is
enslaved to a vrf. This causes Ip6InNoRoutes to be incremented on the
loopback dev. This also causes the lookup to fail on icmpv6_send() and
the dest unreachable to not sent and Ip6OutNoRoutes gets incremented on
the loopback dev.
To reproduce:
* Gateway configuration:
ip link add dev vrf_258 type vrf table 258
ip link set dev enp0s9 master vrf_258
ip addr add 66:1/64 dev enp0s9
ip -6 route add unreachable default metric 8192 table 258
sysctl -w net.ipv6.conf.all.forwarding=1
sysctl -w net.ipv6.conf.enp0s9.forwarding=1
* Sender configuration:
ip addr add 66::2/64 dev enp0s9
ip -6 route add default via 66::1
and ping 67::1 for example from the sender.
Fix this by counting on the original netdev and reset the skb dst to
force a fresh lookup.
v2: Fix typo of destination address in the repro steps.
v3: Simplify the loopback check (per David Ahern) and use reverse
Christmas tree format (per David Miller).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard and Bruno both reported that my commit added a bug,
and Bruno was able to determine the problem came when a segment
wih a FIN packet was coalesced to a prior one in tcp backlog queue.
It turns out the header prediction in tcp_rcv_established()
looks back to TCP headers in the packet, not in the metadata
(aka TCP_SKB_CB(skb)->tcp_flags)
The fast path in tcp_rcv_established() is not supposed to
handle a FIN flag (it does not call tcp_fin())
Therefore we need to make sure to propagate the FIN flag,
so that the coalesced packet does not go through the fast path,
the same than a GRO packet carrying a FIN flag.
While we are at it, make sure we do not coalesce packets with
RST or SYN, or if they do not have ACK set.
Many thanks to Richard and Bruno for pinpointing the bad commit,
and to Richard for providing a first version of the fix.
Fixes: 4f693b55c3 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Reported-by: Bruno Prémont <bonbons@sysophe.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
A request for a flowlabel fails in process or user exclusive mode must
fail if the caller pid or uid does not match. Invert the test.
Previously, the test was unsafe wrt PID recycling, but indeed tested
for inequality: fl1->owner != fl->owner
Fixes: 4f82f45730 ("net ip6 flowlabel: Make owner a union of struct pid* and kuid_t")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Schmidt says:
====================
ieee802154 for net 2019-04-25
An update from ieee802154 for your *net* tree.
Another fix from Kangjie Lu to ensure better checking regmap updates in the
mcr20a driver. Nothing else I have pending for the final release.
If there are any problems let me know.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull seccomp fixes from Kees Cook:
"Syzbot found a use-after-free bug in seccomp due to flags that should
not be allowed to be used together.
Tycho fixed this, I updated the self-tests, and the syzkaller PoC has
been running for several days without triggering KASan (before this
fix, it would reproduce). These patches have also been in -next for
almost a week, just to be sure.
- Add logic for making some seccomp flags exclusive (Tycho)
- Update selftests for exclusivity testing (Kees)"
* tag 'seccomp-v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Make NEW_LISTENER and TSYNC flags exclusive
selftests/seccomp: Prepare for exclusive seccomp flags
This doesn't really do anything, but at least we now parse teh
ZERO_PAGE() address argument so that we'll catch the most obvious errors
in usage next time they'll happen.
See commit 6a5c5d26c4 ("rdma: fix build errors on s390 and MIPS due to
bad ZERO_PAGE use") what happens when we don't have any use of the macro
argument at all.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When compiling genheaders and mdp from a newer host kernel, the
following error happens:
In file included from scripts/selinux/genheaders/genheaders.c:18:
./security/selinux/include/classmap.h:238:2: error: #error New
address family defined, please update secclass_map. #error New
address family defined, please update secclass_map. ^~~~~
make[3]: *** [scripts/Makefile.host:107:
scripts/selinux/genheaders/genheaders] Error 1 make[2]: ***
[scripts/Makefile.build:599: scripts/selinux/genheaders] Error 2
make[1]: *** [scripts/Makefile.build:599: scripts/selinux] Error 2
make[1]: *** Waiting for unfinished jobs....
Instead of relying on the host definition, include linux/socket.h in
classmap.h to have PF_MAX.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: manually merge in mdp.c, subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Johannes Berg says:
====================
* fix use-after-free in mac80211 TXQs
* fix RX STBC byte order
* fix debugfs rename crashing due to ERR_PTR()
* fix missing regulatory notification
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ath10k_mac_vif_chan() always returns an error for the given vif
during system-wide resume which reliably triggers two WARN_ON()s
in ath10k_bss_info_changed() and they are not particularly
useful in that code path, so drop them.
Tested: QCA6174 hw3.2 PCI with WLAN.RM.2.0-00180-QCARMSWPZ-1
Tested: QCA6174 hw3.2 SDIO with WLAN.RMH.4.4.1-00007-QCARMSWP-1
Fixes: cd93b83ad9 ("ath10k: support for multicast rate control")
Fixes: f279294e9e ("ath10k: add support for configuring management packet rate")
Cc: stable@vger.kernel.org
Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Tested-by: Claire Chang <tientzu@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Commit 25733c4e67 ("ath10k: pci: use mutex for diagnostic window CE
polling") introduced a regression where we try to sleep (grab a mutex)
in an atomic context:
[ 233.602619] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:254
[ 233.602626] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 233.602636] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.1.0-rc2 #4
[ 233.602642] Hardware name: Google Scarlet (DT)
[ 233.602647] Call trace:
[ 233.602663] dump_backtrace+0x0/0x11c
[ 233.602672] show_stack+0x20/0x28
[ 233.602681] dump_stack+0x98/0xbc
[ 233.602690] ___might_sleep+0x154/0x16c
[ 233.602696] __might_sleep+0x78/0x88
[ 233.602704] mutex_lock+0x2c/0x5c
[ 233.602717] ath10k_pci_diag_read_mem+0x68/0x21c [ath10k_pci]
[ 233.602725] ath10k_pci_diag_read32+0x48/0x74 [ath10k_pci]
[ 233.602733] ath10k_pci_dump_registers+0x5c/0x16c [ath10k_pci]
[ 233.602741] ath10k_pci_fw_crashed_dump+0xb8/0x548 [ath10k_pci]
[ 233.602749] ath10k_pci_napi_poll+0x60/0x128 [ath10k_pci]
[ 233.602757] net_rx_action+0x140/0x388
[ 233.602766] __do_softirq+0x1b0/0x35c
[...]
ath10k_pci_fw_crashed_dump() is called from NAPI contexts, and firmware
memory dumps are retrieved using the diag memory interface.
A simple reproduction case is to run this on QCA6174A /
WLAN.RM.4.4.1-00132-QCARMSWP-1, which happens to be a way to b0rk the
firmware:
dd if=/sys/kernel/debug/ieee80211/phy0/ath10k/mem_value bs=4K count=1
of=/dev/null
(NB: simulated firmware crashes, via debugfs, don't trigger firmware
dumps.)
The fix is to move the crash-dump into a workqueue context, and avoid
relying on 'data_lock' for most mutual exclusion. We only keep using it
here for protecting 'fw_crash_counter', while the rest of the coredump
buffers are protected by a new 'dump_mutex'.
I've tested the above with simulated firmware crashes (debugfs 'reset'
file), real firmware crashes (the 'dd' command above), and a variety of
reboot and suspend/resume configurations on QCA6174A.
Reported here:
http://lkml.kernel.org/linux-wireless/20190325202706.GA68720@google.com
Fixes: 25733c4e67 ("ath10k: pci: use mutex for diagnostic window CE polling")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
KVM_GET_DIRTY_LOG is implemented by all architectures, not just x86,
and KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is additionally implemented by
arm, arm64, and mips.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
file_remove_privs() might be called for non-regular files, e.g.
blkdev inode. There is no reason to do its job on things
like blkdev inodes, pipes, or cdevs. Hence, abort if
file does not refer to a regular inode.
AV: more to the point, for devices there might be any number of
inodes refering to given device. Which one to strip the permissions
from, even if that made any sense in the first place? All of them
will be observed with contents modified, after all.
Found by LockDoc (Alexander Lochmann, Horst Schirmeier and Olaf
Spinczyk)
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de>
Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It has no business being there, it's checked by relevant ->get_tree()
as it is *and* it returns the wrong error for no reason whatsoever.
Fixes: f3a09c9201 "introduce fs_context methods"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fanotify_get_fsid() is reading mark->connector->fsid under srcu. It can
happen that it sees mark not fully initialized or mark that is already
detached from the object list. In these cases mark->connector
can be NULL leading to NULL ptr dereference. Fix the problem by
being careful when reading mark->connector and check it for being NULL.
Also use WRITE_ONCE when writing the mark just to prevent compiler from
doing something stupid.
Reported-by: syzbot+15927486a4f1bfcbaf91@syzkaller.appspotmail.com
Fixes: 77115225ac ("fanotify: cache fsid in fsnotify_mark_connector")
Signed-off-by: Jan Kara <jack@suse.cz>
The line6 driver uses a lot of USB buffers off of the stack, which is
not allowed on many systems, causing the driver to crash on some of
them. Fix this up by dynamically allocating the buffers with kmalloc()
which allows for proper DMA-able memory.
Reported-by: Christo Gouws <gouws.christo@gmail.com>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Christo Gouws <gouws.christo@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fourth batch of patches intended for v5.1
* Fix an oops when we receive a packet with bogus lengths;
* Fix a bug that prevented 5350 devices from working;
* Fix a small merge damage from the previous series;
When I rebased Greg's patch, I accidentally left the old if block that
was already there. Remove it.
Fixes: 154d4899e4 ("iwlwifi: mvm: properly check debugfs dentry before using it")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We introduced a bug that prevented this old device from
working. The driver would simply not be able to complete
the INIT flow while spewing this warning:
CSR addresses aren't configured
WARNING: CPU: 0 PID: 819 at drivers/net/wireless/intel/iwlwifi/pcie/drv.c:917
iwl_pci_probe+0x160/0x1e0 [iwlwifi]
Cc: stable@vger.kernel.org # v4.18+
Fixes: a8cbb46f83 ("iwlwifi: allow different csr flags for different device families")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Fixes: c8f1b51e50 ("iwlwifi: allow different csr flags for different device families")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
We don't check for the validity of the lengths in the packet received
from the firmware. If the MPDU length received in the rx descriptor
is too short to contain the header length and the crypt length
together, we may end up trying to copy a negative number of bytes
(headlen - hdrlen < 0) which will underflow and cause us to try to
copy a huge amount of data. This causes oopses such as this one:
BUG: unable to handle kernel paging request at ffff896be2970000
PGD 5e201067 P4D 5e201067 PUD 5e205067 PMD 16110d063 PTE 8000000162970161
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 1824 Comm: irq/134-iwlwifi Not tainted 4.19.33-04308-geea41cf4930f #1
Hardware name: [...]
RIP: 0010:memcpy_erms+0x6/0x10
Code: 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3
0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
RSP: 0018:ffffa4630196fc60 EFLAGS: 00010287
RAX: ffff896be2924618 RBX: ffff896bc8ecc600 RCX: 00000000fffb4610
RDX: 00000000fffffff8 RSI: ffff896a835e2a38 RDI: ffff896be2970000
RBP: ffffa4630196fd30 R08: ffff896bc8ecc600 R09: ffff896a83597000
R10: ffff896bd6998400 R11: 000000000200407f R12: ffff896a83597050
R13: 00000000fffffff8 R14: 0000000000000010 R15: ffff896a83597038
FS: 0000000000000000(0000) GS:ffff896be8280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff896be2970000 CR3: 000000005dc12002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
iwl_mvm_rx_mpdu_mq+0xb51/0x121b [iwlmvm]
iwl_pcie_rx_handle+0x58c/0xa89 [iwlwifi]
iwl_pcie_irq_rx_msix_handler+0xd9/0x12a [iwlwifi]
irq_thread_fn+0x24/0x49
irq_thread+0xb0/0x122
kthread+0x138/0x140
ret_from_fork+0x1f/0x40
Fix that by checking the lengths for correctness and trigger a warning
to show that we have received wrong data.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Currently, the UDP GRO code path does bad things on some edge
conditions - Aggregation can happen even on packet with different
lengths.
Fix the above by rewriting the 'complete' condition for GRO
packets. While at it, note explicitly that we allow merging the
first packet per burst below gso_size.
Reported-by: Sean Tong <seantong114@gmail.com>
Fixes: e20cf8d3f1 ("udp: implement GRO for plain UDP sockets.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
net/tls: fix data copies in tls_device_reencrypt()
This series fixes the tls_device_reencrypt() which is broken
if record starts in the frags of the message skb.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fragments may contain data from other records so we have to account
for that when we calculate the destination and max length of copy we
can perform. Note that 'offset' is the offset within the message,
so it can't be passed as offset within the frag..
Here skb_store_bits() would have realised the call is wrong and
simply not copy data.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no guarantee the record starts before the skb frags.
If we don't check for this condition copy amount will get
negative, leading to reads and writes to random memory locations.
Familiar hilarity ensues.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Misc. bug fixes.
6 miscellaneous bug fixes covering several issues in error code paths,
a setup issue for statistics DMA, and an improvement for setting up
multicast address filters.
Please queue these for stable as well.
Patch #5 (bnxt_en: Fix statistics context reservation logic) is for the
most recent 5.0 stable only. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_rx_pkt(), if the driver encounters BD errors, it will recycle
the buffers and jump to the end where the uninitailized variable "len"
is referenced. Fix it by adding a new jump label that will skip
the length update. This is the most correct fix since the length
may not be valid when we get this type of error.
Fixes: 6a8788f256 ("bnxt_en: add support for software dynamic interrupt moderation")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In an earlier commit that fixes the number of stats contexts to
reserve for the RDMA driver, we added a function parameter to pass in
the number of stats contexts to all the relevant functions. The passed
in parameter should have been used to set the enables field of the
firmware message.
Fixes: 780baad44f ("bnxt_en: Reserve 1 stat_ctx for RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If driver determines that extended TX port statistics are not supported
or allocation of the data structure fails, make sure to pass 0 TX stats
size to firmware to disable it. The firmware returned TX stats size should
also be set to 0 for consistency. This will prevent
bnxt_get_ethtool_stats() from accessing the NULL TX stats pointer in
case there is mismatch between firmware and driver.
Fixes: 36e53349b6 ("bnxt_en: Add additional extended port statistics.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we encounter errors during open and proceed to clean up,
bnxt_hwrm_ring_free() may crash if the rings we try to free have never
been allocated. bnxt_cp_ring_for_rx() or bnxt_cp_ring_for_tx()
may reference pointers that have not been allocated.
Fix it by checking for valid fw_ring_id first before calling
bnxt_cp_ring_for_rx() or bnxt_cp_ring_for_tx().
Fixes: 2c61d2117e ("bnxt_en: Add helper functions to get firmware CP ring ID.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the bnxt_init_one() error path, short FW command request memory
is not freed. This patch fixes it.
Fixes: e605db801b ("bnxt_en: Support for Short Firmware Message")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver builds a list of multicast addresses and sends it to the
firmware when the driver's ndo_set_rx_mode() is called. In rare
cases, the firmware can fail this call if internal resources to
add multicast addresses are exhausted. In that case, we should
try the call again by setting the ALL_MCAST flag which is more
guaranteed to succeed.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The not-so-recent change to move VMX's VM-Exit handing to a dedicated
"function" unintentionally exposed KVM to a speculative attack from the
guest by executing a RET prior to stuffing the RSB. Make RSB stuffing
happen immediately after VM-Exit, before any unpaired returns.
Alternatively, the VM-Exit path could postpone full RSB stuffing until
its current location by stuffing the RSB only as needed, or by avoiding
returns in the VM-Exit path entirely, but both alternatives are beyond
ugly since vmx_vmexit() has multiple indirect callers (by way of
vmx_vmenter()). And putting the RSB stuffing immediately after VM-Exit
makes it much less likely to be re-broken in the future.
Note, the cost of PUSH/POP could be avoided in the normal flow by
pairing the PUSH RAX with the POP RAX in __vmx_vcpu_run() and adding an
a POP to nested_vmx_check_vmentry_hw(), but such a weird/subtle
dependency is likely to cause problems in the long run, and PUSH/POP
will take all of a few cycles, which is peanuts compared to the number
of cycles required to fill the RSB.
Fixes: 453eafbe65 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
marvell_get_sset_count() returns how many statistics counters there
are. If the PHY supports fibre, there are 3, otherwise two.
marvell_get_strings() does not make this distinction, and always
returns 3 strings. This then often results in writing past the end
of the buffer for the strings.
Fixes: 2170fef78a ("Marvell phy: add field to get errors from fiber link.")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When allocating the next family->id it makes more sense to use
idr_alloc_cyclic to avoid re-using a previously used family->id as much
as possible.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Canonical way to fetch sk_user_data from an encap_rcv() handler called
from UDP stack in rcu protected section is to use rcu_dereference_sk_user_data(),
otherwise compiler might read it multiple times.
Fixes: d00fa9adc5 ("il2tp: fix races with tunnel socket close")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-04-25
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) the bpf verifier fix to properly mark registers in all stack frames, from Paul.
2) preempt_enable_no_resched->preempt_enable fix, from Peter.
3) other misc fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The first test case, for pointer null checks, is equivalent to the
following pseudo-code. It checks that the verifier does not complain on
line 6 and recognizes that ptr isn't null.
1: ptr = bpf_map_lookup_elem(map, &key);
2: ret = subprog(ptr) {
3: return ptr != NULL;
4: }
5: if (ret)
6: value = *ptr;
The second test case, for packet bound checks, is equivalent to the
following pseudo-code. It checks that the verifier does not complain on
line 7 and recognizes that the packet is at least 1 byte long.
1: pkt_end = ctx.pkt_end;
2: ptr = ctx.pkt + 8;
3: ret = subprog(ptr, pkt_end) {
4: return ptr <= pkt_end;
5: }
6: if (ret)
7: value = *(u8 *)ctx.pkt;
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In case of a null check on a pointer inside a subprog, we should mark all
registers with this pointer as either safe or unknown, in both the current
and previous frames. Currently, only spilled registers and registers in
the current frame are marked. Packet bound checks in subprogs have the
same issue. This patch fixes it to mark registers in previous frames as
well.
A good reproducer for null checks looks as follow:
1: ptr = bpf_map_lookup_elem(map, &key);
2: ret = subprog(ptr) {
3: return ptr != NULL;
4: }
5: if (ret)
6: value = *ptr;
With the above, the verifier will complain on line 6 because it sees ptr
as map_value_or_null despite the null check in subprog 1.
Note that this patch fixes another resulting bug when using
bpf_sk_release():
1: sk = bpf_sk_lookup_tcp(...);
2: subprog(sk) {
3: if (sk)
4: bpf_sk_release(sk);
5: }
6: if (!sk)
7: return 0;
8: return 1;
In the above, mark_ptr_or_null_regs will warn on line 6 because it will
try to free the reference state, even though it was already freed on
line 3.
Fixes: f4d7e40a5b ("bpf: introduce function calls (verification)")
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
"bpftool map create" has an infinite loop on "while (argc)". The error
case is missing.
Symptoms: when forgetting to type the keyword 'type' in front of 'hash':
$ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128
(infinite loop, taking all the CPU)
^C
After the patch:
$ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128
Error: unknown arg hash
Fixes: 0b592b5a01 ("tools: bpftool: add map create command")
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix sparse warning:
arch/mips/net/ebpf_jit.c:196:5: warning:
symbol 'ebpf_to_mips_reg' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As the comment notes, the return codes for TSYNC and NEW_LISTENER
conflict, because they both return positive values, one in the case of
success and one in the case of error. So, let's disallow both of these
flags together.
While this is technically a userspace break, all the users I know
of are still waiting on me to land this feature in libseccomp, so I
think it'll be safe. Also, at present my use case doesn't require
TSYNC at all, so this isn't a big deal to disallow. If someone
wanted to support this, a path forward would be to add a new flag like
TSYNC_AND_LISTENER_YES_I_UNDERSTAND_THAT_TSYNC_WILL_JUST_RETURN_EAGAIN,
but the use cases are so different I don't see it really happening.
Finally, it's worth noting that this does actually fix a UAF issue: at the
end of seccomp_set_mode_filter(), we have:
if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {
if (ret < 0) {
listener_f->private_data = NULL;
fput(listener_f);
put_unused_fd(listener);
} else {
fd_install(listener, listener_f);
ret = listener;
}
}
out_free:
seccomp_filter_free(prepared);
But if ret > 0 because TSYNC raced, we'll install the listener fd and then
free the filter out from underneath it, causing a UAF when the task closes
it or dies. This patch also switches the condition to be simply if (ret),
so that if someone does add the flag mentioned above, they won't have to
remember to fix this too.
Reported-by: syzbot+b562969adb2e04af3442@syzkaller.appspotmail.com
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
CC: stable@vger.kernel.org # v5.0+
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Fix a similar endless event loop as was done in commit
8dcf32175b ("i2c: prevent endless uevent loop with
CONFIG_I2C_DEBUG_CORE"):
The culprit is the dev_dbg printk in the i2c uevent handler. If
this is activated (for instance by CONFIG_I2C_DEBUG_CORE) it results
in an endless loop with systemd-journald.
This happens if user-space scans the system log and reads the uevent
file to get information about a newly created device, which seems
fair use to me. Unfortunately reading the "uevent" file uses the
same function that runs for creating the uevent for a new device,
generating the next syslog entry
Both CONFIG_I2C_DEBUG_CORE and CONFIG_POWER_SUPPLY_DEBUG were reported
in https://bugs.freedesktop.org/show_bug.cgi?id=76886 but only former
seems to have been fixed. Drop debug prints as it was done in I2C
subsystem to resolve the issue.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Since the migration of the driver to stop using the legacy
->select_chip() hook, there is nothing deselecting the target anymore,
thus the selection is not forced at the next access. Ensure the ND_RUN
bit and the interrupts are always in a clean state.
Cc: Daniel Mack <daniel@zonque.org>
Cc: stable@vger.kernel.org
Fixes: b25251414f ("mtd: rawnand: marvell: Stop implementing ->select_chip()")
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Compilation fails if any of undeclared clk_set_*() functions are in use
and CONFIG_HAVE_CLK=n.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
When a VCPU never runs before a guest exists, but we set timer registers
up via ioctls, the associated hrtimer might never get cancelled.
Since we moved vcpu_load/put into the arch-specific implementations and
only have load/put for KVM_RUN, we won't ever have a scheduled hrtimer
for emulating a timer when modifying the timer state via an ioctl from
user space. All we need to do is make sure that we pick up the right
state when we load the timer state next time userspace calls KVM_RUN
again.
We also do not need to worry about this interacting with the bg_timer,
because if we were in WFI from the guest, and somehow ended up in a
kvm_arm_timer_set_reg, it means that:
1. the VCPU thread has received a signal,
2. we have called vcpu_load when being scheduled in again,
3. we have called vcpu_put when we returned to userspace for it to issue
another ioctl
And therefore will not have a bg_timer programmed and the event is
treated as a spurious wakeup from WFI if userspace decides to run the
vcpu again even if there are not virtual interrupts.
This fixes stray virtual timer interrupts triggered by an expiring
hrtimer, which happens after a failed live migration, for instance.
Fixes: bee038a674 ("KVM: arm/arm64: Rework the timer code to use a timer_map")
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Reported-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The commit fc3a2fcaa1 ("mwifiex: use atomic bitops to represent
adapter status variables") had a fairly straightforward bug in it. It
contained this bit of diff:
- if (!adapter->is_suspended) {
+ if (test_bit(MWIFIEX_IS_SUSPENDED, &adapter->work_flags)) {
As you can see the patch missed the "!" when converting to the atomic
bitops. This meant that the resume hasn't done anything at all since
that commit landed and suspend/resume for mwifiex SDIO cards has been
totally broken.
After fixing this mwifiex suspend/resume appears to work again, at
least with the simple testing I've done.
Fixes: fc3a2fcaa1 ("mwifiex: use atomic bitops to represent adapter status variables")
Cc: <stable@vger.kernel.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
A failed KVM_ARM_VCPU_INIT should not set the vcpu target,
as the vcpu target is used by kvm_vcpu_initialized() to
determine if other vcpu ioctls may proceed. We need to set
the target before calling kvm_reset_vcpu(), but if that call
fails, we should then unset it and clear the feature bitmap
while we're at it.
Signed-off-by: Andrew Jones <drjones@redhat.com>
[maz: Simplified patch, completed commit message]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The syzkaller USB fuzzer spotted a slab-out-of-bounds bug in the
ds2490 driver. This bug is caused by improper use of the altsetting
array in the usb_interface structure (the array's entries are not
always stored in numerical order), combined with a naive assumption
that all interfaces probed by the driver will have the expected number
of altsettings.
The bug can be fixed by replacing references to the possibly
non-existent intf->altsetting[alt] entry with the guaranteed-to-exist
intf->cur_altsetting entry.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+d65f673b847a1a96cdba@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The syzkaller USB fuzzer found a general-protection-fault bug in the
yurex driver. The fault occurs when a device has been unplugged; the
driver's interrupt-URB handler logs an error message referring to the
device by name, after the device has been unregistered and its name
deallocated.
This problem is caused by the fact that the interrupt URB isn't
cancelled until the driver's private data structure is released, which
can happen long after the device is gone. The cure is to make sure
that the interrupt URB is killed before yurex_disconnect() returns;
this is exactly the sort of thing that usb_poison_urb() was meant for.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+2eb9121678bdb36e6d57@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change the validation of number_of_packets in get_pipe to compare the
number of packets to a fixed maximum number of packets allowed, set to
be 1024. This number was chosen due to it being used by other drivers as
well, for example drivers/usb/host/uhci-q.c
Background/reason:
The get_pipe function in stub_rx.c validates the number of packets in
isochronous mode and aborts with an error if that number is too large,
in order to prevent malicious input from possibly triggering large
memory allocations. This was previously done by checking whether
pdu->u.cmd_submit.number_of_packets is bigger than the number of packets
that would be needed for pdu->u.cmd_submit.transfer_buffer_length bytes
if all except possibly the last packet had maximum length, given by
usb_endpoint_maxp(epd) * usb_endpoint_maxp_mult(epd). This leads to an
error if URBs with packets shorter than the maximum possible length are
submitted, which is allowed according to
Documentation/driver-api/usb/URB.rst and occurs for example with the
snd-usb-audio driver.
Fixes: c6688ef9f2 ("usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input")
Signed-off-by: Malte Leip <malte@leip.net>
Cc: stable <stable@vger.kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
regmap_update_bits could fail and deserves a check.
The patch adds the checks and if it fails, returns its error
code upstream.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Third batch of iwlwifi fixes intended for v5.1
* Fix an oops when creating debugfs entries;
* Fix bug when trying to capture debugging info while in rfkill;
* Prevent potential uninitialized memory dumps into debugging logs;
* Fix some initialization parameters for AX210 devices;
* Fix an oops with non-MSIX devices;
Add two Dell platform for headset mode.
[ Note: this is a further correction / addition of the previous
pin-based quirks for Dell machines; another entry for ALC236 with
the d-mic pin 0x12 and an entry for ALC295 -- tiwai ]
Fixes: b26e36b7ef ("ALSA: hda/realtek - add two more pin configuration sets to quirk table")
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We need to dereference the directory to get its parent to
be able to rename it, so it's clearly not safe to try to
do this with ERR_PTR() pointers. Skip in this case.
It seems that this is most likely what was causing the
report by syzbot, but I'm not entirely sure as it didn't
come with a reproducer this time.
Cc: stable@vger.kernel.org
Reported-by: syzbot+4ece1a28b8f4730547c9@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit c82c06ce43d3("cfg80211: Notify all User Hints To self managed wiphys")
notified all new user hints to self managed wiphy's after device registration.
But it didn't do this for anything other than cell base hints done before
registration.
This needs to be done during wiphy registration of a self managed device also,
so that the previous user settings are retained.
Fixes: c82c06ce43 ("cfg80211: Notify all User Hints To self managed wiphys")
Signed-off-by: Sriram R <srirrama@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The txq of vif is added to active_txqs list for ATF TXQ scheduling
in the function ieee80211_queue_skb(), but it was not properly removed
before freeing the txq object. It was causing use after free of the txq
objects from the active_txqs list, result was kernel panic
due to invalid memory access.
Fix kernel invalid memory access by properly removing txq object
from active_txqs list before free the object.
Signed-off-by: Bhagavathi Perumal S <bperumal@codeaurora.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull Allwinner clk fixes from Maxime Ripard:
- Some fixes for odd cases of the NKMP clocks
* tag 'clk-fixes-for-5.1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi-ng: nkmp: Explain why zero width check is needed
clk: sunxi-ng: nkmp: Avoid GENMASK(-1, 0)
The syzkaller fuzzer reported a bug in the USB hub driver which turned
out to be caused by a negative runtime-PM usage counter. This allowed
a hub to be runtime suspended at a time when the driver did not expect
it. The symptom is a WARNING issued because the hub's status URB is
submitted while it is already active:
URB 0000000031fb463e submitted while active
WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363
The negative runtime-PM usage count was caused by an unfortunate
design decision made when runtime PM was first implemented for USB.
At that time, USB class drivers were allowed to unbind from their
interfaces without balancing the usage counter (i.e., leaving it with
a positive count). The core code would take care of setting the
counter back to 0 before allowing another driver to bind to the
interface.
Later on when runtime PM was implemented for the entire kernel, the
opposite decision was made: Drivers were required to balance their
runtime-PM get and put calls. In order to maintain backward
compatibility, however, the USB subsystem adapted to the new
implementation by keeping an independent usage counter for each
interface and using it to automatically adjust the normal usage
counter back to 0 whenever a driver was unbound.
This approach involves duplicating information, but what is worse, it
doesn't work properly in cases where a USB class driver delays
decrementing the usage counter until after the driver's disconnect()
routine has returned and the counter has been adjusted back to 0.
Doing so would cause the usage counter to become negative. There's
even a warning about this in the USB power management documentation!
As it happens, this is exactly what the hub driver does. The
kick_hub_wq() routine increments the runtime-PM usage counter, and the
corresponding decrement is carried out by hub_event() in the context
of the hub_wq work-queue thread. This work routine may sometimes run
after the driver has been unbound from its interface, and when it does
it causes the usage counter to go negative.
It is not possible for hub_disconnect() to wait for a pending
hub_event() call to finish, because hub_disconnect() is called with
the device lock held and hub_event() acquires that lock. The only
feasible fix is to reverse the original design decision: remove the
duplicate interface-specific usage counter and require USB drivers to
balance their runtime PM gets and puts. As far as I know, all
existing drivers currently do this.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+7634edaea4d0b341c625@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The syzkaller USB fuzzer identified a failure mode in which dummy-hcd
would never give back an unlinked URB. This causes usb_kill_urb() to
hang, leading to WARNINGs and unkillable threads.
In dummy-hcd, all URBs are given back by the dummy_timer() routine as
it scans through the list of pending URBS. Failure to give back URBs
can be caused by failure to start or early exit from the scanning
loop. The code currently has two such pathways: One is triggered when
an unsupported bus transfer speed is encountered, and the other by
exhausting the simulated bandwidth for USB transfers during a frame.
This patch removes those two paths, thereby allowing all unlinked URBs
to be given back in a timely manner. It adds a check for the bus
speed when the gadget first starts running, so that dummy_timer() will
never thereafter encounter an unsupported speed. And it prevents the
loop from exiting as soon as the total bandwidth has been used up (the
scanning loop continues, giving back unlinked URBs as they are found,
but not transferring any more data).
Thanks to Andrey Konovalov for manually running the syzkaller fuzzer
to help track down the source of the bug.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+d919b0f29d7b5a4994b9@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We recently introduced a change to support devm clk lookups. That change
introduced a code-path that used clk_find() without holding the
'clocks_mutex'. Unfortunately, clk_find() iterates over the 'clocks'
list and so we need to prevent the list from being modified at the same
time. Do this by holding the mutex and checking to make sure it's held
while iterating the list.
Note, we don't really care if the lookup is freed after we find it with
clk_find() because we're just doing a pointer comparison, but if we did
care we would need to keep holding the mutex while we dereference the
clk_lookup pointer.
Fixes: 3eee6c7d11 ("clkdev: add managed clkdev lookup registration")
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Acked-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Tested-by: Jeffrey Hugo <jhugo@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event. In
the event that the adjustments are too aggressive, i.e. the timer fires
earlier than the guest expects, KVM busy waits immediately prior to
entering the guest.
Currently, KVM manually converts the delay from nanoseconds to clock
cycles. But, the conversion is done in the guest's time domain, while
the delay occurs in the host's time domain. This is perfectly ok when
the guest and host are using the same TSC ratio, but if the guest is
using a different ratio then the delay may not be accurate and could
wait too little or too long.
When the guest is not using the host's ratio, convert the delay from
guest clock cycles to host nanoseconds and use ndelay() instead of
__delay() to provide more accurate timing. Because converting to
nanoseconds is relatively expensive, e.g. requires division and more
multiplication ops, continue using __delay() directly when guest and
host TSCs are running at the same ratio.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The introduction of adaptive tuning of lapic timer advancement did not
allow for the scenario where userspace would want to disable adaptive
tuning but still employ timer advancement, e.g. for testing purposes or
to handle a use case where adaptive tuning is unable to settle on a
suitable time. This is epecially pertinent now that KVM places a hard
threshold on the maximum advancment time.
Rework the timer semantics to accept signed values, with a value of '-1'
being interpreted as "use adaptive tuning with KVM's internal default",
and any other value being used as an explicit advancement time, e.g. a
time of '0' effectively disables advancement.
Note, this does not completely restore the original behavior of
lapic_timer_advance_ns. Prior to tracking the advancement per vCPU,
which is necessary to support autotuning, userspace could adjust
lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the
module params are snapshotted at vCPU creation, i.e. applying a new
advancement effectively requires restarting a VM.
Dynamically updating a running vCPU is possible, e.g. a helper could be
added to retrieve the desired delay, choosing between the global module
param and the per-VCPU value depending on whether or not auto-tuning is
(globally) enabled, but introduces a great deal of complexity. The
wrapper itself is not complex, but understanding and documenting the
effects of dynamically toggling auto-tuning and/or adjusting the timer
advancement is nigh impossible since the behavior would be dependent on
KVM's implementation as well as compiler optimizations. In other words,
providing stable behavior would require extremely careful consideration
now and in the future.
Given that the expected use of a manually-tuned timer advancement is to
"tune once, run many", use the vastly simpler approach of recognizing
changes to the module params only when creating a new vCPU.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Automatically adjusting the globally-shared timer advancement could
corrupt the timer, e.g. if multiple vCPUs are concurrently adjusting
the advancement value. That could be partially fixed by using a local
variable for the arithmetic, but it would still be susceptible to a
race when setting timer_advance_adjust_done.
And because virtual_tsc_khz and tsc_scaling_ratio are per-vCPU, the
correct calibration for a given vCPU may not apply to all vCPUs.
Furthermore, lapic_timer_advance_ns is marked __read_mostly, which is
effectively violated when finding a stable advancement takes an extended
amount of timer.
Opportunistically change the definition of lapic_timer_advance_ns to
a u32 so that it matches the style of struct kvm_timer. Explicitly
pass the param to kvm_create_lapic() so that it doesn't have to be
exposed to lapic.c, thus reducing the probability of unintentionally
using the global value instead of the per-vCPU value.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event. Now
that the timer advancement is automatically tuned during runtime, it's
effectively unbounded by default, e.g. if KVM is running as L1 the
advancement can measure in hundreds of milliseconds.
Disable timer advancement if adaptive tuning yields an advancement of
more than 5000ns, as large advancements can break reasonable assumptions
of the guest, e.g. that a timer configured to fire after 1ms won't
arrive on the next instruction. Although KVM busy waits to mitigate the
case of a timer event arriving too early, complications can arise when
shifting the interrupt too far, e.g. kvm-unit-test's vmx.interrupt test
will fail when its "host" exits on interrupts as KVM may inject the INTR
before the guest executes STI+HLT. Arguably the unit test is "broken"
in the sense that delaying a timer interrupt by 1ms doesn't technically
guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
assumption that KVM should support.
Furthermore, an unbounded advancement also effectively unbounds the time
spent busy waiting, e.g. if the guest programs a timer with a very large
delay.
5000ns is a somewhat arbitrary threshold. When running on bare metal,
which is the intended use case, timer advancement is expected to be in
the general vicinity of 1000ns. 5000ns is high enough that false
positives are unlikely, while not being so high as to negatively affect
the host's performance/stability.
Note, a future patch will enable userspace to disable KVM's adaptive
tuning, which will allow priveleged userspace will to specifying an
advancement value in excess of this arbitrary threshold in order to
satisfy an abnormal use case.
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Fixes: 3b8a5df6c4 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was reported that with some special Multi Processor Group configuration,
e.g:
bcdedit.exe /set groupsize 1
bcdedit.exe /set maxgroup on
bcdedit.exe /set groupaware on
for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
is in use.
Tracing kvm_hv_flush_tlb immediately reveals the issue:
kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
We don't flush anything and apparently it's not what Windows expects.
TLFS doesn't say anything about such requests and newer Windows versions
seem to be unaffected. This all feels like a WS2012 bug, which is, however,
easy to workaround in KVM: let's flush everything when we see an empty
flush request, over-flushing doesn't hurt.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If guest sets MSR_IA32_TSCDEADLINE to value such that in host
time-domain it's shorter than lapic_timer_advance_ns, we can
reach a case that we call hrtimer_start() with expiration time set at
the past.
Because lapic_timer.timer is init with HRTIMER_MODE_ABS_PINNED, it
is not allowed to run in softirq and therefore will never expire.
To avoid such a scenario, verify that deadline expiration time is set on
host time-domain further than (now + lapic_timer_advance_ns).
A future patch can also consider adding a min_timer_deadline_ns module parameter,
similar to min_timer_period_us to avoid races that amount of ns it takes
to run logic could still call hrtimer_start() with expiration timer set
at the past.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Specifically, max_tfd_queue_size should be 0x10000 like in
22560 family and not 0x100 like in 22000 family.
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
debugfs can now report an error code if something went wrong instead of
just NULL. So if the return value is to be used as a "real" dentry, it
needs to be checked if it is an error before dereferencing it.
This is now happening because of ff9fb72bc0 ("debugfs: return error
values, not NULL"). If multiple iwlwifi devices are in the system, this
can cause problems when the driver attempts to create the main debugfs
directory again. Later on in the code we fail horribly by trying to
dereference a pointer that is an error value.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Gabriel Ramirez <gabriello.ramirez@gmail.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Intel Linux Wireless <linuxwifi@intel.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: stable <stable@vger.kernel.org> # 5.0
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In ini debug TLVs bit 24 is set. The driver relies on it in the memory
allocation for the debug configuration. This implementation is
problematic in case of a new debug TLV that is not supported yet is added
and uses bit 24. In such a scenario the driver allocate space without
using it which causes errors in the apply point enabling flow.
Solve it by explicitly checking if a given TLV is part of the list of
the supported ini debug TLVs.
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Fixes: f14cda6f3b ("iwlwifi: trans: parse and store debug ini TLVs")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
If we fail to initialize because rfkill is enabled, then trying
to do debug collection currently just fails. Prevent that in the
high-level code, although we should probably also fix the lower
level code to do things more carefully.
It's not 100% clear that it fixes this commit, as the original
dump code at the time might've been more careful. In any case,
we don't really need to dump anything in this expected scenario.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 7125648074 ("iwlwifi: add fw dump upon RT ucode start failure")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
The driver uses msix causes-register to handle both msix and non msix
interrupts when performing sync nmi. On devices that do not support
msix this register is unmapped and accessing it causes a kernel panic.
Solve this by differentiating the two cases and accessing the proper
causes-register in each case.
Reported-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Some drivers (such as the vub300 MMC driver) expect usb_string() to
return a properly NUL-terminated string, even when an error occurs.
(In fact, vub300's probe routine doesn't bother to check the return
code from usb_string().) When the driver goes on to use an
unterminated string, it leads to kernel errors such as
stack-out-of-bounds, as found by the syzkaller USB fuzzer.
An out-of-range string index argument is not at all unlikely, given
that some devices don't provide string descriptors and therefore list
0 as the value for their string indexes. This patch makes
usb_string() return a properly terminated empty string along with the
-EINVAL error code when an out-of-range index is encountered.
And since a USB string index is a single-byte value, indexes >= 256
are just as invalid as values of 0 or below.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: syzbot+b75b85111c10b8d680f1@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In most cases, kmalloc() will not be available early in boot when
pci_setup() is called. Thus, the kstrdup() call that was added to fix the
__initdata bug with the disable_acs_redir parameter usually returns NULL,
so the parameter is discarded and has no effect.
To fix this, store the string that's in initdata until an initcall function
can allocate the memory appropriately. This way we don't need any
additional static memory.
Fixes: d2fd6e8191 ("PCI: Fix __initdata issue with "pci=disable_acs_redir" parameter")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
symlink body shouldn't be freed without an RCU delay. Switch apparmorfs
to ->destroy_inode() and use of call_rcu(); free both the inode and symlink
body in the callback.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
symlink body shouldn't be freed without an RCU delay. Switch securityfs
to ->destroy_inode() and use of call_rcu(); free both the inode and symlink
body in the callback.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If called fast enough so samples do not increment, we can get
division by zero in kernel:
__div0
cpcap_battery_cc_raw_div
cpcap_battery_get_property
power_supply_get_property.part.1
power_supply_get_property
power_supply_show_property
power_supply_uevent
Fixes: 874b2adbed ("power: supply: cpcap-battery: Add a battery driver")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
In case of 'L1_CACHE_SHIFT != 6' we define dummy assembly macroses
PREALLOC_INSTR and PREFETCHW_INSTR without arguments. However
we pass arguments to them in code which cause build errors.
Fix that.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Cc: <stable@vger.kernel.org> [5.0]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
There is a hardware bug in some POWER9 processors where a treclaim in
fake suspend mode can cause an inconsistency in the XER[SO] bit across
the threads of a core, the workaround being to force the core into SMT4
when doing the treclaim.
The FAKE_SUSPEND bit (bit 10) in the PSSCR is used to control whether a
thread is in fake suspend or real suspend. The important difference here
being that thread reconfiguration is blocked in real suspend but not
fake suspend mode.
When we exit a guest which was in fake suspend mode, we force the core
into SMT4 while we do the treclaim in kvmppc_save_tm_hv().
However on the new exit path introduced with the function
kvmhv_run_single_vcpu() we restore the host PSSCR before calling
kvmppc_save_tm_hv() which means that if we were in fake suspend mode we
put the thread into real suspend mode when we clear the
PSSCR[FAKE_SUSPEND] bit. This means that we block thread reconfiguration
and the thread which is trying to get the core into SMT4 before it can
do the treclaim spins forever since it itself is blocking thread
reconfiguration. The result is that that core is essentially lost.
This results in a trace such as:
[ 93.512904] CPU: 7 PID: 13352 Comm: qemu-system-ppc Not tainted 5.0.0 #4
[ 93.512905] NIP: c000000000098a04 LR: c0000000000cc59c CTR: 0000000000000000
[ 93.512908] REGS: c000003fffd2bd70 TRAP: 0100 Not tainted (5.0.0)
[ 93.512908] MSR: 9000000302883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[SE]> CR: 22222444 XER: 00000000
[ 93.512914] CFAR: c000000000098a5c IRQMASK: 3
[ 93.512915] PACATMSCRATCH: 0000000000000001
[ 93.512916] GPR00: 0000000000000001 c000003f6cc1b830 c000000001033100 0000000000000004
[ 93.512928] GPR04: 0000000000000004 0000000000000002 0000000000000004 0000000000000007
[ 93.512930] GPR08: 0000000000000000 0000000000000004 0000000000000000 0000000000000004
[ 93.512932] GPR12: c000203fff7fc000 c000003fffff9500 0000000000000000 0000000000000000
[ 93.512935] GPR16: 2000000000300375 000000000000059f 0000000000000000 0000000000000000
[ 93.512951] GPR20: 0000000000000000 0000000000080053 004000000256f41f c000003f6aa88ef0
[ 93.512953] GPR24: c000003f6aa89100 0000000000000010 0000000000000000 0000000000000000
[ 93.512956] GPR28: c000003f9e9a0800 0000000000000000 0000000000000001 c000203fff7fc000
[ 93.512959] NIP [c000000000098a04] pnv_power9_force_smt4_catch+0x1b4/0x2c0
[ 93.512960] LR [c0000000000cc59c] kvmppc_save_tm_hv+0x40/0x88
[ 93.512960] Call Trace:
[ 93.512961] [c000003f6cc1b830] [0000000000080053] 0x80053 (unreliable)
[ 93.512965] [c000003f6cc1b8a0] [c00800001e9cb030] kvmhv_p9_guest_entry+0x508/0x6b0 [kvm_hv]
[ 93.512967] [c000003f6cc1b940] [c00800001e9cba44] kvmhv_run_single_vcpu+0x2dc/0xb90 [kvm_hv]
[ 93.512968] [c000003f6cc1ba10] [c00800001e9cc948] kvmppc_vcpu_run_hv+0x650/0xb90 [kvm_hv]
[ 93.512969] [c000003f6cc1bae0] [c00800001e8f620c] kvmppc_vcpu_run+0x34/0x48 [kvm]
[ 93.512971] [c000003f6cc1bb00] [c00800001e8f2d4c] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
[ 93.512972] [c000003f6cc1bb90] [c00800001e8e3918] kvm_vcpu_ioctl+0x460/0x7d0 [kvm]
[ 93.512974] [c000003f6cc1bd00] [c0000000003ae2c0] do_vfs_ioctl+0xe0/0x8e0
[ 93.512975] [c000003f6cc1bdb0] [c0000000003aeb24] ksys_ioctl+0x64/0xe0
[ 93.512978] [c000003f6cc1be00] [c0000000003aebc8] sys_ioctl+0x28/0x80
[ 93.512981] [c000003f6cc1be20] [c00000000000b3a4] system_call+0x5c/0x70
[ 93.512983] Instruction dump:
[ 93.512986] 419dffbc e98c0000 2e8b0000 38000001 60000000 60000000 60000000 40950068
[ 93.512993] 392bffff 39400000 79290020 39290001 <7d2903a6> 60000000 60000000 7d235214
To fix this we preserve the PSSCR[FAKE_SUSPEND] bit until we call
kvmppc_save_tm_hv() which will mean the core can get into SMT4 and
perform the treclaim. Note kvmppc_save_tm_hv() clears the
PSSCR[FAKE_SUSPEND] bit again so there is no need to explicitly do that.
Fixes: 95a6432ce9 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Add an explanation why zero width check is needed when generating factor
mask using GENMASK() macro.
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Sometimes one of the nkmp factors is unused. This means that one of the
factors shift and width values are set to 0. Current nkmp clock code
generates a mask for each factor with GENMASK(width + shift - 1, shift).
For unused factor this translates to GENMASK(-1, 0). This code is
further expanded by C preprocessor to final version:
(((~0UL) - (1UL << (0)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (-1))))
or a bit simplified:
(~0UL & (~0UL >> BITS_PER_LONG))
It turns out that result of the second part (~0UL >> BITS_PER_LONG) is
actually undefined by C standard, which clearly specifies:
"If the value of the right operand is negative or is greater than or
equal to the width of the promoted left operand, the behavior is
undefined."
Additionally, compiling kernel with aarch64-linux-gnu-gcc 8.3.0 gave
different results whether literals or variables with same values as
literals were used. GENMASK with literals -1 and 0 gives zero and with
variables gives 0xFFFFFFFFFFFFFFF (~0UL). Because nkmp driver uses
GENMASK with variables as parameter, expression calculates mask as ~0UL
instead of 0. This has further consequences that LSB in register is
always set to 1 (1 is neutral value for a factor and shift is 0).
For example, H6 pll-de clock is set to 600 MHz by sun4i-drm driver, but
due to this bug ends up being 300 MHz. Additionally, 300 MHz seems to be
too low because following warning can be found in dmesg:
[ 1.752763] WARNING: CPU: 2 PID: 41 at drivers/clk/sunxi-ng/ccu_common.c:41 ccu_helper_wait_for_lock.part.0+0x6c/0x90
[ 1.763378] Modules linked in:
[ 1.766441] CPU: 2 PID: 41 Comm: kworker/2:1 Not tainted 5.1.0-rc2-next-20190401 #138
[ 1.774269] Hardware name: Pine H64 (DT)
[ 1.778200] Workqueue: events deferred_probe_work_func
[ 1.783341] pstate: 40000005 (nZcv daif -PAN -UAO)
[ 1.788135] pc : ccu_helper_wait_for_lock.part.0+0x6c/0x90
[ 1.793623] lr : ccu_helper_wait_for_lock.part.0+0x48/0x90
[ 1.799107] sp : ffff000010f93840
[ 1.802422] x29: ffff000010f93840 x28: 0000000000000000
[ 1.807735] x27: ffff800073ce9d80 x26: ffff000010afd1b8
[ 1.813049] x25: ffffffffffffffff x24: 00000000ffffffff
[ 1.818362] x23: 0000000000000001 x22: ffff000010abd5c8
[ 1.823675] x21: 0000000010000000 x20: 00000000685f367e
[ 1.828987] x19: 0000000000001801 x18: 0000000000000001
[ 1.834300] x17: 0000000000000001 x16: 0000000000000000
[ 1.839613] x15: 0000000000000000 x14: ffff000010789858
[ 1.844926] x13: 0000000000000000 x12: 0000000000000001
[ 1.850239] x11: 0000000000000000 x10: 0000000000000970
[ 1.855551] x9 : ffff000010f936c0 x8 : ffff800074cec0d0
[ 1.860864] x7 : 0000800067117000 x6 : 0000000115c30b41
[ 1.866177] x5 : 00ffffffffffffff x4 : 002c959300bfe500
[ 1.871490] x3 : 0000000000000018 x2 : 0000000029aaaaab
[ 1.876802] x1 : 00000000000002e6 x0 : 00000000686072bc
[ 1.882114] Call trace:
[ 1.884565] ccu_helper_wait_for_lock.part.0+0x6c/0x90
[ 1.889705] ccu_helper_wait_for_lock+0x10/0x20
[ 1.894236] ccu_nkmp_set_rate+0x244/0x2a8
[ 1.898334] clk_change_rate+0x144/0x290
[ 1.902258] clk_core_set_rate_nolock+0x180/0x1b8
[ 1.906963] clk_set_rate+0x34/0xa0
[ 1.910455] sun8i_mixer_bind+0x484/0x558
[ 1.914466] component_bind_all+0x10c/0x230
[ 1.918651] sun4i_drv_bind+0xc4/0x1a0
[ 1.922401] try_to_bring_up_master+0x164/0x1c0
[ 1.926932] __component_add+0xa0/0x168
[ 1.930769] component_add+0x10/0x18
[ 1.934346] sun8i_dw_hdmi_probe+0x18/0x20
[ 1.938443] platform_drv_probe+0x50/0xa0
[ 1.942455] really_probe+0xcc/0x280
[ 1.946032] driver_probe_device+0x54/0xe8
[ 1.950130] __device_attach_driver+0x80/0xb8
[ 1.954488] bus_for_each_drv+0x78/0xc8
[ 1.958326] __device_attach+0xd4/0x130
[ 1.962163] device_initial_probe+0x10/0x18
[ 1.966348] bus_probe_device+0x90/0x98
[ 1.970185] deferred_probe_work_func+0x6c/0xa0
[ 1.974720] process_one_work+0x1e0/0x320
[ 1.978732] worker_thread+0x228/0x428
[ 1.982484] kthread+0x120/0x128
[ 1.985714] ret_from_fork+0x10/0x18
[ 1.989290] ---[ end trace 9babd42e1ca4b84f ]---
This commit solves the issue by first checking value of the factor
width. If it is equal to 0 (unused factor), mask is set to 0, otherwise
GENMASK() macro is used as before.
Fixes: d897ef56fa ("clk: sunxi-ng: Mask nkmp factors when setting register")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
When disabling LPIs (for example on reset) at the redistributor
level, it is expected that LPIs that was pending in the CPU
interface are eventually retired.
Currently, this is not what is happening, and these LPIs will
stay in the ap_list, eventually being acknowledged by the vcpu
(which didn't quite expect this behaviour).
The fix is thus to retire these LPIs from the list of pending
interrupts as we disable LPIs.
Reported-by: Heyi Guo <guoheyi@huawei.com>
Tested-by: Heyi Guo <guoheyi@huawei.com>
Fixes: 0e4e82f154 ("KVM: arm64: vgic-its: Enable ITS emulation as a virtual MSI controller")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
1. Bump top level address-cells/size-cells nodes to 2 (to ensure all
down stream addresses are 64-bits, unless explicitly specified
otherwise (in "soc" bus with all peripherals)
2. "memory" also specified with address/size 2
3. Add a commented reference for PAE40 region beyond 4GB physical
address space
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
HSDK currently panics when built for HIGHMEM/ARC_HAS_PAE40 because ioc
is enabled with default which doesn't work for the 2 non contiguous
memory nodes. So get PAE working by disabling ioc instead.
Tested with !PAE40 by forcing @ioc_enable=0 and running the glibc
testsuite over ssh
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
We currently don't reload pointers pointing into skb header
after doing pskb_may_pull() in _decode_session4(). So in case
pskb_may_pull() changed the pointers, we read from random
memory. Fix this by putting all the needed infos on the
stack, so that we don't need to access the header pointers
after doing pskb_may_pull().
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Recently the generic timer test of kvm-unit-tests failed to complete
(stalled) when a physical timer is being used. This issue is caused
by incorrect update of CNTP_CVAL when CNTP_TVAL is being accessed,
introduced by 'Commit 84135d3d18 ("KVM: arm/arm64: consolidate arch
timer trap handlers")'. According to Arm ARM, the read/write behavior
of accesses to the TVAL registers is expected to be:
* READ: TimerValue = (CompareValue – (Counter - Offset)
* WRITE: CompareValue = ((Counter - Offset) + Sign(TimerValue)
This patch fixes the TVAL read/write code path according to the
specification.
Fixes: 84135d3d18 ("KVM: arm/arm64: consolidate arch timer trap handlers")
Signed-off-by: Wei Huang <wei@redhat.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If an xfrmi is associated to a vrf layer 3 master device,
xfrm_policy_check() fails after traffic decapsulation. The input
interface is replaced by the layer 3 master device, and hence
xfrmi_decode_session() can't match the xfrmi anymore to satisfy
policy checking.
Extend ingress xfrmi lookup to honor the original layer 3 slave
device, allowing xfrm interfaces to operate within a vrf domain.
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
esp_output_udp_encap can produce a length that doesn't fit in the 16
bits of a UDP header's length field. In that case, we'll send a
fragmented packet whose length is larger than IP_MAX_MTU (resulting in
"Oversized IP packet" warnings on receive) and with a bogus UDP
length.
To prevent this, add a length check to esp_output_udp_encap and return
-EMSGSIZE on failure.
This seems to be older than git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In commit 6a53b75932 ("xfrm: check id proto in validate_tmpl()")
I introduced a check for xfrm protocol, but according to Herbert
IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
it should be removed from validate_tmpl().
And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
protocols, this is why xfrm_state_flush() could still miss
IPPROTO_ROUTING, which leads that those entries are left in
net->xfrm.state_all before exit net. Fix this by replacing
IPSEC_PROTO_ANY with zero.
This patch also extracts the check from validate_tmpl() to
xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
With this, no other protocols should be added into xfrm.
Fixes: 6a53b75932 ("xfrm: check id proto in validate_tmpl()")
Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Removed info log-message if ipip tunnel registration fails during
module-initialization: it adds nothing to the error message that is
written on all failures.
Fixes: dd9ee34440 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If tunnel registration failed during module initialization, the module
would fail to deregister the IPPROTO_COMP protocol and would attempt to
deregister the tunnel.
The tunnel was not deregistered during module-exit.
Fixes: dd9ee34440 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This reverts commit f10e0010fa.
This commit was just wrong. It caused a lot of
syzbot warnings, so just revert it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When unloading xfrm6_tunnel module, xfrm6_tunnel_fini directly
frees the xfrm6_tunnel_spi_kmem. Maybe someone has gotten the
xfrm6_tunnel_spi, so need to wait it.
Fixes: 91cc3bb0b04ff("xfrm6_tunnel: RCU conversion")
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
For rcu protected pointers, we'd better add '__rcu' for them.
Once added '__rcu' tag for rcu protected pointer, the sparse tool reports
warnings.
net/xfrm/xfrm_user.c:1198:39: sparse: expected struct sock *sk
net/xfrm/xfrm_user.c:1198:39: sparse: got struct sock [noderef] <asn:4> *nlsk
[...]
So introduce a new wrapper function of nlmsg_unicast to handle type
conversions.
This patch also fixes a direct access of a rcu protected socket.
Fixes: be33690d8fcf("[XFRM]: Fix aevent related crash")
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In esp4_gro_receive() and esp6_gro_receive(), secpath can be allocated
without adding xfrm state to xvec. Then, sp->xvec[sp->len - 1] would
fail and result in dereferencing invalid pointer in esp4_gso_segment()
and esp6_gso_segment(). Reset secpath if xfrm function returns error.
Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Reported-by: syzbot+b69368fd933c6c592f4c@syzkaller.appspotmail.com
Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
UBSAN report this:
UBSAN: Undefined behaviour in net/xfrm/xfrm_policy.c:1289:24
index 6 is out of range for type 'unsigned int [6]'
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.162-514.55.6.9.x86_64+ #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
0000000000000000 1466cf39b41b23c9 ffff8801f6b07a58 ffffffff81cb35f4
0000000041b58ab3 ffffffff83230f9c ffffffff81cb34e0 ffff8801f6b07a80
ffff8801f6b07a20 1466cf39b41b23c9 ffffffff851706e0 ffff8801f6b07ae8
Call Trace:
<IRQ> [<ffffffff81cb35f4>] __dump_stack lib/dump_stack.c:15 [inline]
<IRQ> [<ffffffff81cb35f4>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
[<ffffffff81d94225>] ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
[<ffffffff81d954db>] __ubsan_handle_out_of_bounds+0x16e/0x1b2 lib/ubsan.c:382
[<ffffffff82a25acd>] __xfrm_policy_unlink+0x3dd/0x5b0 net/xfrm/xfrm_policy.c:1289
[<ffffffff82a2e572>] xfrm_policy_delete+0x52/0xb0 net/xfrm/xfrm_policy.c:1309
[<ffffffff82a3319b>] xfrm_policy_timer+0x30b/0x590 net/xfrm/xfrm_policy.c:243
[<ffffffff813d3927>] call_timer_fn+0x237/0x990 kernel/time/timer.c:1144
[<ffffffff813d8e7e>] __run_timers kernel/time/timer.c:1218 [inline]
[<ffffffff813d8e7e>] run_timer_softirq+0x6ce/0xb80 kernel/time/timer.c:1401
[<ffffffff8120d6f9>] __do_softirq+0x299/0xe10 kernel/softirq.c:273
[<ffffffff8120e676>] invoke_softirq kernel/softirq.c:350 [inline]
[<ffffffff8120e676>] irq_exit+0x216/0x2c0 kernel/softirq.c:391
[<ffffffff82c5edab>] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
[<ffffffff82c5edab>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
[<ffffffff82c5c985>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:735
<EOI> [<ffffffff81188096>] ? native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:52
[<ffffffff810834d7>] arch_safe_halt arch/x86/include/asm/paravirt.h:111 [inline]
[<ffffffff810834d7>] default_idle+0x27/0x430 arch/x86/kernel/process.c:446
[<ffffffff81085f05>] arch_cpu_idle+0x15/0x20 arch/x86/kernel/process.c:437
[<ffffffff8132abc3>] default_idle_call+0x53/0x90 kernel/sched/idle.c:92
[<ffffffff8132b32d>] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
[<ffffffff8132b32d>] cpu_idle_loop kernel/sched/idle.c:251 [inline]
[<ffffffff8132b32d>] cpu_startup_entry+0x60d/0x9a0 kernel/sched/idle.c:299
[<ffffffff8113e119>] start_secondary+0x3c9/0x560 arch/x86/kernel/smpboot.c:245
The issue is triggered as this:
xfrm_add_policy
-->verify_newpolicy_info //check the index provided by user with XFRM_POLICY_MAX
//In my case, the index is 0x6E6BB6, so it pass the check.
-->xfrm_policy_construct //copy the user's policy and set xfrm_policy_timer
-->xfrm_policy_insert
--> __xfrm_policy_link //use the orgin dir, in my case is 2
--> xfrm_gen_index //generate policy index, there is 0x6E6BB6
then xfrm_policy_timer be fired
xfrm_policy_timer
--> xfrm_policy_id2dir //get dir from (policy index & 7), in my case is 6
--> xfrm_policy_delete
--> __xfrm_policy_unlink //access policy_count[dir], trigger out of range access
Add xfrm_policy_id2dir check in verify_newpolicy_info, make sure the computed dir is
valid, to fix the issue.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: e682adf021 ("xfrm: Try to honor policy index if it's supplied by user")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-03-01 12:17:41 +01:00
167 changed files with 1561 additions and 671 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.