Pull SCSI fixes from James Bottomley:
"Eight fixes.
The most important one is the mpt3sas fix which makes the driver work
again on big endian systems. The rest are mostly minor error path or
checker issues and the vmw_scsi one fixes a performance problem"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATED
scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
scsi: mpt3sas: Swap I/O memory read value back to cpu endianness
scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO
scsi: fcoe: drop frames in ELS LOGO error path
scsi: fcoe: fix use-after-free in fcoe_ctlr_els_send
scsi: qedi: Fix a potential buffer overflow
scsi: qla2xxx: Fix memory leak for allocating abort IOCB
This is purely a preparatory patch for upcoming changes during the 4.19
merge window.
We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).
This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized. It even has a comment to
that effect.
Except it _doesn't_ actually run after the percpu data has been properly
initialized. On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:
- per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
+ this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
which is obviously the right thing to do. Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.
So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.
Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs fixes from Al Viro:
"A bunch of race fixes, mostly around lazy pathwalk.
All of it is -stable fodder, a large part going back to 2013"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make sure that __dentry_kill() always invalidates d_seq, unhashed or not
fix __legitimize_mnt()/mntput() race
fix mntput/mntput race
root dentries need RCU-delayed freeing
Pull networking fixes from David Miller:
"Last bit of straggler fixes...
1) Fix btf library licensing to LGPL, from Martin KaFai lau.
2) Fix error handling in bpf sockmap code, from Daniel Borkmann.
3) XDP cpumap teardown handling wrt. execution contexts, from Jesper
Dangaard Brouer.
4) Fix loss of runtime PM on failed vlan add/del, from Ivan
Khoronzhuk.
5) xen-netfront caches skb_shinfo(skb) across a __pskb_pull_tail()
call, which potentially changes the skb's data buffer, and thus
skb_shinfo(). Fix from Juergen Gross"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
xen/netfront: don't cache skb_shinfo()
net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlan
net: ethernet: ti: cpsw: clear all entries when delete vid
xdp: fix bug in devmap teardown code path
samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier
xdp: fix bug in cpumap teardown code path
bpf, sockmap: fix cork timeout for select due to epipe
bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path
bpf, sockmap: fix bpf_tcp_sendmsg sock error handling
bpf: btf: Change tools/lib/bpf/btf to LGPL
Grygorii Strashko says:
====================
net: ethernet: ti: cpsw: fix runtime pm while add/del reserved vid
Here 2 not critical fixes for:
- vlan ale table leak while error if deleting vlan (simplifies next fix)
- runtime pm while try to set reserved vlan
====================
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's exclusive with normal behaviour but if try to set vlan to one of
the reserved values is made, the cpsw runtime pm is broken.
Fixes: a6c5d14f51 ("drivers: net: cpsw: ndev: fix accessing to suspended device")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In cases if some of the entries were not found in forwarding table
while killing vlan, the rest not needed entries still left in the
table. No need to stop, as entry was deleted anyway. So fix this by
returning error only after all was cleaned. To implement this, return
-ENOENT in cpsw_ale_del_mcast() as it's supposed to be.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If zram supports writeback feature, it's no longer a
BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations
for incompressible pages.
Do not pretend to be synchronous IO device. It makes the system very
sluggish due to waiting for IO completion from upper layers.
Furthermore, it causes a user-after-free problem because swap thinks the
opearion is done when the IO functions returns so it can free the page
(e.g., lock_page_or_retry and goto out_release in do_swap_page) but in
fact, IO is asynchronous so the driver could access a just freed page
afterward.
This patch fixes the problem.
BUG: Bad page state in process qemu-system-x86 pfn:3dfab21
page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1
flags: 0x17fffc000000008(uptodate)
raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
bad because of flags: 0x8(uptodate)
CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G B 4.18.0-rc5+ #1
Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
Call Trace:
dump_stack+0x5c/0x7b
bad_page+0xba/0x120
get_page_from_freelist+0x1016/0x1250
__alloc_pages_nodemask+0xfa/0x250
alloc_pages_vma+0x7c/0x1c0
do_swap_page+0x347/0x920
__handle_mm_fault+0x7b4/0x1110
handle_mm_fault+0xfc/0x1f0
__get_user_pages+0x12f/0x690
get_user_pages_unlocked+0x148/0x1f0
__gfn_to_pfn_memslot+0xff/0x3c0 [kvm]
try_async_pf+0x87/0x230 [kvm]
tdp_page_fault+0x132/0x290 [kvm]
kvm_mmu_page_fault+0x74/0x570 [kvm]
kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm]
kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
do_vfs_ioctl+0xa2/0x630
ksys_ioctl+0x70/0x80
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/
Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org
[minchan@kernel.org: fix changelog, add comment]
Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/
Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org
Link: http://lkml.kernel.org/r/20180805233722.217347-1-minchan@kernel.org
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Tino Lehnig <tino.lehnig@contabo.de>
Tested-by: Tino Lehnig <tino.lehnig@contabo.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With gcc-8 fsanitize=null become very noisy. GCC started to complain
about things like &a->b, where 'a' is NULL pointer. There is no NULL
dereference, we just calculate address to struct member. It's
technically undefined behavior so UBSAN is correct to report it. But as
long as there is no real NULL-dereference, I think, we should be fine.
-fno-delete-null-pointer-checks compiler flag should protect us from any
consequences. So let's just no use -fsanitize=null as it's not useful
for us. If there is a real NULL-deref we will see crash. Even if
userspace mapped something at NULL (root can do this), with things like
SMAP should catch the issue.
Link: http://lkml.kernel.org/r/20180802153209.813-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull i2c fix from Wolfram Sang:
"A single driver bugfix for I2C.
The bug was found by systematically stress testing the driver, so I am
confident to merge it that late in the cycle although it is probably
unusually large"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: xlp9xx: Fix case where SSIF read transaction completes early
Daniel Borkmann says:
====================
pull-request: bpf 2018-08-10
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix cpumap and devmap on teardown as they're under RCU context
and won't have same assumption as running under NAPI protection,
from Jesper.
2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had
a bug where socket error was not propagated correctly, from Daniel.
3) Fix incompatible libbpf header license for BTF code and match it
before it gets officially released with the rest of libbpf which
is LGPL-2.1, from Martin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq. That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all. Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.
We could try and be careful in the (few) places where that could
happen. Or we could just make the final dput() invalidate the damn
thing, unhashed or not. The latter is much simpler and easier to
backport, so let's do it that way.
Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.
Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.
It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...
Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test. Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files. One
of those files gets closed. Total refcount of mnt is 2. On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69. There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting. It observes the
decrement of component #69, but not the increment of component #0. As the
result, the total it gets is not 1 as it should've been - it's 0. At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down. However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.
It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput(). IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jesper Dangaard Brouer says:
====================
Removing entries from cpumap and devmap, goes through a number of
syncronization steps to make sure no new xdp_frames can be enqueued.
But there is a small chance, that xdp_frames remains which have not
been flushed/processed yet. Flushing these during teardown, happens
from RCU context and not as usual under RX NAPI context.
The optimization introduced in commt 389ab7f01a ("xdp: introduce
xdp_return_frame_rx_napi"), missed that the flush operation can also
be called from RCU context. Thus, we cannot always use the
xdp_return_frame_rx_napi call, which take advantage of the protection
provided by XDP RX running under NAPI protection.
The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
adjusted to easier reproduce (verified by Red Hat QA).
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Like cpumap teardown, the devmap teardown code also flush remaining
xdp_frames, via bq_xmit_all() in case map entry is removed. The code
can call xdp_return_frame_rx_napi, from the the wrong context, in-case
ndo_xdp_xmit() fails.
Fixes: 389ab7f01a ("xdp: introduce xdp_return_frame_rx_napi")
Fixes: 735fc4054b ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The teardown race in cpumap is really hard to reproduce. These changes
makes it easier to reproduce, for QA.
The --stress-mode now have a case of a very small queue size of 8, that helps
to trigger teardown flush to encounter a full queue, which results in calling
xdp_return_frame API, in a non-NAPI protect context.
Also increase MAX_CPUS, as my QA department have larger machines than me.
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When removing a cpumap entry, a number of syncronization steps happen.
Eventually the teardown code __cpu_map_entry_free is invoked from/via
call_rcu.
The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
The issues is that the teardown code is not running in the RX NAPI
code path. Thus, it is not allowed to invoke the NAPI variant of
xdp_return_frame.
This bug was found and triggered by using the --stress-mode option to
the samples/bpf program xdp_redirect_cpu. It is hard to trigger,
because the ptr_ring have to be full and cpumap bulk queue max
contains 8 packets, and a remote CPU is racing to empty the ptr_ring
queue.
Fixes: 389ab7f01a ("xdp: introduce xdp_return_frame_rx_napi")
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull crypto fix from Herbert Xu:
"This fixes a performance regression in arm64 NEON crypto as well as a
crash in x86 aegis/morus on unsupported CPUs"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/aegis,morus - Fix and simplify CPUID checks
crypto: arm64 - revert NEON yield for fast AEAD implementations
Pull networking fixes from David Miller:
1) The real fix for the ipv6 route metric leak Sabrina was seeing, from
Cong Wang.
2) Fix syzbot triggers AF_PACKET v3 ring buffer insufficient room
conditions, from Willem de Bruijn.
3) vsock can reinitialize active work struct, fix from Cong Wang.
4) RXRPC keepalive generator can wedge a cpu, fix from David Howells.
5) Fix locking in AF_SMC ioctl, from Ursula Braun.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
dsa: slave: eee: Allow ports to use phylink
net/smc: move sock lock in smc_ioctl()
net/smc: allow sysctl rmem and wmem defaults for servers
net/smc: no shutdown in state SMC_LISTEN
net: aquantia: Fix IFF_ALLMULTI flag functionality
rxrpc: Fix the keepalive generator [ver #2]
net/mlx5e: Cleanup of dcbnl related fields
net/mlx5e: Properly check if hairpin is possible between two functions
vhost: reset metadata cache when initializing new IOTLB
llc: use refcount_inc_not_zero() for llc_sap_find()
dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()
tipc: fix an interrupt unsafe locking scenario
vsock: split dwork to avoid reinitializations
net: thunderx: check for failed allocation lmac->dmacs
cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts
packet: refine ring v3 block size test to hold one frame
ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit
ipv6: fix double refcount of fib6_metrics
During ipmi stress tests we see occasional failure of transactions
at the boot time. This happens in the case of a I2C_M_RECV_LEN
transactions, when the read transfer completes (with the initial
read length of 34) before the driver gets a chance to handle interrupts.
The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN
transactions. The length is updated during the first interrupt, and the
buffer contents are only copied during subsequent interrupts. In case of
just one interrupt, we will complete the transaction without copying
out the bytes from RX fifo.
Update the code to drain the RX fifo after the length update,
so that the transaction completes correctly in all cases.
Signed-off-by: George Cherian <george.cherian@cavium.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
For a port to be able to use EEE, both the MAC and the PHY must
support EEE. A phy can be provided by both a phydev or phylink. Verify
at least one of these exist, not just phydev.
Fixes: aab9c4067d ("net: dsa: Plug in PHYLINK support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun says:
====================
net/smc: fixes 2018-08-08
here are small fixes for SMC: The first patch makes sure, shutdown code
is not executed for sockets in state SMC_LISTEN. The second patch resets
send and receive buffer values for accepted sockets, since TCP buffer size
optimizations for the internal CLC socket should not be forwarded to the
outer SMC socket. The third patch solves a race between connect and ioctl
reported by syzbot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl
defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base
for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size
optimizations for servers should be ignored.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Invoking shutdown for a socket in state SMC_LISTEN does not make
sense. Nevertheless programs like syzbot fuzzing the kernel may
try to do this. For SMC this means a socket refcounting problem.
This patch makes sure a shutdown call for an SMC socket in state
SMC_LISTEN simply returns with -ENOTCONN.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was noticed that NIC always pass all multicast traffic to the host
regardless of IFF_ALLMULTI flag on the interface.
The rule in MC Filter Table in NIC, that is configured to accept any
multicast packets, is turning on if IFF_MULTICAST flag is set on the
interface. It leads to passing all multicast traffic to the host.
This fix changes the condition to turn on that rule by checking
IFF_ALLMULTI flag as it should.
Fixes: b21f502f84 ("net:ethernet:aquantia: Fix for multicast filter handling.")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF_RXRPC has a keepalive message generator that generates a message for a
peer ~20s after the last transmission to that peer to keep firewall ports
open. The implementation is incorrect in the following ways:
(1) It mixes up ktime_t and time64_t types.
(2) It uses ktime_get_real(), the output of which may jump forward or
backward due to adjustments to the time of day.
(3) If the current time jumps forward too much or jumps backwards, the
generator function will crank the base of the time ring round one slot
at a time (ie. a 1s period) until it catches up, spewing out VERSION
packets as it goes.
Fix the problem by:
(1) Only using time64_t. There's no need for sub-second resolution.
(2) Use ktime_get_seconds() rather than ktime_get_real() so that time
isn't perceived to go backwards.
(3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
parts:
(a) The "worker" function that manages the buckets and the timer.
(b) The "dispatch" function that takes the pending peers and
potentially transmits a keepalive packet before putting them back
in the ring into the slot appropriate to the revised last-Tx time.
(4) Taking everything that's pending out of the ring and splicing it into
a temporary collector list for processing.
In the case that there's been a significant jump forward, the ring
gets entirely emptied and then the time base can be warped forward
before the peers are processed.
The warping can't happen if the ring isn't empty because the slot a
peer is in is keepalive-time dependent, relative to the base time.
(5) Limit the number of iterations of the bucket array when scanning it.
(6) Set the timer to skip any empty slots as there's no point waking up if
there's nothing to do yet.
This can be triggered by an incoming call from a server after a reboot with
AF_RXRPC and AFS built into the kernel causing a peer record to be set up
before userspace is started. The system clock is then adjusted by
userspace, thereby potentially causing the keepalive generator to have a
meltdown - which leads to a message like:
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
...
Workqueue: krxrpcd rxrpc_peer_keepalive_worker
EIP: lock_acquire+0x69/0x80
...
Call Trace:
? rxrpc_peer_keepalive_worker+0x5e/0x350
? _raw_spin_lock_bh+0x29/0x60
? rxrpc_peer_keepalive_worker+0x5e/0x350
? rxrpc_peer_keepalive_worker+0x5e/0x350
? __lock_acquire+0x3d3/0x870
? process_one_work+0x110/0x340
? process_one_work+0x166/0x340
? process_one_work+0x110/0x340
? worker_thread+0x39/0x3c0
? kthread+0xdb/0x110
? cancel_delayed_work+0x90/0x90
? kthread_stop+0x70/0x70
? ret_from_fork+0x19/0x24
Fixes: ace45bec6d ("rxrpc: Fix firewall route keepalive")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5e fixes 2018-08-07
I know it is late into 4.18 release, and this is why I am submitting
only two mlx5e ethernet fixes.
The first one from Or, is needed for -stable and it fixes hairpin
for "same device" check.
The second fix is a non risk fix from Huy which cleans up and improves
error return value reporting for dcbnl_ieee_setapp.
For -stable v4.16
- net/mlx5e: Properly check if hairpin is possible between two functions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unused netdev_registered_init/remove in en.h
Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails.
Remove extra white space
Fixes: 2a5e7a1344 ("net/mlx5e: Add dcbnl dscp to priority support")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current check relies on function BDF addresses and can get
us wrong e.g when two VFs are assigned into a VM and the PCI
v-address is set by the hypervisor.
Fixes: 5c65c564c9 ('net/mlx5e: Support offloading TC NIC hairpin flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Alaa Hleihel <alaa@mellanox.com>
Tested-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For years I thought all parisc machines executed loads and stores in
order. However, Jeff Law recently indicated on gcc-patches that this is
not correct. There are various degrees of out-of-order execution all the
way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
series has full out-of-order execution for both integer operations, and
loads and stores.
This is described in the following article:
http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml
For this reason, we need to define mb() and to insert a memory barrier
before the store unlocking spinlocks. This ensures that all memory
accesses are complete prior to unlocking. The ldcw instruction performs
the same function on entry.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Enable the -mlong-calls compiler option by default, because otherwise in most
cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
relocations. This fixes building the 64-bit defconfig.
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Daniel Borkmann says:
====================
Two sockmap fixes in bpf_tcp_sendmsg(), and one fix for the
sockmap kernel selftest. Thanks!
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
I ran into the same issue as a009f1f396 ("selftests/bpf:
test_sockmap, timing improvements") where I had a broken
pipe error on the socket due to remote end timing out on
select and then shutting down it's sockets while the other
side was still sending. We may need to do a bigger rework
in general on the test_sockmap.c, but for now increase it
to a more suitable timeout.
Fixes: a18fda1a62 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In bpf_tcp_sendmsg() the sk_alloc_sg() may fail. In the case of
ENOMEM, it may also mean that we've partially filled the scatterlist
entries with pages. Later jumping to sk_stream_wait_memory()
we could further fail with an error for several reasons, however
we miss to call free_start_sg() if the local sk_msg_buff was used.
Fixes: 4f738adba3 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
While working on bpf_tcp_sendmsg() code, I noticed that when a
sk->sk_err is set we error out with err = sk->sk_err. However
this is problematic since sk->sk_err is a positive error value
and therefore we will neither go into sk_stream_error() nor will
we report an error back to user space. I had this case with EPIPE
and user space was thinking sendmsg() succeeded since EPIPE is
a positive value, thinking we submitted 32 bytes. Fix it by
negating the sk->sk_err value.
Fixes: 4f738adba3 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llc_sap_put() decreases the refcnt before deleting sap
from the global list. Therefore, there is a chance
llc_sap_find() could find a sap with zero refcnt
in this global list.
Close this race condition by checking if refcnt is zero
or not in llc_sap_find(), if it is zero then it is being
removed so we can just treat it as gone.
Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value
can lead to undefined behavior [1].
In order to fix this use a gradual shift of the window with a 'while'
loop, similar to what tcp_cwnd_restart() is doing.
When comparing delta and RTO there is a minor difference between TCP
and DCCP, the last one also invokes dccp_cwnd_restart() and reduces
'cwnd' if delta equals RTO. That case is preserved in this change.
[1]:
[40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7
[40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int'
[40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1
...
[40851.377176] Call Trace:
[40851.408503] dump_stack+0xf1/0x17b
[40851.451331] ? show_regs_print_info+0x5/0x5
[40851.503555] ubsan_epilogue+0x9/0x7c
[40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4
[40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f
[40851.686796] ? xfrm4_output_finish+0x80/0x80
[40851.739827] ? lock_downgrade+0x6d0/0x6d0
[40851.789744] ? xfrm4_prepare_output+0x160/0x160
[40851.845912] ? ip_queue_xmit+0x810/0x1db0
[40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp]
[40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp]
[40852.142412] dccp_sendmsg+0x428/0xb20 [dccp]
[40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp]
[40852.254833] ? sched_clock+0x5/0x10
[40852.298508] ? sched_clock+0x5/0x10
[40852.342194] ? inet_create+0xdf0/0xdf0
[40852.388988] sock_sendmsg+0xd9/0x160
...
Fixes: 113ced1f52 ("dccp ccid-2: Perform congestion-window validation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9faa89d4ed ("tipc: make function tipc_net_finalize() thread
safe") tries to make it thread safe to set node address, so it uses
node_list_lock lock to serialize the whole process of setting node
address in tipc_net_finalize(). But it causes the following interrupt
unsafe locking scenario:
CPU0 CPU1
---- ----
rht_deferred_worker()
rhashtable_rehash_table()
lock(&(&ht->lock)->rlock)
tipc_nl_compat_doit()
tipc_net_finalize()
local_irq_disable();
lock(&(&tn->node_list_lock)->rlock);
tipc_sk_reinit()
rhashtable_walk_enter()
lock(&(&ht->lock)->rlock);
<Interrupt>
tipc_disc_rcv()
tipc_node_check_dest()
tipc_node_create()
lock(&(&tn->node_list_lock)->rlock);
*** DEADLOCK ***
When rhashtable_rehash_table() holds ht->lock on CPU0, it doesn't
disable BH. So if an interrupt happens after the lock, it can create
an inverse lock ordering between ht->lock and tn->node_list_lock. As
a consequence, deadlock might happen.
The reason causing the inverse lock ordering scenario above is because
the initial purpose of node_list_lock is not designed to do the
serialization of node address setting.
As cmpxchg() can guarantee CAS (compare-and-swap) process is atomic,
we use it to replace node_list_lock to ensure setting node address can
be atomically finished. It turns out the potential deadlock can be
avoided as well.
Fixes: 9faa89d4ed ("tipc: make function tipc_net_finalize() thread safe")
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <maloy@donjonn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported that we reinitialize an active delayed
work in vsock_stream_connect():
ODEBUG: init active (active state 0) object type: timer_list hint:
delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
debug_print_object+0x16a/0x210 lib/debugobjects.c:326
The pattern is apparently wrong, we should only initialize
the dealyed work once and could repeatly schedule it. So we
have to move out the initializations to allocation side.
And to avoid confusion, we can split the shared dwork
into two, instead of re-using the same one.
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
Cc: Andy king <acking@vmware.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The allocation of lmac->dmacs is not being checked for allocation
failure. Add the check.
Fixes: 3a34ecfd9d ("net: thunderx: add MAC address filter tracking for LMAC")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike fs.val.lport and fs.val.fport, cxgb4_process_flow_match()
sets fs.val.{l,f}ip to net-endian values without conversion - they come
straight from flow_dissector_key_ipv4_addrs ->dst and ->src resp. So
the assignment in mk_act_open_req() ought to be a straight copy.
As far as I know, T4 PCIe cards do exist, so it's not as if that
thing could only be found on little-endian systems...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out I had misunderstood how the x86_match_cpu() function works.
It evaluates a logical OR of the matching conditions, not logical AND.
This caused the CPU feature checks for AEGIS to pass even if only SSE2
(but not AES-NI) was supported (or vice versa), leading to potential
crashes if something tried to use the registered algs.
This patch switches the checks to a simpler method that is used e.g. in
the Camellia x86 code.
The patch also removes the MODULE_DEVICE_TABLE declarations which
actually seem to cause the modules to be auto-loaded at boot, which is
not desired. The crypto API on-demand module loading is sufficient.
Fixes: 1d373d4e8e ("crypto: x86 - Add optimized AEGIS implementations")
Fixes: 6ecc9d9ff9 ("crypto: x86 - Add optimized MORUS implementations")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As it turns out, checking the TIF_NEED_RESCHED flag after each
iteration results in a significant performance regression (~10%)
when running fast algorithms (i.e., ones that use special instructions
and operate in the < 4 cycles per byte range) on in-order cores with
comparatively slow memory accesses such as the Cortex-A53.
Given the speed of these ciphers, and the fact that the page based
nature of the AEAD scatterwalk API guarantees that the core NEON
transform is never invoked with more than a single page's worth of
input, we can estimate the worst case duration of any resulting
scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages,
processing a page's worth of input at 4 cycles per byte results in
a delay of ~250 us, which is a reasonable upper bound.
So let's remove the yield checks from the fused AES-CCM and AES-GCM
routines entirely.
This reverts commit 7b67ae4d5c and
partially reverts commit 7c50136a8a.
Fixes: 7c50136a8a ("crypto: arm64/aes-ghash - yield NEON after every ...")
Fixes: 7b67ae4d5c ("crypto: arm64/aes-ccm - yield NEON after every ...")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull GPIO fix from Linus Walleij:
"This is a single fix affecting X86 ACPI, and as such pretty important.
It is going to stable as well and have all the high-notch x86 platform
developers agreeing on it"
* tag 'gpio-v4.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib-acpi: make sure we trigger edge events at least once on boot
TPACKET_V3 stores variable length frames in fixed length blocks.
Blocks must be able to store a block header, optional private space
and at least one minimum sized frame.
Frames, even for a zero snaplen packet, store metadata headers and
optional reserved space.
In the block size bounds check, ensure that the frame of the
chosen configuration fits. This includes sockaddr_ll and optional
tp_reserve.
Syzbot was able to construct a ring with insuffient room for the
sockaddr_ll in the header of a zero-length frame, triggering an
out-of-bounds write in dev_parse_header.
Convert the comparison to less than, as zero is a valid snap len.
This matches the test for minimum tp_frame_size immediately below.
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Fixes: eb73190f4f ("net/packet: refine check for priv area size")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.
It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock(). If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch changes the tools/lib/bpf/btf.[ch] to LGPL which
is inline with libbpf also.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
According to RFC791, 68 bytes is the minimum size of IPv4 datagram every
device must be able to forward without further fragmentation while 576
bytes is the minimum size of IPv4 datagram every device has to be able
to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right
value for the ipv4 min mtu check in ip6_tnl_xmit.
While at it, change to use max() instead of if statement.
Fixes: c9fefa0819 ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the callers of ip6_rt_copy_init()/rt6_set_from() hold refcnt
of the "from" fib6_info, so there is no need to hold fib6_metrics
refcnt again, because fib6_metrics refcnt is only released when
fib6_info is gone, that is, they have the same life time, so the
whole fib6_metrics refcnt can be removed actually.
This fixes a kmemleak warning reported by Sabrina.
Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 fix from Thomas Gleixner:
"A single fix, which addresses boot failures on machines which do not
report EBDA correctly, which can place the trampoline into reserved
memory regions. Validating against E820 prevents that"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed/64: Validate trampoline placement against E820
Pull timer fixes from Thomas Gleixner:
"Two oneliners addressing NOHZ failures:
- Use a bitmask to check for the pending timer softirq and not the
bit number. The existing code using the bit number checked for
the wrong bit, which caused timers to either expire late or stop
completely.
- Make the nohz evaluation on interrupt exit more robust. The
existing code did not re-arm the hardware when interrupting a
running softirq in task context (ksoftirqd or tail of
local_bh_enable()), which caused timers to either expire late
or stop completely"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: Fix missing tick reprogram when interrupting an inline softirq
nohz: Fix local_timer_softirq_pending()
Pull perf fixes from Thomas Gleixner:
"A set of fixes for perf:
Kernel side:
- Fix the hardcoded index of extra PCI devices on Broadwell which
caused a resource conflict and triggered warnings on CPU hotplug.
Tooling:
- Update the tools copy of several files, including perf_event.h,
powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and x86's
memcpy_64.s (used in 'perf bench mem'), silencing the respective
warnings during the perf tools build.
- Fix the build on the alpine:edge distro"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices
perf tools: Fix the build on the alpine:edge distro
tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
tools headers uapi: Refresh linux/bpf.h copy
tools headers powerpc: Update asm/unistd.h copy to pick new
tools headers uapi: Update tools's copy of linux/perf_event.h
Pull irq fix from Thomas Gleixner:
"A single bugfix for the irq core to prevent silent data corruption and
malfunction of threaded interrupts under certain conditions"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Make force irq threading setup more robust
Pull networking fixes from David Miller:
1) Handle frames in error situations properly in AF_XDP, from Jakub
Kicinski.
2) tcp_mmap test case only tests ipv6 due to a thinko, fix from
Maninder Singh.
3) Session refcnt fix in l2tp_ppp, from Guillaume Nault.
4) Fix regression in netlink bind handling of multicast gruops, from
Dmitry Safonov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netlink: Don't shift on 64 for ngroups
net/smc: no cursor update send in state SMC_INIT
l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()
mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction
mlxsw: core_acl_flex_actions: Remove redundant counter destruction
mlxsw: core_acl_flex_actions: Remove redundant resource destruction
mlxsw: core_acl_flex_actions: Return error for conflicting actions
selftests/bpf: update test_lwt_seg6local.sh according to iproute2
drivers: net: lmc: fix case value for target abort error
selftest/net: fix protocol family to work for IPv4.
net: xsk: don't return frames via the allocator on error
tools/bpftool: fix a percpu_array map dump problem
Pull usercopy whitelisting fix from Kees Cook:
"Bart Massey discovered that the usercopy whitelist for JFS was
incomplete: the inline inode data may intentionally "overflow" into
the neighboring "extended area", so the size of the whitelist needed
to be raised to include the neighboring field"
* tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
jfs: Fix usercopy whitelist for inline inode data
Pull xfs bugfix from Darrick Wong:
"One more patch for 4.18 to fix a coding error in the iomap_bmap()
function introduced in -rc1: fix incorrect shifting"
* tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: fix iomap_bmap position calculation
It turns out that commit 721c7fc701 ("block: fail op_is_write()
requests to read-only partitions"), while obviously correct, causes
problems for some older lvm2 installations.
The reason is that the lvm snapshotting will continue to write to the
snapshow COW volume, even after the volume has been marked read-only.
End result: snapshot failure.
This has actually been fixed in newer version of the lvm2 tool, but the
old tools still exist, and the breakage was reported both in the kernel
bugzilla and in the Debian bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=200439https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442
The lvm2 fix is here
https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3
but until everybody has updated to recent versions, we'll have to weaken
the "never write to read-only partitions" check. It now allows the
write to happen, but causes a warning, something like this:
generic_make_request: Trying to write to read-only block-device dm-3 (partno X)
Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi
CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3
Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016
Workqueue: ksnaphd do_metadata
RIP: 0010:generic_make_request_checks+0x4ac/0x600
...
Call Trace:
generic_make_request+0x64/0x400
submit_bio+0x6c/0x140
dispatch_io+0x287/0x430
sync_io+0xc3/0x120
dm_io+0x1f8/0x220
do_metadata+0x1d/0x30
process_one_work+0x1b9/0x3e0
worker_thread+0x2b/0x3c0
kthread+0x113/0x130
ret_from_fork+0x35/0x40
Note that this is a "revert" in behavior only. I'm leaving alone the
actual code cleanups in commit 721c7fc701, but letting the previously
uncaught request go through with a warning instead of stopping it.
Fixes: 721c7fc701 ("block: fail op_is_write() requests to read-only partitions")
Reported-and-tested-by: WGH <wgh@torlan.ru>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's legal to have 64 groups for netlink_sock.
As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
only to first 32 groups.
The check for correctness of .bind() userspace supplied parameter
is done by applying mask made from ngroups shift. Which broke Android
as they have 64 groups and the shift for mask resulted in an overflow.
Fixes: 61f4b23769 ("netlink: Don't shift with UB on nlk->ngroups")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-08-05
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix bpftool percpu_array dump by using correct roundup to next
multiple of 8 for the value size, from Yonghong.
2) Fix in AF_XDP's __xsk_rcv_zc() to not returning frames back to
allocator since driver will recycle frame anyway in case of an
error, from Jakub.
3) Fix up BPF test_lwt_seg6local test cases to final iproute2
syntax, from Mathieu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If a writer blocked condition is received without data, the current
consumer cursor is immediately sent. Servers could already receive this
condition in state SMC_INIT without finished tx-setup. This patch
avoids sending a consumer cursor update in this case.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bart Massey reported what turned out to be a usercopy whitelist false
positive in JFS when symlink contents exceeded 128 bytes. The inline
inode data (i_inline) is actually designed to overflow into the "extended
area" following it (i_inline_ea) when needed. So the whitelist needed to
be expanded to include both i_inline and i_inline_ea (the whole size
of which is calculated internally using IDATASIZE, 256, instead of
sizeof(i_inline), 128).
$ cd /mnt/jfs
$ touch $(perl -e 'print "B" x 250')
$ ln -s B* b
$ ls -l >/dev/null
[ 249.436410] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'jfs_ip' (offset 616, size 250)!
Reported-by: Bart Massey <bart.massey@gmail.com>
Fixes: 8d2704d382 ("jfs: Define usercopy region in jfs_ip slab cache")
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: jfs-discussion@lists.sourceforge.net
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Pull KVM fixes from Paolo Bonzini:
"Two vmx bugfixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: vmx: fix vpid leak
KVM: vmx: use local variable for current_vmptr when emulating VMPTRST
If 'session' is not NULL and is not a PPP pseudo-wire, then we fail to
drop the reference taken by l2tp_session_get().
Fixes: ecd012e45a ("l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Fix ACL actions error condition handling
Nir says:
Two issues were lately noticed within mlxsw ACL actions error condition
handling. The first patch deals with conflicting actions such as:
# tc filter add dev swp49 parent ffff: \
protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 \
action goto chain 100 \
action mirred egress redirect dev swp4
The second action will never execute, however SW model allows this
configuration, while the mlxsw driver cannot allow for it as it
implements actions in sets of up to three actions per set with a single
termination marking. Conflicting actions create a contradiction over
this single marking and thus cannot be configured. The fix replaces a
misplaced warning with an error code to be returned.
Patches 2-4 fix a condition of duplicate destruction of resources. Some
actions require allocation of specific resource prior to setting the
action itself. On error condition this resource was destroyed twice,
leading to a crash when using mirror action, and to a redundant
destruction in other cases, since for error condition rule destruction
also takes care of resource destruction. In order to fix this state a
symmetry in behavior is added and resource destruction also takes care
of removing the resource from rule's resource list.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In previous patch mlxsw_afa_resource_del() was added to avoid a duplicate
resource detruction scenario.
For mirror actions, such duplicate destruction leads to a crash as in:
# tc qdisc add dev swp49 ingress
# tc filter add dev swp49 parent ffff: \
protocol ip chain 100 pref 10 \
flower skip_sw dst_ip 192.168.101.1 action drop
# tc filter add dev swp49 parent ffff: \
protocol ip pref 10 \
flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
action mirred egress mirror dev swp4
Therefore add a call to mlxsw_afa_resource_del() in
mlxsw_afa_mirror_destroy() in order to clear that resource
from rule's resources.
Fixes: d0d13c1858 ("mlxsw: spectrum_acl: Add support for mirror action")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each tc flower rule uses a hidden count action. As counter resource may
not be available due to limited HW resources, update _counter_create()
and _counter_destroy() pair to follow previously introduced symmetric
error condition handling, add a call to mlxsw_afa_resource_del() as part
of the counter resource destruction.
Fixes: c18c1e186b ("mlxsw: core: Make counter index allocated inside the action append")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some ACL actions require the allocation of a separate resource
prior to applying the action itself. When facing an error condition
during the setup phase of the action, resource should be destroyed.
For such actions the destruction was done twice which is dangerous
and lead to a potential crash.
The destruction took place first upon error on action setup phase
and then as the rule was destroyed.
The following sequence generated a crash:
# tc qdisc add dev swp49 ingress
# tc filter add dev swp49 parent ffff: \
protocol ip chain 100 pref 10 \
flower skip_sw dst_ip 192.168.101.1 action drop
# tc filter add dev swp49 parent ffff: \
protocol ip pref 10 \
flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
action mirred egress mirror dev swp4
Therefore add mlxsw_afa_resource_del() as a complement of
mlxsw_afa_resource_add() to add symmetry to resource_list membership
handling. Call this from mlxsw_afa_fwd_entry_ref_destroy() to make the
_fwd_entry_ref_create() and _fwd_entry_ref_destroy() pair of calls a
NOP.
Fixes: 140ce42121 ("mlxsw: core: Convert fwd_entry_ref list to be generic per-block resource list")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum switch ACL action set is built in groups of three actions
which may point to additional actions. A group holds a single record
which can be set as goto record for pointing at a following group
or can be set to mark the termination of the lookup. This is perfectly
adequate for handling a series of actions to be executed on a packet.
While the SW model allows configuration of conflicting actions
where it is clear that some actions will never execute, the mlxsw
driver must block such configurations as it creates a conflict
over the single terminate/goto record value.
For a conflicting actions configuration such as:
# tc filter add dev swp49 parent ffff: \
protocol ip pref 10 \
flower skip_sw dst_ip 192.168.101.1 \
action goto chain 100 \
action mirred egress mirror dev swp4
Where it is clear that the last action will never execute, the
mlxsw driver was issuing a warning instead of returning an error.
Therefore replace that warning with an error for this specific
case.
Fixes: 4cda7d8d70 ("mlxsw: core: Introduce flexible actions support")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commands that are reset are returned with status
SAM_STAT_COMMAND_TERMINATED. PVSCSI currently returns DID_OK |
SAM_STAT_COMMAND_TERMINATED which fails the command. Instead, set hostbyte
to DID_RESET to allow upper layers to retry.
Tested by copying a large file between two pvscsi disks on same adapter
while performing a bus reset at 1-second intervals. Before fix, commands
sometimes fail with DID_OK. After fix, commands observed to fail with
DID_RESET.
Signed-off-by: Jim Gill <jgill@vmware.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Surround scsi_execute() calls with scsi_autopm_get_device() and
scsi_autopm_put_device(). Note: removing sr_mutex protection from the
scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
sr_mutex is to serialize cdrom_*() calls.
This patch avoids that complaints similar to the following appear in the
kernel log if runtime power management is enabled:
INFO: task systemd-udevd:650 blocked for more than 120 seconds.
Not tainted 4.18.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd D28176 650 513 0x00000104
Call Trace:
__schedule+0x444/0xfe0
schedule+0x4e/0xe0
schedule_preempt_disabled+0x18/0x30
__mutex_lock+0x41c/0xc70
mutex_lock_nested+0x1b/0x20
__blkdev_get+0x106/0x970
blkdev_get+0x22c/0x5a0
blkdev_open+0xe9/0x100
do_dentry_open.isra.19+0x33e/0x570
vfs_open+0x7c/0xd0
path_openat+0x6e3/0x1120
do_filp_open+0x11c/0x1c0
do_sys_open+0x208/0x2d0
__x64_sys_openat+0x59/0x70
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Maurizio Lombardi <mlombard@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: <stable@vger.kernel.org>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Swap the I/O memory read value back to cpu endianness before storing it in
a data structures which are defined in the MPI headers where u8 components
are not defined in the endianness order.
In this area from day one mpt3sas driver is using le32_to_cpu() &
cpu_to_le32() APIs. But in commit cf6bf9710c
(mpt3sas: Bug fix for big endian systems) we have removed these APIs
before reading I/O memory which we should haven't done it. So
in this patch I am correcting it by adding these APIs back
before accessing I/O memory.
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull rdma fix from Jason Gunthorpe:
"One bug for missing user input validation: refuse invalid port numbers
in the modify_qp system call"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/uverbs: Expand primary and alt AV port checks
Pull block fix from Jens Axboe:
"Just a single fix, from Ming, fixing a regression in this cycle where
the busy tag iteration was changed to only calling the callback
function for requests that are started. We really want all non-free
requests.
This fixes a boot regression on certain VM setups"
* tag 'for-linus-20180803' of git://git.kernel.dk/linux-block:
blk-mq: fix blk_mq_tagset_busy_iter
Pull powerpc fixes from Michael Ellerman:
"One fix for a regression in a recent TLB flush optimisation, which
caused us to incorrectly not send TLB invalidations to coprocessors.
Thanks to Frederic Barrat, Nicholas Piggin, Vaibhav Jain"
* tag 'powerpc-4.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/radix: Fix missing global invalidations when removing copro
Pull drm fixes from Dave Airlie:
"Nothing too major at this late stage:
- adv7511: reset fix
- vc4: scaling fix
- two atomic core fixes
- one legacy core error handling fix
I had a bunch of driver fixes from hdlcd but I think I'll leave them
for -next at this point"
* tag 'drm-fixes-2018-08-03' of git://anongit.freedesktop.org/drm/drm:
drm/vc4: Reset ->{x, y}_scaling[1] when dealing with uniplanar formats
drm/atomic: Initialize variables in drm_atomic_helper_async_check() to make gcc happy
drm/atomic: Check old_plane_state->crtc in drm_atomic_helper_async_check()
drm: re-enable error handling
drm/bridge: adv7511: Reset registers on hotplug
Pull crypto fix from Herbert Xu:
"Fix a memory corruption in the padlock-aes driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: padlock-aes - Fix Nano workaround data corruption
The full nohz tick is reprogrammed in irq_exit() only if the exit is not in
a nesting interrupt. This stands as an optimization: whether a hardirq or a
softirq is interrupted, the tick is going to be reprogrammed when necessary
at the end of the inner interrupt, with even potential new updates on the
timer queue.
When soft interrupts are interrupted, it's assumed that they are executing
on the tail of an interrupt return. In that case tick_nohz_irq_exit() is
called after softirq processing to take care of the tick reprogramming.
But the assumption is wrong: softirqs can be processed inline as well, ie:
outside of an interrupt, like in a call to local_bh_enable() or from
ksoftirqd.
Inline softirqs don't reprogram the tick once they are done, as opposed to
interrupt tail softirq processing. So if a tick interrupts an inline
softirq processing, the next timer will neither be reprogrammed from the
interrupting tick's irq_exit() nor after the interrupted softirq
processing. This situation may leave the tick unprogrammed while timers are
armed.
To fix this, simply keep reprogramming the tick even if a softirq has been
interrupted. That can be optimized further, but for now correctness is more
important.
Note that new timers enqueued in nohz_full mode after a softirq gets
interrupted will still be handled just fine through self-IPIs triggered by
the timer code.
Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org # 4.14+
Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.org
The support of force threading interrupts which are set up with both a
primary and a threaded handler wreckaged the setup of regular requested
threaded interrupts (primary handler == NULL).
The reason is that it does not check whether the primary handler is set to
the default handler which wakes the handler thread. Instead it replaces the
thread handler with the primary handler as it would do with force threaded
interrupts which have been requested via request_irq(). So both the primary
and the thread handler become the same which then triggers the warnon that
the thread handler tries to wakeup a not configured secondary thread.
Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
when requesting the threaded interrupt, which is normaly caught by the
sanity checks when force irq threading is disabled.
Fix it by skipping the force threading setup when a regular threaded
interrupt is requested. As a consequence the interrupt request which lacks
the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
it.
Fixes: 2a1d3ab898 ("genirq: Handle force threading of irqs with primary and thread handler")
Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Cc: stable@vger.kernel.org
The shell file for test_lwt_seg6local contains an early iproute2 syntax
for installing a seg6local End.BPF route. iproute2 support for this
feature has recently been upstreamed, but with an additional keyword
required. This patch updates test_lwt_seg6local.sh to the definitive
iproute2 syntax
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull media fixes from Mauro Carvalho Chehab:
- a deadlock regression at vsp1 driver
- some Remote Controller fixes related to the new BPF filter logic
added on it for Kernel 4.18.
* tag 'media/v4.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: v4l: vsp1: Fix deadlock in VSPDL DRM pipelines
media: rc: read out of bounds if bpf reports high protocol number
media: bpf: ensure bpf program is freed on detach
media: rc: be less noisy when driver misbehaves
Merge misc fixes from Andrew Morton:
"3 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK fails
ipc/shm.c add ->pagesize function to shm_vm_ops
memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
Commit 05ea88608d ("mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct") adds a new ->pagesize() function to
hugetlb_vm_ops, intended to cover all hugetlbfs backed files.
With System V shared memory model, if "huge page" is specified, the
"shared memory" is backed by hugetlbfs files, but the mappings initiated
via shmget/shmat have their original vm_ops overwritten with shm_vm_ops,
so we need to add a ->pagesize function to shm_vm_ops. Otherwise,
vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs backed vma,
result in below BUG:
fs/hugetlbfs/inode.c
443 if (unlikely(page_mapped(page))) {
444 BUG_ON(truncate_op);
resulting in
hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is deprecated
------------[ cut here ]------------
kernel BUG at fs/hugetlbfs/inode.c:444!
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 4.14.35-1829.el7uek.x86_64 #2
RIP: 0010:remove_inode_hugepages+0x3db/0x3e2
....
Call Trace:
hugetlbfs_evict_inode+0x1e/0x3e
evict+0xdb/0x1af
iput+0x1a2/0x1f7
dentry_unlink_inode+0xc6/0xf0
__dentry_kill+0xd8/0x18d
dput+0x1b5/0x1ed
__fput+0x18b/0x216
____fput+0xe/0x10
task_work_run+0x90/0xa7
exit_to_usermode_loop+0xdd/0x116
do_syscall_64+0x187/0x1ae
entry_SYSCALL_64_after_hwframe+0x150/0x0
[jane.chu@oracle.com: relocate comment]
Link: http://lkml.kernel.org/r/20180731044831.26036-1-jane.chu@oracle.com
Link: http://lkml.kernel.org/r/20180727211727.5020-1-jane.chu@oracle.com
Fixes: 05ea88608d ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct")
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current value for a target abort error is 0x010, however, this value
should in fact be 0x002. As it stands, the range of error is 0..7 so
it is currently never being detected. This bug has been in the driver
since the early 2.6.12 days (or before).
Detected by CoverityScan, CID#744290 ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The position calculation in iomap_bmap() shifts bno the wrong way,
so we don't progress properly and end up re-mapping block zero
over and over, yielding an unchanging physical block range as the
logical block advances:
# filefrag -Be file
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 0: 21.. 21: 1: merged
1: 1.. 1: 21.. 21: 1: 22: merged
Discontinuity: Block 1 is at 21 (was 22)
2: 2.. 2: 21.. 21: 1: 22: merged
Discontinuity: Block 2 is at 21 (was 22)
3: 3.. 3: 21.. 21: 1: 22: merged
This breaks the FIBMAP interface for anyone using it (XFS), which
in turn breaks LILO, zipl, etc.
Bug-actually-spotted-by: Darrick J. Wong <darrick.wong@oracle.com>
Fixes: 89eb1906a9 ("iomap: add an iomap-based bmap implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When receiving a LOGO request we forget to clear the FC_RP_STARTED flag
before starting the rport delete routine.
As the started flag was not cleared, we're not deleting the rport but
waiting for a restart and thus are keeping the reference count of the rdata
object at 1.
This leads to the following kmemleak report:
unreferenced object 0xffff88006542aa00 (size 512):
comm "kworker/0:2", pid 24, jiffies 4294899222 (age 226.880s)
hex dump (first 32 bytes):
68 96 fe 65 00 88 ff ff 00 00 00 00 00 00 00 00 h..e............
01 00 00 00 08 00 00 00 02 c5 45 24 ac b8 00 10 ..........E$....
backtrace:
[<(____ptrval____)>] fcoe_ctlr_vn_add.isra.5+0x7f/0x770 [libfcoe]
[<(____ptrval____)>] fcoe_ctlr_vn_recv+0x12af/0x27f0 [libfcoe]
[<(____ptrval____)>] fcoe_ctlr_recv_work+0xd01/0x32f0 [libfcoe]
[<(____ptrval____)>] process_one_work+0x7ff/0x1420
[<(____ptrval____)>] worker_thread+0x87/0xef0
[<(____ptrval____)>] kthread+0x2db/0x390
[<(____ptrval____)>] ret_from_fork+0x35/0x40
[<(____ptrval____)>] 0xffffffffffffffff
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: ard <ard@kwaak.net>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
KASAN reports a use-after-free in fcoe_ctlr_els_send() when we're sending a
LOGO and have FIP debugging enabled. This is because we're first freeing
the skb and then printing the frame's DID. But the DID is a member of the
FC frame header which in turn is the skb's payload.
Exchange the debug print and kfree_skb() calls so we're not touching the
freed data.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
use actual protocol family passed by user rather than hardcoded
AF_INTE6 to cerate sockets.
current code is not working for IPv4.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull arm64 regression fix from Will Deacon:
"Ard found a nasty arm64 regression in 4.18 where the AES ghash/gcm
code doesn't notify the kernel about its use of the vector registers,
therefore potentially corrupting live user state.
The fix is straightforward and Herbert agreed for it to go via arm64"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair
Pull networking fixes from David Miller:
"Fixes keep trickling in:
1) Various IP fragmentation memory limit hardening changes from Eric
Dumazet.
2) Revert ipv6 metrics leak change, it causes more problems than it
fixes for now.
3) Fix WoL regression in stmmac driver, from Jose Abreu.
4) Netlink socket spectre v1 gadget fix, from Jeremy Cline"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
Revert "net/ipv6: fix metrics leak"
rxrpc: Fix user call ID check in rxrpc_service_prealloc_one
net: dsa: Do not suspend/resume closed slave_dev
netlink: Fix spectre v1 gadget in netlink_create()
Documentation: dpaa2: Use correct heading adornment
net: stmmac: Fix WoL for PCI-based setups
bonding: avoid lockdep confusion in bond_get_stats()
enic: do not call enic_change_mtu in enic_probe
ipv4: frags: handle possible skb truesize change
inet: frag: enforce memory limits earlier
net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flow
net/mlx5e: Fix null pointer access when setting MTU of vport representor
net/mlx5e: Set port trust mode to PCP as default
net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager
net: dsa: mv88e6xxx: Fix SERDES support on 88E6141/6341
brcmfmac: fix regression in parsing NVRAM for multiple devices
iwlwifi: add more card IDs for 9000 series
Previously in squashfs_readpage() when copying data into the page
cache, it used the length of the datablock read from the filesystem
(after decompression). However, if the filesystem has been corrupted
this data block may be short, which will leave pages unfilled.
The fix for this is to compute the expected number of bytes to copy
from the inode size, and use this to detect if the block is short.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Tested-by: Willy Tarreau <w@1wt.eu>
Cc: Анатолий Тросиненко <anatoly.trosinenko@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The squashfs fragment reading code doesn't actually verify that the
fragment is inside the fragment table. The end result _is_ verified to
be inside the image when actually reading the fragment data, but before
that is done, we may end up taking a page fault because the fragment
table itself might not even exist.
Another report from Anatoly and his endless squashfs image fuzzing.
Reported-by: Анатолий Тросиненко <anatoly.trosinenko@gmail.com>
Acked-by:: Phillip Lougher <phillip.lougher@gmail.com>,
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit df18b50448.
This change causes other problems and use-after-free situations as
found by syzbot.
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch to fix the case where a lock request was interrupted ended up
changing default handling of errors such as NFS4ERR_DENIED and caused the
client to immediately resend the lock request. Let's do a partial revert
of that request so that the default is now to exit, but change the way
we handle resends to take into account the fact that the user may have
interrupted the request.
Reported-by: Kenneth Johansson <ken@kenjo.org>
Fixes: a3cf9bca2a ("NFSv4: Don't add a new lock on an interrupted wait..")
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Pull ARM fix from Russell King:
"Just a single fix this time around for recent binutils causing build
problems when generating Thumb-2 code"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
Commit 2c4541e24c ("mm: use vma_init() to initialize VMAs on stack and
data segments") tried to initialize various left-over ad-hoc vma's
"properly", but actually made things worse for the temporary vma's used
for TLB flushing.
vma_init() doesn't actually initialize all of the vma, just a few
fields, so doing something like
- struct vm_area_struct vma = { .vm_mm = tlb->mm, };
+ struct vm_area_struct vma;
+
+ vma_init(&vma, tlb->mm);
was actually very bad: instead of having a nicely initialized vma with
every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma
with only a couple of fields initialized. And they weren't even fields
that the code in question mostly cared about.
The flush_tlb_range() function takes a "struct vma" rather than a
"struct mm_struct", because a few architectures actually care about what
kind of range it is - being able to only do an ITLB flush if it's a
range that doesn't have data accesses enabled, for example. And all the
normal users already have the vma for doing the range invalidation.
But a few people want to call flush_tlb_range() with a range they just
made up, so they also end up using a made-up vma. x86 just has a
special "flush_tlb_mm_range()" function for this, but other
architectures (arm and ia64) do the "use fake vma" thing instead, and
thus got caught up in the vma_init() changes.
At the same time, the TLB flushing code really doesn't care about most
other fields in the vma, so vma_init() is just unnecessary and
pointless.
This fixes things by having an explicit "this is just an initializer for
the TLB flush" initializer macro, which is used by the arm/arm64/ia64
people who mis-use this interface with just a dummy vma.
Fixes: 2c4541e24c ("mm: use vma_init() to initialize VMAs on stack and data segments")
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Delete the old VM_BUG_ON_VMA() from zap_pmd_range(), which asserted
that mmap_sem must be held when splitting an "anonymous" vma there.
Whether that's still strictly true nowadays is not entirely clear,
but the danger of sometimes crashing on the BUG is now fairly clear.
Even with the new stricter rules for anonymous vma marking, the
condition it checks for can possible trigger. Commit 44960f2a7b
("staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem
pages") is good, and originally I thought it was safe from that
VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
insists on VM_SHARED.
But after I read John's earlier mail, drawing attention to the
vfs_fallocate() in there: I may be wrong, and I don't know if Android
has THP in the config anyway, but it looks to me like an
unmap_mapping_range() from ashmem's vfs_fallocate() could hit precisely
the VM_BUG_ON_VMA(), once it's vma_is_anonymous().
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There just check the user call ID isn't already in use, hence should
compare user_call_ID with xcall->user_call_ID, which is current
node's user_call_ID.
Fixes: 540b1c48c3 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Suggested-by: David Howells <dhowells@redhat.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull power management fixes from Rafael Wysocki:
"These fix the scope of a recent intel_pstate driver optimization used
incorrectly on some systems due to processor identification ambiguity
and fix a few issues in the turbostat utility, including three recent
regressions.
Specifics:
- Use ACPI FADT preferred PM Profile to distinguish Skylake desktop
processors from some server ones with the same model number in
order to limit the scope of the recent IO-wait boost optimization
to servers, as intended (Srinivas Pandruvada).
- Fix several issues in the turbostat utility:
* Fix the -S option on 1-CPU systems (Len Brown).
* Fix computations using incorrect processor core counts (Artem
Bityutskiy).
* Fix the x2apic debug message (Len Brown).
* Fix logical node enumeration to allow for non-sequential
physical nodes (Prarit Bhargava).
* Fix reported family on modern AMD processors (Calvin Walton).
* Clarify the RAPL column information in the man page (Len Brown)"
* tag 'pm-urgent-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Limit the scope of HWP dynamic boost platforms
tools/power turbostat: version 18.07.27
tools/power turbostat: Read extended processor family from CPUID
tools/power turbostat: Fix logical node enumeration to allow for non-sequential physical nodes
tools/power turbostat: fix x2apic debug message output file
tools/power turbostat: fix bogus summary values
tools/power turbostat: fix -S on UP systems
tools/power turbostat: Update turbostat(8) RAPL throttling column description
Anatoly continues to find issues with fuzzed squashfs images.
This time, corrupt, missing, or undersized data for the page filling
wasn't checked for, because the squashfs_{copy,read}_cache() functions
did the squashfs_copy_data() call without checking the resulting data
size.
Which could result in the page cache pages being incompletely filled in,
and no error indication to the user space reading garbage data.
So make a helper function for the "fill in pages" case, because the
exact same incomplete sequence existed in two places.
[ I should have made a squashfs branch for these things, but I didn't
intend to start doing them in the first place.
My historical connection through cramfs is why I got into looking at
these issues at all, and every time I (continue to) think it's a
one-off.
Because _this_ time is always the last time. Right? - Linus ]
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amit Pundir and Youling in parallel reported crashes with recent
mainline kernels running Android:
F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
F DEBUG : Build fingerprint: 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
F DEBUG : Revision: '0'
F DEBUG : ABI: 'arm'
F DEBUG : pid: 2261, tid: 2261, name: zygote >>> zygote <<<
F DEBUG : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
... <snip> ...
F DEBUG : backtrace:
F DEBUG : #00 pc 00001c04 /system/lib/libc.so (memset+48)
F DEBUG : #01 pc 0010c513 /system/lib/libart.so (create_mspace_with_base+82)
F DEBUG : #02 pc 0015c601 /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, unsigned int)+40)
F DEBUG : #03 pc 0015c3ed /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*, std::__1::basic_string<char, std::__ 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int, unsigned int, unsigned int, unsigned int, bool)+36)
...
This was bisected back to commit bfd40eaff5 ("mm: fix
vma_is_anonymous() false-positives").
create_mspace_with_base() in the trace above, utilizes ashmem, and with
ashmem, for shared mappings we use shmem_zero_setup(), which sets the
vma->vm_ops to &shmem_vm_ops. But for private ashmem mappings nothing
sets the vma->vm_ops.
Looking at the problematic patch, it seems to add a requirement that one
call vma_set_anonymous() on a vma, otherwise the dummy_vm_ops will be
used. Using the dummy_vm_ops seem to triggger SIGBUS when traversing
unmapped pages.
Thus, this patch adds a call to vma_set_anonymous() for ashmem private
mappings and seems to avoid the reported problem.
Fixes: bfd40eaff5 ("mm: fix vma_is_anonymous() false-positives")
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Colin Cross <ccross@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Reported-by: Youling 257 <youling257@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit bfd40eaff5 ("mm: fix vma_is_anonymous() false-positives") made
newly allocated vma's have a dummy vm_ops field so that they wouldn't be
mistaken for anonymous mappings, and if you wanted an anonymous vma you
had to explicitly say so by calling "vma_set_anonymous()" on it.
However, it missed the two special vmas that ia64 processes have: the
register backing store and the NaT page. So they wouldn't actually act
like anonymous ranges, and page faults on them caused a SIGBUS rather
than the creation of a new anon page in them.
That obviously will make any ia64 binary very unhappy indeed, and the
boot fails early.
Fixes: bfd40eaff5 ("mm: fix vma_is_anonymous() false-positives")
Reported-by: Tony Luck <tony.luck@intel.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a DSA slave network device was previously disabled, there is no need
to suspend or resume it.
Fixes: 2446254915 ("net: dsa: allow switch drivers to implement suspend/resume hooks")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
'protocol' is a user-controlled value, so sanitize it after the bounds
check to avoid using it for speculative out-of-bounds access to arrays
indexed by it.
This addresses the following accesses detected with the help of smatch:
* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_keys' [w]
* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_key_strings' [w]
* net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre
issue 'nl_table' [w] (local cap)
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add overline heading adornment to document title in order to comply
with kernel doc requirements.
Fixes: 60b9131 staging: fsl-mc: Convert documentation to rst format
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WoL won't work in PCI-based setups because we are not saving the PCI EP
state before entering suspend state and not allowing D3 wake.
Fix this by using a wrapper around stmmac_{suspend/resume} which
correctly sets the PCI EP state.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the optimizations for TLB invalidation from commit 0cef77c779
("powerpc/64s/radix: flush remote CPUs out of single-threaded
mm_cpumask"), the scope of a TLBI (global vs. local) can now be
influenced by the value of the 'copros' counter of the memory context.
When calling mm_context_remove_copro(), the 'copros' counter is
decremented first before flushing. It may have the unintended side
effect of sending local TLBIs when we explicitly need global
invalidations in this case. Thus breaking any nMMU user in a bad and
unpredictable way.
Fix it by flushing first, before updating the 'copros' counter, so
that invalidations will be global.
Fixes: 0cef77c779 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On some systems using edge triggered ACPI Event Interrupts, the initial
state at boot is not setup by the firmware, instead relying on the edge
irq event handler running at least once to setup the initial state.
2 known examples of this are:
1) The Surface 3 has its _LID state controlled by an ACPI operation region
triggered by a GPIO event:
OperationRegion (GPOR, GeneralPurposeIo, Zero, One)
Field (GPOR, ByteAcc, NoLock, Preserve)
{
Connection (
GpioIo (Shared, PullNone, 0x0000, 0x0000, IoRestrictionNone,
"\\_SB.GPO0", 0x00, ResourceConsumer, ,
)
{ // Pin list
0x004C
}
),
HELD, 1
}
Method (_E4C, 0, Serialized) // _Exx: Edge-Triggered GPE
{
If ((HELD == One))
{
^^LID.LIDB = One
}
Else
{
^^LID.LIDB = Zero
Notify (LID, 0x80) // Status Change
}
Notify (^^PCI0.SPI1.NTRG, One) // Device Check
}
Currently, the state of LIDB is wrong until the user actually closes or
open the cover. We need to trigger the GPIO event once to update the
internal ACPI state.
Coincidentally, this also enables the Surface 2 integrated HID sensor hub
which also requires an ACPI gpio operation region to start initialization.
2) Various Bay Trail based tablets come with an external USB mux and
TI T1210B USB phy to enable USB gadget mode. The mux is controlled by a
GPIO which is controlled by an edge triggered ACPI Event Interrupt which
monitors the micro-USB ID pin.
When the tablet is connected to a PC (or no cable is plugged in), the ID
pin is high and the tablet should be in gadget mode. But the GPIO
controlling the mux is initialized by the firmware so that the USB data
lines are muxed to the host controller.
This means that if the user wants to use gadget mode, the user needs to
first plug in a host-cable to force the ID pin low and then unplug it
and connect the tablet to a PC, to get the ACPI event handler to run and
switch the mux to device mode,
This commit fixes both by running the event-handler once on boot.
Note that the running of the event-handler is done from a late_initcall,
this is done because the handler AML code may rely on OperationRegions
registered by other builtin drivers. This avoids errors like these:
[ 0.133026] ACPI Error: No handler for Region [XSCG] ((____ptrval____)) [GenericSerialBus] (20180531/evregion-132)
[ 0.133036] ACPI Error: Region GenericSerialBus (ID=9) has no handler (20180531/exfldio-265)
[ 0.133046] ACPI Error: Method parse/execution failed \_SB.GPO2._E12, AE_NOT_EXIST (20180531/psparse-516)
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
[hdegoede: Document BYT USB mux reliance on initial trigger]
[hdegoede: Run event handler from a late_initcall, rather then immediately]
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Merge turbostat utility fixes for final 4.18:
- Fix the -S option on 1-CPU systems.
- Fix computations using incorrect processor core counts.
- Fix the x2apic debug message.
- Fix logical node enumeration to allow for non-sequential physical nodes.
- Fix reported family on modern AMD processors.
- Clarify the RAPL column information in the man page.
* pm-tools:
tools/power turbostat: version 18.07.27
tools/power turbostat: Read extended processor family from CPUID
tools/power turbostat: Fix logical node enumeration to allow for non-sequential physical nodes
tools/power turbostat: fix x2apic debug message output file
tools/power turbostat: fix bogus summary values
tools/power turbostat: fix -S on UP systems
tools/power turbostat: Update turbostat(8) RAPL throttling column description
In commit ab123fe071 ("enic: handle mtu change for vf properly")
ASSERT_RTNL() is added to _enic_change_mtu() to prevent it from being
called without rtnl held. enic_probe() calls enic_change_mtu()
without rtnl held. At this point netdev is not registered yet.
Remove call to enic_change_mtu and assign the mtu to netdev->mtu.
Fixes: ab123fe071 ("enic: handle mtu change for vf properly")
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_frag_queue() might call pskb_pull() on one skb that
is already in the fragment queue.
We need to take care of possible truesize change, or we
might have an imbalance of the netns frags memory usage.
IPv6 is immune to this bug, because RFC5722, Section 4,
amended by Errata ID 3089 states :
When reassembling an IPv6 datagram, if
one or more its constituent fragments is determined to be an
overlapping fragment, the entire datagram (and any constituent
fragments) MUST be silently discarded.
Fixes: 158f323b98 ("net: adjust skb->truesize in pskb_expand_head()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently check current frags memory usage only when
a new frag queue is created. This allows attackers to first
consume the memory budget (default : 4 MB) creating thousands
of frag queues, then sending tiny skbs to exceed high_thresh
limit by 2 to 3 order of magnitude.
Note that before commit 648700f76b ("inet: frags: use rhashtables
for reassembly units"), work queue could be starved under DOS,
getting no cpu cycles.
After commit 648700f76b, only the per frag queue timer can eventually
remove an incomplete frag queue and its skbs.
Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Peter Oskolkov <posk@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-07-31
The following series includes four mlx5 fixes.
Please pull and let me know if there's any problem.
For -stable v4.14
net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager
For -stable v4.16
net/mlx5e: Set port trust mode to PCP as default
For -stable v4.17
net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flow
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull audit fix from Paul Moore:
"A single small audit fix to guard against memory allocation failures
when logging information about a kernel module load.
It's small, easy to understand, and self-contained; while nothing is
zero risk, this should be pretty low"
* tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: fix potential null dereference 'context->module.name'
After introduction of the cited commit, mlx5e_build_nic_params
receives the netdevice mtu in order to set the sw_mtu of mlx5e_params.
For enhanced IPoIB, the netdevice mtu is not set in this stage,
therefore, the initial sw_mtu equals zero. As a result, the hw_mtu
of the receive queue will be calculated incorrectly causing traffic
issues.
To fix this issue, query for port mtu before building the nic params.
Fixes: 472a1e44b3 ("net/mlx5e: Save MTU in channels params")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
MTU helper function is used by both conventional mlx5e
instances (PF/VF) and the eswitch representors. The representor
shouldn't change the nic vport context MTU, the VF is responsible for
that. Therefore set_mtu_cb has a null value when changing the
representor MTU.
Fixes: 250a42b6a7 ("net/mlx5e: Support configurable MTU for vport representors")
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The hairpin offload code has dependency on the trust mode being PCP.
Hence we should set PCP as the default for handling cases where we are
disallowed to read the trust mode from the FW, or failed to initialize it.
Fixes: 106be53b6b ('net/mlx5e: Set per priority hairpin pairs')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Execute mlx5_eswitch_init() only if we have MLX5_ESWITCH_MANAGER
capabilities.
Do the same for mlx5_eswitch_cleanup().
Fixes: a9f7705ffd ("net/mlx5: Unify vport manager capability check")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Version 1 of the patch adding SERDES support to the 88E6141/6341
correctly added the ops to the 88E6141/6341. However, by the time
version 3 was committed, the ops had moved to the 88E6085/6175. Put
them back where they belong.
Fixes: 5bafeb6e7e ("net: dsa: mv88e6xxx: 88E6141/6341 SERDES support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo says:
====================
wireless-drivers fixes for 4.18
Last set of fixes before 4.18 is released
iwlwifi
* add new IDs for cards already available on the market
brcmfmac
* fix a regression introduced in v4.17
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull SCSI fixes from James Bottomley:
"Nine fixes, five in the qla2xxx driver, the most serious of which is
the uninitialized list head crash which can be observed in most
systems under a sufficiently loaded low memory environment.
The two sg fixes are minor but obvious and two target ones which seem
reasonable but not high impact"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Return error when TMF returns
scsi: qla2xxx: Fix ISP recovery on unload
scsi: qla2xxx: Fix driver unload by shutting down chip
scsi: qla2xxx: Fix NPIV deletion by calling wait_for_sess_deletion
scsi: qla2xxx: Fix unintialized List head crash
scsi: sg: update comment for blk_get_request()
scsi: sg: fix minor memory leak in error path
scsi: libiscsi: fix possible NULL pointer dereference in case of TMF
scsi: target: iscsi: cxgbit: fix max iso npdu calculation
Pull virtio fixes from Michael Tsirkin:
"Some bugfixes that seem important and safe enough to merge at the last
minute"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_balloon: fix another race between migration and ballooning
tools/virtio: add kmalloc_array stub
tools/virtio: add dma barrier stubs
Pull ACPI fixes from Rafael Wysocki:
"These fix a recent ACPICA regression affecting control method
execution at the table level and an earlier hibernation regression in
the ACPI driver for Intel SoCs (LPSS) that was missed by a previous
fix in this cycle.
Specifics:
- Fix a recent ACPICA regression introduced by a previous fix that
caused control method execution at the table level to be mishandled
by mistake (Erik Schmauss).
- Fix a hibernation regression from the 4.15 cycle in the ACPI driver
for Intel SoCs (LPSS) that caused the platform firmware to be
confused during resume from hibernation by the driver's PM quirks
which was fixed for system-wide suspend/resume (ACPI S3) earlier in
this cycle, but that previous fix missed the hibernation (ACPI S4)
case (Rafael Wysocki)"
* tag 'acpi-urgent-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: AML Parser: ignore control method status in module-level code
ACPI / LPSS: Avoid PM quirks on suspend and resume from hibernation
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.
When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.
is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.
A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.
Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master(). As these fields are not
handled atomically, that clears the is_added bit.
Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state. This avoids the
race because is_added no longer shares a memory location with is_busmaster.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Calling pmull_gcm_encrypt_block() requires kernel_neon_begin() and
kernel_neon_end() to be used since the routine touches the NEON
register file. Add the missing calls.
Also, since NEON register contents are not preserved outside of
a kernel mode NEON region, pass the key schedule array again.
Fixes: 7c50136a8a ("crypto: arm64/aes-ghash - yield NEON after every ...")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Dynamic boosting of HWP performance on IO wake showed significant
improvement to IO workloads. This series was intended for Skylake Xeon
platforms only and feature was enabled by default based on CPU model
number.
But some Xeon platforms reused the Skylake desktop CPU model number. This
caused some undesirable side effects to some graphics workloads. Since
they are heavily IO bound, the increase in CPU performance decreased the
power available for GPU to do its computing and hence decrease in graphics
benchmark performance.
For example on a Skylake desktop, GpuTest benchmark showed average FPS
reduction from 529 to 506.
This change makes sure that HWP boost feature is only enabled for Skylake
server platforms by using ACPI FADT preferred PM Profile. If some desktop
users wants to get benefit of boost, they can still enable boost from
intel_pstate sysfs attribute "hwp_dynamic_boost".
Fixes: 41ab43c9c8 (cpufreq: intel_pstate: enable boost for Skylake Xeon)
Link: https://bugs.freedesktop.org/show_bug.cgi?id=107410
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge a fix for hibernation regression in the ACPI driver for Intel
SoCs (LPSS).
* acpi-soc:
ACPI / LPSS: Avoid PM quirks on suspend and resume from hibernation
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Update the tools copy of several files, including perf_event.h,
powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and
x86's memcpy_64.s (used in 'perf bench mem'), silencing the
respective warnings during the perf tools build.
- Fix the build on the alpine:edge distro.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Masayoshi Mizuma reported that a warning message is shown while a CPU is
hot-removed on Broadwell servers:
WARNING: CPU: 126 PID: 6 at arch/x86/events/intel/uncore.c:988
uncore_pci_remove+0x10b/0x150
Call Trace:
pci_device_remove+0x42/0xd0
device_release_driver_internal+0x148/0x220
pci_stop_bus_device+0x76/0xa0
pci_stop_root_bus+0x44/0x60
acpi_pci_root_remove+0x1f/0x80
acpi_bus_trim+0x57/0x90
acpi_bus_trim+0x2e/0x90
acpi_device_hotplug+0x2bc/0x4b0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x174/0x3a0
worker_thread+0x4c/0x3d0
kthread+0xf8/0x130
This bug was introduced by:
commit 15a3e845b0 ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs")
The index of "QPI Port 2 filter" was hardcode to 2, but this conflicts with the
index of "PCU.3" which is "HSWEP_PCI_PCU_3", which equals to 2 as well.
To fix the conflict, the hardcoded index needs to be cleaned up:
- introduce a new enumerator "BDX_PCI_QPI_PORT2_FILTER" for "QPI Port 2
filter" on Broadwell,
- increase UNCORE_EXTRA_PCI_DEV_MAX by one,
- clean up the hardcoded index.
Debugged-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: msys.mizuma@gmail.com
Cc: stable@vger.kernel.org
Fixes: 15a3e845b0 ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs")
Link: http://lkml.kernel.org/r/1532953688-15008-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull networking fixes from David Miller:
"Several smallish fixes, I don't think any of this requires another -rc
but I'll leave that up to you:
1) Don't leak uninitialzed bytes to userspace in xfrm_user, from Eric
Dumazet.
2) Route leak in xfrm_lookup_route(), from Tommi Rantala.
3) Premature poll() returns in AF_XDP, from Björn Töpel.
4) devlink leak in netdevsim, from Jakub Kicinski.
5) Don't BUG_ON in fib_compute_spec_dst, the condition can
legitimately happen. From Lorenzo Bianconi.
6) Fix some spectre v1 gadgets in generic socket code, from Jeremy
Cline.
7) Don't allow user to bind to out of range multicast groups, from
Dmitry Safonov with a follow-up by Dmitry Safonov.
8) Fix metrics leak in fib6_drop_pcpu_from(), from Sabrina Dubroca"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
netlink: Don't shift with UB on nlk->ngroups
net/ipv6: fix metrics leak
xen-netfront: wait xenbus state change when load module manually
can: ems_usb: Fix memory leak on ems_usb_disconnect()
openvswitch: meter: Fix setting meter id for new entries
netlink: Do not subscribe to non-existent groups
NET: stmmac: align DMA stuff to largest cache line length
tcp_bbr: fix bw probing to raise in-flight data for very small BDPs
net: socket: Fix potential spectre v1 gadget in sock_is_registered
net: socket: fix potential spectre v1 gadget in socketcall
net: mdio-mux: bcm-iproc: fix wrong getter and setter pair
ipv4: remove BUG_ON() from fib_compute_spec_dst
enic: handle mtu change for vf properly
net: lan78xx: fix rx handling before first packet is send
nfp: flower: fix port metadata conversion bug
bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog()
bpf: fix bpf_skb_load_bytes_relative pkt length check
perf build: Build error in libbpf missing initialization
net: ena: Fix use of uninitialized DMA address bits field
bpf: btf: Use exact btf value_size match in map_check_btf()
...
Pull sparc fixes from David Miller:
"Some small __init annotation and build fixes from Stephen Rostedt and
Thomas Petazzoni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: use asm-generic version of msi.h
sparc: move MSI related definitions to where they are used
sparc/time: Add missing __init to init_tick_ops()
Anatoly reports another squashfs fuzzing issue, where the decompression
parameters themselves are in a compressed block.
This causes squashfs_read_data() to be called in order to read the
decompression options before the decompression stream having been set
up, making squashfs go sideways.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Acked-by: Phillip Lougher <phillip.lougher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xdp_return_buff() is used when frame has been successfully
handled (transmitted) or if an error occurred during delayed
processing and there is no way to report it back to
xdp_do_redirect().
In case of __xsk_rcv_zc() error is propagated all the way
back to the driver, so there is no need to call
xdp_return_buff(). Driver will recycle the frame anyway
after seeing that error happened.
Fixes: 173d3adb6f ("xsk: add zero-copy support for Rx")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
I hit the following problem when I tried to use bpftool
to dump a percpu array.
$ sudo ./bpftool map show
61: percpu_array name stub flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
...
$ sudo ./bpftool map dump id 61
bpftool: malloc.c:2406: sysmalloc: Assertion
`(old_top == initial_top (av) && old_size == 0) || \
((unsigned long) (old_size) >= MINSIZE && \
prev_inuse (old_top) && \
((unsigned long) old_end & (pagesize - 1)) == 0)'
failed.
Aborted
Further debugging revealed that this is due to
miscommunication between bpftool and kernel.
For example, for the above percpu_array with value size of 4B.
The map info returned to user space has value size of 4B.
In bpftool, the values array for lookup is allocated like:
info->value_size * get_possible_cpus() = 4 * get_possible_cpus()
In kernel (kernel/bpf/syscall.c), the values array size is
rounded up to multiple of 8.
round_up(map->value_size, 8) * num_possible_cpus()
= 8 * num_possible_cpus()
So when kernel copies the values to user buffer, the kernel will
overwrite beyond user buffer boundary.
This patch fixed the issue by allocating and stepping through
percpu map value array properly in bpftool.
Fixes: 71bb428fe2 ("tools: bpf: add bpftool")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The variable 'context->module.name' may be null pointer when
kmalloc return null, so it's better to check it before using
to avoid null dereference.
Another one more thing this patch does is using kstrdup instead
of (kmalloc + strcpy), and signal a lost record via audit_log_lost.
Cc: stable@vger.kernel.org # 4.11
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This is necessary to be able to include <linux/msi.h> when
CONFIG_GENERIC_MSI_IRQ_DOMAIN is enabled. Without this, a build with
CONFIG_GENERIC_MSI_IRQ_DOMAIN fails with:
In file included from drivers//ata/ahci.c:45:0:
>> include/linux/msi.h:226:10: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
msi_alloc_info_t *arg);
^~~~~~~~~~~~~~~~
sg_alloc_fn
include/linux/msi.h:230:9: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
msi_alloc_info_t *arg);
^~~~~~~~~~~~~~~~
sg_alloc_fn
include/linux/msi.h:239:12: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
msi_alloc_info_t *arg);
^~~~~~~~~~~~~~~~
sg_alloc_fn
include/linux/msi.h:240:22: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
void (*msi_finish)(msi_alloc_info_t *arg, int retval);
^~~~~~~~~~~~~~~~
sg_alloc_fn
include/linux/msi.h:241:20: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
void (*set_desc)(msi_alloc_info_t *arg,
^~~~~~~~~~~~~~~~
sg_alloc_fn
include/linux/msi.h:316:18: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
int nvec, msi_alloc_info_t *args);
^~~~~~~~~~~~~~~~
sg_alloc_fn
include/linux/msi.h:318:29: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
int virq, int nvec, msi_alloc_info_t *args);
^~~~~~~~~~~~~~~~
sg_alloc_fn
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The definitions in arch/sparc/include/asm/msi.h are only used in
arch/sparc/mm/srmmu.c, so it makes sense to have them in the C file
directly.
In addition, having a custom arch/sparc/include/asm/msi.h prevents
from using the asm-generic version of this header, which is necessary
to be able to include <linux/msi.h> when CONFIG_GENERIC_MSI_IRQ_DOMAIN
is enabled.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code that was added to force gcc not to inline any function that isn't
explicitly declared as inline uncovered that init_tick_ops() isn't
marked as "__init". It is only called by __init functions and more
importantly it too calls an __init function which would require it to be
__init as well.
Link: http://lkml.kernel.org/r/201806060444.hdHcKOBy%fengguang.wu@intel.com
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in
hang during boot.
Check for 0 ngroups and use (unsigned long long) as a type to shift.
Fixes: 7acf9d4237 ("netlink: Do not subscribe to non-existent groups").
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can 2018-07-30
this is a pull request of one patch for net/master.
The patch by Anton Vasilyev and the Linux Driver Verification project
fixes a memory leak in the ems_usb driver's disconnect function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull scheduler fixes from Ingo Molnar:
"Misc fixes:
- a deadline scheduler related bug fix which triggered a kernel
warning
- an RT_RUNTIME_SHARE fix
- a stop_machine preemption fix
- a potential NULL dereference fix in sched_domain_debug_one()"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHARE
sched/deadline: Update rq_clock of later_rq when pushing a task
stop_machine: Disable preemption after queueing stopper threads
sched/topology: Check variable group before dereferencing it
Fix type warnings in arch/arc/mm/cache.c.
../arch/arc/mm/cache.c: In function 'flush_anon_page':
../arch/arc/mm/cache.c:1062:55: warning: passing argument 2 of '__flush_dcache_page' makes integer from pointer without a cast [-Wint-conversion]
__flush_dcache_page((phys_addr_t)page_address(page), page_address(page));
^~~~~~~~~~~~~~~~~~
../arch/arc/mm/cache.c:1013:59: note: expected 'long unsigned int' but argument is of type 'void *'
void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
~~~~~~~~~~~~~~^~~~~
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Elad Kanfi <eladkan@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Ofer Levi <oferle@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Fix build errors in arch/arc/'s delay.h:
- add "extern unsigned long loops_per_jiffy;"
- add <asm-generic/types.h> for "u64"
In file included from ../drivers/infiniband/hw/cxgb3/cxio_hal.c:32:
../arch/arc/include/asm/delay.h: In function '__udelay':
../arch/arc/include/asm/delay.h:61:12: error: 'u64' undeclared (first use in this function)
loops = ((u64) usecs * 4295 * HZ * loops_per_jiffy) >> 32;
^~~
In file included from ../drivers/infiniband/hw/cxgb3/cxio_hal.c:32:
../arch/arc/include/asm/delay.h: In function '__udelay':
../arch/arc/include/asm/delay.h:63:37: error: 'loops_per_jiffy' undeclared (first use in this function)
loops = ((u64) usecs * 4295 * HZ * loops_per_jiffy) >> 32;
^~~~~~~~~~~~~~~
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Elad Kanfi <eladkan@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Ofer Levi <oferle@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Fix printk format warning in arch/arc/plat-eznps/mtm.c:
In file included from ../include/linux/printk.h:7,
from ../include/linux/kernel.h:14,
from ../include/linux/list.h:9,
from ../include/linux/smp.h:12,
from ../arch/arc/plat-eznps/mtm.c:17:
../arch/arc/plat-eznps/mtm.c: In function 'set_mtm_hs_ctr':
../include/linux/kern_levels.h:5:18: warning: format '%d' expects argument of type 'int', but argument 2 has type 'long int' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^~~~~~
../include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
#define KERN_ERR KERN_SOH "3" /* error conditions */
^~~~~~~~
../include/linux/printk.h:308:9: note: in expansion of macro 'KERN_ERR'
printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~
../arch/arc/plat-eznps/mtm.c:166:3: note: in expansion of macro 'pr_err'
pr_err("** Invalid @nps_mtm_hs_ctr [%d] needs to be [%d:%d] (incl)\n",
^~~~~~
../arch/arc/plat-eznps/mtm.c:166:40: note: format string is defined here
pr_err("** Invalid @nps_mtm_hs_ctr [%d] needs to be [%d:%d] (incl)\n",
~^
%ld
The hs_ctr variable can just be int instead of long, so also change
kstrtol() to kstrtoint() and leave the format string as %d.
Also add 2 header files since they are used in mtm.c and we prefer
not to depend on accidental/indirect #includes.
Cc: linux-snps-arc@lists.infradead.org
Cc: Ofer Levi <oferle@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Add <linux/types.h> to fix build errors.
Both ctop.h and <soc/nps/common.h> use u32 types and cause many
errors.
Examples:
../include/soc/nps/common.h:71:4: error: unknown type name 'u32'
u32 __reserved:20, cluster:4, core:4, thread:4;
../include/soc/nps/common.h:76:3: error: unknown type name 'u32'
u32 value;
../include/soc/nps/common.h:124:4: error: unknown type name 'u32'
u32 base:8, cl_x:4, cl_y:4,
../include/soc/nps/common.h:127:3: error: unknown type name 'u32'
u32 value;
../arch/arc/plat-eznps/include/plat/ctop.h:83:4: error: unknown type name 'u32'
u32 gen:1, gdis:1, clk_gate_dis:1, asb:1,
../arch/arc/plat-eznps/include/plat/ctop.h:86:3: error: unknown type name 'u32'
u32 value;
../arch/arc/plat-eznps/include/plat/ctop.h:93:4: error: unknown type name 'u32'
u32 csa:22, dmsid:6, __reserved:3, cs:1;
../arch/arc/plat-eznps/include/plat/ctop.h:95:3: error: unknown type name 'u32'
u32 value;
Cc: linux-snps-arc@lists.infradead.org
Cc: Ofer Levi <oferle@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Pull locking fixes from Ingo Molnar:
"A paravirt UP-patching fix, and an I2C MUX driver lockdep warning fix"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/pvqspinlock/x86: Use LOCK_PREFIX in __pv_queued_spin_unlock() assembly code
i2c/mux, locking/core: Annotate the nested rt_mutex usage
locking/rtmutex: Allow specifying a subclass for nested locking
Pull EFI fix from Ingo Molnar:
"An UEFI variables fix for SEV guests"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Access EFI MMIO data as unencrypted when SEV is active
Check that SMP_CACHE_BYTES (and hence ARCH_DMA_MINALIGN) is larger
or equal to any cache line length by comparing it with values
previously read from ARC cache BCR registers.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Since commit d4ead6b34b ("net/ipv6: move metrics from dst to
rt6_info"), ipv6 metrics are shared and refcounted. rt6_set_from()
assigns the rt->from pointer and increases the refcount on from's
metrics. This reference is never released.
Introduce the fib6_metrics_release() helper and use it to release the
metrics.
Fixes: d4ead6b34b ("net/ipv6: move metrics from dst to rt6_info")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When loading module manually, after call xenbus_switch_state to initializes
the state of the netfront device, the driver state did not change so fast
that may lead no dev created in latest kernel. This patch adds wait to make
sure xenbus knows the driver is not in closed/unknown state.
Current state:
[vm]# ethtool eth0
Settings for eth0:
Link detected: yes
[vm]# modprobe -r xen_netfront
[vm]# modprobe xen_netfront
[vm]# ethtool eth0
Settings for eth0:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
Cannot get link status: No such device
No data available
With the patch installed.
[vm]# ethtool eth0
Settings for eth0:
Link detected: yes
[vm]# modprobe -r xen_netfront
[vm]# modprobe xen_netfront
[vm]# ethtool eth0
Settings for eth0:
Link detected: yes
Signed-off-by: Xiao Liang <xiliang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UAPI file byteorder/little_endian.h uses the __always_inline define
without including the header where it is defined, linux/stddef.h, this
ends up working in all the other distros because that file gets included
seemingly by luck from one of the files included from little_endian.h.
But not on Alpine:edge, that fails for all files where perf_event.h is
included but linux/stddef.h isn't include before that.
Adding the missing linux/stddef.h file where it breaks on Alpine:edge to
fix that, in all other distros, that is just a very small header anyway.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-9r1pifftxvuxms8l7ir73p5l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To cope with the changes in:
12c89130a5 ("x86/asm/memcpy_mcsafe: Add write-protection-fault handling")
60622d6822 ("x86/asm/memcpy_mcsafe: Return bytes remaining")
bd131544aa ("x86/asm/memcpy_mcsafe: Add labels for __memcpy_mcsafe() write fault handling")
da7bc9c57e ("x86/asm/memcpy_mcsafe: Remove loop unrolling")
This needed introducing a file with a copy of the mcsafe_handle_tail()
function, that is used in the new memcpy_64.S file, as well as a dummy
mcsafe_test.h header.
Testing it:
$ nm ~/bin/perf | grep mcsafe
0000000000484130 T mcsafe_handle_tail
0000000000484300 T __memcpy_mcsafe
$
$ perf bench mem memcpy
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 1MB bytes ...
44.389205 GB/sec
# function 'x86-64-unrolled' (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...
22.710756 GB/sec
# function 'x86-64-movsq' (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...
42.459239 GB/sec
# function 'x86-64-movsb' (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...
42.459239 GB/sec
$
This silences this perf tools build warning:
Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-igdpciheradk3gb3qqal52d0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kernel panic when with high memory pressure, calltrace looks like,
PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
#0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
#1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
#2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
#3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
#4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
#5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
#6 [ffff881ec7ed7838] __node_set at ffffffff81680300
#7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
#8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
#9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
[exception RIP: _raw_spin_lock_irqsave+47]
RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
It happens in the pagefault and results in double pagefault
during compacting pages when memory allocation fails.
Analysed the vmcore, the page leads to second pagefault is corrupted
with _mapcount=-256, but private=0.
It's caused by the race between migration and ballooning, and lock
missing in virtballoon_migratepage() of virtio_balloon driver.
This patch fix the bug.
Fixes: e22504296d ("virtio_balloon: introduce migration primitives to balloon pages")
Cc: stable@vger.kernel.org
Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Huang Chong <huang.chong@zte.com.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The VSP uses a lock to protect the BRU and BRS assignment when
configuring pipelines. The lock is taken in vsp1_du_atomic_begin() and
released in vsp1_du_atomic_flush(), as well as taken and released in
vsp1_du_setup_lif(). This guards against multiple pipelines trying to
assign the same BRU and BRS at the same time.
The DRM framework calls the .atomic_begin() operations in a loop over
all CRTCs included in an atomic commit. On a VSPDL (the only VSP type
where this matters), a single VSP instance handles two CRTCs, with a
single lock. This results in a deadlock when the .atomic_begin()
operation is called on the second CRTC.
The DRM framework serializes atomic commits that affect the same CRTCs,
but doesn't know about two CRTCs sharing the same VSPDL. Two commits
affecting the VSPDL LIF0 and LIF1 respectively can thus race each other,
hence the need for a lock.
This could be fixed on the DRM side by forcing serialization of commits
affecting CRTCs backed by the same VSPDL, but that would negatively
affect performances, as the locking is only needed when the BRU and BRS
need to be reassigned, which is an uncommon case.
The lock protects the whole .atomic_begin() to .atomic_flush() sequence.
The only operation that can occur in-between is vsp1_du_atomic_update(),
which doesn't touch the BRU and BRS, and thus doesn't need to be
protected by the lock. We can thus only take the lock around the
pipeline setup calls in vsp1_du_atomic_flush(), which fixes the
deadlock.
Fixes: f81f9adc4e ("media: v4l: vsp1: Assign BRU and BRS to pipelines dynamically")
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The repeat period is read from a static array. If a keydown event is
reported from bpf with a high protocol number, we read out of bounds. This
is unlikely to end up with a reasonable repeat period at the best of times,
in which case no timely key up event is generated.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
When building the kernel as Thumb-2 with binutils 2.29 or newer, if the
assembler has seen the .type directive (via ENDPROC()) for a symbol, it
automatically handles the setting of the lowest bit when the symbol is
used with ADR. The badr macro on the other hand handles this lowest bit
manually. This leads to a jump to a wrong address in the wrong state
in the syscall return path:
Internal error: Oops - undefined instruction: 0 [#2] SMP THUMB2
Modules linked in:
CPU: 0 PID: 652 Comm: modprobe Tainted: G D 4.18.0-rc3+ #8
PC is at ret_fast_syscall+0x4/0x62
LR is at sys_brk+0x109/0x128
pc : [<80101004>] lr : [<801c8a35>] psr: 60000013
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 50c5387d Table: 9e82006a DAC: 00000051
Process modprobe (pid: 652, stack limit = 0x(ptrval))
80101000 <ret_fast_syscall>:
80101000: b672 cpsid i
80101002: f8d9 2008 ldr.w r2, [r9, #8]
80101006: f1b2 4ffe cmp.w r2, #2130706432 ; 0x7f000000
80101184 <local_restart>:
80101184: f8d9 a000 ldr.w sl, [r9]
80101188: e92d 0030 stmdb sp!, {r4, r5}
8010118c: f01a 0ff0 tst.w sl, #240 ; 0xf0
80101190: d117 bne.n 801011c2 <__sys_trace>
80101192: 46ba mov sl, r7
80101194: f5ba 7fc8 cmp.w sl, #400 ; 0x190
80101198: bf28 it cs
8010119a: f04f 0a00 movcs.w sl, #0
8010119e: f3af 8014 nop.w {20}
801011a2: f2af 1ea2 subw lr, pc, #418 ; 0x1a2
To fix this, add a new symbol name which doesn't have ENDPROC used on it
and use that with badr. We can't remove the badr usage since that would
would cause breakage with older binutils.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
ems_usb_probe() allocates memory for dev->tx_msg_buffer, but there
is no its deallocation in ems_usb_disconnect().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The meter code would create an entry for each new meter. However, it
would not set the meter id in the new entry, so every meter would appear
to have a meter id of zero. This commit properly sets the meter id when
adding the entry.
Fixes: 96fbc13d7e ("openvswitch: Add meter infrastructure")
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ext4 fixes from Ted Ts'o:
"Some miscellaneous ext4 fixes for 4.18; one fix is for a regression
introduced in 4.18-rc4.
Sorry for the late-breaking pull. I was originally going to wait for
the next merge window, but Eric Whitney found a regression introduced
in 4.18-rc4, so I decided to push out the regression plus the other
fixes now. (The other commits have been baking in linux-next since
early July)"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix check to prevent initializing reserved inodes
ext4: check for allocation block validity with block group locked
ext4: fix inline data updates with checksums enabled
ext4: clear mmp sequence number when remounting read-only
ext4: fix false negatives *and* false positives in ext4_check_descriptors()
Anatoly Trosinenko reports that a corrupted squashfs image can cause a
kernel oops. It turns out that squashfs can end up being confused about
negative fragment lengths.
The regular squashfs_read_data() does check for negative lengths, but
squashfs_read_metadata() did not, and the fragment size code just
blindly trusted the on-disk value. Fix both the fragment parsing and
the metadata reading code.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8844618d8a: "ext4: only look at the bg_flags field if it is
valid" will complain if block group zero does not have the
EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct,
since a freshly created file system has this flag cleared. It gets
almost immediately after the file system is mounted read-write --- but
the following somewhat unlikely sequence will end up triggering a
false positive report of a corrupted file system:
mkfs.ext4 /dev/vdc
mount -o ro /dev/vdc /vdc
mount -o remount,rw /dev/vdc
Instead, when initializing the inode table for block group zero, test
to make sure that itable_unused count is not too large, since that is
the case that will result in some or all of the reserved inodes
getting cleared.
This fixes the failures reported by Eric Whiteney when running
generic/230 and generic/231 in the the nojournal test case.
Fixes: 8844618d8a ("ext4: only look at the bg_flags field if it is valid")
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As for today STMMAC_ALIGN macro (which is used to align DMA stuff)
relies on L1 line length (L1_CACHE_BYTES).
This isn't correct in case of system with several cache levels
which might have L1 cache line length smaller than L2 line. This
can lead to sharing one cache line between DMA buffer and other
data, so we can lose this data while invalidate DMA buffer before
DMA transaction.
Fix that by using SMP_CACHE_BYTES instead of L1_CACHE_BYTES for
aligning.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull turbostat utility fixes for 4.18 from Len Brown:
"Three of them are for regressions since Linux-4.17"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 18.07.27
tools/power turbostat: Read extended processor family from CPUID
tools/power turbostat: Fix logical node enumeration to allow for non-sequential physical nodes
tools/power turbostat: fix x2apic debug message output file
tools/power turbostat: fix bogus summary values
tools/power turbostat: fix -S on UP systems
tools/power turbostat: Update turbostat(8) RAPL throttling column description
Previous change in the AML parser code blindly set all non-successful
dispatcher statuses to AE_OK. That approach is incorrect, though,
because successful control method invocations from module-level
return AE_CTRL_TRANSFER. Overwriting AE_OK to this status causes the
AML parser to think that there was no return value from the control
method invocation.
Fixes: 92c0f4af386 (ACPICA: AML Parser: ignore dispatcher error status during table load)
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For some very small BDPs (with just a few packets) there was a
quantization effect where the target number of packets in flight
during the super-unity-gain (1.25x) phase of gain cycling was
implicitly truncated to a number of packets no larger than the normal
unity-gain (1.0x) phase of gain cycling. This meant that in multi-flow
scenarios some flows could get stuck with a lower bandwidth, because
they did not push enough packets inflight to discover that there was
more bandwidth available. This was really only an issue in multi-flow
LAN scenarios, where RTTs and BDPs are low enough for this to be an
issue.
This fix ensures that gain cycling can raise inflight for small BDPs
by ensuring that in PROBE_BW mode target inflight values with a
super-unity gain are always greater than inflight values with a gain
<= 1. Importantly, this applies whether the inflight value is
calculated for use as a cwnd value, or as a target inflight value for
the end of the super-unity phase in bbr_is_next_cycle_phase() (both
need to be bigger to ensure we can probe with more packets in flight
reliably).
This is a candidate fix for stable releases.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Cline says:
====================
net: socket: Fix potential spectre v1 gadgets
This fixes a pair of potential spectre v1 gadgets.
Note that because the speculation window is large, the policy is to stop
the speculative out-of-bounds load and not worry if the attack can be
completed with a dependent load or store[0].
[0] https://marc.info/?l=linux-kernel&m=152449131114778
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
'call' is a user-controlled value, so sanitize the array index after the
bounds check to avoid speculating past the bounds of the 'nargs' array.
Found with the help of Smatch:
net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue
'nargs' [r] (local cap)
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-07-28
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) API fixes for libbpf's BTF mapping of map key/value types in order
to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR()
markings, from Martin.
2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached
consumer pointer of the RX queue, from Björn.
3) Fix __xdp_return() to check for NULL pointer after the rhashtable
lookup that retrieves the allocator object, from Taehee.
4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue
by 4 bytes which got removed from overall stack usage, from Wang.
5) Fix bpf_skb_load_bytes_relative() length check to use actual
packet length, from Daniel.
6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple()
handler, from Thomas.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull random fixes from Ted Ts'o:
"In reaction to the fixes to address CVE-2018-1108, some Linux
distributions that have certain systemd versions in some cases
combined with patches to libcrypt for FIPS/FEDRAMP compliance, have
led to boot-time stalls for some hardware.
The reaction by some distros and Linux sysadmins has been to install
packages that try to do complicated things with the CPU and hope that
leads to randomness.
To mitigate this, if RDRAND is available, mix it into entropy provided
by userspace. It won't hurt, and it will probably help"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: mix rdrand with entropy sent in from userspace
mdio_mux_iproc_probe() uses platform_set_drvdata() to store md pointer
in device, whereas mdio_mux_iproc_remove() restores md pointer by
dev_get_platdata(&pdev->dev). This leads to wrong resources release.
The patch replaces getter to platform_get_drvdata.
Fixes: 98bc865a1e ("net: mdio-mux: Add MDIO mux driver for iProc SoCs")
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove BUG_ON() from fib_compute_spec_dst routine and check
in_dev pointer during flowi4 data structure initialization.
fib_compute_spec_dst routine can be run concurrently with device removal
where ip_ptr net_device pointer is set to NULL. This can happen
if userspace enables pkt info on UDP rx socket and the device
is removed while traffic is flowing
Fixes: 35ebf65e85 ("ipv4: Create and use fib_compute_spec_dst() helper")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When driver gets notification for mtu change, driver does not handle it for
all RQs. It handles only RQ[0].
Fix is to use enic_change_mtu() interface to change mtu for vf.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull GPIO fixes from Linus Walleij:
"Just a smallish OF fix and a driver fix:
- OF flag fix for special regulator flags
- fix up the Uniphier IRQ callback"
* tag 'gpio-v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: uniphier: set legitimate irq trigger type in .to_irq hook
gpio: of: Handle fixed regulator flags properly
As long the bh tasklet isn't scheduled once, no packet from the rx path
will be handled. Since the tx path also schedule the same tasklet
this situation only persits until the first packet transmission.
So fix this issue by scheduling the tasklet after link reset.
Link: https://github.com/raspberrypi/linux/issues/2617
Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Suggested-by: Floris Bos <bos@je-eigen-domein.nl>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function nfp_flower_repr_get_type_and_port expects an enum nfp_repr_type
return value but, if the repr type is unknown, returns a value of type
enum nfp_flower_cmsg_port_type. This means that if FW encodes the port
ID in a way the driver does not understand instead of dropping the frame
driver may attribute it to a physical port (uplink) provided the port
number is less than physical port count.
Fix this and ensure a net_device of NULL is returned if the repr can not
be determined.
Fixes: 1025351a88 ("nfp: add flower app")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MIPS fix from Paul Burton:
"Here's one more MIPS fix, reverting an errata workaround that was
merged for v4.18-rc2 but has since been found to cause system hangs on
some BCM4718A1-based systems by the OpenWRT project"
* tag 'mips_fixes_4.18_5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
Revert "MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum"
The len > skb_headlen(skb) cannot be used as a maximum upper bound
for the packet length since it does not have any relation to the full
linear packet length when filtering is used from upper layers (e.g.
in case of reuseport BPF programs) as by then skb->data, skb->len
already got mangled through __skb_pull() and others.
Fixes: 4e1ec56cdc ("bpf: add skb_load_bytes_relative helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
In linux-next tree compiling the perf tool with additional make flags
EXTRA_CFLAGS="-Wp,-D_FORTIFY_SOURCE=2 -O2" causes a compiler error.
It is the warning 'variable may be used uninitialized' which is treated
as error: I compile it using a FEDORA 28 installation, my gcc compiler
version: gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20). The file that
causes the error is tools/lib/bpf/libbpf.c.
[root@p23lp27] # make V=1 EXTRA_CFLAGS="-Wp,-D_FORTIFY_SOURCE=2 -O2"
[...]
Makefile.config:849: No openjdk development package found, please
install JDK package, e.g. openjdk-8-jdk, java-1.8.0-openjdk-devel
Warning: Kernel ABI header at 'tools/include/uapi/linux/if_link.h'
differs from latest version at 'include/uapi/linux/if_link.h'
CC libbpf.o
libbpf.c: In function ‘bpf_perf_event_read_simple’:
libbpf.c:2342:6: error: ‘ret’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
int ret;
^
cc1: all warnings being treated as errors
mv: cannot stat './.libbpf.o.tmp': No such file or directory
/home6/tmricht/linux-next/tools/build/Makefile.build:96: recipe for target 'libbpf.o' failed
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull i2c fixes from Wolfram Sang:
"Some driver bugfixes"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: use open drain for recovery GPIO
i2c: rcar: handle RXDMA HW behaviour on Gen3
i2c: imx: Fix reinit_completion() use
i2c: davinci: Avoid zero value of CLKH
As for today we don't setup SMP_CACHE_BYTES and cache_line_size for
ARC, so they are set to L1_CACHE_BYTES by default. L1 line length
(L1_CACHE_BYTES) might be easily smaller than L2 line (which is
usually the case BTW). This breaks code.
For example this breaks ethernet infrastructure on HSDK/AXS103 boards
with IOC disabled, involving manual cache flushes
Functions which alloc and manage sk_buff packet data area rely on
SMP_CACHE_BYTES define. In the result we can share last L2 cache
line in sk_buff linear packet data area between DMA buffer and
some useful data in other structure. So we can lose this data when
we invalidate DMA buffer.
sk_buff linear packet data area
|
|
| skb->end skb->tail
V | |
V V
----------------------------------------------.
packet data | <tail padding> | <useful data in other struct>
----------------------------------------------.
---------------------.--------------------------------------------------.
SLC line | SLC (L2 cache) line (128B) |
---------------------.--------------------------------------------------.
^ ^
| |
These cache lines will be invalidated when we invalidate skb
linear packet data area before DMA transaction starting.
This leads to issues painful to debug as it reproduces only if
(sk_buff->end - sk_buff->tail) < SLC_LINE_SIZE and
if we have some useful data right after sk_buff->end.
Fix that by hardcode SMP_CACHE_BYTES to max line length we may have.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARC backend for dma_sync_single_for_(device|cpu) was broken as it was
not honoring the @dir argument and simply forcing it based on the call:
- arc_dma_sync_single_for_device(dir) assumed DMA_TO_DEVICE (cache wback)
- arc_dma_sync_single_for_cpu(dir) assumed DMA_FROM_DEVICE (cache inv)
This is not true given the DMA API programming model and has been
discussed here [1] in some detail.
Interestingly while the deficiency has been there forever, it only started
showing up after 4.17 dma common ops rework, commit a8eb92d02d
("arc: fix arc_dma_{map,unmap}_page") which wired up these calls under the
more commonly used dma_map_page API triggering the issue.
[1]: https://lkml.org/lkml/2018/5/18/979
Fixes: commit a8eb92d02d ("arc: fix arc_dma_{map,unmap}_page")
Cc: stable@kernel.org # v4.17+
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: reworked changelog]
Pull block fixes from Jens Axboe:
"Bigger than usual at this time, mostly due to the O_DIRECT corruption
issue and the fact that I was on vacation last week. This contains:
- NVMe pull request with two fixes for the FC code, and two target
fixes (Christoph)
- a DIF bio reset iteration fix (Greg Edwards)
- two nbd reply and requeue fixes (Josef)
- SCSI timeout fixup (Keith)
- a small series that fixes an issue with bio_iov_iter_get_pages(),
which ended up causing corruption for larger sized O_DIRECT writes
that ended up racing with buffered writes (Martin Wilck)"
* tag 'for-linus-20180727' of git://git.kernel.dk/linux-block:
block: reset bi_iter.bi_done after splitting bio
block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
blkdev: __blkdev_direct_IO_simple: fix leak in error case
block: bio_iov_iter_get_pages: fix size of last iovec
nvmet: only check for filebacking on -ENOTBLK
nvmet: fixup crash on NULL device path
scsi: set timed out out mq requests to complete
blk-mq: export setting request completion state
nvme: if_ready checks to fail io to deleting controller
nvmet-fc: fix target sgl list on large transfers
nbd: handle unexpected replies better
nbd: don't requeue the same request twice.
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kvm, mm: account shadow page tables to kmemcg
zswap: re-check zswap_is_full() after do zswap_shrink()
include/linux/eventfd.h: include linux/errno.h
mm: fix vma_is_anonymous() false-positives
mm: use vma_init() to initialize VMAs on stack and data segments
mm: introduce vma_init()
mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL
ipc/sem.c: prevent queue.status tearing in semop
mm: disallow mappings that conflict for devm_memremap_pages()
kasan: only select SLUB_DEBUG with SYSFS=y
delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
Pull PCI fix from Bjorn Helgaas:
"Fix a use-after-free error in fatal error recovery (Thomas Tai)"
* tag 'pci-v4.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()
Pull arm64 fixes from Will Deacon:
"Inevitably, after saying that I hoped we would be done on the fixes
front, a couple of issues have cropped up over the last week. Next
time I'll stay schtum.
We've fixed an over-eager BUILD_BUG_ON() which Arnd ran into with
arndconfig, as well as ensuring that KPTI really is disabled on
Thunder-X1, where the cure is worse than the disease (this regressed
when we reworked the heterogeneous CPU feature checking).
Summary:
- Fix disabling of kpti on Thunder-X machines
- Fix premature BUILD_BUG_ON() found with randconfig"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups
arm64: Check for errata before evaluating cpu features
This reverts commit 2a027b47db ("MIPS: BCM47XX: Enable 74K Core
ExternalSync for PCIe erratum").
Enabling ExternalSync caused a regression for BCM4718A1 (used e.g. in
Netgear E3000 and ASUS RT-N16): it simply hangs during PCIe
initialization. It's likely that BCM4717A1 is also affected.
I didn't notice that earlier as the only BCM47XX devices with PCIe I
own are:
1) BCM4706 with 2 x 14e4:4331
2) BCM4706 with 14e4:4360 and 14e4:4331
it appears that BCM4706 is unaffected.
While BCM5300X-ES300-RDS.pdf seems to document that erratum and its
workarounds (according to quotes provided by Tokunori) it seems not even
Broadcom follows them.
According to the provided info Broadcom should define CONF7_ES in their
SDK's mipsinc.h and implement workaround in the si_mips_init(). Checking
both didn't reveal such code. It *could* mean Broadcom also had some
problems with the given workaround.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Michael Marley <michael@michaelmarley.com>
Patchwork: https://patchwork.linux-mips.org/patch/20032/
URL: https://bugs.openwrt.org/index.php?do=details&task_id=1688
Cc: Tokunori Ikegami <ikegami@allied-telesis.co.jp>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Pull drm fixes from Dave Airlie:
"Not much happening this week which is good: two imx display fixes and
one i915 quirk addition"
* tag 'drm-fixes-2018-07-27' of git://anongit.freedesktop.org/drm/drm:
drm/i915/glk: Add Quirk for GLK NUC HDMI port issues.
gpu: ipu-csi: Check for field type alternate
drm/imx: imx-ldb: check if channel is enabled before printing warning
drm/imx: imx-ldb: disable LDB on driver bind
This fixes the reported family on modern AMD processors (e.g. Ryzen,
which is family 0x17). Previously these processors all showed up as
family 0xf.
See the document
https://support.amd.com/TechDocs/56255_OSRR.pdf
section CPUID_Fn00000001_EAX for how to calculate the family
from the BaseFamily and ExtFamily values.
This matches the code in arch/x86/lib/cpu.c
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Pull input fixes from Dmitry Torokhov:
- a couple of new device IDs added to Elan i2c touchpad controller
driver
- another entry in i8042 reset quirk list
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
MAINTAINERS: Add file patterns for serio device tree bindings
Input: elan_i2c - add ACPI ID for lenovo ideapad 330
Pull tracing fixes from Steven Rostedt:
"Various fixes to the tracing infrastructure:
- Fix double free when the reg() call fails in
event_trigger_callback()
- Fix anomoly of snapshot causing tracing_on flag to change
- Add selftest to test snapshot and tracing_on affecting each other
- Fix setting of tracepoint flag on error that prevents probes from
being deleted.
- Fix another possible double free that is similar to
event_trigger_callback()
- Quiet a gcc warning of a false positive unused variable
- Fix crash of partial exposed task->comm to trace events"
* tag 'trace-v4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
kthread, tracing: Don't expose half-written comm when creating kthreads
tracing: Quiet gcc warning about maybe unused link variable
tracing: Fix possible double free in event_enable_trigger_func()
tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
selftests/ftrace: Add snapshot and tracing_on test case
ring_buffer: tracing: Inherit the tracing setting to next ring buffer
tracing: Fix double free of event_trigger_data
Pull xfs fixes from Darrick Wong:
- Fix some uninitialized variable errors
- Fix an incorrect check in metadata verifiers
* tag 'xfs-4.18-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: properly handle free inodes in extent hint validators
xfs: Initialize variables in xfs_alloc_get_rec before using them
Steffen Klassert says:
====================
pull request (net): ipsec 2018-07-27
1) Fix PMTU handling of vti6. We update the PMTU on
the xfrm dst_entry which is not cached anymore
after the flowchache removal. So update the
PMTU of the original dst_entry instead.
From Eyal Birger.
2) Fix a leak of kernel memory to userspace.
From Eric Dumazet.
3) Fix a possible dst_entry memleak in xfrm_lookup_route.
From Tommi Rantala.
4) Fix a skb leak in case we can't call nlmsg_multicast
from xfrm_nlmsg_multicast. From Florian Westphal.
5) Fix a leak of a temporary buffer in the error path of
esp6_input. From Zhen Lei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After the bio has been updated to represent the remaining sectors, reset
bi_done so bio_rewind_iter() does not rewind further than it should.
This resolves a bio_integrity_process() failure on reads where the
original request was split.
Fixes: 63573e359d ("bio-integrity: Restore original iterator on verify stage")
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit a09c591306 (ACPI / LPSS: Avoid PM quirks on suspend and
resume from S3) modified the ACPI driver for Intel SoCs (LPSS) to
avoid applying PM quirks on suspend and resume from S3 to address
system-wide suspend and resume problems on some systems, but it is
reported that the same issue also affects hibernation, so extend
the approach used by that commit to cover hibernation as well.
Fixes: a09c591306 (ACPI / LPSS: Avoid PM quirks on suspend and resume from S3)
Link: https://bugs.launchpad.net/bugs/1774950
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
UBSAN triggers the following undefined behaviour warnings:
[...]
[ 13.236124] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:468:22
[ 13.240043] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
[...]
[ 13.744769] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:373:4
[ 13.748694] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
[...]
When splitting the address to high and low, GENMASK_ULL is used to generate
a bitmask with dma_addr_bits field from io_sq (in ena_com_prepare_tx and
ena_com_add_single_rx_desc).
The problem is that dma_addr_bits is not initialized with a proper value
(besides being cleared in ena_com_create_io_queue).
Assign dma_addr_bits the correct value that is stored in ena_dev when
initializing the SQ.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Gal Pressman <pressmangal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
/sys/../zswap/stored_pages keeps rising in a zswap test with
"zswap.max_pool_percent=0" parameter. But it should not compress or
store pages any more since there is no space in the compressed pool.
Reproduce steps:
1. Boot kernel with "zswap.enabled=1"
2. Set the max_pool_percent to 0
# echo 0 > /sys/module/zswap/parameters/max_pool_percent
3. Do memory stress test to see if some pages have been compressed
# stress --vm 1 --vm-bytes $mem_available"M" --timeout 60s
4. Watching the 'stored_pages' number increasing or not
The root cause is:
When zswap_max_pool_percent is set to 0 via kernel parameter,
zswap_is_full() will always return true due to zswap_shrink(). But if
the shinking is able to reclain a page successfully the code then
proceeds to compressing/storing another page, so the value of
stored_pages will keep changing.
To solve the issue, this patch adds a zswap_is_full() check again after
zswap_shrink() to make sure it's now under the max_pool_percent, and to
not compress/store if we reached the limit.
Link: http://lkml.kernel.org/r/20180530103936.17812-1-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new gasket staging driver ran into a randconfig build failure when
CONFIG_EVENTFD is disabled:
In file included from drivers/staging/gasket/gasket_interrupt.h:11,
from drivers/staging/gasket/gasket_interrupt.c:4:
include/linux/eventfd.h: In function 'eventfd_ctx_fdget':
include/linux/eventfd.h:51:9: error: implicit declaration of function 'ERR_PTR' [-Werror=implicit-function-declaration]
I can't see anything wrong with including eventfd.h before err.h, so the
easiest fix is to make it possible to do this by including the file
where it is needed.
Link: http://lkml.kernel.org/r/20180724110737.3985088-1-arnd@arndb.de
Fixes: 9a69f5087c ("drivers/staging: Gasket driver framework + Apex driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When pmem namespaces created are smaller than section size, this can
cause an issue during removal and gpf was observed:
general protection fault: 0000 1 SMP PTI
CPU: 36 PID: 3941 Comm: ndctl Tainted: G W 4.14.28-1.el7uek.x86_64 #2
task: ffff88acda150000 task.stack: ffffc900233a4000
RIP: 0010:__put_page+0x56/0x79
Call Trace:
devm_memremap_pages_release+0x155/0x23a
release_nodes+0x21e/0x260
devres_release_all+0x3c/0x48
device_release_driver_internal+0x15c/0x207
device_release_driver+0x12/0x14
unbind_store+0xba/0xd8
drv_attr_store+0x27/0x31
sysfs_kf_write+0x3f/0x46
kernfs_fop_write+0x10f/0x18b
__vfs_write+0x3a/0x16d
vfs_write+0xb2/0x1a1
SyS_write+0x55/0xb9
do_syscall_64+0x79/0x1ae
entry_SYSCALL_64_after_hwframe+0x3d/0x0
Add code to check whether we have a mapping already in the same section
and prevent additional mappings from being created if that is the case.
Link: http://lkml.kernel.org/r/152909478401.50143.312364396244072931.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Building with KASAN and SLUB but without sysfs now results in a
build-time error:
WARNING: unmet direct dependencies detected for SLUB_DEBUG
Depends on [n]: SLUB [=y] && SYSFS [=n]
Selected by [y]:
- KASAN [=y] && HAVE_ARCH_KASAN [=y] && (SLUB [=y] || SLAB [=n] && !DEBUG_SLAB [=n]) && SLUB [=y]
mm/slub.c:4565:12: error: 'list_locations' defined but not used [-Werror=unused-function]
static int list_locations(struct kmem_cache *s, char *buf,
^~~~~~~~~~~~~~
mm/slub.c:4406:13: error: 'validate_slab_cache' defined but not used [-Werror=unused-function]
static long validate_slab_cache(struct kmem_cache *s)
This disallows that broken configuration in Kconfig.
Link: http://lkml.kernel.org/r/20180709154019.1693026-1-arnd@arndb.de
Fixes: dd275caf4a ("kasan: depend on CONFIG_SLUB_DEBUG")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.
Commit c96f5471ce ("delayacct: Account blkio completion on the correct
task"), while updating delayacct_blkio_end() to take the target task
instead of always using %current, made the function test NULL on
%current->delays and then continue to operated on @p->delays. If
%current succeeded init while @p didn't, it leads to the following
crash.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: __delayacct_blkio_end+0xc/0x40
PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
RIP: 0010:__delayacct_blkio_end+0xc/0x40
Call Trace:
try_to_wake_up+0x2c0/0x600
autoremove_wake_function+0xe/0x30
__wake_up_common+0x74/0x120
wake_up_page_bit+0x9c/0xe0
mpage_end_io+0x27/0x70
blk_update_request+0x78/0x2c0
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x20b/0x5f0
blk_mq_complete_request+0xa2/0x100
ata_scsi_qc_complete+0x79/0x400
ata_qc_complete_multiple+0x86/0xd0
ahci_handle_port_interrupt+0xc9/0x5c0
ahci_handle_port_intr+0x54/0xb0
ahci_single_level_irq_intr+0x3b/0x60
__handle_irq_event_percpu+0x43/0x190
handle_irq_event_percpu+0x20/0x50
handle_irq_event+0x2a/0x50
handle_edge_irq+0x80/0x1c0
handle_irq+0xaf/0x120
do_IRQ+0x41/0xc0
common_interrupt+0xf/0xf
Fix it by updating delayacct_blkio_end() check @p->delays instead.
Link: http://lkml.kernel.org/r/20180724175542.GP1934745@devbig577.frc2.facebook.com
Fixes: c96f5471ce ("delayacct: Account blkio completion on the correct task")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <dsj@fb.com>
Debugged-by: Dave Jones <dsj@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Josh Snyder <joshs@netflix.com>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drm/imx: imx-drm ldb and ipu-v3 csi fixes
- Disable the LVDS Display Bridge (LDB) on driver bind. This is
necessary to guarantee correct LVDS signals in case the bootloader
left the LVDS output active.
- Remove false positive warning about disabled second LVDS channel in
dual-channel mode. In this mode, the second LVDS channel can not be
used separately. If the second channel is correctly described as
disabled in the device tree, the driver warned about this anyway.
- Fix the CSI confiuration to not only enable interlaced capture mode
for V4L2_FIELD_SEQ_BT and V4L2_FIELD_SEQ_TB, but also for the
V4L2_FIELD_ALTERNATE interlacing mode. Before, it incorrectly tried
to capture progressive frames in that case.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1532100423.3438.8.camel@pengutronix.de
The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects
'> map->value_size' to ensure map_seq_show_elem() will not
access things beyond an array element.
Yonghong suggested that using '!=' is a more correct
check. The 8 bytes round_up on value_size is stored
in array->elem_size. Hence, using '!=' on map->value_size
is a proper check.
This patch also adds new tests to check the btf array
key type and value type. Two of these new tests verify
the btf's value_size (the change in this patch).
It also fixes two existing tests that wrongly encoded
a btf's type size (pprint_test) and the value_type_id (in one
of the raw_tests[]). However, that do not affect these two
BTF verification tests before or after this test changes.
These two tests mainly failed at array creation time after
this patch.
Fixes: a26ca7c982 ("bpf: btf: Add pretty print support to the basic arraymap")
Suggested-by: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
rhashtable_lookup() can return NULL. so that NULL pointer
check routine should be added.
Fixes: 02b55e5657 ("xdp: add MEM_TYPE_ZERO_COPY")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fix dev_change_tx_queue_len so it rolls back original value
upon a failure in dev_qdisc_change_tx_queue_len.
This is already done for notifirers' failures, share the code.
In case of failure in dev_qdisc_change_tx_queue_len, some tx queues
would still be of the new length, while they should be reverted.
Currently, the revert is not done, and is marked with a TODO label
in dev_qdisc_change_tx_queue_len, and should find some nice solution
to do it.
Yet it is still better to not apply the newly requested value.
Fixes: 48bfd55e7e ("net_sched: plug in qdisc ops change_tx_queue_len")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
turbostat fails on some multi-package topologies because the logical node
enumeration assumes that the nodes are sequentially numbered,
which causes the logical numa nodes to not be enumerated, or enumerated incorrectly.
Use a more robust enumeration algorithm which allows for non-seqential physical nodes.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch fixes a regression introduced in
commit 8cb48b32a5 ("tools/power turbostat: track thread ID in cpu_topology")
Turbostat uses incorrect cores number ('topo.num_cores') - its value is count
of logical CPUs, instead of count of physical cores. So it is twice as large as
it should be on a typical Intel system. For example, on a 6 core Xeon system
'topo.num_cores' is 12, and on a 52 core Xeon system 'topo.num_cores' is 104.
And interestingly, on a 68-core Knights Landing Intel system 'topo.num_cores'
is 272, because this system has 4 logical CPUs per core.
As a result, some of the turbostat calculations are incorrect. For example,
on idle 52-core Xeon system when all cores are ~99% in Core C6 (CPU%c6), the
summary (very first) line shows ~48% Core C6, while it should be ~99%.
This patch fixes the problem by fixing 'topo.num_cores' calculation.
Was:
1. Init 'thread_id' for all CPUs to -1
2. Run 'get_thread_siblings()' which sets it to 0 or 1
3. Increment 'topo.num_cores' when thread_id != -1 (bug!)
Now:
1. Init 'thread_id' for all CPUs to -1
2. Run 'get_thread_siblings()' which sets it to 0 or 1
3. Increment 'topo.num_cores' when thread_id is not 0
I did not have a chance to test this on an AMD machine, and only tested on a
couple of Intel Xeons (6 and 52 cores).
Reported-by: Vladislav Govtva <vladislav.govtva@intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
bio_iov_iter_get_pages() currently only adds pages for the next non-zero
segment from the iov_iter to the bio. That's suboptimal for callers,
which typically try to pin as many pages as fit into the bio. This patch
converts the current bio_iov_iter_get_pages() into a static helper, and
introduces a new helper that allocates as many pages as
1) fit into the bio,
2) are present in the iov_iter,
3) and can be pinned by MM.
Error is returned only if zero pages could be pinned. Because of 3), a
zero return value doesn't necessarily mean all pages have been pinned.
Callers that have to pin every page in the iov_iter must still call this
function in a loop (this is currently the case).
This change matters most for __blkdev_direct_IO_simple(), which calls
bio_iov_iter_get_pages() only once. If it obtains less pages than
requested, it returns a "short write" or "short read", and
__generic_file_write_iter() falls back to buffered writes, which may
lead to data corruption.
Fixes: 72ecad22d9 ("block: support a full bio worth of IO for simplified bdev direct-io")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the last page of the bio is not "full", the length of the last
vector slot needs to be corrected. This slot has the index
(bio->bi_vcnt - 1), but only in bio->bi_io_vec. In the "bv" helper
array, which is shifted by the value of bio->bi_vcnt at function
invocation, the correct index is (nr_pages - 1).
v2: improved readability following suggestions from Ming Lei.
v3: followed a formatting suggestion from Christoph Hellwig.
Fixes: 2cefe4dbaa ("block: add bio_iov_iter_get_pages()")
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fixes from Christoph:
"Two small fixes each for the FC code and the target."
* 'nvme-4.18' of git://git.infradead.org/nvme:
nvmet: only check for filebacking on -ENOTBLK
nvmet: fixup crash on NULL device path
nvme: if_ready checks to fail io to deleting controller
nvmet-fc: fix target sgl list on large transfers
When an fatal error is received by a non-bridge device, the device is
removed, and pci_stop_and_remove_bus_device() deallocates the device
structure. The freed device structure is used by subsequent code to send
uevents and print messages.
Hold a reference on the device until we're finished using it. This is not
an ideal fix because pcie_do_fatal_recovery() should not use the device at
all after removing it, but that's too big a project for right now.
Fixes: 7e9084b367 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices")
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
[bhelgaas: changelog, reduce get/put coverage]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If we enable or disable xgbe flow-control by ethtool ,
it does't work.Because the parameter is not properly
assigned,so we need to adjust the assignment order
of the parameters.
Fixes: c1ce2f7736 ("amd-xgbe: Fix flow control setting logic")
Signed-off-by: tangpengpeng <tangpengpeng@higon.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull USB fixes from Greg KH:
"Here are a number of USB fixes and new device ids for 4.18-rc7.
The largest number are a bunch of gadget driver fixes that got delayed
in being submitted earlier due to vacation schedules, but nothing
really huge is present in them. There are some new device ids and some
PHY driver fixes that were connected to some USB ones. Full details
are in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'usb-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
usb: core: handle hub C_PORT_OVER_CURRENT condition
usb: xhci: Fix memory leak in xhci_endpoint_reset()
usb: typec: tcpm: Fix sink PDO starting index for PPS APDO selection
usb: gadget: f_fs: Only return delayed status when len is 0
usb: gadget: f_uac2: fix endianness of 'struct cntrl_*_lay3'
usb: dwc2: Fix inefficient copy of unaligned buffers
usb: dwc2: Fix DMA alignment to start at allocated boundary
usb: dwc3: rockchip: Fix PHY documentation links.
tools: usb: ffs-test: Fix build on big endian systems
usb: gadget: aspeed: Workaround memory ordering issue
usb: dwc3: gadget: remove redundant variable maxpacket
usb: dwc2: avoid NULL dereferences
usb/phy: fix PPC64 build errors in phy-fsl-usb.c
usb: dwc2: host: do not delay retries for CONTROL IN transfers
usb: gadget: u_audio: protect stream runtime fields with stream spinlock
usb: gadget: u_audio: remove cached period bytes value
usb: gadget: u_audio: remove caching of stream buffer parameters
usb: gadget: u_audio: update hw_ptr in iso_complete after data copied
usb: gadget: u_audio: fix pcm/card naming in g_audio_setup()
usb: gadget: f_uac2: fix error handling in afunc_bind (again)
...
Pull staging driver fixes from Greg KH:
"Here are three small staging driver fixes for 4.18-rc7.
One is a revert of an earlier patch that turned out to be incorrect,
one is a fix for the speakup drivers, and the last a fix for the
ks7010 driver to resolve a regression.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: speakup: fix wraparound in uaccess length check
staging: ks7010: call 'hostif_mib_set_request_int' instead of 'hostif_mib_set_request_bool'
Revert "staging:r8188eu: Use lib80211 to support TKIP"
Pull driver core fix from Greg KH:
"This is a single driver core fix for 4.18-rc7. It partially reverts a
previous commit to resolve some reported issues.
It has been in linux-next for a while now with no reported issues"
* tag 'driver-core-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Partially revert "driver core: correct device's shutdown order"
Pull ACPI fix from Rafael Wysocki:
"Fix a recent ACPICA regression causing the AML parser to get confused
and fail in some situations involving incorrect AML in an ACPI table
(Erik Schmauss)"
* tag 'acpi-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: AML Parser: ignore dispatcher error status during table load
Pull power management fix from Rafael Wysocki:
"Fix up the recently introduced cpufreq driver for Qualcomm Kryo
processors by adding a terminating NULL entry to its table of device
IDs (YueHaibing)"
* tag 'pm-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: qcom-kryo: add NULL entry to the end of_device_id array
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.
creator other
vsnprintf:
fill (not terminated)
count the rest trace_sched_waking(p):
... memcpy(comm, p->comm, TASK_COMM_LEN)
write \0
The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12"
...and a strcpy out of there would cause stack corruption:
[224761.522292] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffff9bf9783c78
crash-arm64> kbt | grep 'comm\|trace_print_context'
#6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
comm (char [16]) = "irq/497-pwr_even"
crash-arm64> rd 0xffffffd4d0e17d14 8
ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_
ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16:
ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`..
ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`..........
The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.
Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.
Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com
Cc: stable@vger.kernel.org
Fixes: bc0c38d139 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently we are leaking bpf programs when they are detached from the
lirc device; the refcount never reaches zero.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Break statements were missing for Geneve case in
ndo_udp_tunnel_{add/del}, thereby raw mac matchall
entries were not getting added.
Fixes: c746fc0e8b2d("cxgb4: add geneve offload support for T6")
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on
enable_trace_kprobe() failure") added an if statement that depends on another
if statement that gcc doesn't see will initialize the "link" variable and
gives the warning:
"warning: 'link' may be used uninitialized in this function"
It is really a false positive, but to quiet the warning, and also to make
sure that it never actually is used uninitialized, initialize the "link"
variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
thinks it could be used uninitialized.
Cc: stable@vger.kernel.org
Fixes: 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
There was a case that triggered a double free in event_trigger_callback()
due to the called reg() function freeing the trigger_data and then it
getting freed again by the error return by the caller. The solution there
was to up the trigger_data ref count.
Code inspection found that event_enable_trigger_func() has the same issue,
but is not as easy to trigger (requires harder to trigger failures). It
needs to be solved slightly different as it needs more to clean up when the
reg() function fails.
Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 7862ad1846 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
Reivewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Polling for the ingress queues relies on reading the producer/consumer
pointers of the Rx queue.
Prior this commit, a cached consumer pointer could be used, instead of
the actual consumer pointer and therefore report POLLIN prematurely.
This patch makes sure that the non-cached consumer pointer is used
instead.
Reported-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Qi Zhang <qi.z.zhang@intel.com>
Fixes: c497176cb2 ("xsk: add Rx receive functions and poll support")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit 24dea04767 ("bpf, x32: remove ld_abs/ld_ind")
removed the 4 /* Extra space for skb_copy_bits buffer */
from _STACK_SIZE, but it didn't fix the concerned code
in emit_prologue and emit_epilogue, and this error will
bring very strange kernel runtime errors. This patch
fixes it.
Fixes: 24dea04767 ("bpf, x32: remove ld_abs/ld_ind")
Reported-by: Meelis Roos <mroos@linux.ee>
Bisected-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes the following sparse warnings:
net/ipv4/igmp.c:1391:6: warning:
symbol '__ip_mc_inc_group' was not declared. Should it be static?
Fixes: 6e2059b53f ("ipv4/igmp: init group mode as INCLUDE when join source group")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On GLK NUC platforms the HDMI retiming buffer needs additional disabled
time to correctly sync to a faster incoming signal.
When measured on a scope the highspeed lines of the HDMI clock turn off
for ~400uS during a normal resolution change. The HDMI retimer on the
GLK NUC appears to require at least a full frame of quiet time before a
new faster clock can be correctly sync'd. Wait 100ms due to msleep
inaccuracies while waiting for a completed frame. Add a quirk to the
driver for GLK boards that use ITE66317 HDMI retimers.
V2: Add more devices to the quirk list
V3: Delay increased to 100ms, check to confirm crtc type is HDMI.
V4: crtc type check extended to include _DDI and whitespace fixes
v5: Fix white spaces, remove the macro for delay. Revert the crtc type
check introduced in v4.
Cc: Imre Deak <imre.deak@intel.com>
Cc: <stable@vger.kernel.org> # v4.14+
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105887
Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com>
Tested-by: Daniel Scheller <d.scheller.oss@gmail.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180710200205.1478-1-radhakrishna.sripada@intel.com
(cherry picked from commit 90c3e21987)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Otherwise interfaces get exposed under /sys/devices/virtual, which
doesn't give udev the context it needs for PCI-based predictable
interface names.
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When received packets are dropped in virtio_net driver, received packets
counter is incremented but bytes counter is not.
As a result, for instance if we drop all packets by XDP, only received
is counted and bytes stays 0, which looks inconsistent.
IMHO received packets/bytes should be counted if packets are produced by
the hypervisor, like what common NICs on physical machines are doing.
So fix the bytes counter.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
drm_atomic_helper_async_check() declares the plane, old_plane_state and
new_plane_state variables to iterate over all planes of the atomic
state and make sure only one plane is enabled.
Unfortunately gcc is not smart enough to figure out that the check on
n_planes is enough to guarantee that plane, new_plane_state and
old_plane_state are initialized.
Explicitly initialize those variables to NULL to make gcc happy.
Fixes: fef9df8b59 ("drm/atomic: initial support for asynchronous plane update")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180724133300.32023-1-boris.brezillon@bootlin.com
Pull clk fixes from Stephen Boyd:
"One more round of updates for problems seen this -rc series. Drivers
fixes are:
- Amlogic Meson audio divider fix and CPU clk critical marking
- Qualcomm multimedia GDSC marked as 'always on' to keep display
working
- Aspeed fixes for critical clks, resets causing clks to stay
disabled, and an incorrect HPLL frequency calculation
- Marvell Armada 3700 cpu clks would undervolt when switching from
low frequencies to high frequencies because the voltage didn't
stabilize in time so now we switch to an intermediate frequency
Plus we have a core framework thinko that messed up the debugfs flag
printing logic to make it not very useful"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: aspeed: Support HPLL strapping on ast2400
clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz
clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical
clk/mmcc-msm8996: Make mmagic_bimc_gdsc ALWAYS_ON
clk: aspeed: Treat a gate in reset as disabled
clk: Really show symbolic clock flags in debugfs
clk: qcom: gcc-msm8996: Disable halt check on UFS tx clock
clk: meson: audio-divider is one based
clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL
Pull fscache/cachefiles fixes from David Howells:
- Allow cancelled operations to be queued so they can be cleaned up.
- Fix a refcounting bug in the monitoring of reads on backend files
whereby a race can occur between monitor objects being listed for
work, the work processing being queued and the work processor running
and destroying the monitor objects.
- Fix a ref overput in object attachment, whereby a tentatively
considered object is put in error handling without first being 'got'.
- Fix a missing clear of the CACHEFILES_OBJECT_ACTIVE flag whereby an
assertion occurs when we retry because it seems the object is now
active.
- Wait rather BUG'ing on an object collision in the depths of
cachefiles as the active object should be being cleaned up - also
depends on the one above.
* tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
cachefiles: Wait rather than BUG'ing on "Unexpected object collision"
cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
fscache: Fix reference overput in fscache_attach_object() error handling
cachefiles: Fix refcounting bug in backing-file read monitoring
fscache: Allow cancelled operations to be enqueued
Maintain the tracing on/off setting of the ring_buffer when switching
to the trace buffer snapshot.
Taking a snapshot is done by swapping the backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is defined
by the ring buffer, when swapping it, the tracing on/off setting
can also be changed. This causes a strange result like below:
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat tracing_on
0
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
0
We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This is an anomaly, because user doesn't know
that each "ring_buffer" stores its own tracing-enable state and
the snapshot is done by swapping ring buffers.
Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
Cc: stable@vger.kernel.org
Fixes: debdd57f51 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Updated commit log and comment in the code ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Running the following:
# cd /sys/kernel/debug/tracing
# echo 500000 > buffer_size_kb
[ Or some other number that takes up most of memory ]
# echo snapshot > events/sched/sched_switch/trigger
Triggers the following bug:
------------[ cut here ]------------
kernel BUG at mm/slub.c:296!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
RIP: 0010:kfree+0x16c/0x180
Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
FS: 00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
Call Trace:
event_trigger_callback+0xee/0x1d0
event_trigger_write+0xfc/0x1a0
__vfs_write+0x33/0x190
? handle_mm_fault+0x115/0x230
? _cond_resched+0x16/0x40
vfs_write+0xb0/0x190
ksys_write+0x52/0xc0
do_syscall_64+0x5a/0x160
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f363e16ab50
Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
---[ end trace d301afa879ddfa25 ]---
The cause is because the register_snapshot_trigger() call failed to
allocate the snapshot buffer, and then called unregister_trigger()
which freed the data that was passed to it. Then on return to the
function that called register_snapshot_trigger(), as it sees it
failed to register, it frees the trigger_data again and causes
a double free.
By calling event_trigger_init() on the trigger_data (which only ups
the reference counter for it), and then event_trigger_free() afterward,
the trigger_data would not get freed by the registering trigger function
as it would only up and lower the ref count for it. If the register
trigger function fails, then the event_trigger_free() called after it
will free the trigger data normally.
Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home
Cc: stable@vger.kerne.org
Fixes: 93e31ffbf4 ("tracing: Add 'snapshot' event trigger command")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in
the active object tree, we have been emitting a BUG after logging
information about it and the new object.
Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared
on the old object (or return an error). The ACTIVE flag should be cleared
after it has been removed from the active object tree. A timeout of 60s is
used in the wait, so we shouldn't be able to get stuck there.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
In cachefiles_mark_object_active(), the new object is marked active and
then we try to add it to the active object tree. If a conflicting object
is already present, we want to wait for that to go away. After the wait,
we go round again and try to re-mark the object as being active - but it's
already marked active from the first time we went through and a BUG is
issued.
Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again.
Analysis from Kiran Kumar Modukuri:
[Impact]
Oops during heavy NFS + FSCache + Cachefiles
CacheFiles: Error: Overlong wait for old active object to go away.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
CacheFiles: Error: Object already active kernel BUG at
fs/cachefiles/namei.c:163!
[Cause]
In a heavily loaded system with big files being read and truncated, an
fscache object for a cookie is being dropped and a new object being
looked. The new object being looked for has to wait for the old object
to go away before the new object is moved to active state.
[Fix]
Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when
retrying the object lookup.
[Testcase]
Have run ~100 hours of NFS stress tests and have not seen this bug recur.
[Regression Potential]
- Limited to fscache/cachefiles.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
When a cookie is allocated that causes fscache_object structs to be
allocated, those objects are initialised with the cookie pointer, but
aren't blessed with a ref on that cookie unless the attachment is
successfully completed in fscache_attach_object().
If attachment fails because the parent object was dying or there was a
collision, fscache_attach_object() returns without incrementing the cookie
counter - but upon failure of this function, the object is released which
then puts the cookie, whether or not a ref was taken on the cookie.
Fix this by taking a ref on the cookie when it is assigned in
fscache_object_init(), even when we're creating a root object.
Analysis from Kiran Kumar:
This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
fscache cookie ref count updated incorrectly during fscache object
allocation resulting in following Oops.
kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
[Cause]
Two threads are trying to do operate on a cookie and two objects.
(1) One thread tries to unmount the filesystem and in process goes over a
huge list of objects marking them dead and deleting the objects.
cookie->usage is also decremented in following path:
nfs_fscache_release_super_cookie
-> __fscache_relinquish_cookie
->__fscache_cookie_put
->BUG_ON(atomic_read(&cookie->usage) <= 0);
(2) A second thread tries to lookup an object for reading data in following
path:
fscache_alloc_object
1) cachefiles_alloc_object
-> fscache_object_init
-> assign cookie, but usage not bumped.
2) fscache_attach_object -> fails in cant_attach_object because the
cookie's backing object or cookie's->parent object are going away
3) fscache_put_object
-> cachefiles_put_object
->fscache_object_destroy
->fscache_cookie_put
->BUG_ON(atomic_read(&cookie->usage) <= 0);
[NOTE from dhowells] It's unclear as to the circumstances in which (2) can
take place, given that thread (1) is in nfs_kill_super(), however a
conflicting NFS mount with slightly different parameters that creates a
different superblock would do it. A backtrace from Kiran seems to show
that this is a possibility:
kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
...
RIP: __fscache_cookie_put+0x3a/0x40 [fscache]
Call Trace:
__fscache_relinquish_cookie+0x87/0x120 [fscache]
nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs]
nfs_kill_super+0x29/0x40 [nfs]
deactivate_locked_super+0x48/0x80
deactivate_super+0x5c/0x60
cleanup_mnt+0x3f/0x90
__cleanup_mnt+0x12/0x20
task_work_run+0x86/0xb0
exit_to_usermode_loop+0xc2/0xd0
syscall_return_slowpath+0x4e/0x60
int_ret_from_sys_call+0x25/0x9f
[Fix] Bump up the cookie usage in fscache_object_init, when it is first
being assigned a cookie atomically such that the cookie is added and bumped
up if its refcount is not zero. Remove the assignment in
fscache_attach_object().
[Testcase]
I have run ~100 hours of NFS stress tests and not seen this bug recur.
[Regression Potential]
- Limited to fscache/cachefiles.
Fixes: ccc4fc3d11 ("FS-Cache: Implement the cookie management part of the netfs API")
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cachefiles_read_waiter() has the right to access a 'monitor' object by
virtue of being called under the waitqueue lock for one of the pages in its
purview. However, it has no ref on that monitor object or on the
associated operation.
What it is allowed to do is to move the monitor object to the operation's
to_do list, but once it drops the work_lock, it's actually no longer
permitted to access that object. However, it is trying to enqueue the
retrieval operation for processing - but it can only do this via a pointer
in the monitor object, something it shouldn't be doing.
If it doesn't enqueue the operation, the operation may not get processed.
If the order is flipped so that the enqueue is first, then it's possible
for the work processor to look at the to_do list before the monitor is
enqueued upon it.
Fix this by getting a ref on the operation so that we can trust that it
will still be there once we've added the monitor to the to_do list and
dropped the work_lock. The op can then be enqueued after the lock is
dropped.
The bug can manifest in one of a couple of ways. The first manifestation
looks like:
FS-Cache:
FS-Cache: Assertion failed
FS-Cache: 6 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:494!
RIP: 0010:fscache_put_operation+0x1e3/0x1f0
...
fscache_op_work_func+0x26/0x50
process_one_work+0x131/0x290
worker_thread+0x45/0x360
kthread+0xf8/0x130
? create_worker+0x190/0x190
? kthread_cancel_work_sync+0x10/0x10
ret_from_fork+0x1f/0x30
This is due to the operation being in the DEAD state (6) rather than
INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
fscache_put_operation().
The bug can also manifest like the following:
kernel BUG at fs/fscache/operation.c:69!
...
[exception RIP: fscache_enqueue_operation+246]
...
#7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
#8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
#9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
entirely clear which assertion failed.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Lei Xue <carmark.dlut@gmail.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Reported-by: Anthony DeRobertis <aderobertis@metrics.net>
Reported-by: NeilBrown <neilb@suse.com>
Reported-by: Daniel Axtens <dja@axtens.net>
Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Alter the state-check assertion in fscache_enqueue_operation() to allow
cancelled operations to be given processing time so they can be cleaned up.
Also fix a debugging statement that was requiring such operations to have
an object assigned.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Arnd reports the following arm64 randconfig build error with the PSI
patches that add another page flag:
/git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init':
/git/arm-soc/include/linux/compiler.h:357:38: error: call to
'__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON
failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT)
The additional page flag causes other information stored in
page->flags to get bumped into their own struct page member:
#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <=
BITS_PER_LONG - NR_PAGEFLAGS
#define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
#else
#define LAST_CPUPID_WIDTH 0
#endif
#if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0
#define LAST_CPUPID_NOT_IN_PAGE_FLAGS
#endif
which in turn causes the struct page size to exceed the size set in
STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the
VMEMMAP page array according to address space and struct page size.
However, the check is performed - and triggers here - on a !VMEMMAP
config, which consumes an additional 22 page bits for the sparse
section id. When VMEMMAP is enabled, those bits are returned, cpupid
doesn't need its own member, and the page passes the VMEMMAP check.
Restrict that check to the situation it was meant to check: that we
are sizing the VMEMMAP page array correctly.
Says Arnd:
Further experiments show that the build error already existed before,
but was only triggered with larger values of CONFIG_NR_CPU and/or
CONFIG_NODES_SHIFT that might be used in actual configurations but
not in randconfig builds.
With longer CPU and node masks, I could recreate the problem with
kernels as old as linux-4.7 when arm64 NUMA support got added.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Fixes: 1a2db30034 ("arm64, numa: Add NUMA support for arm64 platforms.")
Fixes: 3e1907d5bf ("arm64: mm: move vmemmap region right below the linear region")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit d3aec8a28b ("arm64: capabilities: Restrict KPTI
detection to boot-time CPUs") we rely on errata flags being already
populated during feature enumeration. The order of errata and
features was flipped as part of commit ed478b3f9e ("arm64:
capabilities: Group handling of features and errata workarounds").
Return to the orginal order of errata and feature evaluation to
ensure errata flags are present during feature evaluation.
Fixes: ed478b3f9e ("arm64: capabilities: Group handling of
features and errata workarounds")
CC: Suzuki K Poulose <suzuki.poulose@arm.com>
CC: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We only need to check for a file-backed namespace if
nvmet_bdev_ns_enable() returns -ENOTBLK. For any other error
it's pointless as the open() error will remain the same.
Fixes: d5eff33e ("nvmet: add simple file backed ns support")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When writing an empty string into the device_path attribute the kernel
will crash with
nvmet: failed to open block device (null): (-22)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
This patch sanitizes the error handling for invalid device path settings.
Fixes: a07b4970 ("nvmet: add a generic NVMe target")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Dirk Gouders reported that two consecutive "make" invocations on an
already compiled tree will show alternating behaviors:
$ make
CALL scripts/checksyscalls.sh
DESCEND objtool
CHK include/generated/compile.h
DATAREL arch/x86/boot/compressed/vmlinux
Kernel: arch/x86/boot/bzImage is ready (#48)
Building modules, stage 2.
MODPOST 165 modules
$ make
CALL scripts/checksyscalls.sh
DESCEND objtool
CHK include/generated/compile.h
LD arch/x86/boot/compressed/vmlinux
ZOFFSET arch/x86/boot/zoffset.h
AS arch/x86/boot/header.o
LD arch/x86/boot/setup.elf
OBJCOPY arch/x86/boot/setup.bin
OBJCOPY arch/x86/boot/vmlinux.bin
BUILD arch/x86/boot/bzImage
Setup is 15644 bytes (padded to 15872 bytes).
System is 6663 kB
CRC 3eb90f40
Kernel: arch/x86/boot/bzImage is ready (#48)
Building modules, stage 2.
MODPOST 165 modules
He bisected it back to:
commit 98f7852537 ("x86/boot: Refuse to build with data relocations")
The root cause was the use of the "if_changed" kbuild function multiple
times for the same target. It was designed to only be used once per
target, otherwise it will effectively always trigger, flipping back and
forth between the two commands getting recorded by "if_changed". Instead,
this patch merges the two commands into a single function to get stable
build artifacts (i.e. .vmlinux.cmd), and a single build behavior.
Bisected-and-Reported-by: Dirk Gouders <dirk@gouders.net>
Fix-Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180724230827.GA37823@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In function perf_event_parse_addr_filter(), the path::dentry of each struct
perf_addr_filter is left unassigned (as it should be) when the pattern
being parsed is related to kernel space. But in function
perf_addr_filter_match() the same dentries are given to d_inode() where
the value is not expected to be NULL, resulting in the following splat:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
pc : perf_event_mmap+0x2fc/0x5a0
lr : perf_event_mmap+0x2c8/0x5a0
Process uname (pid: 2860, stack limit = 0x000000001cbcca37)
Call trace:
perf_event_mmap+0x2fc/0x5a0
mmap_region+0x124/0x570
do_mmap+0x344/0x4f8
vm_mmap_pgoff+0xe4/0x110
vm_mmap+0x2c/0x40
elf_map+0x60/0x108
load_elf_binary+0x450/0x12c4
search_binary_handler+0x90/0x290
__do_execve_file.isra.13+0x6e4/0x858
sys_execve+0x3c/0x50
el0_svc_naked+0x30/0x34
This patch is fixing the problem by introducing a new check in function
perf_addr_filter_match() to see if the filter's dentry is NULL.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: miklos@szeredi.hu
Cc: namhyung@kernel.org
Cc: songliubraving@fb.com
Fixes: 9511bce9fe ("perf/core: Fix bad use of igrab()")
Link: http://lkml.kernel.org/r/1531782831-1186-1-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vince reported the perf_fuzzer giving various unwinder warnings and
Josh reported:
> Deja vu. Most of these are related to perf PEBS, similar to the
> following issue:
>
> b8000586c9 ("perf/x86/intel: Cure bogus unwind from PEBS entries")
>
> This is basically the ORC version of that. setup_pebs_sample_data() is
> assembling a franken-pt_regs which ORC isn't happy about. RIP is
> inconsistent with some of the other registers (like RSP and RBP).
And where the previous unwinder only needed BP,SP ORC also requires
IP. But we cannot spoof IP because then the sample will get displaced,
entirely negating the point of PEBS.
So cure the whole thing differently by doing the unwind early; this
does however require a means to communicate we did the unwind early.
We (ab)use an unused sample_type bit for this, which we set on events
that fill out the data->callchain before the normal
perf_prepare_sample().
Debugged-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
NO_RT_RUNTIME_SHARE feature is used to prevent a CPU borrow enough
runtime with a spin-rt-task.
However, if RT_RUNTIME_SHARE feature is enabled and rt_rq has borrowd
enough rt_runtime at the beginning, rt_runtime can't be restored to
its initial bandwidth rt_runtime after we disable RT_RUNTIME_SHARE.
E.g. on my PC with 4 cores, procedure to reproduce:
1) Make sure RT_RUNTIME_SHARE is enabled
cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY
CACHE_HOT_BUDDY WAKEUP_PREEMPTION NO_HRTICK NO_DOUBLE_TICK
LB_BIAS NONTASK_CAPACITY TTWU_QUEUE NO_SIS_AVG_CPU SIS_PROP
NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI RT_RUNTIME_SHARE NO_LB_MIN
ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS
2) Start a spin-rt-task
./loop_rr &
3) set affinity to the last cpu
taskset -p 8 $pid_of_loop_rr
4) Observe that last cpu have borrowed enough runtime.
cat /proc/sched_debug | grep rt_runtime
.rt_runtime : 950.000000
.rt_runtime : 900.000000
.rt_runtime : 950.000000
.rt_runtime : 1000.000000
5) Disable RT_RUNTIME_SHARE
echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
6) Observe that rt_runtime can not been restored
cat /proc/sched_debug | grep rt_runtime
.rt_runtime : 950.000000
.rt_runtime : 900.000000
.rt_runtime : 950.000000
.rt_runtime : 1000.000000
This patch help to restore rt_runtime after we disable
RT_RUNTIME_SHARE.
Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/1531874815-39357-1-git-send-email-liu.hailong6@zte.com.cn
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit:
9fb8d5dc4b ("stop_machine, Disable preemption when waking two stopper threads")
does not fully address the race condition that can occur
as follows:
On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.
Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.
If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.
Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.
Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: matt@codeblueprint.co.uk
Fixes: 9fb8d5dc4b ("stop_machine, Disable preemption when waking two stopper threads")
Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If an i2c topology has instances of nested muxes, then a lockdep splat
is produced when when i2c_parent_lock_bus() is called. Here is an
example:
============================================
WARNING: possible recursive locking detected
--------------------------------------------
insmod/68159 is trying to acquire lock:
(i2c_register_adapter#2){+.+.}, at: i2c_parent_lock_bus+0x32/0x50 [i2c_mux]
but task is already holding lock:
(i2c_register_adapter#2){+.+.}, at: i2c_parent_lock_bus+0x32/0x50 [i2c_mux]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(i2c_register_adapter#2);
lock(i2c_register_adapter#2);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by insmod/68159:
#0: (i2c_register_adapter#2){+.+.}, at: i2c_parent_lock_bus+0x32/0x50 [i2c_mux]
stack backtrace:
CPU: 13 PID: 68159 Comm: insmod Tainted: G O
Call Trace:
dump_stack+0x67/0x98
__lock_acquire+0x162e/0x1780
lock_acquire+0xba/0x200
rt_mutex_lock+0x44/0x60
i2c_parent_lock_bus+0x32/0x50 [i2c_mux]
i2c_parent_lock_bus+0x3e/0x50 [i2c_mux]
i2c_smbus_xfer+0xf0/0x700
i2c_smbus_read_byte+0x42/0x70
my2c_init+0xa2/0x1000 [my2c]
do_one_initcall+0x51/0x192
do_init_module+0x62/0x216
load_module+0x20f9/0x2b50
SYSC_init_module+0x19a/0x1c0
SyS_init_module+0xe/0x10
do_syscall_64+0x6c/0x1a0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Reported-by: John Sperbeck <jsperbeck@google.com>
Tested-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Deepa Dinamani <deepadinamani@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Chang <dpf@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Link: http://lkml.kernel.org/r/20180720083914.1950-3-peda@axentia.se
Signed-off-by: Ingo Molnar <mingo@kernel.org>
NVRAM is designed to work with Broadcom's SDK Linux kernel which fakes
PCI domain 0 for all internal MMIO devices. Since official Linux kernel
uses platform devices for that purpose there is a mismatch in numbering
PCI domains.
There used to be a fix for that problem but it was accidentally dropped
during the last firmware loading rework. That resulted in brcmfmac not
being able to extract device specific NVRAM content and all kind of
calibration problems.
Reported-by: Aditya Xavier <adityaxavier@gmail.com>
Fixes: 2baa3aaee2 ("brcmfmac: introduce brcmf_fw_alloc_request() function")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Martin KaFai Lau says:
====================
The series allows the BPF loader to figure out the btf_key_id
and btf_value_id from a map's name by using BPF_ANNOTATE_KV_PAIR()
similarly as in iproute2 commit f823f36012fb ("bpf: implement
btf handling and map annotation").
It also removes the old 'typedef' way which requires two separate
typedefs (one for the key and one for the value).
By doing this, iproute2 and libbpf have one consistent way to
figure out the btf_key_type_id and btf_value_type_id for a map.
The first two patches are some prep/cleanup works. The last patch
introduces BPF_ANNOTATE_KV_PAIR.
v3:
- Replace some more *int*_t and u* usages with the
equivalent __[su]* in btf.c
v2:
- Fix the incorrect '&&' check on container_type
in bpf_map_find_btf_info().
- Expose the existing static btf_type_by_id() instead of
creating a new one.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch introduces BPF_ANNOTATE_KV_PAIR to signal the
bpf loader about the btf key_type and value_type of a bpf map.
Please refer to the changes in test_btf_haskv.c for its usage.
Both iproute2 and libbpf loader will then have the same
convention to find out the map's btf_key_type_id and
btf_value_type_id from a map's name.
Fixes: 8a138aed4a ("bpf: btf: Add BTF support to libbpf")
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch replaces [u]int32_t and [u]int64_t usage with
__[su]32 and __[su]64. The same change goes for [u]int16_t
and [u]int8_t.
Fixes: 8a138aed4a ("bpf: btf: Add BTF support to libbpf")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch sync the uapi btf.h to tools/
Fixes: 36fc3c8c28 bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull MIPS fixes from Paul Burton:
"A couple more MIPS fixes for 4.18:
- Fix an off-by-one in reporting PCI resource sizes to userland which
regressed in v3.12.
- Fix writes to DDR controller registers used to flush write buffers,
which regressed with some refactoring in v4.2"
* tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: ath79: fix register address in ath79_ddr_wb_flush()
MIPS: Fix off-by-one in pci_resource_to_user()
Pull networking fixes from David Miller:
1) Handle stations tied to AP_VLANs properly during mac80211 hw
reconfig. From Manikanta Pubbisetty.
2) Fix jump stack depth validation in nf_tables, from Taehee Yoo.
3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran
Ben Elisha.
4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann.
5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan.
6) Fix cached netdev name leak in nf_tables, from Florian Westphal.
7) Fix memory leaks on chain rename, also from Florian Westphal.
8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk
Cheng.
9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing.
10) Fix link local address handling with VRF, from David Ahern.
11) Don't clobber 'err' on a successful call to __skb_linearize() in
skb_segment(). From Eric Dumazet.
12) Fix vxlan fdb notification races, from Roopa Prabhu.
13) Hash UDP fragments consistently, from Paolo Abeni.
14) If TCP receives lots of out of order tiny packets, we do really
silly stuff. Make the out-of-order queue ending more robust to this
kind of behavior, from Eric Dumazet.
15) Don't leak netlink dump state in nf_tables, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net: axienet: Fix double deregister of mdio
qmi_wwan: fix interface number for DW5821e production firmware
ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
bnx2x: Fix invalid memory access in rss hash config path.
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
r8169: restore previous behavior to accept BIOS WoL settings
cfg80211: never ignore user regulatory hint
sock: fix sg page frag coalescing in sk_alloc_sg
netfilter: nf_tables: move dumper state allocation into ->start
tcp: add tcp_ooo_try_coalesce() helper
tcp: call tcp_drop() from tcp_data_queue_ofo()
tcp: detect malicious patterns in tcp_collapse_ofo_queue()
tcp: avoid collapses in tcp_prune_queue() if possible
tcp: free batches of packets in tcp_prune_ofo_queue()
ip: hash fragments consistently
ipv6: use fib6_info_hold_safe() when necessary
can: xilinx_can: fix power management handling
can: xilinx_can: fix incorrect clear of non-processed interrupts
can: xilinx_can: fix RX overflow interrupt not being enabled
can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
...
Syzbot reported a read beyond the end of the skb head when returning
IPV6_ORIGDSTADDR:
BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
copy_to_user include/linux/uaccess.h:184 [inline]
put_cmsg+0x5ef/0x860 net/core/scm.c:242
ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
[..]
This logic and its ipv4 counterpart read the destination port from
the packet at skb_transport_offset(skb) + 4.
With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
packet that stores headers exactly up to skb_transport_offset(skb) in
the head and the remainder in a frag.
Call pskb_may_pull before accessing the pointer to ensure that it lies
in skb head.
Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rx hash/filter table configuration uses rss_conf_obj to configure filters
in the hardware. This object is initialized only when the interface is
brought up.
This patch adds driver changes to configure rss params only when the device
is in opened state. In port disabled case, the config will be cached in the
driver structure which will be applied in the successive load path.
Please consider applying it to 'net' branch.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp
context, rather than the one passed in the input modifier.
However, the qp number in the qp context is not defined as a
required parameter by the FW. Therefore, drivers may choose to not
specify the qp number in the qp context for the reset-to-init transition.
Thus, we must save the qp number passed in the command input modifier --
which is always present. (This saved qp number is used as the input
modifier for command 2RST_QP when a slave's qp's are destroyed).
Fixes: c82e9aa0a8 ("mlx4_core: resource tracking for HCA resources used by guests")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit cited below checked that the port numbers provided in the
primary and alt AVs are legal.
That is sufficient to prevent a kernel panic. However, it is not
sufficient for correct operation.
In Linux, AVs (both primary and alt) must be completely self-described.
We do not accept an AV from userspace without an embedded port number.
(This has been the case since kernel 3.14 commit dbf727de74
("IB/core: Use GID table in AH creation and dmac resolution")).
For the primary AV, this embedded port number must match the port number
specified with IB_QP_PORT.
We also expect the port number embedded in the alt AV to match the
alt_port_num value passed by the userspace driver in the modify_qp command
base structure.
Add these checks to modify_qp.
Cc: <stable@vger.kernel.org> # 4.16
Fixes: 5d4c05c3ee ("RDMA/uverbs: Sanitize user entered port numbers prior to access it")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Commit 7edf6d314c tried to resolve an inconsistency (BIOS WoL
settings are accepted, but device isn't wakeup-enabled) resulting
from a previous broken-BIOS workaround by making disabled WoL the
default.
This however had some side effects, most likely due to a broken BIOS
some systems don't properly resume from suspend when the MagicPacket
WoL bit isn't set in the chip, see
https://bugzilla.kernel.org/show_bug.cgi?id=200195
Therefore restore the WoL behavior from 4.16.
Reported-by: Albert Astals Cid <aacid@kde.org>
Fixes: 7edf6d314c ("r8169: disable WOL per default")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The scsi block layer requires requests claimed by the error handling be
completed by the error handler. A previous commit allowed completions
to proceed for blk-mq, breaking that assumption.
This patch prevents completions that may race with the timeout handler
by marking the state to complete, restoring the previous behavior.
Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is preparing for drivers that want to directly alter the state of
their requests. No functional change here.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When inodes are freed in xfs_ifree(), di_flags is cleared (so extent size
hints are removed) but the actual extent size fields are left intact.
This causes the extent hint validators to fail on freed inodes which once
had extent size hints.
This can be observed (for example) by running xfs/229 twice on a
non-crc xfs filesystem, or presumably on V5 with ikeep.
Fixes: 7d71a67 ("xfs: verify extent size hint is valid in inode verifier")
Fixes: 02a0fda ("xfs: verify COW extent size hint is valid in inode verifier")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull s390 fix from Martin Schwidefsky.
Guenter Roeck reports that the s390 allmodconfig build fails because of
a gcc plugin problem. The fix won't be in-tree until 4.19, so for now
disable the gcc plugins on s390.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: disable gcc plugins
Including asm/cacheflush.h first results in the following build error
when trying to build sparc32:allmodconfig, because 'struct page' has not
been declared, and the function declaration ends up creating a separate
(private) declaration of struct page (as a result of function arguments
being in the scope of the function declaration and definition, not in
global scope).
The C scoping rules do not just affect variable visibility, they also
affect type declaration visibility.
The end result is that when the actual call site is seen in
<linux/highmem.h>, the 'struct page' type in the caller is not the same
'struct page' that the function was declared with, resulting in:
In file included from arch/sparc/include/asm/page.h:10:0,
...
from drivers/staging/media/omap4iss/iss_video.c:15:
include/linux/highmem.h: In function 'clear_user_highpage':
include/linux/highmem.h:137:31: error:
passing argument 1 of 'sparc_flush_page_to_ram' from incompatible
pointer type
Include generic includes files first to fix the problem.
Fixes: fc96d58c10 ("[media] v4l: omap4iss: Add support for OMAP4 camera interface - Video devices")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[ Added explanation of C scope rules - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Make sure we don't go over the maximum jump stack boundary,
from Taehee Yoo.
2) Missing rcu_barrier() in hash and rbtree sets, also from Taehee.
3) Missing check to nul-node in rbtree timeout routine, from Taehee.
4) Use dev->name from flowtable to fix a memleak, from Florian.
5) Oneliner to free flowtable object on removal, from Florian.
6) Memleak in chain rename transaction, again from Florian.
7) Don't allow two chains to use the same name in the same
transaction, from Florian.
8) handle DCCP SYNC/SYNCACK as invalid, this triggers an
uninitialized timer in conntrack reported by syzbot, from Florian.
9) Fix leak in case netlink_dump_start() fails, from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
Only a few fixes:
* always keep regulatory user hint
* add missing break statement in station flags parsing
* fix non-linear SKBs in port-control-over-nl80211
* reconfigure VLAN stations during HW restart
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
I2C is open drain, so request the GPIO accordingly, even if pinmux did
set it up correctly for in-kernel users in this case.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
On Gen3, we can only do RXDMA once per transfer reliably. For that, we
must reset the device, then we can have RXDMA once. This patch
implements this. When there is no reset controller or the reset fails,
RXDMA will be blocked completely. Otherwise, it will be disabled after
the first RXDMA transfer. Based on a commit from the BSP by Hiromitsu
Yamasaki, yet completely refactored to handle multiple read messages
within one transfer.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
The revised if_ready checks skipped over the case of returning error when
the controller is being deleted. Instead it was returning BUSY, which
caused the ios to retry, which caused the ns delete to hang waiting for
the ios to drain.
Stack trace of hang looks like:
kworker/u64:2 D 0 74 2 0x80000000
Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
Call Trace:
? __schedule+0x26d/0x820
schedule+0x32/0x80
blk_mq_freeze_queue_wait+0x36/0x80
? remove_wait_queue+0x60/0x60
blk_cleanup_queue+0x72/0x160
nvme_ns_remove+0x106/0x140 [nvme_core]
nvme_remove_namespaces+0x7e/0xa0 [nvme_core]
nvme_delete_ctrl_work+0x4d/0x80 [nvme_core]
process_one_work+0x160/0x350
worker_thread+0x1c3/0x3d0
kthread+0xf5/0x130
? process_one_work+0x350/0x350
? kthread_bind+0x10/0x10
ret_from_fork+0x1f/0x30
Extend nvmf_fail_nonready_command() to supply the controller pointer so
that the controller state can be looked at. Fail any io to a controller
that is deleting.
Fixes: 3bc32bb118 ("nvme-fabrics: refactor queue ready check")
Fixes: 35897b920c ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
The existing code to carve up the sg list expected an sg element-per-page
which can be very incorrect with iommu's remapping multiple memory pages
to fewer bus addresses. To hit this error required a large io payload
(greater than 256k) and a system that maps on a per-page basis. It's
possible that large ios could get by fine if the system condensed the
sgl list into the first 64 elements.
This patch corrects the sg list handling by specifically walking the
sg list element by element and attempting to divide the transfer up
on a per-sg element boundary. While doing so, it still tries to keep
sequences under 256k, but will exceed that rule if a single sg element
is larger than 256k.
Fixes: 48fa362b6c ("nvmet-fc: simplify sg list handling")
Cc: <stable@vger.kernel.org> # 4.14
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
error_entry and error_exit communicate the user vs. kernel status of
the frame using %ebx. This is unnecessary -- the information is in
regs->cs. Just use regs->cs.
This makes error_entry simpler and makes error_exit more robust.
It also fixes a nasty bug. Before all the Spectre nonsense, the
xen_failsafe_callback entry point returned like this:
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
ENCODE_FRAME_POINTER
jmp error_exit
And it did not go through error_entry. This was bogus: RBX
contained garbage, and error_exit expected a flag in RBX.
Fortunately, it generally contained *nonzero* garbage, so the
correct code path was used. As part of the Spectre fixes, code was
added to clear RBX to mitigate certain speculation attacks. Now,
depending on kernel configuration, RBX got zeroed and, when running
some Wine workloads, the kernel crashes. This was introduced by:
commit 3ac6d8c787 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
With this patch applied, RBX is no longer needed as a flag, and the
problem goes away.
I suspect that malicious userspace could use this bug to crash the
kernel even without the offending patch applied, though.
[ Historical note: I wrote this patch as a cleanup before I was aware
of the bug it fixed. ]
[ Note to stable maintainers: this should probably get applied to all
kernels. If you're nervous about that, a more conservative fix to
add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should
also fix the problem. ]
Reported-and-tested-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Fixes: 3ac6d8c787 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
Link: http://lkml.kernel.org/r/b5010a090d3586b2d6e06c7ad3ec5542d1241c45.1532282627.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently user regulatory hint is ignored if all wiphys
in the system are self managed. But the hint is not ignored
if there is no wiphy in the system. This affects the global
regulatory setting. Global regulatory setting needs to be
maintained so that it can be applied to a new wiphy entering
the system. Therefore, do not ignore user regulatory setting
even if all wiphys in the system are self managed.
Signed-off-by: Amar Singhal <asinghal@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The s390 build currently fails with the latent entropy plugin:
arch/s390/kernel/als.o: In function `verify_facilities':
als.c:(.init.text+0x24): undefined reference to `latent_entropy'
als.c:(.init.text+0xae): undefined reference to `latent_entropy'
make[3]: *** [arch/s390/boot/compressed/vmlinux] Error 1
make[2]: *** [arch/s390/boot/compressed/vmlinux] Error 2
make[1]: *** [bzImage] Error 2
This will be fixed with the early boot rework from Vasily, which
is planned for the 4.19 merge window.
For 4.18 the simplest solution is to disable the gcc plugins and
reenable them after the early boot rework is upstream.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Current sg coalescing logic in sk_alloc_sg() (latter is used by tls and
sockmap) is not quite correct in that we do fetch the previous sg entry,
however the subsequent check whether the refilled page frag from the
socket is still the same as from the last entry with prior offset and
length matching the start of the current buffer is comparing always the
first sg list entry instead of the prior one.
Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures the member->offset of a struct
is in the correct order (i.e the later member's offset cannot
go backward).
The current "pahole -J" BTF encoder does not generate something
like this. However, checking this can ensure future encoder
will not violate this.
Fixes: 69b693f0ae ("bpf: btf: Introduce BPF Type Format (BTF)")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Shaochun Chen points out we leak dumper filter state allocations
stored in dump_control->data in case there is an error before netlink sets
cb_running (after which ->done will be called at some point).
In order to fix this, add .start functions and do the allocations
there.
->done is going to clean up, and in case error occurs before
->start invocation no cleanups need to be done anymore.
Reported-by: shaochun chen <cscnull@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If a GPIO chip is a part of a hierarchy IRQ domain, there is no
way to specify the trigger type when gpio(d)_to_irq() allocates an
interrupt on-the-fly.
Currently, uniphier_gpio_to_irq() sets IRQ_TYPE_NONE, but it causes
an error in the .alloc() hook of the parent domain.
(drivers/irq/irq-uniphier-aidet.c)
Even if we change irq-uniphier-aidet.c to accept the NONE type,
GIC complains about it since commit 83a86fbb5b ("irqchip/gic:
Loudly complain about the use of IRQ_TYPE_NONE").
Instead, use IRQ_TYPE_LEVEL_HIGH as a temporary value when an irq
is allocated. irq_set_irq_type() will override it when the irq is
really requested.
Fixes: dbe776c2ca ("gpio: uniphier: add UniPhier GPIO controller driver")
Reported-by: Katsuhiro Suzuki <suzuki.katsuhiro@socionext.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Katsuhiro Suzuki <suzuki.katsuhiro@socionext.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This fixes up the handling of fixed regulator polarity
inversion flags: while I remembered to fix it for the
undocumented "reg-fixed-voltage" I forgot about the
official "regulator-fixed" binding, there are two ways
to do a fixed regulator.
The error was noticed and fixed.
Fixes: a603a2b8d8 ("gpio: of: Add special quirk to parse regulator flags")
Cc: Mark Brown <broonie@kernel.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Reported-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Eric Dumazet says:
====================
Juha-Matti Tilli reported that malicious peers could inject tiny
packets in out_of_order_queue, forcing very expensive calls
to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
every incoming packet.
With tcp_rmem[2] default of 6MB, the ooo queue could
contain ~7000 nodes.
This patch series makes sure we cut cpu cycles enough to
render the attack not critical.
We might in the future go further, like disconnecting
or black-holing proven malicious flows.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In case skb in out_or_order_queue is the result of
multiple skbs coalescing, we would like to get a proper gso_segs
counter tracking, so that future tcp_drop() can report an accurate
number.
I chose to not implement this tracking for skbs in receive queue,
since they are not dropped, unless socket is disconnected.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to be able to give better diagnostics and detect
malicious traffic, we need to have better sk->sk_drops tracking.
Fixes: 9f5afeae51 ("tcp: use an RB tree for ooo receive queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case an attacker feeds tiny packets completely out of order,
tcp_collapse_ofo_queue() might scan the whole rb-tree, performing
expensive copies, but not changing socket memory usage at all.
1) Do not attempt to collapse tiny skbs.
2) Add logic to exit early when too many tiny skbs are detected.
We prefer not doing aggressive collapsing (which copies packets)
for pathological flows, and revert to tcp_prune_ofo_queue() which
will be less expensive.
In the future, we might add the possibility of terminating flows
that are proven to be malicious.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right after a TCP flow is created, receiving tiny out of order
packets allways hit the condition :
if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
tcp_clamp_window(sk);
tcp_clamp_window() increases sk_rcvbuf to match sk_rmem_alloc
(guarded by tcp_rmem[2])
Calling tcp_collapse_ofo_queue() in this case is not useful,
and offers a O(N^2) surface attack to malicious peers.
Better not attempt anything before full queue capacity is reached,
forcing attacker to spend lots of resource and allow us to more
easily detect the abuse.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Juha-Matti Tilli reported that malicious peers could inject tiny
packets in out_of_order_queue, forcing very expensive calls
to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
every incoming packet. out_of_order_queue rb-tree can contain
thousands of nodes, iterating over all of them is not nice.
Before linux-4.9, we would have pruned all packets in ofo_queue
in one go, every XXXX packets. XXXX depends on sk_rcvbuf and skbs
truesize, but is about 7000 packets with tcp_rmem[2] default of 6 MB.
Since we plan to increase tcp_rmem[2] in the future to cope with
modern BDP, can not revert to the old behavior, without great pain.
Strategy taken in this patch is to purge ~12.5 % of the queue capacity.
Fixes: 36a6503fed ("tcp: refine tcp_prune_ofo_queue() to not drop all packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb hash for locally generated ip[v6] fragments belonging
to the same datagram can vary in several circumstances:
* for connected UDP[v6] sockets, the first fragment get its hash
via set_owner_w()/skb_set_hash_from_sk()
* for unconnected IPv6 UDPv6 sockets, the first fragment can get
its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
auto_flowlabel is enabled
For the following frags the hash is usually computed via
skb_get_hash().
The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
scenario the egress tx queue can be selected on a per packet basis
via the skb hash.
It may also fool flow-oriented schedulers to place fragments belonging
to the same datagram in different flows.
Fix the issue by copying the skb hash from the head frag into
the others at fragmentation time.
Before this commit:
perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8"
netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n &
perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1
perf script
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0
After this commit:
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
Fixes: b73c3d0e4f ("net: Save TX flow hash in sock and set in skbuf on xmit")
Fixes: 67800f9b1f ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure to call reinit_completion() before dma is started to avoid race
condition where reinit_completion() is called after complete() and before
wait_for_completion_timeout().
Signed-off-by: Esben Haabendal <eha@deif.com>
Fixes: ce1a78840f ("i2c: imx: add DMA support for freescale i2c driver")
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
If CLKH is set to 0 I2C clock is not generated at all, so avoid this value
and stretch the clock in this case.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Marc Kleine-Budde says:
====================
pull-request: can 2018-07-23
this is a pull request of 12 patches for net/master.
The patch by Stephane Grosjean for the peak_canfd CAN driver fixes a problem
with older firmware. The next patch is by Roman Fietze and fixes the setup of
the CCCR register in the m_can driver. Nicholas Mc Guire's patch for the
mpc5xxx_can driver adds missing error checking. The two patches by Faiz Abbas
fix the runtime resume and clean up the probe function in the m_can driver. The
last 7 patches by Anssi Hannula fix several problem in the xilinx_can driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several issues with the suspend/resume handling code of the
driver:
- The device is attached and detached in the runtime_suspend() and
runtime_resume() callbacks if the interface is running. However,
during xcan_chip_start() the interface is considered running,
causing the resume handler to incorrectly call netif_start_queue()
at the beginning of xcan_chip_start(), and on xcan_chip_start() error
return the suspend handler detaches the device leaving the user
unable to bring-up the device anymore.
- The device is not brought properly up on system resume. A reset is
done and the code tries to determine the bus state after that.
However, after reset the device is always in Configuration mode
(down), so the state checking code does not make sense and
communication will also not work.
- The suspend callback tries to set the device to sleep mode (low-power
mode which monitors the bus and brings the device back to normal mode
on activity), but then immediately disables the clocks (possibly
before the device reaches the sleep mode), which does not make sense
to me. If a clean shutdown is wanted before disabling clocks, we can
just bring it down completely instead of only sleep mode.
Reorganize the PM code so that only the clock logic remains in the
runtime PM callbacks and the system PM callbacks contain the device
bring-up/down logic. This makes calling the runtime PM callbacks during
e.g. xcan_chip_start() safe.
The system PM callbacks now simply call common code to start/stop the
HW if the interface was running, replacing the broken code from before.
xcan_chip_stop() is updated to use the common reset code so that it will
wait for the reset to complete. Reset also disables all interrupts so do
not do that separately.
Also, the device_may_wakeup() checks are removed as the driver does not
have wakeup support.
Tested on Zynq-7000 integrated CAN.
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
xcan_interrupt() clears ERROR|RXOFLV|BSOFF|ARBLST interrupts if any of
them is asserted. This does not take into account that some of them
could have been asserted between interrupt status read and interrupt
clear, therefore clearing them without handling them.
Fix the code to only clear those interrupts that it knows are asserted
and therefore going to be processed in xcan_err_interrupt().
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
RX overflow interrupt (RXOFLW) is disabled even though xcan_interrupt()
processes it. This means that an RX overflow interrupt will only be
processed when another interrupt gets asserted (e.g. for RX/TX).
Fix that by enabling the RXOFLW interrupt.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The xilinx_can driver assumes that the TXOK interrupt only clears after
it has been acknowledged as many times as there have been successfully
sent frames.
However, the documentation does not mention such behavior, instead
saying just that the interrupt is cleared when the clear bit is set.
Similarly, testing seems to also suggest that it is immediately cleared
regardless of the amount of frames having been sent. Performing some
heavy TX load and then going back to idle has the tx_head drifting
further away from tx_tail over time, steadily reducing the amount of
frames the driver keeps in the TX FIFO (but not to zero, as the TXOK
interrupt always frees up space for 1 frame from the driver's
perspective, so frames continue to be sent) and delaying the local echo
frames.
The TX FIFO tracking is also otherwise buggy as it does not account for
TX FIFO being cleared after software resets, causing
BUG!, TX FIFO full when queue awake!
messages to be output.
There does not seem to be any way to accurately track the state of the
TX FIFO for local echo support while using the full TX FIFO.
The Zynq version of the HW (but not the soft-AXI version) has watermark
programming support and with it an additional TX-FIFO-empty interrupt
bit.
Modify the driver to only put 1 frame into TX FIFO at a time on soft-AXI
and 2 frames at a time on Zynq. On Zynq the TXFEMP interrupt bit is used
to detect whether 1 or 2 frames have been sent at interrupt processing
time.
Tested with the integrated CAN on Zynq-7000 SoC. The 1-frame-FIFO mode
was also tested.
An alternative way to solve this would be to drop local echo support but
keep using the full TX FIFO.
v2: Add FIFO space check before TX queue wake with locking to
synchronize with queue stop. This avoids waking the queue when xmit()
had just filled it.
v3: Keep local echo support and reduce the amount of frames in FIFO
instead as suggested by Marc Kleine-Budde.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The xilinx_can driver contains no mechanism for propagating recovery
from CAN_STATE_ERROR_WARNING and CAN_STATE_ERROR_PASSIVE.
Add such a mechanism by factoring the handling of
XCAN_STATE_ERROR_PASSIVE and XCAN_STATE_ERROR_WARNING out of
xcan_err_interrupt and checking for recovery after RX and TX if the
interface is in one of those states.
Tested with the integrated CAN on Zynq-7000 SoC.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If the device gets into a state where RXNEMP (RX FIFO not empty)
interrupt is asserted without RXOK (new frame received successfully)
interrupt being asserted, xcan_rx_poll() will continue to try to clear
RXNEMP without actually reading frames from RX FIFO. If the RX FIFO is
not empty, the interrupt will not be cleared and napi_schedule() will
just be called again.
This situation can occur when:
(a) xcan_rx() returns without reading RX FIFO due to an error condition.
The code tries to clear both RXOK and RXNEMP but RXNEMP will not clear
due to a frame still being in the FIFO. The frame will never be read
from the FIFO as RXOK is no longer set.
(b) A frame is received between xcan_rx_poll() reading interrupt status
and clearing RXOK. RXOK will be cleared, but RXNEMP will again remain
set as the new message is still in the FIFO.
I'm able to trigger case (b) by flooding the bus with frames under load.
There does not seem to be any benefit in using both RXNEMP and RXOK in
the way the driver does, and the polling example in the reference manual
(UG585 v1.10 18.3.7 Read Messages from RxFIFO) also says that either
RXOK or RXNEMP can be used for detecting incoming messages.
Fix the issue and simplify the RX processing by only using RXNEMP
without RXOK.
Tested with the integrated CAN on Zynq-7000 SoC.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The xilinx_can driver performs a software reset when an RX overrun is
detected. This causes the device to enter Configuration mode where no
messages are received or transmitted.
The documentation does not mention any need to perform a reset on an RX
overrun, and testing by inducing an RX overflow also indicated that the
device continues to work just fine without a reset.
Remove the software reset.
Tested with the integrated CAN on Zynq-7000 SoC.
Fixes: b1201e44f5 ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
MCAN message ram should only be accessed once clocks are enabled.
Therefore, move the call to parse/init the message ram to after
clocks are enabled.
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
pm_runtime_get_sync() returns a 1 if the state of the device is already
'active'. This is not a failure case and should return a success.
Therefore fix error handling for pm_runtime_get_sync() call such that
it returns success when the value is 1.
Also cleanup the TODO for using runtime PM for sleep mode as that is
implemented.
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Cc: <stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
of_iomap() can return NULL so that return needs to be checked and NULL
treated as failure. While at it also take care of the missing
of_node_put() in the error path.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: commit afa17a500a ("net/can: add driver for mscan family & mpc52xx_mscan")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Inside m_can_chip_config(), when setting up the new value of the CCCR,
the CCCR_NISO bit is not cleared like the others, CCCR_TEST, CCCR_MON,
CCCR_BRSE and CCCR_FDOE, before checking the can.ctrlmode bits for
CAN_CTRLMODE_FD_NON_ISO.
This way once the controller was configured for CAN_CTRLMODE_FD_NON_ISO,
this mode could never be cleared again.
This fix is only relevant for controllers with version 3.1.x or 3.2.x.
Older versions do not support NISO.
Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The DMA logic in firmwares < v3.3.0 embedded in the PCAN-PCIe FD cards
family is not capable of handling a mix of 32-bit and 64-bit logical
addresses. If the board is equipped with 2 or 4 CAN ports, then such a
situation might lead to a PCIe Bus Error "Malformed TLP" packet
as well as "irq xx: nobody cared" issue.
This patch adds a workaround that requests only 32-bit DMA addresses
when these might be allocated outside of the 4 GB area.
This issue has been fixed in firmware v3.3.0 and next.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The dispatcher and the executer process the parse nodes During table
load. Error status from the evaluation confuses the AML parser. This
results in the parser failing to complete parsing of the current
scope op which becomes problematic. For the incorrect AML below, _ADR
never gets created.
definition_block(...)
{
Scope (\_SB)
{
Device (PCI0){...}
Name (OBJ1, 0x0)
OBJ1 = PCI0 + 5 // Results in an operand error.
} // \_SB not closed
// parser looks for \_SB._SB.PCI0, results in AE_NOT_FOUND error
// Entire scope block gets skipped.
Scope (\_SB.PCI0)
{
Name (_ADR, 0x0)
}
}
Fix the above error by properly completing the initial \_SB scope
after an error by clearing errors that occur during table load. In
the above case, this means that OBJ1 = PIC0 + 5 is skipped.
Fixes: 5088814a6e (ACPICA: AML parser: attempt to continue loading table after error)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200363
Tested-by: Bastien Nocera <hadess@hadess.net>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Prevent drivers from building on PPC32 if they use isa_bus_to_virt(),
isa_virt_to_bus(), or isa_page_to_bus(), which are not available and
thus cause build errors.
../drivers/net/ethernet/3com/3c515.c: In function 'corkscrew_open':
../drivers/net/ethernet/3com/3c515.c:824:9: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
../drivers/net/ethernet/amd/lance.c: In function 'lance_rx':
../drivers/net/ethernet/amd/lance.c:1203:23: error: implicit declaration of function 'isa_bus_to_virt'; did you mean 'bus_to_virt'? [-Werror=implicit-function-declaration]
../drivers/net/ethernet/amd/ni65.c: In function 'ni65_init_lance':
../drivers/net/ethernet/amd/ni65.c:585:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
../drivers/net/ethernet/cirrus/cs89x0.c: In function 'net_open':
../drivers/net/ethernet/cirrus/cs89x0.c:897:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously only the neighbour state was checked to decide if an offloaded
entry should be removed. However, there can be situations when the entry
is dead but still marked as valid. This can lead to dead entries not
being removed from fw tables or even incorrect data being added.
Check the entry dead bit before deciding if it should be added to or
removed from fw neighbour tables.
Fixes: 8e6a9046b6 ("nfp: flower vxlan neighbour offload")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu says:
====================
vxlan: fix default fdb entry user-space notify ordering/race
Problem:
In vxlan_newlink, a default fdb entry is added before register_netdev.
The default fdb creation function notifies user-space of the
fdb entry on the vxlan device which user-space does not know about yet.
(RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).
This series fixes the user-space netlink notification ordering issue
with the following changes:
- decouple fdb notify from fdb create.
- Move fdb notify after register_netdev.
- modify rtnl_configure_link to allow configuring a link early.
- Call rtnl_configure_link in vxlan newlink handler to notify
userspace about the newlink before fdb notify and
hence avoiding the user-space race.
====================
Fixes: afbd8bae9c ("vxlan: add implicit fdb entry for default destination")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Problem:
In vxlan_newlink, a default fdb entry is added before register_netdev.
The default fdb creation function also notifies user-space of the
fdb entry on the vxlan device which user-space does not know about yet.
(RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).
This patch fixes the user-space netlink notification ordering issue
with the following changes:
- decouple fdb notify from fdb create.
- Move fdb notify after register_netdev.
- Call rtnl_configure_link in vxlan newlink handler to notify
userspace about the newlink before fdb notify and
hence avoiding the user-space race.
Fixes: afbd8bae9c ("vxlan: add implicit fdb entry for default destination")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new option do_notify to vxlan_fdb_destroy to make
sending netlink notify optional. Used by a later patch.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add new vxlan_fdb_alloc helper
- rename existing vxlan_fdb_create into vxlan_fdb_update:
because it really creates or updates an existing
fdb entry
- move new fdb creation into a separate vxlan_fdb_create
Main motivation for this change is to introduce the ability
to decouple vxlan fdb creation and notify, used in a later patch.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_configure_link sets dev->rtnl_link_state to
RTNL_LINK_INITIALIZED and unconditionally calls
__dev_notify_flags to notify user-space of dev flags.
current call sequence for rtnl_configure_link
rtnetlink_newlink
rtnl_link_ops->newlink
rtnl_configure_link (unconditionally notifies userspace of
default and new dev flags)
If a newlink handler wants to call rtnl_configure_link
early, we will end up with duplicate notifications to
user-space.
This patch fixes rtnl_configure_link to check rtnl_link_state
and call __dev_notify_flags with gchanges = 0 if already
RTNL_LINK_INITIALIZED.
Later in the series, this patch will help the following sequence
where a driver implementing newlink can call rtnl_configure_link
to initialize the link early.
makes the following call sequence work:
rtnetlink_newlink
rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
link and notifies
user-space of default
dev flags)
rtnl_configure_link (updates dev flags if requested by user ifm
and notifies user-space of new dev flags)
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Got crash report with following backtrace:
BUG: unable to handle kernel paging request at ffff8801869daffe
RIP: 0010:[<ffffffff816429c4>] [<ffffffff816429c4>] ip6_finish_output2+0x394/0x4c0
RSP: 0018:ffff880186c83a98 EFLAGS: 00010283
RAX: ffff8801869db00e ...
[<ffffffff81644cdc>] ip6_finish_output+0x8c/0xf0
[<ffffffff81644d97>] ip6_output+0x57/0x100
[<ffffffff81643dc9>] ip6_forward+0x4b9/0x840
[<ffffffff81645566>] ip6_rcv_finish+0x66/0xc0
[<ffffffff81645db9>] ipv6_rcv+0x319/0x530
[<ffffffff815892ac>] netif_receive_skb+0x1c/0x70
[<ffffffffc0060bec>] atl1c_clean+0x1ec/0x310 [atl1c]
...
The bad access is in neigh_hh_output(), at skb->data - 16 (HH_DATA_MOD).
atl1c driver provided skb with no headroom, so 14 bytes (ethernet
header) got pulled, but then 16 are copied.
Reserve NET_SKB_PAD bytes headroom, like netdev_alloc_skb().
Compile tested only; I lack hardware.
Fixes: 7b70176421 ("atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two scenarios that we will restore deleted records. The first is
when device down and up(or unmap/remap). In this scenario the new filter
mode is same with previous one. Because we get it from in_dev->mc_list and
we do not touch it during device down and up.
The other scenario is when a new socket join a group which was just delete
and not finish sending status reports. In this scenario, we should use the
current filter mode instead of restore old one. Here are 4 cases in total.
old_socket new_socket before_fix after_fix
IN(A) IN(A) ALLOW(A) ALLOW(A)
IN(A) EX( ) TO_IN( ) TO_EX( )
EX( ) IN(A) TO_EX( ) ALLOW(A)
EX( ) EX( ) TO_EX( ) TO_EX( )
Fixes: 24803f38a5 (igmp: do not remove igmp souce list info when set link down)
Fixes: 1666d49e1d (mld: do not remove mld souce list info when set link down)
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
free_irq() waits until all handlers for this IRQ have completed. As the
relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
it might never return if the thread calling free_irq() holds this lock.
For the same reason kthread_cancel_delayed_work_sync() in the polling case
must not hold this lock.
Also first free the irq (or stop the worker respectively) such that
mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
worker thread to call handle_nested_irq(0) which results in a NULL-pointer
exception.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Example setup:
host: ip -6 addr add dev eth1 2001:db8:104::4
where eth1 is enslaved to a VRF
switch: ip -6 ro add 2001:db8:104::4/128 dev br1
where br1 only has an LLA
ping6 2001:db8:104::4
ssh 2001:db8:104::4
(NOTE: UDP works fine if the PKTINFO has the address set to the global
address and ifindex is set to the index of eth1 with a destination an
LLA).
For ICMP, icmp6_iif needs to be updated to check if skb->dev is an
L3 master. If it is then return the ifindex from rt6i_idev similar
to what is done for loopback.
For TCP, restore the original tcp_v6_iif definition which is needed in
most places and add a new tcp_v6_iif_l3_slave that considers the
l3_slave variability. This latter check is only needed for socket
lookups.
Fixes: 9ff7438460 ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following warning:
net/ipv4/bpfilter/sockopt.c:28:5: error: symbol 'bpfilter_ip_set_sockopt' redeclared with different type
net/ipv4/bpfilter/sockopt.c:34:5: error: symbol 'bpfilter_ip_get_sockopt' redeclared with different type
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The situation described in the comment can occur also with
PHY_IGNORE_INTERRUPT, therefore change the condition to include it.
Fixes: f555f34fdc ("net: phy: fix auto-negotiation stall due to unavailable interrupt")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru says:
====================
qed: Fix series II.
The patch series fixes few issues in the qed driver.
Please consider applying it to 'net' branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
FW hsi contains 256 approximation buckets which are split in ramrod into
eight u32 values, but driver is using eight 'unsigned long' variables.
This patch fixes the mcast logic by making the API utilize u32.
Fixes: 83aeb933 ("qed*: Trivial modifications")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a possible race where driver can read link status in mid-transition
and see that virtual-link is up yet speed is 0. Since in this
mid-transition we're guaranteed to see a mailbox from MFW soon, we can
afford to treat this as link down.
Fixes: cc875c2e ("qed: Add link support")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apparently, MFW publishes EEE capabilities even for Fiber-boards that don't
support them, and later since qed internally sets adv_caps it would cause
link-flap avoidance (LFA) to fail when driver would initiate the link.
This in turn delays the link, causing traffic to fail.
Driver has been modified to not to ask MFW for any EEE config if EEE isn't
to be enabled.
Fixes: 645874e5 ("qed: Add support for Energy efficient ethernet.")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a missing rcu_read_unlock in the error path
Fixes: c95567c803 ("caif: added check for potential null return")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some time now, if you load the bonding driver and configure bond
parameters via sysfs using minimal config options, such as specifying
nothing but the mode, relying on defaults for everything else, modes
that cannot use arp monitoring (802.3ad, balance-tlb, balance-alb) all
wind up with both arp_interval=0 (as it should be) and miimon=0, which
means the miimon monitor thread never actually runs. This is particularly
problematic for 802.3ad.
For example, from an LNST recipe I've set up:
$ modprobe bonding max_bonds=0"
$ echo "+t_bond0" > /sys/class/net/bonding_masters"
$ ip link set t_bond0 down"
$ echo "802.3ad" > /sys/class/net/t_bond0/bonding/mode"
$ ip link set ens1f1 down"
$ echo "+ens1f1" > /sys/class/net/t_bond0/bonding/slaves"
$ ip link set ens1f0 down"
$ echo "+ens1f0" > /sys/class/net/t_bond0/bonding/slaves"
$ ethtool -i t_bond0"
$ ip link set ens1f1 up"
$ ip link set ens1f0 up"
$ ip link set t_bond0 up"
$ ip addr add 192.168.9.1/24 dev t_bond0"
$ ip addr add 2002::1/64 dev t_bond0"
This bond comes up okay, but things look slightly suspect in
/proc/net/bonding/t_bond0 output:
$ grep -i mii /proc/net/bonding/t_bond0
MII Status: up
MII Polling Interval (ms): 0
MII Status: up
MII Status: up
Now, pull a cable on one of the ports in the bond, then reconnect it, and
you'll see:
Slave Interface: ens1f0
MII Status: down
Speed: 1000 Mbps
Duplex: full
I believe this became a major issue as of commit 4d2c0cda07, which for
802.3ad bonds, sets slave->link = BOND_LINK_DOWN, with a comment about
relying on link monitoring via miimon to set it correctly, but since the
miimon work queue never runs, the link just stays marked down.
If we simply tweak bond_option_mode_set() slightly, we can check for the
non-arp modes having no miimon value set, and insert BOND_DEFAULT_MIIMON,
which gets things back in full working order. This problem exists as far
back as 4.14, and might be worth fixing in all stable trees since, though
the work-around is to simply specify an miimon value yourself.
Reported-by: Bob Ball <ball@umich.edu>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-07-18
The following series provides fixes to mlx5 core and net device driver.
Please pull and let me know if there's any problem.
For -stable v4.7
net/mlx5e: Don't allow aRFS for encapsulated packets
net/mlx5e: Fix quota counting in aRFS expire flow
For -stable v4.15
net/mlx5e: Only allow offloading decap egress (egdev) flows
net/mlx5e: Refine ets validation function
net/mlx5: Adjust clock overflow work period
For -stable v4.17
net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-07-20
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix in BPF Makefile to detect llvm-objcopy in a more robust way which is
needed for pahole's BTF converter and minor UAPI tweaks in BTF_INT_BITS()
to shrink the mask before eventual UAPI freeze, from Martin.
2) Fix a segfault in bpftool when prog pin id has no further arguments such
as id value or file specified, from Taeung.
3) Fix powerpc JIT handling of XADD which has jumps to exit path that would
potentially bypass verifier expectations e.g. with subprog calls. Also add
a test case to make sure XADD is not mangling src/dst register, from Daniel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on USB2.0 Spec Section 11.12.5,
"If a hub has per-port power switching and per-port current limiting,
an over-current on one port may still cause the power on another port
to fall below specific minimums. In this case, the affected port is
placed in the Power-Off state and C_PORT_OVER_CURRENT is set for the
port, but PORT_OVER_CURRENT is not set."
so let's check C_PORT_OVER_CURRENT too for over current condition.
Fixes: 08d1dec6f4 ("usb:hub set hub->change_bits when over-current happens")
Cc: <stable@vger.kernel.org>
Tested-by: Alessandro Antenucci <antenucci@korg.it>
Signed-off-by: Bin Liu <b-liu@ti.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code does not check sk->sk_shutdown & RCV_SHUTDOWN.
tls_sw_recvmsg may return a positive value in the case where bytes have
already been copied when the socket is shutdown. sk->sk_err has been
cleared, causing the tls_wait_data to hang forever on a subsequent
invocation. Checking sk->sk_shutdown & RCV_SHUTDOWN, as in tcp_recvmsg,
fixes this problem.
Fixes: c46234ebb4 ("tls: RX path for ktls")
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng says:
====================
fix DCTCP ECE Ack series
This patch set address that the existing DCTCP implementation does not
fully implement the ACK policy specified in the RFC. This improves
the responsiveness of CE status change particularly on flows with
small inflight.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
has to be sent immediately so the sender can respond quickly:
""" When receiving packets, the CE codepoint MUST be processed as follows:
1. If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
true and send an immediate ACK.
2. If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
to false and send an immediate ACK.
"""
Previously DCTCP implementation may continue to delay the ACK. This
patch fixes that to implement the RFC by forcing an immediate ACK.
Tested with this packetdrill script provided by Larry Brakmo
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001
0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
+0.005 < [ce] . 2001:3001(1000) ack 2 win 257
+0.000 > [ect01] . 2:2(0) ack 2001
// Previously the ACK below would be delayed by 40ms
+0.000 > [ect01] E. 2:2(0) ack 3001
+0.500 < F. 9501:9501(0) ack 4 win 257
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when a DCTCP receiver delays an ACK and receive a
data packet with a different CE mark from the previous one's, it
sends two immediate ACKs acking previous and latest sequences
respectly (for ECN accounting).
Previously sending the first ACK may mark off the delayed ACK timer
(tcp_event_ack_sent). This may subsequently prevent sending the
second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
The culprit is that tcp_send_ack() assumes it always acknowleges
the latest sequence, which is not true for the first special ACK.
The fix is to not make the assumption in tcp_send_ack and check the
actual ack sequence before cancelling the delayed ACK. Further it's
safer to pass the ack sequence number as a local variable into
tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
future bugs like this.
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VPID for the nested vcpu is allocated at vmx_create_vcpu whenever nested
vmx is turned on with the module parameter.
However, it's only freed if the L1 guest has executed VMXON which is not
a given.
As a result, on a system with nested==on every creation+deletion of an
L1 vcpu without running an L2 guest results in leaking one vpid. Since
the total number of vpids is limited to 64k, they can eventually get
exhausted, preventing L2 from starting.
Delay allocation of the L2 vpid until VMXON emulation, thus matching its
freeing.
Fixes: 5c614b3583
Cc: stable@vger.kernel.org
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not expose the address of vmx->nested.current_vmptr to
kvm_write_guest_virt_system() as the resulting __copy_to_user()
call will trigger a WARN when CONFIG_HARDENED_USERCOPY is
enabled.
Opportunistically clean up variable names in handle_vmptrst()
to improve readability, e.g. vmcs_gva is misleading as the
memory operand of VMPTRST is plain memory, not a VMCS.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Tested-by: Peter Shier <pshier@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is a bug in the sink PDO search code when trying to select
a PPS APDO. The current code actually sets the starting index for
searching to whatever value 'i' is, rather than choosing index 1
to avoid the first PDO (always 5V fixed). As a result, for sources
which support PPS but whose PPS APDO index does not match with the
supporting sink PPS APDO index for the platform, no valid PPS APDO
will be found so this feature will not be permitted.
Sadly in testing, both Source and Sink capabilities matched up and
this was missed. Code is now updated to correctly set the start
index to 1, and testing with additional PPS capable sources show
this to work as expected.
Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Fixes: 2eadc33f40 ("typec: tcpm: Add core support for sink side PPS")
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 1b9ba000 ("Allow function drivers to pause control
transfers") states that USB_GADGET_DELAYED_STATUS is only
supported if data phase is 0 bytes.
It seems that when the length is not 0 bytes, there is no
need to explicitly delay the data stage since the transfer
is not completed until the user responds. However, when the
length is 0, there is no data stage and the transfer is
finished once setup() returns, hence there is a need to
explicitly delay completion.
This manifests as the following bugs:
Prior to 946ef68ad4 ('Let setup() return
USB_GADGET_DELAYED_STATUS'), when setup is 0 bytes, ffs
would require user to queue a 0 byte request in order to
clear setup state. However, that 0 byte request was actually
not needed and would hang and cause errors in other setup
requests.
After the above commit, 0 byte setups work since the gadget
now accepts empty queues to ep0 to clear the delay, but all
other setups hang.
Fixes: 946ef68ad4 ("Let setup() return USB_GADGET_DELAYED_STATUS")
Signed-off-by: Jerry Zhang <zhangjerry@google.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When first DCCP packet is SYNC or SYNCACK, we insert a new conntrack
that has an un-initialized timeout value, i.e. such entry could be
reaped at any time.
Mark them as INVALID and only ignore SYNC/SYNCACK when connection had
an old state.
Reported-by: syzbot+6f18401420df260e37ed@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Its possible to rename two chains to the same name in one
transaction:
nft add chain t c1
nft add chain t c2
nft 'rename chain t c1 c3;rename chain t c2 c3'
This creates two chains named 'c3'.
Appears to be harmless, both chains can still be deleted both
by name or handle, but, nevertheless, its a bug.
Walk transaction log and also compare vs. the pending renames.
Both chains can still be deleted, but nevertheless it is a bug as
we don't allow to create chains with identical names, so we should
prevent this from happening-by-rename too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The new name is stored in the transaction metadata, on commit,
the pointers to the old and new names are swapped.
Therefore in abort and commit case we have to free the
pointer in the chain_trans container.
In commit case, the pointer can be used by another cpu that
is currently dumping the renamed chain, thus kfree needs to
happen after waiting for rcu readers to complete.
Fixes: b7263e071a ("netfilter: nf_tables: Allow chain name of up to 255 chars")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
no need to store the name in separate area.
Furthermore, it uses kmalloc but not kfree and most accesses seem to treat
it as char[IFNAMSIZ] not char *.
Remove this and use dev->name instead.
In case event zeroed dev, just omit the name in the dump.
Fixes: d92191aa84 ("netfilter: nf_tables: cache device name in flowtable object")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
I noticed the "--version" option of the llvm-objcopy command has recently
disappeared from the master llvm branch. It is currently used as a BTF
support test in tools/testing/selftests/bpf/Makefile.
This patch replaces it with "--help" which should be
less error prone in the future.
Fixes: c0fa1b6c3e ("bpf: btf: Add BTF tests")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch shrinks the BTF_INT_BITS() mask. The current
btf_int_check_meta() ensures the nr_bits of an integer
cannot exceed 64. Hence, it is mostly an uapi cleanup.
The actual btf usage (i.e. seq_show()) is also modified
to use u8 instead of u16. The verification (e.g. btf_int_check_meta())
path stays as is to deal with invalid BTF situation.
Fixes: 69b693f0ae ("bpf: btf: Introduce BPF Type Format (BTF)")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Arguments of 'pin' subcommand should be checked
at the very beginning of do_pin_any().
Otherwise segfault errors can occur when using
'map pin' or 'prog pin' commands, so fix it.
# bpftool prog pin id
Segmentation fault
Fixes: 71bb428fe2 ("tools: bpf: add bpftool")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The calculation of "wqe_size" is not correct when the tx queue is busy in
hinic_xmit_frame().
When there are no free WQEs, the tx flow will unmap the skb buffer, then
ring the doobell for the pending packets. But the "wqe_size" which used
to calculate the doorbell address is not correct. The wqe size should be
cleared to 0, otherwise, it will cause a doorbell error.
This patch fixes the problem.
Reported-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was detected by the self-test thanks to Ard's chunking patch.
I finally got around to testing this out on my ancient Via box. It
turns out that the workaround got the assembly wrong and we end up
doing count + initial cycles of the loop instead of just count.
This obviously causes corruption, either by overwriting the source
that is yet to be processed, or writing over the end of the buffer.
On CPUs that don't require the workaround only ECB is affected.
On Nano CPUs both ECB and CBC are affected.
This patch fixes it by doing the subtraction prior to the assembly.
Fixes: a76c1c23d0 ("crypto: padlock-aes - work around Nano CPU...")
Cc: <stable@vger.kernel.org>
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
During unload process, the chip can encounter problem where a FW dump would
be captured. For this case, the full reset sequence will be skip to bring
the chip back to full operational state.
Fixes: e315cd28b9 ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use chip shutdown at the start of unload to stop all DMA + traffic and
bring down the laser. This prevents any link activities from triggering the
driver to be re-engaged.
Fixes: 4b60c82736 ("scsi: qla2xxx: Add fw_started flags to qpair")
Cc: <stable@vger.kernel.org> #4.16
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In case of IOCB Queue full or system where memory is low and driver
receives large number of RSCN storm, the stale sp pointer can stay on
gpnid_list resulting in page_fault.
This patch fixes this issue by initializing the sp->elem list head and
removing sp->elem before memory is freed.
Following stack trace is seen
9 [ffff987b37d1bc60] page_fault at ffffffffad516768 [exception RIP: qla24xx_async_gpnid+496]
10 [ffff987b37d1bd10] qla24xx_async_gpnid at ffffffffc039866d [qla2xxx]
11 [ffff987b37d1bd80] qla2x00_do_work at ffffffffc036169c [qla2xxx]
12 [ffff987b37d1be38] qla2x00_do_dpc_all_vps at ffffffffc03adfed [qla2xxx]
13 [ffff987b37d1be78] qla2x00_do_dpc at ffffffffc036458a [qla2xxx]
14 [ffff987b37d1bec8] kthread at ffffffffacebae31
Fixes: 2d73ac6102 ("scsi: qla2xxx: Serialize GPNID for multiple RSCN")
Cc: <stable@vger.kernel.org> # v4.17+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Daniel Borkmann says:
====================
This set adds a ppc64 JIT fix for xadd as well as a missing test
case for verifying whether xadd messes with src/dst reg. Thanks!
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We currently do not have such a test case in test_verifier selftests
but it's important to test under bpf_jit_enable=1 to make sure JIT
implementations do not mistakenly mess with src/dst reg for xadd/{w,dw}.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
None of the JITs is allowed to implement exit paths from the BPF
insn mappings other than BPF_JMP | BPF_EXIT. In the BPF core code
we have a couple of rewrites in eBPF (e.g. LD_ABS / LD_IND) and
in eBPF to cBPF translation to retain old existing behavior where
exceptions may occur; they are also tightly controlled by the
verifier where it disallows some of the features such as BPF to
BPF calls when legacy LD_ABS / LD_IND ops are present in the BPF
program. During recent review of all BPF_XADD JIT implementations
I noticed that the ppc64 one is buggy in that it contains two
jumps to exit paths. This is problematic as this can bypass verifier
expectations e.g. pointed out in commit f6b1b3bf0d ("bpf: fix
subprog verifier bypass by div/mod by 0 exception"). The first
exit path is obsoleted by the fix in ca36960211 ("bpf: allow xadd
only on aligned memory") anyway, and for the second one we need to
do a fetch, add and store loop if the reservation from lwarx/ldarx
was lost in the meantime.
Fixes: 156d0e290e ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Tested-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We get egress rules through the egdev mechanism when the ingress device
is not supporting offload, with the expected use-case of tunnel decap
ingress rule set on shared tunnel device.
Make sure to offload egress/egdev rules only if decap action (tunnel key
unset) exists there and err otherwise.
Fixes: 717503b9cf ("net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix bad alignment of SQ buffer in fragmented QP allocation.
It should start directly after RQ buffer ends.
Take special care of the end case where the RQ buffer does not occupy
a whole page. RQ size is a power of two, so would be the case only for
small RQ sizes (RQ size < PAGE_SIZE).
Fix wrong assignments for sqb->size (mistakenly assigned RQ size),
and for npages value of RQ and SQ.
Fixes: 3a2f703312 ("net/mlx5: Use order-0 allocations for all WQ types")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The flow counters binding support commit introduced a code change where
none NULL 'rule_dest' is always passed to mlx5_add_flow_rules, this breaks
'DON'T_TRAP' rules insertion.
The fix uses the equivalent 'dest_num' value instead of dest pointer
at the failed check.
fixes: 3b3233fbf0 ('IB/mlx5: Add flow counters binding support')
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Driver is yet to support aRFS for encapsulated packets, return early
error in such case.
Fixes: 18c908e477 ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Quota should follow the amount of rules which do expire, and not the
number of rules that were examined, fixed that.
Fixes: 18c908e477 ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When driver converts HW timestamp to wall clock time it subtracts
the last saved cycle counter from the HW timestamp and converts the
difference to nanoseconds.
The conversion is done by multiplying the cycles difference with the
clock multiplier value as a first step and therefore the cycles
difference should be small enough so that the multiplication product
doesn't exceed 64bit.
The overflow handling routine is in charge of updating the last saved
cycle counter in driver and it is called periodically using kernel
delayed workqueue.
The delay period for this work is calculated using the max HW cycle
counter value (a 41 bit mask) as a base which doesn't take the 64bit
limit into account so the delay period may be incorrect and too
long to prevent a large difference between the HW counter and the last
saved counter in SW.
This change adjusts the work period for the HW clock overflow work by
taking the minimum between the previous value and the quotient of max
u64 value and the clock multiplier value.
Fixes: ef9814deaf ("net/mlx5e: Add HW timestamping (TS) support")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Removed an error message received when configuring ETS total
bandwidth to be zero.
Our hardware doesn't support such configuration, so we shall
reject it in the driver. Nevertheless, we removed the error message
in order to eliminate error messages caused by old userspace tools
who try to pass such configuration.
Fixes: ff0891915c ("net/mlx5e: Fix ETS BW check")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The Lenovo LaVie Z laptop requires i8042 to be reset in order to
consistently detect its Elantech touchpad. The nomux and kbdreset
quirks are not sufficient.
It's possible the other LaVie Z models from NEC require this as well.
Cc: stable@vger.kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Fedora has integrated the jitter entropy daemon to work around slow
boot problems, especially on VM's that don't support virtio-rng:
https://bugzilla.redhat.com/show_bug.cgi?id=1572944
It's understandable why they did this, but the Jitter entropy daemon
works fundamentally on the principle: "the CPU microarchitecture is
**so** complicated and we can't figure it out, so it *must* be
random". Yes, it uses statistical tests to "prove" it is secure, but
AES_ENCRYPT(NSA_KEY, COUNTER++) will also pass statistical tests with
flying colors.
So if RDRAND is available, mix it into entropy submitted from
userspace. It can't hurt, and if you believe the NSA has backdoored
RDRAND, then they probably have enough details about the Intel
microarchitecture that they can reverse engineer how the Jitter
entropy daemon affects the microarchitecture, and attack its output
stream. And if RDRAND is in fact an honest DRNG, it will immeasurably
improve on what the Jitter entropy daemon might produce.
This also provides some protection against someone who is able to read
or set the entropy seed file.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Felipe writes:
usb: fixes for v4.18-rc5
With a total of 20 non-merge commits, we have accumulated quite a few
fixes. These include lot's of fixes the our audio gadget interface, a
build error fix for PPC64 builds for the frescale PHY driver,
sleep-while-atomic fixes on the r8a66597 UDC driver, 3-stage SETUP fix
for the aspeed-vhub UDC and some other misc fixes.
The list [1] of commits doing endianness fixes in USB subsystem is long
due to below quote from USB spec Revision 2.0 from April 27, 2000:
------------
8.1 Byte/Bit Ordering
Multiple byte fields in standard descriptors, requests, and responses
are interpreted as and moved over the bus in little-endian order, i.e.
LSB to MSB.
------------
This commit belongs to the same family.
[1] Example of endianness fixes in USB subsystem:
commit 14e1d56cbe ("usb: gadget: f_uac2: endianness fixes.")
commit 42370b8211 ("usb: gadget: f_uac1: endianness fixes.")
commit 63afd5cc78 ("USB: chaoskey: fix Alea quirk on big-endian hosts")
commit 74098c4ac7 ("usb: gadget: acm: fix endianness in notifications")
commit cdd7928df0 ("ACM gadget: fix endianness in notifications")
commit 323ece54e0 ("cdc-wdm: fix endianness bug in debug statements")
commit e102609f10 ("usb: gadget: uvc: Fix endianness mismatches")
list goes on
Fixes: 132fcb4608 ("usb: gadget: Add Audio Class 2.0 Driver")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Reviewed-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Make sure only to copy any actual data rather than the whole buffer,
when releasing the temporary buffer used for unaligned non-isochronous
transfers.
Taken directly from commit 0efd937e27 ("USB: ehci-tegra: fix inefficient
copy of unaligned buffers")
Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM)
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Antti Seppälä <a.seppala@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The commit 3bc04e28a0 ("usb: dwc2: host: Get aligned DMA in a more
supported way") introduced a common way to align DMA allocations.
The code in the commit aligns the struct dma_aligned_buffer but the
actual DMA address pointed by data[0] gets aligned to an offset from
the allocated boundary by the kmalloc_ptr and the old_xfer_buffer
pointers.
This is against the recommendation in Documentation/DMA-API.txt which
states:
Therefore, it is recommended that driver writers who don't take
special care to determine the cache line size at run time only map
virtual regions that begin and end on page boundaries (which are
guaranteed also to be cache line boundaries).
The effect of this is that architectures with non-coherent DMA caches
may run into memory corruption or kernel crashes with Unhandled
kernel unaligned accesses exceptions.
Fix the alignment by positioning the DMA area in front of the allocation
and use memory at the end of the area for storing the orginal
transfer_buffer pointer. This may have the added benefit of increased
performance as the DMA area is now fully aligned on all architectures.
Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM).
Fixes: 3bc04e28a0 ("usb: dwc2: host: Get aligned DMA in a more supported way")
Cc: <stable@vger.kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Antti Seppälä <a.seppala@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Commit 34962fb807 ("docs: Fix more broken references") replaced the
broken reference to rockchip,dwc3-usb-phy.txt binding for the Qualcomm
DWC3 binding (qcom-dwc3-usb-phy.txt). That's wrong, so replace that
reference for the correct ones.
Fixes: 34962fb807 ("docs: Fix more broken references")
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The tools/usb/ffs-test.c file defines cpu_to_le16/32 by using the C
library htole16/32 function calls. However, cpu_to_le16/32 are used when
initializing structures, i.e in a context where a function call is not
allowed.
It works fine on little endian systems because htole16/32 are defined by
the C library as no-ops. But on big-endian systems, they are actually
doing something, which might involve calling a function, causing build
failures, such as:
ffs-test.c:48:25: error: initializer element is not constant
#define cpu_to_le32(x) htole32(x)
^~~~~~~
ffs-test.c:128:12: note: in expansion of macro ‘cpu_to_le32’
.magic = cpu_to_le32(FUNCTIONFS_DESCRIPTORS_MAGIC_V2),
^~~~~~~~~~~
To solve this, we code cpu_to_le16/32 in a way that allows them to be
used when initializing structures. This fix was imported from
meta-openembedded/android-tools/fix-big-endian-build.patch written by
Thomas Petazzoni <thomas.petazzoni@free-electrons.com>.
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The Aspeed SoC has a memory ordering issue that (thankfully)
only affects the USB gadget device. A read back is necessary
after writing to memory and before letting the device DMA
from it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Variable maxpacket is being assigned but is never used hence it is
redundant and can be removed.
Cleans up clang warning:
warning: variable 'maxpacket' set but not used [-Wunused-but-set-variable]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
For unidirectional endpoints, the endpoint pointer will be NULL for the
unused direction. Check that the endpoint is active before
dereferencing this pointer.
Fixes: 1b4977c793 ("usb: dwc2: Update dwc2_handle_incomplete_isoc_in() function")
Fixes: 689efb2619 ("usb: dwc2: Update dwc2_handle_incomplete_isoc_out() function")
Fixes: d84845522d ("usb: dwc2: Update GINTSTS_GOUTNAKEFF interrupt handling")
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Fix build errors when built for PPC64:
These variables are only used on PPC32 so they don't need to be
initialized for PPC64.
../drivers/usb/phy/phy-fsl-usb.c: In function 'usb_otg_start':
../drivers/usb/phy/phy-fsl-usb.c:865:3: error: '_fsl_readl' undeclared (first use in this function); did you mean 'fsl_readl'?
_fsl_readl = _fsl_readl_be;
../drivers/usb/phy/phy-fsl-usb.c:865:16: error: '_fsl_readl_be' undeclared (first use in this function); did you mean 'fsl_readl'?
_fsl_readl = _fsl_readl_be;
../drivers/usb/phy/phy-fsl-usb.c:866:3: error: '_fsl_writel' undeclared (first use in this function); did you mean 'fsl_writel'?
_fsl_writel = _fsl_writel_be;
../drivers/usb/phy/phy-fsl-usb.c:866:17: error: '_fsl_writel_be' undeclared (first use in this function); did you mean 'fsl_writel'?
_fsl_writel = _fsl_writel_be;
../drivers/usb/phy/phy-fsl-usb.c:868:16: error: '_fsl_readl_le' undeclared (first use in this function); did you mean 'fsl_readl'?
_fsl_readl = _fsl_readl_le;
../drivers/usb/phy/phy-fsl-usb.c:869:17: error: '_fsl_writel_le' undeclared (first use in this function); did you mean 'fsl_writel'?
_fsl_writel = _fsl_writel_le;
and the sysfs "show" function return type should be ssize_t, not int:
../drivers/usb/phy/phy-fsl-usb.c:1042:49: error: initialization of 'ssize_t (*)(struct device *, struct device_attribute *, char *)' {aka 'long int (*)(struct device *, struct device_attribute *, char *)'} from incompatible pointer type 'int (*)(struct device *, struct device_attribute *, char *)' [-Werror=incompatible-pointer-types]
static DEVICE_ATTR(fsl_usb2_otg_state, S_IRUGO, show_fsl_usb2_otg_state, NULL);
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: linux-usb@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
When handling split transactions we will try to delay retry after
getting a NAK from the device. This works well for BULK transfers that
can be polled for essentially forever. Unfortunately, on slower systems
at boot time, when the kernel is busy enumerating all the devices (USB
or not), we issue a bunch of control requests (reading device
descriptors, etc). If we get a NAK for the IN part of the control
request and delay retry for too long (because the system is busy), we
may confuse the device when we finally get to reissue SSPLIT/CSPLIT IN
and the device will respond with STALL. As a result we end up with
failure to get device descriptor and will fail to enumerate the device:
[ 3.428801] usb 2-1.2.1: new full-speed USB device number 9 using dwc2
[ 3.508576] usb 2-1.2.1: device descriptor read/8, error -32
[ 3.699150] usb 2-1.2.1: device descriptor read/8, error -32
[ 3.891653] usb 2-1.2.1: new full-speed USB device number 10 using dwc2
[ 3.968859] usb 2-1.2.1: device descriptor read/8, error -32
...
Let's not delay retries of split CONTROL IN transfers, as this allows us
to reliably enumerate devices at boot time.
Fixes: 38d2b5fb75 ("usb: dwc2: host: Don't retry NAKed transactions right away")
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Substream period size potentially can be changed in runtime, however
this is not accounted in the data copying routine, the change replaces
the cached value with an actual value from substream runtime.
As a side effect the change also removes a potential division by zero
in u_audio_iso_complete() function, if there is a race with
uac_pcm_hw_free(), which sets prm->period_size to 0.
Fixes: 132fcb4608 ("usb: gadget: Add Audio Class 2.0 Driver")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
There is no necessity to copy PCM stream ring buffer area and size
properties to UAC private data structure, these values can be got
from substream itself.
The change gives more control on substream and avoid stale caching.
Fixes: 132fcb4608 ("usb: gadget: Add Audio Class 2.0 Driver")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
In u_audio_iso_complete, the runtime hw_ptr is updated before the
data is actually copied over to/from the buffer/dma area. When
ALSA uses this hw_ptr, the data may not actually be available to
be used. This causes trash/stale audio to play/record. This
patch updates the hw_ptr after the data has been copied to avoid
this.
Fixes: 132fcb4608 ("usb: gadget: Add Audio Class 2.0 Driver")
Signed-off-by: Joshua Frkuska <joshua_frkuska@mentor.com>
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Fix below smatch (v0.5.0-4443-g69e9094e11c1) warnings:
drivers/usb/gadget/function/u_audio.c:607 g_audio_setup() warn: strcpy() 'pcm_name' of unknown size might be too large for 'pcm->name'
drivers/usb/gadget/function/u_audio.c:614 g_audio_setup() warn: strcpy() 'card_name' of unknown size might be too large for 'card->driver'
drivers/usb/gadget/function/u_audio.c:615 g_audio_setup() warn: strcpy() 'card_name' of unknown size might be too large for 'card->shortname'
Below commits performed a similar 's/strcpy/strlcpy/' rework:
* v2.6.31 commit 8372d4980f ("ALSA: ctxfi - Fix PCM device naming")
* v4.14 commit 003d3e70db ("ALSA: ad1848: fix format string overflow warning")
* v4.14 commit 6d8b04de87 ("ALSA: cs423x: fix format string overflow warning")
Fixes: eb9fecb9e6 ("usb: gadget: f_uac2: split out audio core")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The driver may sleep in an interrupt handler.
The function call path (from bottom to top) in Linux-4.16.7 is:
[FUNC] r8a66597_queue(GFP_KERNEL)
drivers/usb/gadget/udc/r8a66597-udc.c, 1193:
r8a66597_queue in get_status
drivers/usb/gadget/udc/r8a66597-udc.c, 1301:
get_status in setup_packet
drivers/usb/gadget/udc/r8a66597-udc.c, 1381:
setup_packet in irq_control_stage
drivers/usb/gadget/udc/r8a66597-udc.c, 1508:
irq_control_stage in r8a66597_irq (interrupt handler)
To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC.
This bug is found by my static analysis tool (DSAC-2) and checked by
my code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The driver may sleep with holding a spinlock.
The function call paths (from bottom to top) in Linux-4.16.7 are:
[FUNC] msleep
drivers/usb/gadget/udc/r8a66597-udc.c, 839:
msleep in init_controller
drivers/usb/gadget/udc/r8a66597-udc.c, 96:
init_controller in r8a66597_usb_disconnect
drivers/usb/gadget/udc/r8a66597-udc.c, 93:
spin_lock in r8a66597_usb_disconnect
[FUNC] msleep
drivers/usb/gadget/udc/r8a66597-udc.c, 835:
msleep in init_controller
drivers/usb/gadget/udc/r8a66597-udc.c, 96:
init_controller in r8a66597_usb_disconnect
drivers/usb/gadget/udc/r8a66597-udc.c, 93:
spin_lock in r8a66597_usb_disconnect
To fix these bugs, msleep() is replaced with mdelay().
This bug is found by my static analysis tool (DSAC-2) and checked by
my code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
The current code is broken as it re-defines "req" inside the
if block, then goto out of it. Thus the request that ends
up being sent is not the one that was populated by the
code in question.
This fixes RNDIS driver autodetect by Windows 10 for me.
The bug was introduced by Chris rework to remove the local
queuing inside the if { } block of the redefined request.
Fixes: 636ba13aec ("usb: gadget: composite: remove duplicated code in OS desc handling")
Cc: <stable@vger.kernel.org> # v4.17
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
A couple of bugs in the driver are preventing SETUP packets
with an OUT data phase from working properly.
Interestingly those are incredibly rare (RNDIS typically
uses them and thus is broken without this fix).
The main problem was an incorrect register offset being
applied for arming RX on EP0. The other problem relates
to stalling such a packet before the data phase, in which
case we don't get an ACK cycle, and get the next SETUP
packet directly, so we shouldn't reject it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
If the server or network is misbehaving and we get an unexpected reply
we can sometimes miss the request not being started and wait on a
request and never get a response, or even double complete the same
request. Fix this by replacing the send_complete completion with just a
per command lock. Add a per command cookie as well so that we can know
if we're getting a double completion for a previous event. Also check
to make sure we dont have REQUEUED set as that means we raced with the
timeout handler and need to just let the retry occur.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We can race with the snd timeout and the per-request timeout and end up
requeuing the same request twice. We can't use the send_complete
completion to tell if everything is ok because we hold the tx_lock
during send, so the timeout stuff will block waiting to mark the socket
dead, and we could be marked complete and still requeue. Instead add a
flag to the socket so we know whether we've been requeued yet.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The MIPS implementation of pci_resource_to_user() introduced in v3.12 by
commit 4c2924b725 ("MIPS: PCI: Use pci_resource_to_user to map pci
memory space properly") incorrectly sets *end to the address of the
byte after the resource, rather than the last byte of the resource.
This results in userland seeing resources as a byte larger than they
actually are, for example a 32 byte BAR will be reported by a tool such
as lspci as being 33 bytes in size:
Region 2: I/O ports at 1000 [disabled] [size=33]
Correct this by subtracting one from the calculated end address,
reporting the correct address to userland.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Rui Wang <rui.wang@windriver.com>
Fixes: 4c2924b725 ("MIPS: PCI: Use pci_resource_to_user to map pci memory space properly")
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v3.12+
Patchwork: https://patchwork.linux-mips.org/patch/19829/
When the CSI is receiving from a bt.656 bus, include a check for
field type 'alternate' when determining whether to set CSI clock
mode to CCIR656_INTERLACED or CCIR656_PROGRESSIVE.
Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
If the second LVDS channel has been disabled in the DT when using dual-channel
mode we should not print a warning.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The LVDS signal integrity is only guaranteed when the correct enable
sequence (first IPU DI, then LDB) is used. If the LDB display output was
active before the imx-drm driver is loaded (like when a bootsplash was
active) the DI will be disabled by the full IPU reset we do when loading
the driver. The LDB control registers are not part of the IPU range and
thus will remain unchanged.
This leads to the LDB still being active when the DI is getting enabled,
effectively reversing the required enable sequence. Fix this by also
disabling the LDB on driver bind.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
drm_legacy_ctxbitmap_next() returns idr_alloc() which can return
-ENOMEM, -EINVAL or -ENOSPC none of which are -1 . but the call sites
of drm_legacy_ctxbitmap_next() seem to be assuming that the error case
would be -1 (original return of drm_ctxbitmap_next() prior to 2.6.23
was actually -1). Thus reenable error handling by checking for < 0.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: 62968144e6 ("drm: convert drm context code to use Linux idr")
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1531571532-22733-1-git-send-email-hofrat@osadl.org
Kishon writes:
phy: for 4.18-rc
*) Fix to get xhci working after disable<->enable cycle
*) Fix wrong enum used for status lines (also fixes a compilation
warning).
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
If softsynthx_read() is called with `count < 3`, `count - 3` wraps, causing
the loop to copy as much data as available to the provided buffer. If
softsynthx_read() is invoked through sys_splice(), this causes an
unbounded kernel write; but even when userspace just reads from it
normally, a small size could cause userspace crashes.
Fixes: 425e586cf9 ("speakup: add unicode variant of /dev/softsynth")
Cc: stable@vger.kernel.org
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
'hostif_mib_set_request_bool' function receives a bool as value and
send the received value with MIB_VALUE_TYPE_BOOL type. There is
one case where the value passed is not a boolean one but
'MCAST_FILTER_PROMISC' which is '2'. Call hostif_mib_set_request_int
instead for related multicast enumeration. This changes original
code behaviour but seems to be the right way to do this.
Fixes: 8ce76bff0e ("staging: ks7010: add new helpers to achieve mib set request and simplify code")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b83b8b1881 ("staging:r8188eu: Use lib80211 to support TKIP")
is causing 2 problems for me:
1) One boot the wifi on a laptop with a r8188eu wifi device would not
connect and dmesg contained an oops about scheduling while atomic
pointing to the tkip code. This went away after reverting the commit.
2) I reverted the revert to try and get the oops from 1. again to be able
to add it to this commit message. But now the system did connect to the
wifi only to print a whole bunch of oopses, followed by a hardfreeze a
few seconds later. Subsequent reboots also all lead to scenario 2. Until
I reverted the commit again.
Revert the commit fixes both issues making the laptop usable again.
Fixes: b83b8b1881 ("staging:r8188eu: Use lib80211 to support TKIP")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Ivan Safonov <insafonov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The calling convention of blk_get_request() has changed in lk 4.18; update
the comment in sg.c to match.
Fixes: ff005a0662 ("block: sanitize blk_get_request calling conventions")
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In iscsi_check_tmf_restrictions() task->hdr is dereferenced to print the
opcode, it is possible that task->hdr is NULL.
There are two cases based on opcode argument:
1. ISCSI_OP_SCSI_CMD - In this case alloc_pdu() is called
after iscsi_check_tmf_restrictions()
iscsi_prep_scsi_cmd_pdu() -> iscsi_check_tmf_restrictions() -> alloc_pdu().
Transport drivers allocate memory for iSCSI hdr in alloc_pdu() and assign
it to task->hdr. In case of TMF task->hdr will be NULL resulting in NULL
pointer dereference.
2. ISCSI_OP_SCSI_DATA_OUT - In this case transport driver can free the
memory for iSCSI hdr after transmitting the pdu so task->hdr can be NULL or
invalid.
This patch fixes this issue by removing task->hdr->opcode from the printk
statement.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- rounddown CXGBIT_MAX_ISO_PAYLOAD by csk->emss before calculating
max_iso_npdu to get max TCP payload in multiple of mss.
- call cxgbit_set_digest() before cxgbit_set_iso_npdu() to set
csk->submode, it is used in calculating number of iso pdus.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With commit 044e6e3d74: "ext4: don't update checksum of new
initialized bitmaps" the buffer valid bit will get set without
actually setting up the checksum for the allocation bitmap, since the
checksum will get calculated once we actually allocate an inode or
block.
If we are doing this, then we need to (re-)check the verified bit
after we take the block group lock. Otherwise, we could race with
another process reading and verifying the bitmap, which would then
complain about the checksum being invalid.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
The HPLL can be configured through a register (SCU24), however some
platforms chose to configure it through the strapping settings and do
not use the register. This was not noticed as the logic for bit 18 in
SCU24 was confused: set means programmed, but the driver read it as set
means strapped.
This gives us the correct HPLL value on Palmetto systems, from which
most of the peripheral clocks are generated.
Fixes: 5eda5d79e4 ("clk: Add clock driver for ASPEED BMC SoCs")
Cc: stable@vger.kernel.org # v4.15
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Commit 52cdbdd498 (driver core: correct device's shutdown order)
introduced a regression by breaking device shutdown on some systems.
Namely, the devices_kset_move_last() call in really_probe() added by
that commit is a mistake as it may cause parents to follow children
in the devices_kset list which then causes shutdown to fail. For
example, if a device has children before really_probe() is called
for it (which is not uncommon), that call will cause it to be
reordered after the children in the devices_kset list and the
ordering of that list will not reflect the correct device shutdown
order any more.
Also it causes the devices_kset list to be constantly reordered
until all drivers have been probed which is totally pointless
overhead in the majority of cases and it only covered an issue
with system shutdown, while system-wide suspend/resume potentially
had the same issue on the affected platforms (which was not covered).
Moreover, the shutdown issue originally addressed by the change in
really_probe() made by commit 52cdbdd498 is not present in 4.18-rc
any more, since dra7 started to use the sdhci-omap driver which
doesn't disable any regulators during shutdown, so the really_probe()
part of commit 52cdbdd498 can be safely reverted. [The original
issue was related to the omap_hsmmc driver used by dra7 previously.]
For the above reasons, revert the really_probe() modifications made
by commit 52cdbdd498.
The other code changes made by commit 52cdbdd498 are useful and
they need not be reverted.
Fixes: 52cdbdd498 (driver core: correct device's shutdown order)
Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/
Reported-by: Pingfan Liu <kernelfans@gmail.com>
Tested-by: Pingfan Liu <kernelfans@gmail.com>
Reviewed-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The inline data code was updating the raw inode directly; this is
problematic since if metadata checksums are enabled,
ext4_mark_inode_dirty() must be called to update the inode's checksum.
In addition, the jbd2 layer requires that get_write_access() be called
before the metadata buffer is modified. Fix both of these problems.
https://bugzilla.kernel.org/show_bug.cgi?id=200443
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Switching the CPU from the L2 or L3 frequencies (300 and 200 Mhz
respectively) to L0 frequency (1.2 Ghz) requires a significant amount
of time to let VDD stabilize to the appropriate voltage. This amount of
time is large enough that it cannot be covered by the hardware
countdown register. Due to this, the CPU might start operating at L0
before the voltage is stabilized, leading to CPU stalls.
To work around this problem, we prevent switching directly from the
L2/L3 frequencies to the L0 frequency, and instead switch to the L1
frequency in-between. The sequence therefore becomes:
1. First switch from L2/L3(200/300MHz) to L1(600MHZ)
2. Sleep 20ms for stabling VDD voltage
3. Then switch from L1(600MHZ) to L0(1200Mhz).
It is based on the work done by Ken Ma <make@marvell.com>
Cc: stable@vger.kernel.org
Fixes: 2089dc33ea ("clk: mvebu: armada-37xx-periph: add DVFS support for cpu clocks")
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
I was looking at usually suppressed gcc warnings,
[-Wimplicit-fallthrough=] in this case:
The code definitely looks like a break is missing here.
However I am not able to test the NL80211_IFTYPE_MESH_POINT,
nor do I actually know what might be :)
So please use this patch with caution and only if you are
able to do some testing.
Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
[johannes: looks obvious enough to apply as is, interesting
though that it never seems to have been a problem]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Previously, when an MMP-protected file system is remounted read-only,
the kmmpd thread would exit the next time it woke up (a few seconds
later), without resetting the MMP sequence number back to
EXT4_MMP_SEQ_CLEAN.
Fix this by explicitly killing the MMP thread when the file system is
remounted read-only.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger@dilger.ca>
Ext4_check_descriptors() was getting called before s_gdb_count was
initialized. So for file systems w/o the meta_bg feature, allocation
bitmaps could overlap the block group descriptors and ext4 wouldn't
notice.
For file systems with the meta_bg feature enabled, there was a
fencepost error which would cause the ext4_check_descriptors() to
incorrectly believe that the block allocation bitmap overlaps with the
block group descriptor blocks, and it would reject the mount.
Fix both of these problems.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
This is used by the host to talk to the BMC's PCIe slave device. The BMC
is not involved, but the clock needs to be enabled so the host can use
the device.
Fixes: 15ed8ce5f8 ("clk: aspeed: Register gated clocks")
Cc: stable@vger.kernel.org # 4.15
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Tested-by: Lei YU <mine260309@gmail.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Patch (7705bb7176 clk: qcom: mmcc-msm8996: leave all mmagic gdscs
and clocks always enabled") makes all mmgaic gdscs ALWAYS_ON.
The mmagic_bimc_gdsc is also needed to be turned on to get display
working on 8x96.
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Cc: Rajendra Nayak <rnayak@codeaurora.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Fixes: 7705bb7176 ("clk: qcom: mmcc-msm8996: leave all mmagic gdscs and clocks always enabled")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Pull Amlogic clk driver fixes from Jerome Brunet:
These are two simple fixes, yet the first one is quite important as it
solves boots hangs we've been having when FDIV2 gets disabled. This did
not show up before because this particular clock is heavily used and
only gets disabled for a very short period of time before modules (such
as ethernet or emmc) probe.
- fix boot issue with gxbb and gxl platforms
- fix racalculation error in the clk_audio_divider
* tag 'meson-clk-fixes-4.18-1' of https://github.com/BayLibre/clk-meson:
clk: meson: audio-divider is one based
clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL
On some systems, we come out of the bootloader with some
gates set with the clock "enabled" but the reset also
asserted.
Since 8a53fc511c "clk: aspeed: Prevent reset if clock is enabled"
we check that enabled bit in aspeed_clk_enabled(), and do
nothing if already set.
This breaks when the above scenario occurs, as the clock
is enabled, but the reset still needs to be lifted.
This patch fixes it by also checking the reset bit (if any)
and treating a gate in "reset" as being disabled.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fixes: 8a53fc511c "clk: aspeed: Prevent reset if clock is enabled"
Cc: Eddie James <eajames@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
The last-minute fold-in of the ENTRY() macro did change behavior:
instead of printing the symbolic name (e.g. "CLK_IS_BASIC"), it prints
the expansion of it (e.g. "(1UL << (5))").
Use "#" instead of __stringify() to fix this.
Fixes: a6059ab981 ("clk: Show symbolic clock flags in debugfs")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
The current implementation of cfg80211_rx_control_port assumed that the
caller could provide a contiguous region of memory for the control port
frame to be sent up to userspace. Unfortunately, many drivers produce
non-linear skbs, especially for data frames. This resulted in userspace
getting notified of control port frames with correct metadata (from
address, port, etc) yet garbage / nonsense contents, resulting in bad
handshakes, disconnections, etc.
mac80211 linearizes skbs containing management frames. But it didn't
seem worthwhile to do this for control port frames. Thus the signature
of cfg80211_rx_control_port was changed to take the skb directly.
nl80211 then takes care of obtaining control port frame data directly
from the (linear | non-linear) skb.
The caller is still responsible for freeing the skb,
cfg80211_rx_control_port does not take ownership of it.
Fixes: 6a671a50f8 ("nl80211: Add CMD_CONTROL_PORT_FRAME API")
Signed-off-by: Denis Kenzior <denkenz@gmail.com>
[fix some kernel-doc formatting, add fixes tag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 03e6275ae3 ("usb: chipidea: Fix ULPI on imx51") causes a kernel
hang on imx51 systems that use the ULPI interface and do not select the
CONFIG_USB_CHIPIDEA_ULPI option.
In order to avoid such potential misuse, let's always build the
chipidea ULPI code into the final ci_hdrc object.
Tested on a imx51-babbage board.
Fixes: 03e6275ae3 ("usb: chipidea: Fix ULPI on imx51")
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Submitters of device tree binding documentation may forget to CC
the subsystem maintainer if this is missing.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
As part of hw reconfig, only stations linked to AP interfaces are added
back to the driver ignoring those which are tied to AP_VLAN interfaces.
It is true that there could be stations tied to the AP_VLAN interface while
serving 4addr clients or when using AP_VLAN for VLAN operations; we should
be adding these stations back to the driver as part of hw reconfig, failing
to do so can cause functional issues.
In the case of ath10k driver, the following errors were observed.
ath10k_pci : failed to install key for non-existent peer XX:XX:XX:XX:XX:XX
Workqueue: events_freezable ieee80211_restart_work [mac80211]
(unwind_backtrace) from (show_stack+0x10/0x14)
(show_stack) (dump_stack+0x80/0xa0)
(dump_stack) (warn_slowpath_common+0x68/0x8c)
(warn_slowpath_common) (warn_slowpath_null+0x18/0x20)
(warn_slowpath_null) (ieee80211_enable_keys+0x88/0x154 [mac80211])
(ieee80211_enable_keys) (ieee80211_reconfig+0xc90/0x19c8 [mac80211])
(ieee80211_reconfig]) (ieee80211_restart_work+0x8c/0xa0 [mac80211])
(ieee80211_restart_work) (process_one_work+0x284/0x488)
(process_one_work) (worker_thread+0x228/0x360)
(worker_thread) (kthread+0xd8/0xec)
(kthread) (ret_from_fork+0x14/0x24)
Also while bringing down the AP VAP, WARN_ONs and errors related to peer
removal were observed.
ath10k_pci : failed to clear all peer wep keys for vdev 0: -2
ath10k_pci : failed to disassociate station: 8c:fd:f0:0a:8c:f5 vdev 0: -2
(unwind_backtrace) (show_stack+0x10/0x14)
(show_stack) (dump_stack+0x80/0xa0)
(dump_stack) (warn_slowpath_common+0x68/0x8c)
(warn_slowpath_common) (warn_slowpath_null+0x18/0x20)
(warn_slowpath_null) (sta_set_sinfo+0xb98/0xc9c [mac80211])
(sta_set_sinfo [mac80211]) (__sta_info_flush+0xf0/0x134 [mac80211])
(__sta_info_flush [mac80211]) (ieee80211_stop_ap+0xe8/0x390 [mac80211])
(ieee80211_stop_ap [mac80211]) (__cfg80211_stop_ap+0xe0/0x3dc [cfg80211])
(__cfg80211_stop_ap [cfg80211]) (cfg80211_stop_ap+0x30/0x44 [cfg80211])
(cfg80211_stop_ap [cfg80211]) (genl_rcv_msg+0x274/0x30c)
(genl_rcv_msg) (netlink_rcv_skb+0x58/0xac)
(netlink_rcv_skb) (genl_rcv+0x20/0x34)
(genl_rcv) (netlink_unicast+0x11c/0x204)
(netlink_unicast) (netlink_sendmsg+0x30c/0x370)
(netlink_sendmsg) (sock_sendmsg+0x70/0x84)
(sock_sendmsg) (___sys_sendmsg.part.3+0x188/0x228)
(___sys_sendmsg.part.3) (__sys_sendmsg+0x4c/0x70)
(__sys_sendmsg) (ret_fast_syscall+0x0/0x44)
These issues got fixed by adding the stations which are
tied to AP_VLANs back to the driver.
Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Kbuilt test robot reported:
drivers/phy/motorola/phy-mapphone-mdm6600.c:188:16: warning: is used
uninitialized in this function [-Wuninitialized]
val |= values[i] << i;
~~~~~~^~~
Looking at the phy_mdm6600_status() values does get initialized by
gpiod_get_array_value_cansleep(), but we are using wrong enum
in that function. Let's fix the use, both end up being three though
so urgent rush on this one AFAIK.
Fixes: 5d1ebbda03 ("phy: mapphone-mdm6600: Add USB PHY driver for
MDM6600 on Droid 4")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Unset is required to enable USB 3.0 PHY when XHCI reenabled in response
to setting PHY3_IDDQ_OVERRIDE in uninit().
Fixes: cd6f769fde ("phy: phy-brcm-usb-init: Power down USB 3.0 PHY when XHCI disabled")
Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Make sure we initialize *bno and *len, before jumping to out_bad_rec
label, and risk calling xfs_warn() with uninitialized variables.
Coverity: 100898
Coverity: 1437081
Coverity: 1437129
Coverity: 1437191
Coverity: 1437201
Coverity: 1437212
Coverity: 1437341
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This ought to be an omission in e619492323 ("esp: Fix memleaks on error
paths."). The memleak on error path in esp6_input is similar to esp_input
of esp4.
Fixes: e619492323 ("esp: Fix memleaks on error paths.")
Fixes: 3f29770723 ("ipsec: check return value of skb_to_sgvec always")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Since commit 48231f289e ("media: rc: drivers should produce alternate
pulse and space timing events"), on meson-ir we are regularly producing
errors. Reduce to warning level and only warn once to avoid flooding
the log.
A proper fix for meson-ir is going to be too large for v4.18.
Signed-off-by: Sean Young <sean@mess.org>
Cc: stable@vger.kernel.org # 4.17+
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
nlmsg_multicast() always frees the skb, so in case we cannot call
it we must do that ourselves.
Fixes: 21ee543edc ("xfrm: fix race between netns cleanup and state expire notification")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Fix missing dst_release() when local broadcast or multicast traffic is
xfrm policy blocked.
For IPv4 this results to dst leak: ip_route_output_flow() allocates
dst_entry via __ip_route_output_key() and passes it to
xfrm_lookup_route(). xfrm_lookup returns ERR_PTR(-EPERM) that is
propagated. The dst that was allocated is never released.
IPv4 local broadcast testcase:
ping -b 192.168.1.255 &
sleep 1
ip xfrm policy add src 0.0.0.0/0 dst 192.168.1.255/32 dir out action block
IPv4 multicast testcase:
ping 224.0.0.1 &
sleep 1
ip xfrm policy add src 0.0.0.0/0 dst 224.0.0.1/32 dir out action block
For IPv6 the missing dst_release() causes trouble e.g. when used in netns:
ip netns add TEST
ip netns exec TEST ip link set lo up
ip link add dummy0 type dummy
ip link set dev dummy0 netns TEST
ip netns exec TEST ip addr add fd00::1111 dev dummy0
ip netns exec TEST ip link set dummy0 up
ip netns exec TEST ping -6 -c 5 ff02::1%dummy0 &
sleep 1
ip netns exec TEST ip xfrm policy add src ::/0 dst ff02::1 dir out action block
wait
ip netns del TEST
After netns deletion we see:
[ 258.239097] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 268.279061] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 278.367018] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 288.375259] unregister_netdevice: waiting for lo to become free. Usage count = 2
Fixes: ac37e2515c ("xfrm: release dst_orig in case of error in xfrm_lookup()")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The audio divider is one based. This offset was mistakenly dropped from
recalc_rate() when migrating to clk_regmap.
Fixes: 88a4e12836 ("clk: meson: migrate the audio divider clock to clk_regmap")
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
On Amlogic Meson GXBB & GXL platforms, the SCPI Cortex-M4 Co-Processor
seems to be dependent on the FCLK_DIV2 to be operationnal.
The issue occurred since v4.17-rc1 by freezing the kernel boot when
the 'schedutil' cpufreq governor was selected as default :
[ 12.071837] scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version
domain-0 init dvfs: 4
[ 12.087757] hctosys: unable to open rtc device (rtc0)
[ 12.087907] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[ 12.102241] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
But when disabling the MMC driver, the boot finished but cpufreq failed to
change the CPU frequency :
[ 12.153045] cpufreq: __target_index: Failed to change cpu frequency: -5
A bisect between v4.16 and v4.16-rc1 gave
05f814402d ("clk: meson: add fdiv clock gates") to be the first bad commit.
This commit added support for the missing clock gates before the fixed PLL
fixed dividers (FCLK_DIVx) and the clock framework basically disabled
all the unused fixed dividers, thus disabled a critical clock path for
the SCPI Co-Processor.
This patch simply sets the FCLK_DIV2 gate as critical to ensure
nobody can disable it.
Fixes: 05f814402d ("clk: meson: add fdiv clock gates")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
[few corrections in the commit description]
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
When setting the skb->dst before doing the MTU check, the route PMTU
caching and reporting is done on the new dst which is about to be
released.
Instead, PMTU handling should be done using the original dst.
This is aligned with IPv4 VTI.
Fixes: ccd740cbc6 ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-06-07 09:31:42 +02:00
441 changed files with 4194 additions and 2225 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.