Age | Commit message (Collapse) | Author |
|
The vti6_rcv function performs some tests on the retrieved tunnel
including checking the IP protocol, the XFRM input policy, the
source and destination address.
In all but one places the skb is released in the error case. When
the input policy check fails the network packet is leaked.
Using the same goto-label discard in this case to fix this problem.
Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces")
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Xiumei found a panic in esp offload:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
RIP: 0010:esp_output_done+0x101/0x160 [esp4]
Call Trace:
? esp_output+0x180/0x180 [esp4]
cryptd_aead_crypt+0x4c/0x90
cryptd_queue_worker+0x6e/0xa0
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x35/0x40
It was caused by that skb secpath is used in esp_output_done() after it's
been released elsewhere.
The tx path for esp offload is:
__dev_queue_xmit()->
validate_xmit_skb_list()->
validate_xmit_xfrm()->
esp_xmit()->
esp_output_tail()->
aead_request_set_callback(esp_output_done) <--[1]
crypto_aead_encrypt() <--[2]
In [1], .callback is set, and in [2] it will trigger the worker schedule,
later on a kernel thread will call .callback(esp_output_done), as the call
trace shows.
But in validate_xmit_xfrm():
skb_list_walk_safe(skb, skb2, nskb) {
...
err = x->type_offload->xmit(x, skb2, esp_features); [esp_xmit]
...
}
When the err is -EINPROGRESS, which means this skb2 will be enqueued and
later gets encrypted and sent out by .callback later in a kernel thread,
skb2 should be removed fromt skb chain. Otherwise, it will get processed
again outside validate_xmit_xfrm(), which could release skb secpath, and
cause the panic above.
This patch is to remove the skb from the chain when it's enqueued in
cryptd_wq. While at it, remove the unnecessary 'if (!skb)' check.
Fixes: 3dca3f38cfb8 ("xfrm: Separate ESP handling from segmentation for GRO packets.")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
hlist_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Without doing verify_sec_ctx_len() check in xfrm_add_acquire(), it may be
out-of-bounds to access uctx->ctx_str with uctx->ctx_len, as noticed by
syz:
BUG: KASAN: slab-out-of-bounds in selinux_xfrm_alloc_user+0x237/0x430
Read of size 768 at addr ffff8880123be9b4 by task syz-executor.1/11650
Call Trace:
dump_stack+0xe8/0x16e
print_address_description.cold.3+0x9/0x23b
kasan_report.cold.4+0x64/0x95
memcpy+0x1f/0x50
selinux_xfrm_alloc_user+0x237/0x430
security_xfrm_policy_alloc+0x5c/0xb0
xfrm_policy_construct+0x2b1/0x650
xfrm_add_acquire+0x21d/0xa10
xfrm_user_rcv_msg+0x431/0x6f0
netlink_rcv_skb+0x15a/0x410
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x50e/0x6a0
netlink_sendmsg+0x8ae/0xd40
sock_sendmsg+0x133/0x170
___sys_sendmsg+0x834/0x9a0
__sys_sendmsg+0x100/0x1e0
do_syscall_64+0xe5/0x660
entry_SYSCALL_64_after_hwframe+0x6a/0xdf
So fix it by adding the missing verify_sec_ctx_len check there.
Fixes: 980ebd25794f ("[IPSEC]: Sync series - acquire insert")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
It's not sufficient to do 'uctx->len != (sizeof(struct xfrm_user_sec_ctx) +
uctx->ctx_len)' check only, as uctx->len may be greater than nla_len(rt),
in which case it will cause slab-out-of-bounds when accessing uctx->ctx_str
later.
This patch is to fix it by return -EINVAL when uctx->len > nla_len(rt).
Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
I forgot the 4in6/6in4 cases in my previous patch. Let's fix them.
Fixes: 95224166a903 ("vti[6]: fix packet tx through bpf_redirect()")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This patch to handle the asynchronous unregister
device event so the device IPsec offload resources
could be cleanly released.
Fixes: e4db5b61c572 ("xfrm: policy: remove pcpu policy cache")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Taehee Yoo says:
=====================
netdevsim: fix several bugs in netdevsim module
This patchset fixes several bugs in netdevsim module.
1. The first patch fixes using uninitialized resources
This patch fixes two similar problems, which is to use uninitialized
resources.
a) In the current code, {new/del}_device_store() use resource,
they are initialized by __init().
But, these functions could be called before __init() is finished.
So, accessing uninitialized data could occur and it eventually makes panic.
b) In the current code, {new/del}_port_store() uses resource,
they are initialized by new_device_store().
But thes functions could be called before new_device_store() is finished.
2. The second patch fixes another race condition.
The main problem is a race condition in {new/del}_port() and devlink reload
function.
These functions would allocate and remove resources. So these functions
should not be executed concurrently.
3. The third patch fixes a panic in nsim_dev_take_snapshot_write().
nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region.
But these data could be removed by both reload routine and
del_device_store(). And these functions could be executed concurrently.
4. The fourth patch fixes stack-out-of-bound in nsim_dev_debugfs_init().
nsim_dev_debugfs_init() provides only 16bytes for name pointer.
But, there are some case the name length is over 16bytes.
So, stack-out-of-bound occurs.
5. The fifth patch uses IS_ERR instead of IS_ERR_OR_NULL.
debugfs_create_{dir/file} doesn't return NULL.
So, IS_ERR() is more correct.
6. The sixth patch avoids kmalloc warning.
When too large memory allocation is requested by user-space, kmalloc
internally prints warning message.
That warning message is not necessary.
In order to avoid that, it adds __GFP_NOWARN.
7. The last patch removes an unused sdev.c file
Change log:
v2 -> v3:
- Use smp_load_acquire() and smp_store_release() for flag variables.
- Change variable names.
- Fix deadlock in second patch.
- Update lock variable comment.
- Add new patch for fixing panic in snapshot_write().
- Include Reviewed-by tags.
- Update some log messages and comment.
v1 -> v2:
- Splits a fixing race condition patch into two patches.
- Fix incorrect Fixes tags.
- Update comments
- Fix use-after-free
- Add a new patch, which removes an unused sdev.c file.
- Remove a patch, which tries to avoid debugfs warning.
=====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sdev.c code is merged into dev.c and is not used anymore.
it would be removed.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
vfnum buffer size and binary_len buffer size is received by user-space.
So, this buffer size could be too large. If so, kmalloc will internally
print a warning message.
This warning message is actually not necessary for the netdevsim module.
So, this patch adds __GFP_NOWARN.
Test commands:
modprobe netdevsim
echo 1 > /sys/bus/netdevsim/new_device
echo 1000000000 > /sys/devices/netdevsim1/sriov_numvfs
Splat looks like:
[ 357.847266][ T1000] WARNING: CPU: 0 PID: 1000 at mm/page_alloc.c:4738 __alloc_pages_nodemask+0x2f3/0x740
[ 357.850273][ T1000] Modules linked in: netdevsim veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrx
[ 357.852989][ T1000] CPU: 0 PID: 1000 Comm: bash Tainted: G B 5.5.0-rc5+ #270
[ 357.854334][ T1000] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 357.855703][ T1000] RIP: 0010:__alloc_pages_nodemask+0x2f3/0x740
[ 357.856669][ T1000] Code: 64 fe ff ff 65 48 8b 04 25 c0 0f 02 00 48 05 f0 12 00 00 41 be 01 00 00 00 49 89 47 0
[ 357.860272][ T1000] RSP: 0018:ffff8880b7f47bd8 EFLAGS: 00010246
[ 357.861009][ T1000] RAX: ffffed1016fe8f80 RBX: 1ffff11016fe8fae RCX: 0000000000000000
[ 357.861843][ T1000] RDX: 0000000000000000 RSI: 0000000000000017 RDI: 0000000000000000
[ 357.862661][ T1000] RBP: 0000000000040dc0 R08: 1ffff11016fe8f67 R09: dffffc0000000000
[ 357.863509][ T1000] R10: ffff8880b7f47d68 R11: fffffbfff2798180 R12: 1ffff11016fe8f80
[ 357.864355][ T1000] R13: 0000000000000017 R14: 0000000000000017 R15: ffff8880c2038d68
[ 357.865178][ T1000] FS: 00007fd9a5b8c740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
[ 357.866248][ T1000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 357.867531][ T1000] CR2: 000055ce01ba8100 CR3: 00000000b7dbe005 CR4: 00000000000606f0
[ 357.868972][ T1000] Call Trace:
[ 357.869423][ T1000] ? lock_contended+0xcd0/0xcd0
[ 357.870001][ T1000] ? __alloc_pages_slowpath+0x21d0/0x21d0
[ 357.870673][ T1000] ? _kstrtoull+0x76/0x160
[ 357.871148][ T1000] ? alloc_pages_current+0xc1/0x1a0
[ 357.871704][ T1000] kmalloc_order+0x22/0x80
[ 357.872184][ T1000] kmalloc_order_trace+0x1d/0x140
[ 357.872733][ T1000] __kmalloc+0x302/0x3a0
[ 357.873204][ T1000] nsim_bus_dev_numvfs_store+0x1ab/0x260 [netdevsim]
[ 357.873919][ T1000] ? kernfs_get_active+0x12c/0x180
[ 357.874459][ T1000] ? new_device_store+0x450/0x450 [netdevsim]
[ 357.875111][ T1000] ? kernfs_get_parent+0x70/0x70
[ 357.875632][ T1000] ? sysfs_file_ops+0x160/0x160
[ 357.876152][ T1000] kernfs_fop_write+0x276/0x410
[ 357.876680][ T1000] ? __sb_start_write+0x1ba/0x2e0
[ 357.877225][ T1000] vfs_write+0x197/0x4a0
[ 357.877671][ T1000] ksys_write+0x141/0x1d0
[ ... ]
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 79579220566c ("netdevsim: add SR-IOV functionality")
Fixes: 82c93a87bf8b ("netdevsim: implement couple of testing devlink health reporters")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Debugfs APIs return valid pointer or error pointer. it doesn't return NULL.
So, using IS_ERR is enough, not using IS_ERR_OR_NULL.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When netdevsim dev is being created, a debugfs directory is created.
The variable "dev_ddir_name" is 16bytes device name pointer and device
name is "netdevsim<dev id>".
The maximum dev id length is 10.
So, 16bytes for device name isn't enough.
Test commands:
modprobe netdevsim
echo "1000000000 0" > /sys/bus/netdevsim/new_device
Splat looks like:
[ 249.622710][ T900] BUG: KASAN: stack-out-of-bounds in number+0x824/0x880
[ 249.623658][ T900] Write of size 1 at addr ffff88804c527988 by task bash/900
[ 249.624521][ T900]
[ 249.624830][ T900] CPU: 1 PID: 900 Comm: bash Not tainted 5.5.0+ #322
[ 249.625691][ T900] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 249.626712][ T900] Call Trace:
[ 249.627103][ T900] dump_stack+0x96/0xdb
[ 249.627639][ T900] ? number+0x824/0x880
[ 249.628173][ T900] print_address_description.constprop.5+0x1be/0x360
[ 249.629022][ T900] ? number+0x824/0x880
[ 249.629569][ T900] ? number+0x824/0x880
[ 249.630105][ T900] __kasan_report+0x12a/0x170
[ 249.630717][ T900] ? number+0x824/0x880
[ 249.631201][ T900] kasan_report+0xe/0x20
[ 249.631723][ T900] number+0x824/0x880
[ 249.632235][ T900] ? put_dec+0xa0/0xa0
[ 249.632716][ T900] ? rcu_read_lock_sched_held+0x90/0xc0
[ 249.633392][ T900] vsnprintf+0x63c/0x10b0
[ 249.633983][ T900] ? pointer+0x5b0/0x5b0
[ 249.634543][ T900] ? mark_lock+0x11d/0xc40
[ 249.635200][ T900] sprintf+0x9b/0xd0
[ 249.635750][ T900] ? scnprintf+0xe0/0xe0
[ 249.636370][ T900] nsim_dev_probe+0x63c/0xbf0 [netdevsim]
[ ... ]
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: ab1d0cc004d7 ("netdevsim: change debugfs tree topology")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region.
So, during this function, these data shouldn't be removed.
But there is no protecting stuff in this function.
There are two similar cases.
1. reload case
reload could be called during nsim_dev_take_snapshot_write().
When reload is being executed, nsim_dev_reload_down() is called and it
calls nsim_dev_reload_destroy(). nsim_dev_reload_destroy() calls
devlink_region_destroy() to destroy nsim_dev->dummy_region.
So, during nsim_dev_take_snapshot_write(), nsim_dev->dummy_region()
would be removed.
At this point, snapshot_write() would access freed pointer.
In order to fix this case, take_snapshot file will be removed before
devlink_region_destroy().
The take_snapshot file will be re-created by ->reload_up().
2. del_device_store case
del_device_store() also could call nsim_dev_reload_destroy()
during nsim_dev_take_snapshot_write(). If so, panic would occur.
This problem is actually the same problem with the first case.
So, this problem will be fixed by the first case's solution.
Test commands:
modprobe netdevsim
while :
do
echo 1 > /sys/bus/netdevsim/new_device &
echo 1 > /sys/bus/netdevsim/del_device &
devlink dev reload netdevsim/netdevsim1 &
echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/take_snapshot &
done
Splat looks like:
[ 45.564513][ T975] general protection fault, probably for non-canonical address 0xdffffc000000003a: 0000 [#1] SMP DEI
[ 45.566131][ T975] KASAN: null-ptr-deref in range [0x00000000000001d0-0x00000000000001d7]
[ 45.566135][ T975] CPU: 1 PID: 975 Comm: bash Not tainted 5.5.0+ #322
[ 45.569020][ T975] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 45.569026][ T975] RIP: 0010:__mutex_lock+0x10a/0x14b0
[ 45.570518][ T975] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[ 45.570522][ T975] RSP: 0018:ffff888046ccfbf0 EFLAGS: 00010206
[ 45.572305][ T975] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 45.572308][ T975] RDX: 000000000000003a RSI: ffffffffac926440 RDI: 00000000000001d0
[ 45.576843][ T975] RBP: ffff888046ccfd70 R08: ffffffffab610645 R09: 0000000000000000
[ 45.576847][ T975] R10: ffff888046ccfd90 R11: ffffed100d6360ad R12: 0000000000000000
[ 45.578471][ T975] R13: dffffc0000000000 R14: ffffffffae1976c0 R15: 0000000000000168
[ 45.578475][ T975] FS: 00007f614d6e7740(0000) GS:ffff88806c400000(0000) knlGS:0000000000000000
[ 45.581492][ T975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.582942][ T975] CR2: 00005618677d1cf0 CR3: 000000005fb9c002 CR4: 00000000000606e0
[ 45.584543][ T975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 45.586633][ T975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 45.589889][ T975] Call Trace:
[ 45.591445][ T975] ? devlink_region_snapshot_create+0x55/0x4a0
[ 45.601250][ T975] ? mutex_lock_io_nested+0x1380/0x1380
[ 45.602817][ T975] ? mutex_lock_io_nested+0x1380/0x1380
[ 45.603875][ T975] ? mark_held_locks+0xa5/0xe0
[ 45.604769][ T975] ? _raw_spin_unlock_irqrestore+0x2d/0x50
[ 45.606147][ T975] ? __mutex_unlock_slowpath+0xd0/0x670
[ 45.607723][ T975] ? crng_backtrack_protect+0x80/0x80
[ 45.613530][ T975] ? wait_for_completion+0x390/0x390
[ 45.615152][ T975] ? devlink_region_snapshot_create+0x55/0x4a0
[ 45.616834][ T975] devlink_region_snapshot_create+0x55/0x4a0
[ ... ]
Fixes: 4418f862d675 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
devlink reload destroys resources and allocates resources again.
So, when devices and ports resources are being used, devlink reload
function should not be executed. In order to avoid this race, a new
lock is added and new_port() and del_port() call devlink_reload_disable()
and devlink_reload_enable().
Thread0 Thread1
{new/del}_port() {new/del}_port()
devlink_reload_disable()
devlink_reload_disable()
devlink_reload_enable()
//here
devlink_reload_enable()
Before Thread1's devlink_reload_enable(), the devlink is already allowed
to execute reload because Thread0 allows it. devlink reload disable/enable
variable type is bool. So the above case would exist.
So, disable/enable should be executed atomically.
In order to do that, a new lock is used.
Test commands:
modprobe netdevsim
echo 1 > /sys/bus/netdevsim/new_device
while :
do
echo 1 > /sys/devices/netdevsim1/new_port &
echo 1 > /sys/devices/netdevsim1/del_port &
devlink dev reload netdevsim/netdevsim1 &
done
Splat looks like:
[ 23.342145][ T932] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
[ 23.342159][ T932] WARNING: CPU: 0 PID: 932 at kernel/locking/mutex-debug.c:103 mutex_destroy+0xc7/0xf0
[ 23.344182][ T932] Modules linked in: netdevsim openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_dx
[ 23.346485][ T932] CPU: 0 PID: 932 Comm: devlink Not tainted 5.5.0+ #322
[ 23.347696][ T932] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 23.348893][ T932] RIP: 0010:mutex_destroy+0xc7/0xf0
[ 23.349505][ T932] Code: e0 07 83 c0 03 38 d0 7c 04 84 d2 75 2e 8b 05 00 ac b0 02 85 c0 75 8b 48 c7 c6 00 5e 07 96 40
[ 23.351887][ T932] RSP: 0018:ffff88806208f810 EFLAGS: 00010286
[ 23.353963][ T932] RAX: dffffc0000000008 RBX: ffff888067f6f2c0 RCX: ffffffff942c4bd4
[ 23.355222][ T932] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff96dac5b4
[ 23.356169][ T932] RBP: ffff888067f6f000 R08: fffffbfff2d235a5 R09: fffffbfff2d235a5
[ 23.357160][ T932] R10: 0000000000000001 R11: fffffbfff2d235a4 R12: ffff888067f6f208
[ 23.358288][ T932] R13: ffff88806208fa70 R14: ffff888067f6f000 R15: ffff888069ce3800
[ 23.359307][ T932] FS: 00007fe2a3876740(0000) GS:ffff88806c000000(0000) knlGS:0000000000000000
[ 23.360473][ T932] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.361319][ T932] CR2: 00005561357aa000 CR3: 000000005227a006 CR4: 00000000000606f0
[ 23.362323][ T932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 23.363417][ T932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 23.364414][ T932] Call Trace:
[ 23.364828][ T932] nsim_dev_reload_destroy+0x77/0xb0 [netdevsim]
[ 23.365655][ T932] nsim_dev_reload_down+0x84/0xb0 [netdevsim]
[ 23.366433][ T932] devlink_reload+0xb1/0x350
[ 23.367010][ T932] genl_rcv_msg+0x580/0xe90
[ ...]
[ 23.531729][ T1305] kernel BUG at lib/list_debug.c:53!
[ 23.532523][ T1305] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 23.533467][ T1305] CPU: 2 PID: 1305 Comm: bash Tainted: G W 5.5.0+ #322
[ 23.534962][ T1305] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 23.536503][ T1305] RIP: 0010:__list_del_entry_valid+0xe6/0x150
[ 23.538346][ T1305] Code: 89 ea 48 c7 c7 00 73 1e 96 e8 df f7 4c ff 0f 0b 48 c7 c7 60 73 1e 96 e8 d1 f7 4c ff 0f 0b 44
[ 23.541068][ T1305] RSP: 0018:ffff888047c27b58 EFLAGS: 00010282
[ 23.542001][ T1305] RAX: 0000000000000054 RBX: ffff888067f6f318 RCX: 0000000000000000
[ 23.543051][ T1305] RDX: 0000000000000054 RSI: 0000000000000008 RDI: ffffed1008f84f61
[ 23.544072][ T1305] RBP: ffff88804aa0fca0 R08: ffffed100d940539 R09: ffffed100d940539
[ 23.545085][ T1305] R10: 0000000000000001 R11: ffffed100d940538 R12: ffff888047c27cb0
[ 23.546422][ T1305] R13: ffff88806208b840 R14: ffffffff981976c0 R15: ffff888067f6f2c0
[ 23.547406][ T1305] FS: 00007f76c0431740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[ 23.548527][ T1305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.549389][ T1305] CR2: 00007f5048f1a2f8 CR3: 000000004b310006 CR4: 00000000000606e0
[ 23.550636][ T1305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 23.551578][ T1305] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 23.552597][ T1305] Call Trace:
[ 23.553004][ T1305] mutex_remove_waiter+0x101/0x520
[ 23.553646][ T1305] __mutex_lock+0xac7/0x14b0
[ 23.554218][ T1305] ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[ 23.554908][ T1305] ? mutex_lock_io_nested+0x1380/0x1380
[ 23.555570][ T1305] ? _parse_integer+0xf0/0xf0
[ 23.556043][ T1305] ? kstrtouint+0x86/0x110
[ 23.556504][ T1305] ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[ 23.557133][ T1305] nsim_dev_port_del+0x4e/0x140 [netdevsim]
[ 23.558024][ T1305] del_port_store+0xcc/0xf0 [netdevsim]
[ ... ]
Fixes: 75ba029f3c07 ("netdevsim: implement proper devlink reload")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When module is being initialized, __init() calls bus_register() and
driver_register().
These functions internally create various resources and sysfs files.
The sysfs files are used for basic operations(add/del device).
/sys/bus/netdevsim/new_device
/sys/bus/netdevsim/del_device
These sysfs files use netdevsim resources, they are mostly allocated
and initialized in ->probe() function, which is nsim_dev_probe().
But, sysfs files could be executed before ->probe() is finished.
So, accessing uninitialized data would occur.
Another problem is very similar.
/sys/bus/netdevsim/new_device internally creates sysfs files.
/sys/devices/netdevsim<id>/new_port
/sys/devices/netdevsim<id>/del_port
These sysfs files also use netdevsim resources, they are mostly allocated
and initialized in creating device routine, which is nsim_bus_dev_new().
But they also could be executed before nsim_bus_dev_new() is finished.
So, accessing uninitialized data would occur.
To fix these problems, this patch adds flags, which means whether the
operation is finished or not.
The flag variable 'nsim_bus_enable' means whether netdevsim bus was
initialized or not.
This is protected by nsim_bus_dev_list_lock.
The flag variable 'nsim_bus_dev->init' means whether nsim_bus_dev was
initialized or not.
This could be used in {new/del}_port_store() with no lock.
Test commands:
#SHELL1
modprobe netdevsim
while :
do
echo "1 1" > /sys/bus/netdevsim/new_device
echo "1 1" > /sys/bus/netdevsim/del_device
done
#SHELL2
while :
do
echo 1 > /sys/devices/netdevsim1/new_port
echo 1 > /sys/devices/netdevsim1/del_port
done
Splat looks like:
[ 47.508954][ T1008] general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 I
[ 47.510793][ T1008] KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
[ 47.511963][ T1008] CPU: 2 PID: 1008 Comm: bash Not tainted 5.5.0+ #322
[ 47.512823][ T1008] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 47.514041][ T1008] RIP: 0010:__mutex_lock+0x10a/0x14b0
[ 47.514699][ T1008] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[ 47.517163][ T1008] RSP: 0018:ffff888059b4fbb0 EFLAGS: 00010206
[ 47.517802][ T1008] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 47.518941][ T1008] RDX: 0000000000000021 RSI: ffffffff85926440 RDI: 0000000000000108
[ 47.519732][ T1008] RBP: ffff888059b4fd30 R08: ffffffffc073fad0 R09: 0000000000000000
[ 47.520729][ T1008] R10: ffff888059b4fd50 R11: ffff88804bb38040 R12: 0000000000000000
[ 47.521702][ T1008] R13: dffffc0000000000 R14: ffffffff871976c0 R15: 00000000000000a0
[ 47.522760][ T1008] FS: 00007fd4be05a740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[ 47.523877][ T1008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 47.524627][ T1008] CR2: 0000561c82b69cf0 CR3: 0000000065dd6004 CR4: 00000000000606e0
[ 47.527662][ T1008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 47.528604][ T1008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 47.529531][ T1008] Call Trace:
[ 47.529874][ T1008] ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[ 47.530470][ T1008] ? mutex_lock_io_nested+0x1380/0x1380
[ 47.531018][ T1008] ? _kstrtoull+0x76/0x160
[ 47.531449][ T1008] ? _parse_integer+0xf0/0xf0
[ 47.531874][ T1008] ? kernfs_fop_write+0x1cf/0x410
[ 47.532330][ T1008] ? sysfs_file_ops+0x160/0x160
[ 47.532773][ T1008] ? kstrtouint+0x86/0x110
[ 47.533168][ T1008] ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[ 47.533721][ T1008] nsim_dev_port_add+0x50/0x150 [netdevsim]
[ 47.534336][ T1008] ? sysfs_file_ops+0x160/0x160
[ 47.534858][ T1008] new_port_store+0x99/0xb0 [netdevsim]
[ 47.535439][ T1008] ? del_port_store+0xb0/0xb0 [netdevsim]
[ 47.536035][ T1008] ? sysfs_file_ops+0x112/0x160
[ 47.536544][ T1008] ? sysfs_kf_write+0x3b/0x180
[ 47.537029][ T1008] kernfs_fop_write+0x276/0x410
[ 47.537548][ T1008] ? __sb_start_write+0x215/0x2e0
[ 47.538110][ T1008] vfs_write+0x197/0x4a0
[ ... ]
Fixes: f9d9db47d3ba ("netdevsim: add bus attributes to add new and delete devices")
Fixes: 794b2c05ca1c ("netdevsim: extend device attrs to support port addition and deletion")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Michael Chan says:
=====================
bnxt_en: Bug fixes
3 patches that fix some issues in the firmware reset logic, starting
with a small patch to refactor the code that re-enables SRIOV. The
last patch fixes a TC queue mapping issue.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver currently only calls netdev_set_tc_queue when the number of
TCs is greater than 1. Instead, the comparison should be greater than
or equal to 1. Even with 1 TC, we need to set the queue mapping.
This bug can cause warnings when the number of TCs is changed back to 1.
Fixes: 7809592d3e2e ("bnxt_en: Enable MSIX early in bnxt_init_one().")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current logic that calls pci_disable_device() in __bnxt_close_nic()
during firmware reset is flawed. If firmware is still alive, we're
disabling the device too early, causing some firmware commands to
not reach the firmware.
Fix it by moving the logic to bnxt_reset_close(). If firmware is
in fatal condition, we call pci_disable_device() before we free
any of the rings to prevent DMA corruption of the freed rings. If
firmware is still alive, we call pci_disable_device() after the
last firmware message has been sent.
Fixes: 3bc7d4a352ef ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bnxt_ulp_start() needs to be called before SRIOV is re-enabled after
firmware reset. Re-enabling SRIOV may consume all the resources and
may cause the RDMA driver to fail to get MSIX and other resources.
Fix it by calling bnxt_ulp_start() first before calling
bnxt_reenable_sriov().
We re-arrange the logic so that we call bnxt_ulp_start() and
bnxt_reenable_sriov() in proper sequence in bnxt_fw_reset_task() and
bnxt_open(). The former is the normal coordinated firmware reset sequence
and the latter is firmware reset while the function is down. This new
logic is now more straight forward and will now fix both scenarios.
Fixes: f3a6d206c25a ("bnxt_en: Call bnxt_ulp_stop()/bnxt_ulp_start() during error recovery.")
Reported-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Put the current logic in bnxt_open() to re-enable SRIOV after detecting
firmware reset into a new function bnxt_reenable_sriov(). This call
needs to be invoked in the firmware reset path also in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When running v5.5 with a rootfs on NFS, memory abort may happen in
the system resume stage:
Unable to handle kernel paging request at virtual address dead00000000012a
[dead00000000012a] address between user and kernel address ranges
pc : run_timer_softirq+0x334/0x3d8
lr : run_timer_softirq+0x244/0x3d8
x1 : ffff800011cafe80 x0 : dead000000000122
Call trace:
run_timer_softirq+0x334/0x3d8
efi_header_end+0x114/0x234
irq_exit+0xd0/0xd8
__handle_domain_irq+0x60/0xb0
gic_handle_irq+0x58/0xa8
el1_irq+0xb8/0x180
arch_cpu_idle+0x10/0x18
do_idle+0x1d8/0x2b0
cpu_startup_entry+0x24/0x40
secondary_start_kernel+0x1b4/0x208
Code: f9000693 a9400660 f9000020 b4000040 (f9000401)
---[ end trace bb83ceeb4c482071 ]---
Kernel panic - not syncing: Fatal exception in interrupt
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 2-3
Kernel Offset: disabled
CPU features: 0x00002,2300aa30
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
It's found that stmmac_xmit() and stmmac_resume() sometimes might
run concurrently, possibly resulting in a race condition between
mod_timer() and setup_timer(), being called by stmmac_xmit() and
stmmac_resume() respectively.
Since the resume() runs setup_timer() every time, it'd be safer to
have del_timer_sync() in the suspend() as the counterpart.
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
RxRPC fixes
Here are a number of fixes for AF_RXRPC:
(1) Fix a potential use after free in rxrpc_put_local() where it was
accessing the object just put to get tracing information.
(2) Fix insufficient notifications being generated by the function that
queues data packets on a call. This occasionally causes recvmsg() to
stall indefinitely.
(3) Fix a number of packet-transmitting work functions to hold an active
count on the local endpoint so that the UDP socket doesn't get
destroyed whilst they're calling kernel_sendmsg() on it.
(4) Fix a NULL pointer deref that stemmed from a call's connection pointer
being cleared when the call was disconnected.
Changes:
v2: Removed a couple of BUG() statements that got added.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a call is disconnected, the connection pointer from the call is
cleared to make sure it isn't used again and to prevent further attempted
transmission for the call. Unfortunately, there might be a daemon trying
to use it at the same time to transmit a packet.
Fix this by keeping call->conn set, but setting a flag on the call to
indicate disconnection instead.
Remove also the bits in the transmission functions where the conn pointer is
checked and a ref taken under spinlock as this is now redundant.
Fixes: 8d94aa381dab ("rxrpc: Calls shouldn't hold socket refs")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
SeongJae Park says:
====================
Fix reconnection latency caused by FIN/ACK handling race
The first patch fixes the problem by adjusting the first resend delay of
the SYN in the case. The second one adds a user space test to reproduce
this problem.
From v2
(https://lore.kernel.org/linux-kselftest/20200201071859.4231-1-sj38.park@gmail.com/)
- Use TCP_TIMEOUT_MIN as reduced delay (Neal Cardwall)
- Add Reviewed-by and Signed-off-by from Eric Dumazet
From v1
(https://lore.kernel.org/linux-kselftest/20200131122421.23286-1-sjpark@amazon.com/)
- Drop the trivial comment fix patch (Eric Dumazet)
- Limit the delay adjustment to only the first SYN resend (Eric Dumazet)
- selftest: Avoid use of hard-coded port number (Eric Dumazet)
- Explain RST/ACK and FIN/ACK has no big difference (Neal Cardwell)
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This commit adds a test for FIN_ACK process races related reconnection
latency spike issues. The issue has described and solved by the
previous commit ("tcp: Reduce SYN resend delay if a suspicous ACK is
received").
The test program is configured with a server and a client process. The
server creates and binds a socket to a port that dynamically allocated,
listen on it, and start a infinite loop. Inside the loop, it accepts
connection, reads 4 bytes from the socket, and closes the connection.
The client is constructed as an infinite loop. Inside the loop, it
creates a socket with LINGER and NODELAY option, connect to the server,
send 4 bytes data, try read some data from server. After the read()
returns, it measure the latency from the beginning of this loop to this
point and if the latency is larger than 1 second (spike), print a
message.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order. This is possible in RSS disabled environments such as a
connection inside a host.
For example, expected state transitions and required packets for the
disconnection will be similar to below flow.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 <--ACK---
07 FIN_WAIT_2
08 <--FIN/ACK---
09 TIME_WAIT
10 ---ACK-->
11 LAST_ACK
12 CLOSED CLOSED
In some cases such as LINGER option applied socket, the FIN and FIN/ACK
will be substituted to RST and RST/ACK, but there is no difference in
the main logic.
The acks in lines 6 and 8 are the acks. If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B. Thus, Process B will left in
CLOSE_WAIT status, as below.
00 (Process A) (Process B)
01 ESTABLISHED ESTABLISHED
02 close()
03 FIN_WAIT_1
04 ---FIN-->
05 CLOSE_WAIT
06 (<--ACK---)
07 (<--FIN/ACK---)
08 (fired in right order)
09 <--FIN/ACK---
10 <--ACK---
11 (processed in reverse order)
12 FIN_WAIT_2
Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number. Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection. If reconnections are frequent, the one second latency
spikes can be a big problem. Below is a tcpdump results of the problem:
14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
/* ONE SECOND DELAY */
15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
This commit mitigates the problem by reducing the delay for the next SYN
if the suspicous ACK is received while in SYN_SENT state.
Following commit will add a selftest, which can be also helpful for
understanding of this issue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 6d97985072dc ("isdn: move capi drivers to staging") cleaned up the
isdn drivers and split the MAINTAINERS section for ISDN, but missed to add
the terminal slash for the two directories mISDN and hardware. Hence, all
files in those directories were not part of the new ISDN/mISDN SUBSYSTEM,
but were considered to be part of "THE REST".
Rectify the situation, and while at it, also complete the section with two
further build files that belong to that subsystem.
This was identified with a small script that finds all files belonging to
"THE REST" according to the current MAINTAINERS file, and I investigated
upon its output.
Fixes: 6d97985072dc ("isdn: move capi drivers to staging")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix suspicious RCU usage in ipset, from Jozsef Kadlecsik.
2) Use kvcalloc, from Joe Perches.
3) Flush flowtable hardware workqueue after garbage collection run,
from Paul Blakey.
4) Missing flowtable hardware workqueue flush from nf_flow_table_free(),
also from Paul.
5) Restore NF_FLOW_HW_DEAD in flow_offload_work_del(), from Paul.
6) Flowtable documentation fixes, from Matteo Croce.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NLA_BINARY can be confusing, since .len value represents
the max size of the blob.
cls_rsvp really wants user space to provide long enough data
for TCA_RSVP_DST and TCA_RSVP_SRC attributes.
BUG: KMSAN: uninit-value in rsvp_get net/sched/cls_rsvp.h:258 [inline]
BUG: KMSAN: uninit-value in gen_handle net/sched/cls_rsvp.h:402 [inline]
BUG: KMSAN: uninit-value in rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572
CPU: 1 PID: 13228 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
rsvp_get net/sched/cls_rsvp.h:258 [inline]
gen_handle net/sched/cls_rsvp.h:402 [inline]
rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572
tc_new_tfilter+0x31fe/0x5010 net/sched/cls_api.c:2104
rtnetlink_rcv_msg+0xcb7/0x1570 net/core/rtnetlink.c:5415
netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg net/socket.c:659 [inline]
____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
___sys_sendmsg net/socket.c:2384 [inline]
__sys_sendmsg+0x451/0x5f0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45b349
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f269d43dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f269d43e6d4 RCX: 000000000045b349
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000009c2 R14: 00000000004cb338 R15: 000000000075bfd4
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
slab_alloc_node mm/slub.c:2774 [inline]
__kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382
__kmalloc_reserve net/core/skbuff.c:141 [inline]
__alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209
alloc_skb include/linux/skbuff.h:1049 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg net/socket.c:659 [inline]
____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
___sys_sendmsg net/socket.c:2384 [inline]
__sys_sendmsg+0x451/0x5f0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 6fa8c0144b77 ("[NET_SCHED]: Use nla_policy for attribute validation in classifiers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current maintainer Arvid Brodin <arvid.brodin@alten.se> hasn't
contributed to the kernel since 2015-02-27. His company mail address is
also bouncing and the company confirmed (2020-01-31) that no Arvid Brodin
is working for them:
> Vi har dessvärre ingen Arvid Brodin som arbetar på ALTEN.
A MIA person cannot be the maintainer. It is better to mark is as orphaned
until some other person can jump in and take over the responsibility for
HSR.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the qed_fw_overlay_mem_alloc() then we should return -ENOMEM instead
of success.
Fixes: 30d5f85895fa ("qed: FW 8.42.2.0 Add fw overlay feature")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The otx2_mbox_get_rsp() function never returns NULL, it returns error
pointers on error.
Fixes: 34bfe0ebedb7 ("octeontx2-pf: MTU, MAC and RX mode config support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tp->segs_in and tp->segs_out need to be cleared in tcp_disconnect().
tcp_disconnect() is rarely used, but it is worth fixing it.
Fixes: 2efd055c53c0 ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tp->data_segs_in and tp->data_segs_out need to be cleared
in tcp_disconnect().
tcp_disconnect() is rarely used, but it is worth fixing it.
Fixes: a44d6eacdaf5 ("tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tp->delivered needs to be cleared in tcp_disconnect().
tcp_disconnect() is rarely used, but it is worth fixing it.
Fixes: ddf1af6fa00e ("tcp: new delivery accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
total_retrans needs to be cleared in tcp_disconnect().
tcp_disconnect() is rarely used, but it is worth fixing it.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the flowtable documentation there is a missing semicolon, the command
as is would give this error:
nftables.conf:5:27-33: Error: syntax error, unexpected devices, expecting newline or semicolon
hook ingress priority 0 devices = { br0, pppoe-data };
^^^^^^^
nftables.conf:4:12-13: Error: invalid hook (null)
flowtable ft {
^^
Fixes: 19b351f16fd9 ("netfilter: add flowtable documentation")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
During the refactor this was accidently removed.
Fixes: ae29045018c8 ("netfilter: flowtable: add nf_flow_offload_tuple() helper")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If entries exist when freeing a hardware offload enabled table,
we queue work for hardware while running the gc iteration.
Execute it (flush) after queueing.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
On netdev down event, nf_flow_table_cleanup() is called for the relevant
device and it cleans all the tables that are on that device.
If one of those tables has hardware offload flag,
nf_flow_table_iterate_cleanup flushes hardware and then runs the gc.
But the gc can queue more hardware work, which will take time to execute.
Instead first add the work, then flush it, to execute it now.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Convert the uses of kvmalloc_array with __GFP_ZERO to
the equivalent kvcalloc.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When building arm32 allmodconfig:
ERROR: "__aeabi_uldivmod"
[drivers/net/ethernet/mellanox/mlxsw/mlxsw_spectrum.ko] undefined!
rate_bytes_ps has type u64, we need to use a 64-bit division helper to
avoid a build error.
Fixes: a44f58c41bfb ("mlxsw: spectrum_qdisc: Support offloading of TBF Qdisc")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Be sure to include all the packet type bits in the mask.
Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The probe() might enable a VDDIO regulator, which needs to be disabled
again before calling regulator_put(). Add a remove() function.
Fixes: 2f664823a470 ("net: phy: at803x: add device tree binding")
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If phydev->mii_ts is set by the PHY driver, it will always be
overwritten in of_mdiobus_register_phy(). Fix it. Also make sure, that
the unregister() doesn't do anything if the mii_timestamper was provided by
the PHY driver.
Fixes: 1dca22b18421 ("net: mdio: of: Register discovered MII time stampers.")
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
of_find_mii_timestamper() returns NULL if no timestamper is found.
Therefore, guard the unregister_mii_timestamper() calls.
Fixes: 1dca22b18421 ("net: mdio: of: Register discovered MII time stampers.")
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The introduction of a split between the reference count on rxrpc_local
objects and the usage count didn't quite go far enough. A number of kernel
work items need to make use of the socket to perform transmission. These
also need to get an active count on the local object to prevent the socket
from being closed.
Fix this by getting the active count in those places.
Also split out the raw active count get/put functions as these places tend
to hold refs on the rxrpc_local object already, so getting and putting an
extra object ref is just a waste of time.
The problem can lead to symptoms like:
BUG: kernel NULL pointer dereference, address: 0000000000000018
..
CPU: 2 PID: 818 Comm: kworker/u9:0 Not tainted 5.5.0-fscache+ #51
...
RIP: 0010:selinux_socket_sendmsg+0x5/0x13
...
Call Trace:
security_socket_sendmsg+0x2c/0x3e
sock_sendmsg+0x1a/0x46
rxrpc_send_keepalive+0x131/0x1ae
rxrpc_peer_keepalive_worker+0x219/0x34b
process_one_work+0x18e/0x271
worker_thread+0x1a3/0x247
kthread+0xe6/0xeb
ret_from_fork+0x1f/0x30
Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
In rxrpc_input_data(), rxrpc_notify_socket() is called if the base sequence
number of the packet is immediately following the hard-ack point at the end
of the function. However, this isn't sufficient, since the recvmsg side
may have been advancing the window and then overrun the position in which
we're adding - at which point rx_hard_ack >= seq0 and no notification is
generated.
Fix this by always generating a notification at the end of the input
function.
Without this, a long call may stall, possibly indefinitely.
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Fix rxrpc_put_local() to not access local->debug_id after calling
atomic_dec_return() as, unless that returned n==0, we no longer have the
right to access the object.
Fixes: 06d9532fa6b3 ("rxrpc: Fix read-after-free in rxrpc_queue_local()")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Pull drm updates from Davbe Airlie:
"This is the main pull request for graphics for 5.6. Usual selection of
changes all over.
I've got one outstanding vmwgfx pull that touches mm so kept it
separate until after all of this lands. I'll try and get it to you
soon after this, but it might be early next week (nothing wrong with
code, just my schedule is messy)
This also hits a lot of fbdev drivers with some cleanups.
Other notables:
- vulkan timeline semaphore support added to syncobjs
- nouveau turing secureboot/graphics support
- Displayport MST display stream compression support
Detailed summary:
uapi:
- dma-buf heaps added (and fixed)
- command line add support for panel oreientation
- command line allow overriding penguin count
drm:
- mipi dsi definition updates
- lockdep annotations for dma_resv
- remove dma-buf kmap/kunmap support
- constify fb_ops in all fbdev drivers
- MST fix for daisy chained hotplug-
- CTA-861-G modes with VIC >= 193 added
- fix drm_panel_of_backlight export
- LVDS decoder support
- more device based logging support
- scanline alighment for dumb buffers
- MST DSC helpers
scheduler:
- documentation fixes
- job distribution improvements
panel:
- Logic PD type 28 panel support
- Jimax8729d MIPI-DSI
- igenic JZ4770
- generic DSI devicetree bindings
- sony acx424AKP panel
- Leadtek LTK500HD1829
- xinpeng XPP055C272
- AUO B116XAK01
- GiantPlus GPM940B0
- BOE NV140FHM-N49
- Satoz SAT050AT40H12R2
- Sharp LS020B1DD01D panels.
ttm:
- use blocking WW lock
i915:
- hw/uapi state separation
- Lock annotation improvements
- selftest improvements
- ICL/TGL DSI VDSC support
- VBT parsing improvments
- Display refactoring
- DSI updates + fixes
- HDCP 2.2 for CFL
- CML PCI ID fixes
- GLK+ fbc fix
- PSR fixes
- GEN/GT refactor improvments
- DP MST fixes
- switch context id alloc to xarray
- workaround updates
- LMEM debugfs support
- tiled monitor fixes
- ICL+ clock gating programming removed
- DP MST disable sequence fixed
- LMEM discontiguous object maps
- prefaulting for discontiguous objects
- use LMEM for dumb buffers if possible
- add LMEM mmap support
amdgpu:
- enable sync object timelines for vulkan
- MST atomic routines
- enable MST DSC support
- add DMCUB display microengine support
- DC OEM i2c support
- Renoir DC fixes
- Initial HDCP 2.x support
- BACO support for Arcturus
- Use BACO for runtime PM power save
- gfxoff on navi10
- gfx10 golden updates and fixes
- DCN support on POWER
- GFXOFF for raven1 refresh
- MM engine idle handlers cleanup
- 10bpc EDP panel fixes
- renoir watermark fixes
- SR-IOV fixes
- Arcturus VCN fixes
- GDDR6 training fixes
- freesync fixes
- Pollock support
amdkfd:
- unify more codepath with amdgpu
- use KIQ to setup HIQ rather than MMIO
radeon:
- fix vma fault handler race
- PPC DMA fix
- register check fixes for r100/r200
nouveau:
- mmap_sem vs dma_resv fix
- rewrite the ACR secure boot code for Turing
- TU10x graphics engine support (TU11x pending)
- Page kind mapping for turing
- 10-bit LUT support
- GP10B Tegra fixes
- HD audio regression fix
hisilicon/hibmc:
- use generic fbdev code and helpers
rockchip:
- dsi/px30 support
virtio:
- fb damage support
- static some functions
vc4:
- use dma_resv lock wrappers
msm:
- use dma_resv lock wrappers
- sc7180 display + DSI support
- a618 support
- UBWC support improvements
vmwgfx:
- updates + new logging uapi
exynos:
- enable/disable callback cleanups
etnaviv:
- use dma_resv lock wrappers
atmel-hlcdc:
- clock fixes
mediatek:
- cmdq support
- non-smooth cursor fixes
- ctm property support
sun4i:
- suspend support
- A64 mipi dsi support
rcar-du:
- Color management module support
- LVDS encoder dual-link support
- R8A77980 support
analogic:
- add support for an6345
ast:
- atomic modeset support
- primary plane garbage fix
arcgpu:
- fixes for fourcc handling
tegra:
- minor fixes and improvments
mcde:
- vblank support
meson:
- OSD1 plane AFBC commit
gma500:
- add pageflip support
- reomve global drm_dev
komeda:
- tweak debugfs output
- d32 support
- runtime PM suppotr
udl:
- use generic shmem helpers
- cleanup and fixes"
* tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm: (1998 commits)
drm/nouveau/fb/gp102-: allow module to load even when scrubber binary is missing
drm/nouveau/acr: return error when registering LSF if ACR not supported
drm/nouveau/disp/gv100-: not all channel types support reporting error codes
drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
drm/nouveau: support synchronous pushbuf submission
drm/nouveau: signal pending fences when channel has been killed
drm/nouveau: reject attempts to submit to dead channels
drm/nouveau: zero vma pointer even if we only unreference it rather than free
drm/nouveau: Add HD-audio component notifier support
drm/nouveau: fix build error without CONFIG_IOMMU_API
drm/nouveau/kms/nv04: remove set but not used variable 'width'
drm/nouveau/kms/nv50: remove set but not unused variable 'nv_connector'
drm/nouveau/mmu: fix comptag memory leak
drm/nouveau/gr/gp10b: Use gp100_grctx and gp100_gr_zbc
drm/nouveau/pmu/gm20b,gp10b: Fix Falcon bootstrapping
drm/exynos: Rename Exynos to lowercase
drm/exynos: change callback names
drm/mst: Don't do atomic checks over disabled managers
drm/amdgpu: add the lost mutex_init back
drm/amd/display: skip opp blank or unblank if test pattern enabled
...
|