-
Notifications
You must be signed in to change notification settings - Fork 1
vcnl3020 hwmon iio driver #2
base: master
Are you sure you want to change the base?
Conversation
const char *buf, size_t size) | ||
{ | ||
s32 rc; | ||
unsigned long data = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
{ | ||
s32 rc; | ||
unsigned long data = 0; | ||
unsigned long val = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove initialize value
hwmon_dev = devm_hwmon_device_register_with_groups(&pdev->dev, | ||
VCNL_DRV_NAME, | ||
vcnl3020_data, | ||
vcnl3020_hwmon_groups); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to format code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_patch.pl doesn't show anything bad here, it doesn't fit with 80 lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awful. Maybe add a CR right after hwmon_dev =
static int32_t vcnl3020_init(struct vcnl3020_data *data) | ||
{ | ||
s32 rc; | ||
u32 proximity_rate, led_current, threshold, count_exceed, param_val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do declarations in that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and in which way it should be done then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u32 proximity_rate;
u32 led_current;
... and so on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://elixir.bootlin.com/linux/latest/source/kernel/async.c#L114
it is not prohibited in linux kernel guideline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is generally better to have one declaration per line as it simplifies review of future changes. Added/removed text in the middle of a line is harder to review than added/removed lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I can make it per line, both variants are fine.
static int32_t vcnl3020_measure_proximity(struct vcnl3020_data *data, | ||
int32_t *val) | ||
{ | ||
s32 tries = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I say to my kids, if someone does bad things, it's not a justification for doing the same yourself. Please try to understand where this 20 comes from (probably some datasheet) and replace it with a relevant macro. Even if you don't find where it comes from, replace it with a macro like VCLN3020_MEASURE_TRIES
or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes it's justified to use local variable without definition, this case one of this, it just local variable which describes itself purpose of it.
goto fail; | ||
if (rc & VCNL_PS_RDY) | ||
break; | ||
msleep(20); /* measurement takes up to 100 ms */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sleep for 20, comment for 100. Is it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lame excuse again. :) Don't do it. Read what you copy-paste. And don't do magic numbers too.
rc = i2c_smbus_write_byte_data(data->client, VCNL_COMMAND, cmdreg); | ||
if (rc < 0) | ||
goto fail; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refracting is needed. You can return rc an any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand what you meant here, but if you were saying that Ivan needed to put a return rc
right here, then you're wrong, it would be a violation of rule 7 of Linux Kernel Coding Style. Centralized exiting of functions is preffered by Linus (and me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean replace
mutex_unlock(&data->vcnl3020_lock);
return 0;
fail:
mutex_unlock(&data->vcnl3020_lock);
return rc;
with
fail:
mutex_unlock(&data->vcnl3020_lock);
return rc;
or something like that
struct vcnl3020_data *data = iio_priv(indio_dev); | ||
|
||
switch (mask) { | ||
case IIO_CHAN_INFO_RAW: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch with 1 case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this reads quite ok. I'm not sure that an if ()
would make it look better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, format your code before review. For now it looks like a student work ) |
|
||
return !!(isr & BIT(INT_TH_LOW)); | ||
} | ||
EXPORT_SYMBOL_GPL(vcnl3020_read_isr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made a mistake - EXPORT_SYMBOL_GPL(vcnl3020_intrusion);
if (rc < 0) | ||
goto fail; | ||
|
||
mutex_unlock(&data->vcnl3020_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutex_unlock and return 0; should be removed here btw, it just will go through.
static int32_t vcnl3020_init(struct vcnl3020_data *data) | ||
{ | ||
s32 rc; | ||
u32 proximity_rate, led_current, threshold, count_exceed, param_val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
param_val is not needed here also, all params can be pointed directly in property_read.
hwmon_dev = devm_hwmon_device_register_with_groups(&pdev->dev, | ||
VCNL_DRV_NAME, | ||
vcnl3020_data, | ||
vcnl3020_hwmon_groups); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awful. Maybe add a CR right after hwmon_dev =
@@ -123,4 +123,9 @@ config VL53L0X_I2C | |||
To compile this driver as a module, choose M here: the | |||
module will be called vl53l0x-i2c. | |||
|
|||
config VCNL3020 | |||
tristate "VCNL 3000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3000? A proper description needed. E.g. "Vishay Semiconductor VCNL3020 proximity sensor" or smth. like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y, it should be changed + help section
static int32_t vcnl3020_init(struct vcnl3020_data *data) | ||
{ | ||
s32 rc; | ||
u32 proximity_rate, led_current, threshold, count_exceed, param_val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is generally better to have one declaration per line as it simplifies review of future changes. Added/removed text in the middle of a line is harder to review than added/removed lines.
|
||
rc = device_property_read_u32(dev, "proximity-rate", ¶m_val); | ||
if (rc) { | ||
dev_err(dev, "couldn't get proximity rate %x", rc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Couldn't
. Capitalize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I'll change messages in next version.
|
||
rc = device_property_read_u32(dev, "count-exceed", ¶m_val); | ||
if (rc) { | ||
dev_err(dev, "couldn't get count exceed %x", rc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize
} | ||
|
||
rc = i2c_smbus_read_byte_data(data->client, VCNL_COMMAND); | ||
dev_err(dev, "command data %x", rc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a dev_err()
? Looks more like dev_dbg()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was debug information, forgot to remove it.
mutex_lock(&data->vcnl3020_lock); | ||
isr = i2c_smbus_read_byte_data(data->client, VCNL_ISR); | ||
if (isr < 0) { | ||
dev_err(&data->client->dev, "Error reading interrupt status %x", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Error (%d) reading
...
goto fail; | ||
if (rc & VCNL_PS_RDY) | ||
break; | ||
msleep(20); /* measurement takes up to 100 ms */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lame excuse again. :) Don't do it. Read what you copy-paste. And don't do magic numbers too.
rc = i2c_smbus_read_byte_data(data->client, VCNL_PS_RESULT_HI); | ||
if (rc < 0) | ||
goto fail; | ||
*val = (rc & 0xff) << 8; | ||
dev_dbg(&data->client->dev, "result high byte 0x%x", rc); | ||
|
||
rc = i2c_smbus_read_byte_data(data->client, VCNL_PS_RESULT_LO); | ||
if (rc < 0) | ||
goto fail; | ||
*val |= rc & 0xff; | ||
dev_dbg(&data->client->dev, "result low byte 0x%x", rc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume, the data in HI/LO registers can't change once it is VCNL_PS_RDY
? For if it can (as happens with timers and ADCs), then you should re-read the HI byte and verify that it didn't change, and if it did, then re-read the LO byte and repeat. Otherwise you risk getting weird values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll check in doc.
goto out; | ||
|
||
dev_info(&client->dev, "Proximity sensor, Rev: %02x\n", | ||
VCNL3020_PROD_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are your sure you're printing Rev
here, and not the product ID ?
arm32 uses software to simulate the instruction replaced by kprobe. some instructions may be simulated by constructing assembly functions. therefore, before executing instruction simulation, it is necessary to construct assembly function execution environment in C language through binding registers. after kasan is enabled, the register binding relationship will be destroyed, resulting in instruction simulation errors and causing kernel panic. the kprobe emulate instruction function is distributed in three files: actions-common.c actions-arm.c actions-thumb.c, so disable KASAN when compiling these files. for example, use kprobe insert on cap_capable+20 after kasan enabled, the cap_capable assembly code is as follows: <cap_capable>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e1a05000 mov r5, r0 e280006c add r0, r0, torvalds#108 ; 0x6c e1a04001 mov r4, r1 e1a06002 mov r6, r2 e59fa090 ldr sl, [pc, torvalds#144] ; ebfc7bf8 bl c03aa4b4 <__asan_load4> e595706c ldr r7, [r5, torvalds#108] ; 0x6c e2859014 add r9, r5, torvalds#20 ...... The emulate_ldr assembly code after enabling kasan is as follows: c06f1384 <emulate_ldr>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e282803c add r8, r2, torvalds#60 ; 0x3c e1a05000 mov r5, r0 e7e37855 ubfx r7, r5, torvalds#16, #4 e1a00008 mov r0, r8 e1a09001 mov r9, r1 e1a04002 mov r4, r2 ebf35462 bl c03c6530 <__asan_load4> e357000f cmp r7, torvalds#15 e7e36655 ubfx r6, r5, torvalds#12, #4 e205a00f and sl, r5, torvalds#15 0a000001 beq c06f13bc <emulate_ldr+0x38> e0840107 add r0, r4, r7, lsl #2 ebf3545c bl c03c6530 <__asan_load4> e084010a add r0, r4, sl, lsl #2 ebf3545a bl c03c6530 <__asan_load4> e2890010 add r0, r9, torvalds#16 ebf35458 bl c03c6530 <__asan_load4> e5990010 ldr r0, [r9, torvalds#16] e12fff30 blx r0 e356000f cm r6, torvalds#15 1a000014 bne c06f1430 <emulate_ldr+0xac> e1a06000 mov r6, r0 e2840040 add r0, r4, torvalds#64 ; 0x40 ...... when running in emulate_ldr to simulate the ldr instruction, panic occurred, and the log is as follows: Unable to handle kernel NULL pointer dereference at virtual address 00000090 pgd = ecb46400 [00000090] *pgd=2e0fa003, *pmd=00000000 Internal error: Oops: 206 [#1] SMP ARM PC is at cap_capable+0x14/0xb0 LR is at emulate_ldr+0x50/0xc0 psr: 600d0293 sp : ecd63af8 ip : 00000004 fp : c0a7c30c r10: 00000000 r9 : c30897f4 r8 : ecd63cd4 r7 : 0000000f r6 : 0000000a r5 : e59fa090 r4 : ecd63c98 r3 : c06ae294 r2 : 00000000 r1 : b7611300 r0 : bf4ec008 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 32c5387d Table: 2d546400 DAC: 55555555 Process bash (pid: 1643, stack limit = 0xecd60190) (cap_capable) from (kprobe_handler+0x218/0x340) (kprobe_handler) from (kprobe_trap_handler+0x24/0x48) (kprobe_trap_handler) from (do_undefinstr+0x13c/0x364) (do_undefinstr) from (__und_svc_finish+0x0/0x30) (__und_svc_finish) from (cap_capable+0x18/0xb0) (cap_capable) from (cap_vm_enough_memory+0x38/0x48) (cap_vm_enough_memory) from (security_vm_enough_memory_mm+0x48/0x6c) (security_vm_enough_memory_mm) from (copy_process.constprop.5+0x16b4/0x25c8) (copy_process.constprop.5) from (_do_fork+0xe8/0x55c) (_do_fork) from (SyS_clone+0x1c/0x24) (SyS_clone) from (__sys_trace_return+0x0/0x10) Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7) Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support") Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM") Signed-off-by: huangshaobo <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
Change the cifs filesystem to take account of the changes to fscache's indexing rewrite and reenable caching in cifs. The following changes have been made: (1) The fscache_netfs struct is no more, and there's no need to register the filesystem as a whole. (2) The session cookie is now an fscache_volume cookie, allocated with fscache_acquire_volume(). That takes three parameters: a string representing the "volume" in the index, a string naming the cache to use (or NULL) and a u64 that conveys coherency metadata for the volume. For cifs, I've made it render the volume name string as: "cifs,<ipaddress>,<sharename>" where the sharename has '/' characters replaced with ';'. This probably needs rethinking a bit as the total name could exceed the maximum filename component length. Further, the coherency data is currently just set to 0. It needs something else doing with it - I wonder if it would suffice simply to sum the resource_id, vol_create_time and vol_serial_number or maybe hash them. (3) The fscache_cookie_def is no more and needed information is passed directly to fscache_acquire_cookie(). The cache no longer calls back into the filesystem, but rather metadata changes are indicated at other times. fscache_acquire_cookie() is passed the same keying and coherency information as before. (4) The functions to set/reset cookies are removed and fscache_use_cookie() and fscache_unuse_cookie() are used instead. fscache_use_cookie() is passed a flag to indicate if the cookie is opened for writing. fscache_unuse_cookie() is passed updates for the metadata if we changed it (ie. if the file was opened for writing). These are called when the file is opened or closed. (5) cifs_setattr_*() are made to call fscache_resize() to change the size of the cache object. (6) The functions to read and write data are stubbed out pending a conversion to use netfslib. Changes ======= ver torvalds#8: - Abstract cache invalidation into a helper function. - Fix some checkpatch warnings[3]. ver torvalds#7: - Removed the accidentally added-back call to get the super cookie in cifs_root_iget(). - Fixed the right call to cifs_fscache_get_super_cookie() to take account of the "-o fsc" mount flag. ver torvalds#6: - Moved the change of gfpflags_allow_blocking() to current_is_kswapd() for cifs here. - Fixed one of the error paths in cifs_atomic_open() to jump around the call to use the cookie. - Fixed an additional successful return in the middle of cifs_open() to use the cookie on the way out. - Only get a volume cookie (and thus inode cookies) when "-o fsc" is supplied to mount. ver #5: - Fixed a couple of bits of cookie handling[2]: - The cookie should be released in cifs_evict_inode(), not cifsFileInfo_put_final(). The cookie needs to persist beyond file closure so that writepages will be able to write to it. - fscache_use_cookie() needs to be called in cifs_atomic_open() as it is for cifs_open(). ver #4: - Fixed the use of sizeof with memset. - tcon->vol_create_time is __le64 so doesn't need cpu_to_le64(). ver #3: - Canonicalise the cifs coherency data to make the cache portable. - Set volume coherency data. ver #2: - Use gfpflags_allow_blocking() rather than using flag directly. - Upgraded to -rc4 to allow for upstream changes[1]. - fscache_acquire_volume() now returns errors. Signed-off-by: David Howells <[email protected]> Acked-by: Jeff Layton <[email protected]> cc: Steve French <[email protected]> cc: Shyam Prasad N <[email protected]> cc: [email protected] cc: [email protected] Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23b55d673d7527b093cd97b7c217c82e70cd1af0 [1] Link: https://lore.kernel.org/r/[email protected]/ [2] Link: https://lore.kernel.org/r/CAH2r5muTanw9pJqzAHd01d9A8keeChkzGsCEH6=0rHutVLAF-A@mail.gmail.com/ [3] Link: https://lore.kernel.org/r/163819671009.215744.11230627184193298714.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906982979.143852.10672081929614953210.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967187187.1823006.247415138444991444.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021579335.640689.2681324337038770579.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/[email protected]/ # v5 Link: https://lore.kernel.org/r/[email protected]/ # v6 Signed-off-by: Steve French <[email protected]>
…y if PMI is pending Running selftest with CONFIG_PPC_IRQ_SOFT_MASK_DEBUG enabled in kernel triggered below warning: [ 172.851380] ------------[ cut here ]------------ [ 172.851391] WARNING: CPU: 8 PID: 2901 at arch/powerpc/include/asm/hw_irq.h:246 power_pmu_disable+0x270/0x280 [ 172.851402] Modules linked in: dm_mod bonding nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfkill nfnetlink sunrpc xfs libcrc32c pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse [ 172.851442] CPU: 8 PID: 2901 Comm: lost_exception_ Not tainted 5.16.0-rc5-03218-g798527287598 #2 [ 172.851451] NIP: c00000000013d600 LR: c00000000013d5a4 CTR: c00000000013b180 [ 172.851458] REGS: c000000017687860 TRAP: 0700 Not tainted (5.16.0-rc5-03218-g798527287598) [ 172.851465] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 48004884 XER: 20040000 [ 172.851482] CFAR: c00000000013d5b4 IRQMASK: 1 [ 172.851482] GPR00: c00000000013d5a4 c000000017687b00 c000000002a10600 0000000000000004 [ 172.851482] GPR04: 0000000082004000 c0000008ba08f0a8 0000000000000000 00000008b7ed0000 [ 172.851482] GPR08: 00000000446194f6 0000000000008000 c00000000013b118 c000000000d58e68 [ 172.851482] GPR12: c00000000013d390 c00000001ec54a80 0000000000000000 0000000000000000 [ 172.851482] GPR16: 0000000000000000 0000000000000000 c000000015d5c708 c0000000025396d0 [ 172.851482] GPR20: 0000000000000000 0000000000000000 c00000000a3bbf40 0000000000000003 [ 172.851482] GPR24: 0000000000000000 c0000008ba097400 c0000000161e0d00 c00000000a3bb600 [ 172.851482] GPR28: c000000015d5c700 0000000000000001 0000000082384090 c0000008ba0020d8 [ 172.851549] NIP [c00000000013d600] power_pmu_disable+0x270/0x280 [ 172.851557] LR [c00000000013d5a4] power_pmu_disable+0x214/0x280 [ 172.851565] Call Trace: [ 172.851568] [c000000017687b00] [c00000000013d5a4] power_pmu_disable+0x214/0x280 (unreliable) [ 172.851579] [c000000017687b40] [c0000000003403ac] perf_pmu_disable+0x4c/0x60 [ 172.851588] [c000000017687b60] [c0000000003445e4] __perf_event_task_sched_out+0x1d4/0x660 [ 172.851596] [c000000017687c50] [c000000000d1175c] __schedule+0xbcc/0x12a0 [ 172.851602] [c000000017687d60] [c000000000d11ea8] schedule+0x78/0x140 [ 172.851608] [c000000017687d90] [c0000000001a8080] sys_sched_yield+0x20/0x40 [ 172.851615] [c000000017687db0] [c0000000000334dc] system_call_exception+0x18c/0x380 [ 172.851622] [c000000017687e10] [c00000000000c74c] system_call_common+0xec/0x268 The warning indicates that MSR_EE being set(interrupt enabled) when there was an overflown PMC detected. This could happen in power_pmu_disable since it runs under interrupt soft disable condition ( local_irq_save ) and not with interrupts hard disabled. commit 2c9ac51 ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") intended to clear PMI pending bit in Paca when disabling the PMU. It could happen that PMC gets overflown while code is in power_pmu_disable callback function. Hence add a check to see if PMI pending bit is set in Paca before clearing it via clear_pmi_pending. Fixes: 2c9ac51 ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") Reported-by: Sachin Sant <[email protected]> Signed-off-by: Athira Rajeev <[email protected]> Tested-by: Sachin Sant <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
… devices Suppose we have an environment with a number of non-NPIV FCP devices (virtual HBAs / FCP devices / zfcp "adapter"s) sharing the same physical FCP channel (HBA port) and its I_T nexus. Plus a number of storage target ports zoned to such shared channel. Now one target port logs out of the fabric causing an RSCN. Zfcp reacts with an ADISC ELS and subsequent port recovery depending on the ADISC result. This happens on all such FCP devices (in different Linux images) concurrently as they all receive a copy of this RSCN. In the following we look at one of those FCP devices. Requests other than FSF_QTCB_FCP_CMND can be slow until they get a response. Depending on which requests are affected by slow responses, there are different recovery outcomes. Here we want to fix failed recoveries on port or adapter level by avoiding recovery requests that can be slow. We need the cached N_Port_ID for the remote port "link" test with ADISC. Just before sending the ADISC, we now intentionally forget the old cached N_Port_ID. The idea is that on receiving an RSCN for a port, we have to assume that any cached information about this port is stale. This forces a fresh new GID_PN [FC-GS] nameserver lookup on any subsequent recovery for the same port. Since we typically can still communicate with the nameserver efficiently, we now reach steady state quicker: Either the nameserver still does not know about the port so we stop recovery, or the nameserver already knows the port potentially with a new N_Port_ID and we can successfully and quickly perform open port recovery. For the one case, where ADISC returns successfully, we re-initialize port->d_id because that case does not involve any port recovery. This also solves a problem if the storage WWPN quickly logs into the fabric again but with a different N_Port_ID. Such as on virtual WWPN takeover during target NPIV failover. [https://www.redbooks.ibm.com/abstracts/redp5477.html] In that case the RSCN from the storage FDISC was ignored by zfcp and we could not successfully recover the failover. On some later failback on the storage, we could have been lucky if the virtual WWPN got the same old N_Port_ID from the SAN switch as we still had cached. Then the related RSCN triggered a successful port reopen recovery. However, there is no guarantee to get the same N_Port_ID on NPIV FDISC. Even though NPIV-enabled FCP devices are not affected by this problem, this code change optimizes recovery time for gone remote ports as a side effect. The timely drop of cached N_Port_IDs prevents unnecessary slow open port attempts. While the problem might have been in code before v2.6.32 commit 799b76d ("[SCSI] zfcp: Decouple gid_pn requests from erp") this fix depends on the gid_pn_work introduced with that commit, so we mark it as culprit to satisfy fix dependencies. Note: Point-to-point remote port is already handled separately and gets its N_Port_ID from the cached peer_d_id. So resetting port->d_id in general does not affect PtP. Link: https://lore.kernel.org/r/[email protected] Fixes: 799b76d ("[SCSI] zfcp: Decouple gid_pn requests from erp") Cc: <[email protected]> #2.6.32+ Suggested-by: Benjamin Block <[email protected]> Reviewed-by: Benjamin Block <[email protected]> Signed-off-by: Steffen Maier <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Add a test that sends large udp packet (which is fragmented) via a stateless nft nat rule, i.e. 'ip saddr set 10.2.3.4' and check that the datagram is received by peer. On kernels without commit 4e1860a ("netfilter: nft_payload: do not update layer 4 checksum when mangling fragments")', this will fail with: cmp: EOF on /tmp/tmp.V1q0iXJyQF which is empty -rw------- 1 root root 4096 Jan 24 22:03 /tmp/tmp.Aaqnq4rBKS -rw------- 1 root root 0 Jan 24 22:03 /tmp/tmp.V1q0iXJyQF ERROR: in and output file mismatch when checking udp with stateless nat FAIL: nftables v1.0.0 (Fearless Fosdick #2) On patched kernels, this will show: PASS: IP statless for ns2-PFp89amx Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Fix the following false positive warning: ============================= WARNING: suspicious RCU usage 5.16.0-rc4+ torvalds#57 Not tainted ----------------------------- arch/x86/kvm/../../../virt/kvm/eventfd.c:484 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by fc_vcpu 0/330: #0: ffff8884835fc0b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x88/0x6f0 [kvm] #1: ffffc90004c0bb68 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0x600/0x1860 [kvm] #2: ffffc90004c0c1d0 (&kvm->irq_srcu){....}-{0:0}, at: kvm_notify_acked_irq+0x36/0x180 [kvm] stack backtrace: CPU: 26 PID: 330 Comm: fc_vcpu 0 Not tainted 5.16.0-rc4+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x44/0x57 kvm_notify_acked_gsi+0x6b/0x70 [kvm] kvm_notify_acked_irq+0x8d/0x180 [kvm] kvm_ioapic_update_eoi+0x92/0x240 [kvm] kvm_apic_set_eoi_accelerated+0x2a/0xe0 [kvm] handle_apic_eoi_induced+0x3d/0x60 [kvm_intel] vmx_handle_exit+0x19c/0x6a0 [kvm_intel] vcpu_enter_guest+0x66e/0x1860 [kvm] kvm_arch_vcpu_ioctl_run+0x438/0x7f0 [kvm] kvm_vcpu_ioctl+0x38a/0x6f0 [kvm] __x64_sys_ioctl+0x89/0xc0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&kvm->irq_srcu), kvm->irq_ack_notifier_list is protected by kvm->irq_srcu. In fact, kvm->irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it a false positive warning. So use hlist_for_each_entry_srcu() instead of hlist_for_each_entry_rcu(). Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Hou Wenlong <[email protected]> Message-Id: <f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <[email protected]>
Quota disable ioctl starts a transaction before waiting for the qgroup rescan worker completes. However, this wait can be infinite and results in deadlock because of circular dependency among the quota disable ioctl, the qgroup rescan worker and the other task with transaction such as block group relocation task. The deadlock happens with the steps following: 1) Task A calls ioctl to disable quota. It starts a transaction and waits for qgroup rescan worker completes. 2) Task B such as block group relocation task starts a transaction and joins to the transaction that task A started. Then task B commits to the transaction. In this commit, task B waits for a commit by task A. 3) Task C as the qgroup rescan worker starts its job and starts a transaction. In this transaction start, task C waits for completion of the transaction that task A started and task B committed. This deadlock was found with fstests test case btrfs/115 and a zoned null_blk device. The test case enables and disables quota, and the block group reclaim was triggered during the quota disable by chance. The deadlock was also observed by running quota enable and disable in parallel with 'btrfs balance' command on regular null_blk devices. An example report of the deadlock: [372.469894] INFO: task kworker/u16:6:103 blocked for more than 122 seconds. [372.479944] Not tainted 5.16.0-rc8 torvalds#7 [372.485067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [372.493898] task:kworker/u16:6 state:D stack: 0 pid: 103 ppid: 2 flags:0x00004000 [372.503285] Workqueue: btrfs-qgroup-rescan btrfs_work_helper [btrfs] [372.510782] Call Trace: [372.514092] <TASK> [372.521684] __schedule+0xb56/0x4850 [372.530104] ? io_schedule_timeout+0x190/0x190 [372.538842] ? lockdep_hardirqs_on+0x7e/0x100 [372.547092] ? _raw_spin_unlock_irqrestore+0x3e/0x60 [372.555591] schedule+0xe0/0x270 [372.561894] btrfs_commit_transaction+0x18bb/0x2610 [btrfs] [372.570506] ? btrfs_apply_pending_changes+0x50/0x50 [btrfs] [372.578875] ? free_unref_page+0x3f2/0x650 [372.585484] ? finish_wait+0x270/0x270 [372.591594] ? release_extent_buffer+0x224/0x420 [btrfs] [372.599264] btrfs_qgroup_rescan_worker+0xc13/0x10c0 [btrfs] [372.607157] ? lock_release+0x3a9/0x6d0 [372.613054] ? btrfs_qgroup_account_extent+0xda0/0xda0 [btrfs] [372.620960] ? do_raw_spin_lock+0x11e/0x250 [372.627137] ? rwlock_bug.part.0+0x90/0x90 [372.633215] ? lock_is_held_type+0xe4/0x140 [372.639404] btrfs_work_helper+0x1ae/0xa90 [btrfs] [372.646268] process_one_work+0x7e9/0x1320 [372.652321] ? lock_release+0x6d0/0x6d0 [372.658081] ? pwq_dec_nr_in_flight+0x230/0x230 [372.664513] ? rwlock_bug.part.0+0x90/0x90 [372.670529] worker_thread+0x59e/0xf90 [372.676172] ? process_one_work+0x1320/0x1320 [372.682440] kthread+0x3b9/0x490 [372.687550] ? _raw_spin_unlock_irq+0x24/0x50 [372.693811] ? set_kthread_struct+0x100/0x100 [372.700052] ret_from_fork+0x22/0x30 [372.705517] </TASK> [372.709747] INFO: task btrfs-transacti:2347 blocked for more than 123 seconds. [372.729827] Not tainted 5.16.0-rc8 torvalds#7 [372.745907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [372.767106] task:btrfs-transacti state:D stack: 0 pid: 2347 ppid: 2 flags:0x00004000 [372.787776] Call Trace: [372.801652] <TASK> [372.812961] __schedule+0xb56/0x4850 [372.830011] ? io_schedule_timeout+0x190/0x190 [372.852547] ? lockdep_hardirqs_on+0x7e/0x100 [372.871761] ? _raw_spin_unlock_irqrestore+0x3e/0x60 [372.886792] schedule+0xe0/0x270 [372.901685] wait_current_trans+0x22c/0x310 [btrfs] [372.919743] ? btrfs_put_transaction+0x3d0/0x3d0 [btrfs] [372.938923] ? finish_wait+0x270/0x270 [372.959085] ? join_transaction+0xc75/0xe30 [btrfs] [372.977706] start_transaction+0x938/0x10a0 [btrfs] [372.997168] transaction_kthread+0x19d/0x3c0 [btrfs] [373.013021] ? btrfs_cleanup_transaction.isra.0+0xfc0/0xfc0 [btrfs] [373.031678] kthread+0x3b9/0x490 [373.047420] ? _raw_spin_unlock_irq+0x24/0x50 [373.064645] ? set_kthread_struct+0x100/0x100 [373.078571] ret_from_fork+0x22/0x30 [373.091197] </TASK> [373.105611] INFO: task btrfs:3145 blocked for more than 123 seconds. [373.114147] Not tainted 5.16.0-rc8 torvalds#7 [373.120401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [373.130393] task:btrfs state:D stack: 0 pid: 3145 ppid: 3141 flags:0x00004000 [373.140998] Call Trace: [373.145501] <TASK> [373.149654] __schedule+0xb56/0x4850 [373.155306] ? io_schedule_timeout+0x190/0x190 [373.161965] ? lockdep_hardirqs_on+0x7e/0x100 [373.168469] ? _raw_spin_unlock_irqrestore+0x3e/0x60 [373.175468] schedule+0xe0/0x270 [373.180814] wait_for_commit+0x104/0x150 [btrfs] [373.187643] ? test_and_set_bit+0x20/0x20 [btrfs] [373.194772] ? kmem_cache_free+0x124/0x550 [373.201191] ? btrfs_put_transaction+0x69/0x3d0 [btrfs] [373.208738] ? finish_wait+0x270/0x270 [373.214704] ? __btrfs_end_transaction+0x347/0x7b0 [btrfs] [373.222342] btrfs_commit_transaction+0x44d/0x2610 [btrfs] [373.230233] ? join_transaction+0x255/0xe30 [btrfs] [373.237334] ? btrfs_record_root_in_trans+0x4d/0x170 [btrfs] [373.245251] ? btrfs_apply_pending_changes+0x50/0x50 [btrfs] [373.253296] relocate_block_group+0x105/0xc20 [btrfs] [373.260533] ? mutex_lock_io_nested+0x1270/0x1270 [373.267516] ? btrfs_wait_nocow_writers+0x85/0x180 [btrfs] [373.275155] ? merge_reloc_roots+0x710/0x710 [btrfs] [373.283602] ? btrfs_wait_ordered_extents+0xd30/0xd30 [btrfs] [373.291934] ? kmem_cache_free+0x124/0x550 [373.298180] btrfs_relocate_block_group+0x35c/0x930 [btrfs] [373.306047] btrfs_relocate_chunk+0x85/0x210 [btrfs] [373.313229] btrfs_balance+0x12f4/0x2d20 [btrfs] [373.320227] ? lock_release+0x3a9/0x6d0 [373.326206] ? btrfs_relocate_chunk+0x210/0x210 [btrfs] [373.333591] ? lock_is_held_type+0xe4/0x140 [373.340031] ? rcu_read_lock_sched_held+0x3f/0x70 [373.346910] btrfs_ioctl_balance+0x548/0x700 [btrfs] [373.354207] btrfs_ioctl+0x7f2/0x71b0 [btrfs] [373.360774] ? lockdep_hardirqs_on_prepare+0x410/0x410 [373.367957] ? lockdep_hardirqs_on_prepare+0x410/0x410 [373.375327] ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs] [373.383841] ? find_held_lock+0x2c/0x110 [373.389993] ? lock_release+0x3a9/0x6d0 [373.395828] ? mntput_no_expire+0xf7/0xad0 [373.402083] ? lock_is_held_type+0xe4/0x140 [373.408249] ? vfs_fileattr_set+0x9f0/0x9f0 [373.414486] ? selinux_file_ioctl+0x349/0x4e0 [373.420938] ? trace_raw_output_lock+0xb4/0xe0 [373.427442] ? selinux_inode_getsecctx+0x80/0x80 [373.434224] ? lockdep_hardirqs_on+0x7e/0x100 [373.440660] ? force_qs_rnp+0x2a0/0x6b0 [373.446534] ? lock_is_held_type+0x9b/0x140 [373.452763] ? __blkcg_punt_bio_submit+0x1b0/0x1b0 [373.459732] ? security_file_ioctl+0x50/0x90 [373.466089] __x64_sys_ioctl+0x127/0x190 [373.472022] do_syscall_64+0x3b/0x90 [373.477513] entry_SYSCALL_64_after_hwframe+0x44/0xae [373.484823] RIP: 0033:0x7f8f4af7e2bb [373.490493] RSP: 002b:00007ffcbf936178 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [373.500197] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8f4af7e2bb [373.509451] RDX: 00007ffcbf936220 RSI: 00000000c4009420 RDI: 0000000000000003 [373.518659] RBP: 00007ffcbf93774a R08: 0000000000000013 R09: 00007f8f4b02d4e0 [373.527872] R10: 00007f8f4ae87740 R11: 0000000000000246 R12: 0000000000000001 [373.537222] R13: 00007ffcbf936220 R14: 0000000000000000 R15: 0000000000000002 [373.546506] </TASK> [373.550878] INFO: task btrfs:3146 blocked for more than 123 seconds. [373.559383] Not tainted 5.16.0-rc8 torvalds#7 [373.565748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [373.575748] task:btrfs state:D stack: 0 pid: 3146 ppid: 2168 flags:0x00000000 [373.586314] Call Trace: [373.590846] <TASK> [373.595121] __schedule+0xb56/0x4850 [373.600901] ? __lock_acquire+0x23db/0x5030 [373.607176] ? io_schedule_timeout+0x190/0x190 [373.613954] schedule+0xe0/0x270 [373.619157] schedule_timeout+0x168/0x220 [373.625170] ? usleep_range_state+0x150/0x150 [373.631653] ? mark_held_locks+0x9e/0xe0 [373.637767] ? do_raw_spin_lock+0x11e/0x250 [373.643993] ? lockdep_hardirqs_on_prepare+0x17b/0x410 [373.651267] ? _raw_spin_unlock_irq+0x24/0x50 [373.657677] ? lockdep_hardirqs_on+0x7e/0x100 [373.664103] wait_for_completion+0x163/0x250 [373.670437] ? bit_wait_timeout+0x160/0x160 [373.676585] btrfs_quota_disable+0x176/0x9a0 [btrfs] [373.683979] ? btrfs_quota_enable+0x12f0/0x12f0 [btrfs] [373.691340] ? down_write+0xd0/0x130 [373.696880] ? down_write_killable+0x150/0x150 [373.703352] btrfs_ioctl+0x3945/0x71b0 [btrfs] [373.710061] ? find_held_lock+0x2c/0x110 [373.716192] ? lock_release+0x3a9/0x6d0 [373.722047] ? __handle_mm_fault+0x23cd/0x3050 [373.728486] ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs] [373.737032] ? set_pte+0x6a/0x90 [373.742271] ? do_raw_spin_unlock+0x55/0x1f0 [373.748506] ? lock_is_held_type+0xe4/0x140 [373.754792] ? vfs_fileattr_set+0x9f0/0x9f0 [373.761083] ? selinux_file_ioctl+0x349/0x4e0 [373.767521] ? selinux_inode_getsecctx+0x80/0x80 [373.774247] ? __up_read+0x182/0x6e0 [373.780026] ? count_memcg_events.constprop.0+0x46/0x60 [373.787281] ? up_write+0x460/0x460 [373.792932] ? security_file_ioctl+0x50/0x90 [373.799232] __x64_sys_ioctl+0x127/0x190 [373.805237] do_syscall_64+0x3b/0x90 [373.810947] entry_SYSCALL_64_after_hwframe+0x44/0xae [373.818102] RIP: 0033:0x7f1383ea02bb [373.823847] RSP: 002b:00007fffeb4d71f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [373.833641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1383ea02bb [373.842961] RDX: 00007fffeb4d7210 RSI: 00000000c0109428 RDI: 0000000000000003 [373.852179] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078 [373.861408] R10: 00007f1383daec78 R11: 0000000000000202 R12: 00007fffeb4d874a [373.870647] R13: 0000000000493099 R14: 0000000000000001 R15: 0000000000000000 [373.879838] </TASK> [373.884018] Showing all locks held in the system: [373.894250] 3 locks held by kworker/4:1/58: [373.900356] 1 lock held by khungtaskd/63: [373.906333] #0: ffffffff8945ff60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 [373.917307] 3 locks held by kworker/u16:6/103: [373.923938] #0: ffff888127b4f138 ((wq_completion)btrfs-qgroup-rescan){+.+.}-{0:0}, at: process_one_work+0x712/0x1320 [373.936555] #1: ffff88810b817dd8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x73f/0x1320 [373.951109] #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_qgroup_rescan_worker+0x1f6/0x10c0 [btrfs] [373.964027] 2 locks held by less/1803: [373.969982] #0: ffff88813ed56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80 [373.981295] #1: ffffc90000b3b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9e2/0x1060 [373.992969] 1 lock held by btrfs-transacti/2347: [373.999893] #0: ffff88813d4887a8 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0xe3/0x3c0 [btrfs] [374.015872] 3 locks held by btrfs/3145: [374.022298] #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl_balance+0xc3/0x700 [btrfs] [374.034456] #1: ffff88813d48a0a0 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0xfe5/0x2d20 [btrfs] [374.047646] #2: ffff88813d488838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x354/0x930 [btrfs] [374.063295] 4 locks held by btrfs/3146: [374.069647] #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl+0x38b1/0x71b0 [btrfs] [374.081601] #1: ffff88813d488bb8 (&fs_info->subvol_sem){+.+.}-{3:3}, at: btrfs_ioctl+0x38fd/0x71b0 [btrfs] [374.094283] #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_disable+0xc8/0x9a0 [btrfs] [374.106885] #3: ffff88813d489800 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_disable+0xd5/0x9a0 [btrfs] [374.126780] ============================================= To avoid the deadlock, wait for the qgroup rescan worker to complete before starting the transaction for the quota disable ioctl. Clear BTRFS_FS_QUOTA_ENABLE flag before the wait and the transaction to request the worker to complete. On transaction start failure, set the BTRFS_FS_QUOTA_ENABLE flag again. These BTRFS_FS_QUOTA_ENABLE flag changes can be done safely since the function btrfs_quota_disable is not called concurrently because of fs_info->subvol_sem. Also check the BTRFS_FS_QUOTA_ENABLE flag in qgroup_rescan_init to avoid another qgroup rescan worker to start after the previous qgroup worker completed. CC: [email protected] # 5.4+ Suggested-by: Nikolay Borisov <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Signed-off-by: David Sterba <[email protected]>
…egulator The interrupt pin of the external ethernet phy is used, instead of the enable-gpio pin of the tf-io regulator. The GPIOE_2 pin is located in the gpio_ao bank. This causes phy interrupt problems at system startup. [ 76.645190] irq 36: nobody cared (try booting with the "irqpoll" option) [ 76.649617] CPU: 0 PID: 1416 Comm: irq/36-0.0:00 Not tainted 5.16.0 #2 [ 76.649629] Hardware name: Hardkernel ODROID-HC4 (DT) [ 76.649635] Call trace: [ 76.649638] dump_backtrace+0x0/0x1c8 [ 76.649658] show_stack+0x14/0x60 [ 76.649667] dump_stack_lvl+0x64/0x7c [ 76.649676] dump_stack+0x14/0x2c [ 76.649683] __report_bad_irq+0x38/0xe8 [ 76.649695] note_interrupt+0x220/0x3a0 [ 76.649704] handle_irq_event_percpu+0x58/0x88 [ 76.649713] handle_irq_event+0x44/0xd8 [ 76.649721] handle_fasteoi_irq+0xa8/0x130 [ 76.649730] generic_handle_domain_irq+0x38/0x58 [ 76.649738] gic_handle_irq+0x9c/0xb8 [ 76.649747] call_on_irq_stack+0x28/0x38 [ 76.649755] do_interrupt_handler+0x7c/0x80 [ 76.649763] el1_interrupt+0x34/0x80 [ 76.649772] el1h_64_irq_handler+0x14/0x20 [ 76.649781] el1h_64_irq+0x74/0x78 [ 76.649788] irq_finalize_oneshot.part.56+0x68/0xf8 [ 76.649796] irq_thread_fn+0x5c/0x98 [ 76.649804] irq_thread+0x13c/0x260 [ 76.649812] kthread+0x144/0x178 [ 76.649822] ret_from_fork+0x10/0x20 [ 76.649830] handlers: [ 76.653170] [<0000000025a6cd31>] irq_default_primary_handler threaded [<0000000093580eb7>] phy_interrupt [ 76.661256] Disabling IRQ torvalds#36 Fixes: 1f80a5c ("arm64: dts: meson-sm1-odroid: add missing enable gpio and supply for tf_io regulator") Signed-off-by: Lutz Koschorreck <[email protected]> Reviewed-by: Neil Armstrong <[email protected]> Signed-off-by: Neil Armstrong <[email protected]> [narmstrong: removed spurious invalid & blank lines from commit message] Link: https://lore.kernel.org/r/20220127130537.GA187347@odroid-VirtualBox
…/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #2 - A couple of fixes when handling an exception while a SError has been delivered - Workaround for Cortex-A510's single-step[ erratum
When using the flushoncommit mount option, during almost every transaction commit we trigger a warning from __writeback_inodes_sb_nr(): $ cat fs/fs-writeback.c: (...) static void __writeback_inodes_sb_nr(struct super_block *sb, ... { (...) WARN_ON(!rwsem_is_locked(&sb->s_umount)); (...) } (...) The trace produced in dmesg looks like the following: [947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3 [947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif [947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 torvalds#186 [947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3 [947.502097] Code: 24 10 4c 89 44 24 18 c6 (...) [947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246 [947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000 [947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50 [947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000 [947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488 [947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460 [947.553621] FS: 0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000 [947.560537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0 [947.571072] Call Trace: [947.572354] <TASK> [947.573266] btrfs_commit_transaction+0x1f1/0x998 [947.576785] ? start_transaction+0x3ab/0x44e [947.579867] ? schedule_timeout+0x8a/0xdd [947.582716] transaction_kthread+0xe9/0x156 [947.585721] ? btrfs_cleanup_transaction.isra.0+0x407/0x407 [947.590104] kthread+0x131/0x139 [947.592168] ? set_kthread_struct+0x32/0x32 [947.595174] ret_from_fork+0x22/0x30 [947.597561] </TASK> [947.598553] ---[ end trace 644721052755541c ]--- This is because we started using writeback_inodes_sb() to flush delalloc when committing a transaction (when using -o flushoncommit), in order to avoid deadlocks with filesystem freeze operations. This change was made by commit ce8ea7c ("btrfs: don't call btrfs_start_delalloc_roots in flushoncommit"). After that change we started producing that warning, and every now and then a user reports this since the warning happens too often, it spams dmesg/syslog, and a user is unsure if this reflects any problem that might compromise the filesystem's reliability. We can not just lock the sb->s_umount semaphore before calling writeback_inodes_sb(), because that would at least deadlock with filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem() is called while we are holding that semaphore in write mode, and that can trigger a transaction commit, resulting in a deadlock. It would also trigger the same type of deadlock in the unmount path. Possibly, it could also introduce some other locking dependencies that lockdep would report. To fix this call try_to_writeback_inodes_sb() instead of writeback_inodes_sb(), because that will try to read lock sb->s_umount and then will only call writeback_inodes_sb() if it was able to lock it. This is fine because the cases where it can't read lock sb->s_umount are during a filesystem unmount or during a filesystem freeze - in those cases sb->s_umount is write locked and sync_filesystem() is called, which calls writeback_inodes_sb(). In other words, in all cases where we can't take a read lock on sb->s_umount, writeback is already being triggered elsewhere. An alternative would be to call btrfs_start_delalloc_roots() with a number of pages different from LONG_MAX, for example matching the number of delalloc bytes we currently have, in which case we would end up starting all delalloc with filemap_fdatawrite_wbc() and not with an async flush via filemap_flush() - that is only possible after the rather recent commit e076ab2 ("btrfs: shrink delalloc pages instead of full inodes"). However that creates a whole new can of worms due to new lock dependencies, which lockdep complains, like for example: [ 8948.247280] ====================================================== [ 8948.247823] WARNING: possible circular locking dependency detected [ 8948.248353] 5.17.0-rc1-btrfs-next-111 #1 Not tainted [ 8948.248786] ------------------------------------------------------ [ 8948.249320] kworker/u16:18/933570 is trying to acquire lock: [ 8948.249812] ffff9b3de1591690 (sb_internal#2){.+.+}-{0:0}, at: find_free_extent+0x141e/0x1590 [btrfs] [ 8948.250638] but task is already holding lock: [ 8948.251140] ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs] [ 8948.252018] which lock already depends on the new lock. [ 8948.252710] the existing dependency chain (in reverse order) is: [ 8948.253343] -> #2 (&root->delalloc_mutex){+.+.}-{3:3}: [ 8948.253950] __mutex_lock+0x90/0x900 [ 8948.254354] start_delalloc_inodes+0x78/0x400 [btrfs] [ 8948.254859] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs] [ 8948.255408] btrfs_commit_transaction+0x32f/0xc00 [btrfs] [ 8948.255942] btrfs_mksubvol+0x380/0x570 [btrfs] [ 8948.256406] btrfs_mksnapshot+0x81/0xb0 [btrfs] [ 8948.256870] __btrfs_ioctl_snap_create+0x17f/0x190 [btrfs] [ 8948.257413] btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs] [ 8948.257961] btrfs_ioctl+0x1196/0x3630 [btrfs] [ 8948.258418] __x64_sys_ioctl+0x83/0xb0 [ 8948.258793] do_syscall_64+0x3b/0xc0 [ 8948.259146] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8948.259709] -> #1 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}: [ 8948.260330] __mutex_lock+0x90/0x900 [ 8948.260692] btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs] [ 8948.261234] btrfs_commit_transaction+0x32f/0xc00 [btrfs] [ 8948.261766] btrfs_set_free_space_cache_v1_active+0x38/0x60 [btrfs] [ 8948.262379] btrfs_start_pre_rw_mount+0x119/0x180 [btrfs] [ 8948.262909] open_ctree+0x1511/0x171e [btrfs] [ 8948.263359] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 8948.263863] legacy_get_tree+0x30/0x50 [ 8948.264242] vfs_get_tree+0x28/0xc0 [ 8948.264594] vfs_kern_mount.part.0+0x71/0xb0 [ 8948.265017] btrfs_mount+0x11d/0x3a0 [btrfs] [ 8948.265462] legacy_get_tree+0x30/0x50 [ 8948.265851] vfs_get_tree+0x28/0xc0 [ 8948.266203] path_mount+0x2d4/0xbe0 [ 8948.266554] __x64_sys_mount+0x103/0x140 [ 8948.266940] do_syscall_64+0x3b/0xc0 [ 8948.267300] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8948.267790] -> #0 (sb_internal#2){.+.+}-{0:0}: [ 8948.268322] __lock_acquire+0x12e8/0x2260 [ 8948.268733] lock_acquire+0xd7/0x310 [ 8948.269092] start_transaction+0x44c/0x6e0 [btrfs] [ 8948.269591] find_free_extent+0x141e/0x1590 [btrfs] [ 8948.270087] btrfs_reserve_extent+0x14b/0x280 [btrfs] [ 8948.270588] cow_file_range+0x17e/0x490 [btrfs] [ 8948.271051] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs] [ 8948.271586] writepage_delalloc+0xb5/0x170 [btrfs] [ 8948.272071] __extent_writepage+0x156/0x3c0 [btrfs] [ 8948.272579] extent_write_cache_pages+0x263/0x460 [btrfs] [ 8948.273113] extent_writepages+0x76/0x130 [btrfs] [ 8948.273573] do_writepages+0xd2/0x1c0 [ 8948.273942] filemap_fdatawrite_wbc+0x68/0x90 [ 8948.274371] start_delalloc_inodes+0x17f/0x400 [btrfs] [ 8948.274876] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs] [ 8948.275417] flush_space+0x1f2/0x630 [btrfs] [ 8948.275863] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs] [ 8948.276438] process_one_work+0x252/0x5a0 [ 8948.276829] worker_thread+0x55/0x3b0 [ 8948.277189] kthread+0xf2/0x120 [ 8948.277506] ret_from_fork+0x22/0x30 [ 8948.277868] other info that might help us debug this: [ 8948.278548] Chain exists of: sb_internal#2 --> &fs_info->delalloc_root_mutex --> &root->delalloc_mutex [ 8948.279601] Possible unsafe locking scenario: [ 8948.280102] CPU0 CPU1 [ 8948.280508] ---- ---- [ 8948.280915] lock(&root->delalloc_mutex); [ 8948.281271] lock(&fs_info->delalloc_root_mutex); [ 8948.281915] lock(&root->delalloc_mutex); [ 8948.282487] lock(sb_internal#2); [ 8948.282800] *** DEADLOCK *** [ 8948.283333] 4 locks held by kworker/u16:18/933570: [ 8948.283750] #0: ffff9b3dc00a9d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0 [ 8948.284609] #1: ffffa90349dafe70 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0 [ 8948.285637] #2: ffff9b3e14db5040 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}, at: btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs] [ 8948.286674] #3: ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs] [ 8948.287596] stack backtrace: [ 8948.287975] CPU: 3 PID: 933570 Comm: kworker/u16:18 Not tainted 5.17.0-rc1-btrfs-next-111 #1 [ 8948.288677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 8948.289649] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] [ 8948.290298] Call Trace: [ 8948.290517] <TASK> [ 8948.290700] dump_stack_lvl+0x59/0x73 [ 8948.291026] check_noncircular+0xf3/0x110 [ 8948.291375] ? start_transaction+0x228/0x6e0 [btrfs] [ 8948.291826] __lock_acquire+0x12e8/0x2260 [ 8948.292241] lock_acquire+0xd7/0x310 [ 8948.292714] ? find_free_extent+0x141e/0x1590 [btrfs] [ 8948.293241] ? lock_is_held_type+0xea/0x140 [ 8948.293601] start_transaction+0x44c/0x6e0 [btrfs] [ 8948.294055] ? find_free_extent+0x141e/0x1590 [btrfs] [ 8948.294518] find_free_extent+0x141e/0x1590 [btrfs] [ 8948.294957] ? _raw_spin_unlock+0x29/0x40 [ 8948.295312] ? btrfs_get_alloc_profile+0x124/0x290 [btrfs] [ 8948.295813] btrfs_reserve_extent+0x14b/0x280 [btrfs] [ 8948.296270] cow_file_range+0x17e/0x490 [btrfs] [ 8948.296691] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs] [ 8948.297175] ? find_lock_delalloc_range+0x247/0x270 [btrfs] [ 8948.297678] writepage_delalloc+0xb5/0x170 [btrfs] [ 8948.298123] __extent_writepage+0x156/0x3c0 [btrfs] [ 8948.298570] extent_write_cache_pages+0x263/0x460 [btrfs] [ 8948.299061] extent_writepages+0x76/0x130 [btrfs] [ 8948.299495] do_writepages+0xd2/0x1c0 [ 8948.299817] ? sched_clock_cpu+0xd/0x110 [ 8948.300160] ? lock_release+0x155/0x4a0 [ 8948.300494] filemap_fdatawrite_wbc+0x68/0x90 [ 8948.300874] ? do_raw_spin_unlock+0x4b/0xa0 [ 8948.301243] start_delalloc_inodes+0x17f/0x400 [btrfs] [ 8948.301706] ? lock_release+0x155/0x4a0 [ 8948.302055] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs] [ 8948.302564] flush_space+0x1f2/0x630 [btrfs] [ 8948.302970] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs] [ 8948.303510] process_one_work+0x252/0x5a0 [ 8948.303860] ? process_one_work+0x5a0/0x5a0 [ 8948.304221] worker_thread+0x55/0x3b0 [ 8948.304543] ? process_one_work+0x5a0/0x5a0 [ 8948.304904] kthread+0xf2/0x120 [ 8948.305184] ? kthread_complete_and_exit+0x20/0x20 [ 8948.305598] ret_from_fork+0x22/0x30 [ 8948.305921] </TASK> It all comes from the fact that btrfs_start_delalloc_roots() takes the delalloc_root_mutex, in the transaction commit path we are holding a read lock on one of the superblock's freeze semaphores (via sb_start_intwrite()), the async reclaim task can also do a call to btrfs_start_delalloc_roots(), which ends up triggering writeback with calls to filemap_fdatawrite_wbc(), resulting in extent allocation which in turn can call btrfs_start_transaction(), which will result in taking the freeze semaphore via sb_start_intwrite(), forming a nasty dependency on all those locks which can be taken in different orders by different code paths. So just adopt the simple approach of calling try_to_writeback_inodes_sb() at btrfs_start_delalloc_flush(). Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Filipe Manana <[email protected]> [ add more link reports ] Signed-off-by: David Sterba <[email protected]>
This fixes the following trace caused by attempting to lock cmd_sync_work_lock while holding the rcu_read_lock: kworker/u3:2/212 is trying to lock: ffff888002600910 (&hdev->cmd_sync_work_lock){+.+.}-{3:3}, at: hci_cmd_sync_queue+0xad/0x140 other info that might help us debug this: context-{4:4} 4 locks held by kworker/u3:2/212: #0: ffff8880028c6530 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #1: ffff888001aafde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #2: ffff888002600070 (&hdev->lock){+.+.}-{3:3}, at: hci_cc_le_set_cig_params+0x64/0x4f0 #3: ffffffffa5994b00 (rcu_read_lock){....}-{1:2}, at: hci_cc_le_set_cig_params+0x2f9/0x4f0 Fixes: 26afbd8 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
This attempts to fix the following trace: iso-tester/52 is trying to acquire lock: ffff8880024e0070 (&hdev->lock){+.+.}-{3:3}, at: iso_sock_listen+0x29e/0x440 but task is already holding lock: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_sock_listen+0x8b/0x440 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: lock_acquire+0x176/0x3d0 lock_sock_nested+0x32/0x80 iso_connect_cfm+0x1a3/0x630 hci_cc_le_setup_iso_path+0x195/0x340 hci_cmd_complete_evt+0x1ae/0x500 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 -> #1 (hci_cb_list_lock){+.+.}-{3:3}: lock_acquire+0x176/0x3d0 __mutex_lock+0x13b/0xf50 hci_le_remote_feat_complete_evt+0x17e/0x320 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 -> #0 (&hdev->lock){+.+.}-{3:3}: check_prev_add+0xfc/0x1190 __lock_acquire+0x1e27/0x2750 lock_acquire+0x176/0x3d0 __mutex_lock+0x13b/0xf50 iso_sock_listen+0x29e/0x440 __sys_listen+0xe6/0x160 __x64_sys_listen+0x25/0x30 do_syscall_64+0x42/0x90 entry_SYSCALL_64_after_hwframe+0x62/0xcc other info that might help us debug this: Chain exists of: &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(&hdev->lock); *** DEADLOCK *** 1 lock held by iso-tester/52: #0: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_sock_listen+0x8b/0x440 Fixes: f764a6c ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
The cited commit changed class of tc_ht internal mutex in order to avoid false lock dependency with fs_core node and flow_table hash table structures. However, hash table implementation internally also includes a workqueue task with its own lockdep map which causes similar bogus lockdep splat[0]. Fix it by also adding dedicated class for hash table workqueue work structure of tc_ht. [0]: [ 1139.672465] ====================================================== [ 1139.673552] WARNING: possible circular locking dependency detected [ 1139.674635] 6.1.0_for_upstream_debug_2022_12_12_17_02 #1 Not tainted [ 1139.675734] ------------------------------------------------------ [ 1139.676801] modprobe/5998 is trying to acquire lock: [ 1139.677726] ffff88811e7b93b8 (&node->lock){++++}-{3:3}, at: down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.679662] but task is already holding lock: [ 1139.680703] ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0 [ 1139.682223] which lock already depends on the new lock. [ 1139.683640] the existing dependency chain (in reverse order) is: [ 1139.684887] -> #2 (&tc_ht_lock_key){+.+.}-{3:3}: [ 1139.685975] __mutex_lock+0x12c/0x14b0 [ 1139.686659] rht_deferred_worker+0x35/0x1540 [ 1139.687405] process_one_work+0x7c2/0x1310 [ 1139.688134] worker_thread+0x59d/0xec0 [ 1139.688820] kthread+0x28f/0x330 [ 1139.689444] ret_from_fork+0x1f/0x30 [ 1139.690106] -> #1 ((work_completion)(&ht->run_work)){+.+.}-{0:0}: [ 1139.691250] __flush_work+0xe8/0x900 [ 1139.691915] __cancel_work_timer+0x2ca/0x3f0 [ 1139.692655] rhashtable_free_and_destroy+0x22/0x6f0 [ 1139.693472] del_sw_flow_table+0x22/0xb0 [mlx5_core] [ 1139.694592] tree_put_node+0x24c/0x450 [mlx5_core] [ 1139.695686] tree_remove_node+0x6e/0x100 [mlx5_core] [ 1139.696803] mlx5_destroy_flow_table+0x187/0x690 [mlx5_core] [ 1139.698017] mlx5e_tc_nic_cleanup+0x2f8/0x400 [mlx5_core] [ 1139.699217] mlx5e_cleanup_nic_rx+0x2b/0x210 [mlx5_core] [ 1139.700397] mlx5e_detach_netdev+0x19d/0x2b0 [mlx5_core] [ 1139.701571] mlx5e_suspend+0xdb/0x140 [mlx5_core] [ 1139.702665] mlx5e_remove+0x89/0x190 [mlx5_core] [ 1139.703756] auxiliary_bus_remove+0x52/0x70 [ 1139.704492] device_release_driver_internal+0x3c1/0x600 [ 1139.705360] bus_remove_device+0x2a5/0x560 [ 1139.706080] device_del+0x492/0xb80 [ 1139.706724] mlx5_rescan_drivers_locked+0x194/0x6a0 [mlx5_core] [ 1139.707961] mlx5_unregister_device+0x7a/0xa0 [mlx5_core] [ 1139.709138] mlx5_uninit_one+0x5f/0x160 [mlx5_core] [ 1139.710252] remove_one+0xd1/0x160 [mlx5_core] [ 1139.711297] pci_device_remove+0x96/0x1c0 [ 1139.722721] device_release_driver_internal+0x3c1/0x600 [ 1139.723590] unbind_store+0x1b1/0x200 [ 1139.724259] kernfs_fop_write_iter+0x348/0x520 [ 1139.725019] vfs_write+0x7b2/0xbf0 [ 1139.725658] ksys_write+0xf3/0x1d0 [ 1139.726292] do_syscall_64+0x3d/0x90 [ 1139.726942] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 1139.727769] -> #0 (&node->lock){++++}-{3:3}: [ 1139.728698] __lock_acquire+0x2cf5/0x62f0 [ 1139.729415] lock_acquire+0x1c1/0x540 [ 1139.730076] down_write+0x8e/0x1f0 [ 1139.730709] down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.731841] mlx5_del_flow_rules+0x6f/0x610 [mlx5_core] [ 1139.732982] __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core] [ 1139.734207] mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core] [ 1139.735491] mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core] [ 1139.736716] mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core] [ 1139.738007] mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core] [ 1139.739213] mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core] [ 1139.740377] _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core] [ 1139.741534] rhashtable_free_and_destroy+0x3be/0x6f0 [ 1139.742351] mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core] [ 1139.743512] mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core] [ 1139.744683] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] [ 1139.745860] mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core] [ 1139.747098] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core] [ 1139.748372] mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core] [ 1139.749590] __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core] [ 1139.750813] mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core] [ 1139.752147] mlx5e_rep_remove+0x62/0x80 [mlx5_core] [ 1139.753293] auxiliary_bus_remove+0x52/0x70 [ 1139.754028] device_release_driver_internal+0x3c1/0x600 [ 1139.754885] driver_detach+0xc1/0x180 [ 1139.755553] bus_remove_driver+0xef/0x2e0 [ 1139.756260] auxiliary_driver_unregister+0x16/0x50 [ 1139.757059] mlx5e_rep_cleanup+0x19/0x30 [mlx5_core] [ 1139.758207] mlx5e_cleanup+0x12/0x30 [mlx5_core] [ 1139.759295] mlx5_cleanup+0xc/0x49 [mlx5_core] [ 1139.760384] __x64_sys_delete_module+0x2b5/0x450 [ 1139.761166] do_syscall_64+0x3d/0x90 [ 1139.761827] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 1139.762663] other info that might help us debug this: [ 1139.763925] Chain exists of: &node->lock --> (work_completion)(&ht->run_work) --> &tc_ht_lock_key [ 1139.765743] Possible unsafe locking scenario: [ 1139.766688] CPU0 CPU1 [ 1139.767399] ---- ---- [ 1139.768111] lock(&tc_ht_lock_key); [ 1139.768704] lock((work_completion)(&ht->run_work)); [ 1139.769869] lock(&tc_ht_lock_key); [ 1139.770770] lock(&node->lock); [ 1139.771326] *** DEADLOCK *** [ 1139.772345] 2 locks held by modprobe/5998: [ 1139.772994] #0: ffff88813c1ff0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600 [ 1139.774399] #1: ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0 [ 1139.775822] stack backtrace: [ 1139.776579] CPU: 3 PID: 5998 Comm: modprobe Not tainted 6.1.0_for_upstream_debug_2022_12_12_17_02 #1 [ 1139.777935] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 1139.779529] Call Trace: [ 1139.779992] <TASK> [ 1139.780409] dump_stack_lvl+0x57/0x7d [ 1139.781015] check_noncircular+0x278/0x300 [ 1139.781687] ? print_circular_bug+0x460/0x460 [ 1139.782381] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1139.783121] ? lock_release+0x487/0x7c0 [ 1139.783759] ? orc_find.part.0+0x1f1/0x330 [ 1139.784423] ? mark_lock.part.0+0xef/0x2fc0 [ 1139.785091] __lock_acquire+0x2cf5/0x62f0 [ 1139.785754] ? register_lock_class+0x18e0/0x18e0 [ 1139.786483] lock_acquire+0x1c1/0x540 [ 1139.787093] ? down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.788195] ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0 [ 1139.788978] ? register_lock_class+0x18e0/0x18e0 [ 1139.789715] down_write+0x8e/0x1f0 [ 1139.790292] ? down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.791380] ? down_write_killable+0x220/0x220 [ 1139.792080] ? find_held_lock+0x2d/0x110 [ 1139.792713] down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.793795] mlx5_del_flow_rules+0x6f/0x610 [mlx5_core] [ 1139.794879] __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core] [ 1139.796032] ? __esw_offloads_unload_rep+0xd0/0xd0 [mlx5_core] [ 1139.797227] ? xa_load+0x11a/0x200 [ 1139.797800] ? __xa_clear_mark+0xf0/0xf0 [ 1139.798438] mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core] [ 1139.799660] mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core] [ 1139.800821] mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core] [ 1139.802049] ? mlx5_eswitch_get_uplink_priv+0x25/0x80 [mlx5_core] [ 1139.803260] mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core] [ 1139.804398] ? __cancel_work_timer+0x1c2/0x3f0 [ 1139.805099] ? mlx5e_tc_unoffload_from_slow_path+0x460/0x460 [mlx5_core] [ 1139.806387] mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core] [ 1139.807481] _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core] [ 1139.808564] rhashtable_free_and_destroy+0x3be/0x6f0 [ 1139.809336] ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core] [ 1139.809336] ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core] [ 1139.810455] mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core] [ 1139.811552] mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core] [ 1139.812655] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] [ 1139.813768] mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core] [ 1139.814952] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core] [ 1139.816166] mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core] [ 1139.817336] __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core] [ 1139.818507] mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core] [ 1139.819788] ? mlx5_eswitch_uplink_get_proto_dev+0x30/0x30 [mlx5_core] [ 1139.821051] ? kernfs_find_ns+0x137/0x310 [ 1139.821705] mlx5e_rep_remove+0x62/0x80 [mlx5_core] [ 1139.822778] auxiliary_bus_remove+0x52/0x70 [ 1139.823449] device_release_driver_internal+0x3c1/0x600 [ 1139.824240] driver_detach+0xc1/0x180 [ 1139.824842] bus_remove_driver+0xef/0x2e0 [ 1139.825504] auxiliary_driver_unregister+0x16/0x50 [ 1139.826245] mlx5e_rep_cleanup+0x19/0x30 [mlx5_core] [ 1139.827322] mlx5e_cleanup+0x12/0x30 [mlx5_core] [ 1139.828345] mlx5_cleanup+0xc/0x49 [mlx5_core] [ 1139.829382] __x64_sys_delete_module+0x2b5/0x450 [ 1139.830119] ? module_flags+0x300/0x300 [ 1139.830750] ? task_work_func_match+0x50/0x50 [ 1139.831440] ? task_work_cancel+0x20/0x20 [ 1139.832088] ? lockdep_hardirqs_on_prepare+0x273/0x3f0 [ 1139.832873] ? syscall_enter_from_user_mode+0x1d/0x50 [ 1139.833661] ? trace_hardirqs_on+0x2d/0x100 [ 1139.834328] do_syscall_64+0x3d/0x90 [ 1139.834922] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 1139.835700] RIP: 0033:0x7f153e71288b [ 1139.836302] Code: 73 01 c3 48 8b 0d 9d 75 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 75 0e 00 f7 d8 64 89 01 48 [ 1139.838866] RSP: 002b:00007ffe0a3ed938 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 1139.840020] RAX: ffffffffffffffda RBX: 0000564c2cbf8220 RCX: 00007f153e71288b [ 1139.841043] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000564c2cbf8288 [ 1139.842072] RBP: 0000564c2cbf8220 R08: 0000000000000000 R09: 0000000000000000 [ 1139.843094] R10: 00007f153e7a3ac0 R11: 0000000000000206 R12: 0000564c2cbf8288 [ 1139.844118] R13: 0000000000000000 R14: 0000564c2cbf7ae8 R15: 00007ffe0a3efcb8 Fixes: 9ba3333 ("net/mlx5e: Avoid false lock depenency warning on tc_ht") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Eli Cohen <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
The commit 4af1b64 ("octeontx2-pf: Fix lmtst ID used in aura free") uses the get/put_cpu() to protect the usage of percpu pointer in ->aura_freeptr() callback, but it also unnecessarily disable the preemption for the blockable memory allocation. The commit 87b93b6 ("octeontx2-pf: Avoid use of GFP_KERNEL in atomic context") tried to fix these sleep inside atomic warnings. But it only fix the one for the non-rt kernel. For the rt kernel, we still get the similar warnings like below. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by swapper/0/1: #0: ffff800009fc5fe8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30 #1: ffff000100c276c0 (&mbox->lock){+.+.}-{3:3}, at: otx2_init_hw_resources+0x8c/0x3a4 #2: ffffffbfef6537e0 (&cpu_rcache->lock){+.+.}-{2:2}, at: alloc_iova_fast+0x1ac/0x2ac Preemption disabled at: [<ffff800008b1908c>] otx2_rq_aura_pool_init+0x14c/0x284 CPU: 20 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc3-rt1-yocto-preempt-rt #1 Hardware name: Marvell OcteonTX CN96XX board (DT) Call trace: dump_backtrace.part.0+0xe8/0xf4 show_stack+0x20/0x30 dump_stack_lvl+0x9c/0xd8 dump_stack+0x18/0x34 __might_resched+0x188/0x224 rt_spin_lock+0x64/0x110 alloc_iova_fast+0x1ac/0x2ac iommu_dma_alloc_iova+0xd4/0x110 __iommu_dma_map+0x80/0x144 iommu_dma_map_page+0xe8/0x260 dma_map_page_attrs+0xb4/0xc0 __otx2_alloc_rbuf+0x90/0x150 otx2_rq_aura_pool_init+0x1c8/0x284 otx2_init_hw_resources+0xe4/0x3a4 otx2_open+0xf0/0x610 __dev_open+0x104/0x224 __dev_change_flags+0x1e4/0x274 dev_change_flags+0x2c/0x7c ic_open_devs+0x124/0x2f8 ip_auto_config+0x180/0x42c do_one_initcall+0x90/0x4dc do_basic_setup+0x10c/0x14c kernel_init_freeable+0x10c/0x13c kernel_init+0x2c/0x140 ret_from_fork+0x10/0x20 Of course, we can shuffle the get/put_cpu() to only wrap the invocation of ->aura_freeptr() as what commit 87b93b6 does. But there are only two ->aura_freeptr() callbacks, otx2_aura_freeptr() and cn10k_aura_freeptr(). There is no usage of perpcu variable in the otx2_aura_freeptr() at all, so the get/put_cpu() seems redundant to it. We can move the get/put_cpu() into the corresponding callback which really has the percpu variable usage and avoid the sprinkling of get/put_cpu() in several places. Fixes: 4af1b64 ("octeontx2-pf: Fix lmtst ID used in aura free") Signed-off-by: Kevin Hao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
…kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.2, take #2 - Pass the correct address to mte_clear_page_tags() on initialising a tagged page - Plug a race against a GICv4.1 doorbell interrupt while saving the vgic-v3 pending state.
Jakub Sitnicki says: ==================== This patch set addresses the syzbot report in [1]. Patch #1 has been suggested by Eric [2]. I extended it to cover the rest of sock_map proto callbacks. Otherwise we would still overflow the stack. Patch #2 contains the actual fix and bug analysis. Patches #3 & #4 add coverage to selftests to trigger the bug. [1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/CANn89iK2UN1FmdUcH12fv_xiZkv2G+Nskvmq7fG6aA_6VKRf6g@mail.gmail.com/ --- v1 -> v2: v1: https://lore.kernel.org/r/[email protected] [v1 didn't hit bpf@ ML by mistake] * pull in Eric's patch to protect against recursion loop bugs (Eric) * add a macro helper to check if pointer is inside a memory range (Eric) ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
…eues(). Christoph Paasch reported that commit b5fc292 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues(). [0 - 2] Also, we can reproduce it by a program in [3]. In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy() to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in inet_csk_destroy_sock(). The same check has been in inet_sock_destruct() from at least v2.6, we can just remove the WARN_ON_ONCE(). However, among the users of sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct(). Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor(). [0]: https://lore.kernel.org/netdev/[email protected]/ [1]: multipath-tcp/mptcp_net-next#341 [2]: WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0 Modules linked in: CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9 RSP: 0018:ffff88810570fc68 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005 RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488 R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458 FS: 00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0 Call Trace: <TASK> inet_csk_destroy_sock+0x1a1/0x320 __tcp_close+0xab6/0xe90 tcp_close+0x30/0xc0 inet_release+0xe9/0x1f0 inet6_release+0x4c/0x70 __sock_release+0xd2/0x280 sock_close+0x15/0x20 __fput+0x252/0xa20 task_work_run+0x169/0x250 exit_to_user_mode_prepare+0x113/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f7fdf7ae28d Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003 RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000 R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8 </TASK> [3]: https://lore.kernel.org/netdev/[email protected]/ Fixes: b5fc292 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") Reported-by: syzbot <[email protected]> Reported-by: Christoph Paasch <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
No description provided.