Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/upstreams/develop' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
jwk404 committed Jul 25, 2024
2 parents 22f6364 + 47cd938 commit 11e22c3
Show file tree
Hide file tree
Showing 654 changed files with 16,601 additions and 7,456 deletions.
1 change: 1 addition & 0 deletions Documentation/ABI/testing/sysfs-devices-system-cpu
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,7 @@ What: /sys/devices/system/cpu/vulnerabilities
/sys/devices/system/cpu/vulnerabilities/mds
/sys/devices/system/cpu/vulnerabilities/meltdown
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
/sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
/sys/devices/system/cpu/vulnerabilities/retbleed
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/spectre_v1
Expand Down
74 changes: 74 additions & 0 deletions Documentation/admin-guide/filesystem-monitoring.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
.. SPDX-License-Identifier: GPL-2.0
====================================
File system Monitoring with fanotify
====================================

File system Error Reporting
===========================

Fanotify supports the FAN_FS_ERROR event type for file system-wide error
reporting. It is meant to be used by file system health monitoring
daemons, which listen for these events and take actions (notify
sysadmin, start recovery) when a file system problem is detected.

By design, a FAN_FS_ERROR notification exposes sufficient information
for a monitoring tool to know a problem in the file system has happened.
It doesn't necessarily provide a user space application with semantics
to verify an IO operation was successfully executed. That is out of
scope for this feature. Instead, it is only meant as a framework for
early file system problem detection and reporting recovery tools.

When a file system operation fails, it is common for dozens of kernel
errors to cascade after the initial failure, hiding the original failure
log, which is usually the most useful debug data to troubleshoot the
problem. For this reason, FAN_FS_ERROR tries to report only the first
error that occurred for a file system since the last notification, and
it simply counts additional errors. This ensures that the most
important pieces of information are never lost.

FAN_FS_ERROR requires the fanotify group to be setup with the
FAN_REPORT_FID flag.

At the time of this writing, the only file system that emits FAN_FS_ERROR
notifications is Ext4.

A FAN_FS_ERROR Notification has the following format::

[ Notification Metadata (Mandatory) ]
[ Generic Error Record (Mandatory) ]
[ FID record (Mandatory) ]

The order of records is not guaranteed, and new records might be added
in the future. Therefore, applications must not rely on the order and
must be prepared to skip over unknown records. Please refer to
``samples/fanotify/fs-monitor.c`` for an example parser.

Generic error record
--------------------

The generic error record provides enough information for a file system
agnostic tool to learn about a problem in the file system, without
providing any additional details about the problem. This record is
identified by ``struct fanotify_event_info_header.info_type`` being set
to FAN_EVENT_INFO_TYPE_ERROR.

struct fanotify_event_info_error {
struct fanotify_event_info_header hdr;
__s32 error;
__u32 error_count;
};

The `error` field identifies the type of error using errno values.
`error_count` tracks the number of errors that occurred and were
suppressed to preserve the original error information, since the last
notification.

FID record
----------

The FID record can be used to uniquely identify the inode that triggered
the error through the combination of fsid and file handle. A file system
specific application can use that information to attempt a recovery
procedure. Errors that are not related to an inode are reported with an
empty file handle of type FILEID_INVALID.
1 change: 1 addition & 0 deletions Documentation/admin-guide/hw-vuln/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ are configurable at compile, boot or run time.
cross-thread-rsb.rst
gather_data_sampling.rst
srso
reg-file-data-sampling
104 changes: 104 additions & 0 deletions Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
==================================
Register File Data Sampling (RFDS)
==================================

Register File Data Sampling (RFDS) is a microarchitectural vulnerability that
only affects Intel Atom parts(also branded as E-cores). RFDS may allow
a malicious actor to infer data values previously used in floating point
registers, vector registers, or integer registers. RFDS does not provide the
ability to choose which data is inferred. CVE-2023-28746 is assigned to RFDS.

Affected Processors
===================
Below is the list of affected Intel processors [#f1]_:

=================== ============
Common name Family_Model
=================== ============
ATOM_GOLDMONT 06_5CH
ATOM_GOLDMONT_D 06_5FH
ATOM_GOLDMONT_PLUS 06_7AH
ATOM_TREMONT_D 06_86H
ATOM_TREMONT 06_96H
ALDERLAKE 06_97H
ALDERLAKE_L 06_9AH
ATOM_TREMONT_L 06_9CH
RAPTORLAKE 06_B7H
RAPTORLAKE_P 06_BAH
ALDERLAKE_N 06_BEH
RAPTORLAKE_S 06_BFH
=================== ============

As an exception to this table, Intel Xeon E family parts ALDERLAKE(06_97H) and
RAPTORLAKE(06_B7H) codenamed Catlow are not affected. They are reported as
vulnerable in Linux because they share the same family/model with an affected
part. Unlike their affected counterparts, they do not enumerate RFDS_CLEAR or
CPUID.HYBRID. This information could be used to distinguish between the
affected and unaffected parts, but it is deemed not worth adding complexity as
the reporting is fixed automatically when these parts enumerate RFDS_NO.

Mitigation
==========
Intel released a microcode update that enables software to clear sensitive
information using the VERW instruction. Like MDS, RFDS deploys the same
mitigation strategy to force the CPU to clear the affected buffers before an
attacker can extract the secrets. This is achieved by using the otherwise
unused and obsolete VERW instruction in combination with a microcode update.
The microcode clears the affected CPU buffers when the VERW instruction is
executed.

Mitigation points
-----------------
VERW is executed by the kernel before returning to user space, and by KVM
before VMentry. None of the affected cores support SMT, so VERW is not required
at C-state transitions.

New bits in IA32_ARCH_CAPABILITIES
----------------------------------
Newer processors and microcode update on existing affected processors added new
bits to IA32_ARCH_CAPABILITIES MSR. These bits can be used to enumerate
vulnerability and mitigation capability:

- Bit 27 - RFDS_NO - When set, processor is not affected by RFDS.
- Bit 28 - RFDS_CLEAR - When set, processor is affected by RFDS, and has the
microcode that clears the affected buffers on VERW execution.

Mitigation control on the kernel command line
---------------------------------------------
The kernel command line allows to control RFDS mitigation at boot time with the
parameter "reg_file_data_sampling=". The valid arguments are:

========== =================================================================
on If the CPU is vulnerable, enable mitigation; CPU buffer clearing
on exit to userspace and before entering a VM.
off Disables mitigation.
========== =================================================================

Mitigation default is selected by CONFIG_MITIGATION_RFDS.

Mitigation status information
-----------------------------
The Linux kernel provides a sysfs interface to enumerate the current
vulnerability status of the system: whether the system is vulnerable, and
which mitigations are active. The relevant sysfs file is:

/sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling

The possible values in this file are:

.. list-table::

* - 'Not affected'
- The processor is not vulnerable
* - 'Vulnerable'
- The processor is vulnerable, but no mitigation enabled
* - 'Vulnerable: No microcode'
- The processor is vulnerable but microcode is not updated.
* - 'Mitigation: Clear Register File'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.

References
----------
.. [#f1] Affected Processors
https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
37 changes: 19 additions & 18 deletions Documentation/admin-guide/hw-vuln/spectre.rst
Original file line number Diff line number Diff line change
Expand Up @@ -439,12 +439,12 @@ The possible values in this file are:
- System is protected by retpoline
* - BHI: BHI_DIS_S
- System is protected by BHI_DIS_S
* - BHI: SW loop; KVM SW loop
* - BHI: SW loop, KVM SW loop
- System is protected by software clearing sequence
* - BHI: Syscall hardening
- Syscalls are hardened against BHI
* - BHI: Syscall hardening; KVM: SW loop
- System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence
* - BHI: Vulnerable
- System is vulnerable to BHI
* - BHI: Vulnerable, KVM: SW loop
- System is vulnerable; KVM is protected by software clearing sequence

Full mitigation might require a microcode update from the CPU
vendor. When the necessary microcode is not available, the kernel will
Expand Down Expand Up @@ -506,8 +506,12 @@ Spectre variant 2
between modes. Systems which support BHI_DIS_S will set it to protect against
BHI attacks.

Legacy IBRS systems clear the IBRS bit on exit to userspace and
therefore explicitly enable STIBP for that
On Intel's enhanced IBRS systems, this includes cross-thread branch target
injections on SMT systems (STIBP). In other words, Intel eIBRS enables
STIBP, too.

AMD Automatic IBRS does not protect userspace, and Legacy IBRS systems clear
the IBRS bit on exit to userspace, therefore both explicitly enable STIBP.

The retpoline mitigation is turned on by default on vulnerable
CPUs. It can be forced on or off by the administrator
Expand Down Expand Up @@ -641,9 +645,10 @@ kernel command line.
retpoline,generic Retpolines
retpoline,lfence LFENCE; indirect branch
retpoline,amd alias for retpoline,lfence
eibrs enhanced IBRS
eibrs,retpoline enhanced IBRS + Retpolines
eibrs,lfence enhanced IBRS + LFENCE
eibrs Enhanced/Auto IBRS
eibrs,retpoline Enhanced/Auto IBRS + Retpolines
eibrs,lfence Enhanced/Auto IBRS + LFENCE
ibrs use IBRS to protect kernel

Not specifying this option is equivalent to
spectre_v2=auto.
Expand Down Expand Up @@ -706,18 +711,14 @@ For user space mitigation:
spectre_bhi=

[X86] Control mitigation of Branch History Injection
(BHI) vulnerability. Syscalls are hardened against BHI
regardless of this setting. This setting affects the deployment
(BHI) vulnerability. This setting affects the deployment
of the HW BHI control and the SW BHB clearing sequence.

on
unconditionally enable.
(default) Enable the HW or SW mitigation as
needed.
off
unconditionally disable.
auto
enable if hardware mitigation
control(BHI_DIS_S) is available, otherwise
enable alternate mitigation in KVM.
Disable the mitigation.

For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt

Expand Down
1 change: 1 addition & 0 deletions Documentation/admin-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
edid
efi-stub
ext4
filesystem-monitoring
nfs/index
gpio/index
highuid
Expand Down
39 changes: 29 additions & 10 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1053,6 +1053,26 @@
The filter can be disabled or changed to another
driver later using sysfs.

reg_file_data_sampling=
[X86] Controls mitigation for Register File Data
Sampling (RFDS) vulnerability. RFDS is a CPU
vulnerability which may allow userspace to infer
kernel data values previously stored in floating point
registers, vector registers, or integer registers.
RFDS only affects Intel Atom processors.

on: Turns ON the mitigation.
off: Turns OFF the mitigation.

This parameter overrides the compile time default set
by CONFIG_MITIGATION_RFDS. Mitigation cannot be
disabled when other VERW based mitigations (like MDS)
are enabled. In order to disable RFDS mitigation all
VERW based mitigations need to be disabled.

For details see:
Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst

driver_async_probe= [KNL]
List of driver names to be probed asynchronously.
Format: <driver_name1>,<driver_name2>...
Expand Down Expand Up @@ -3087,8 +3107,10 @@
nospectre_bhb [ARM64]
nospectre_v1 [X86,PPC]
nospectre_v2 [X86,PPC,S390,ARM64]
reg_file_data_sampling=off [X86]
retbleed=off [X86]
spec_store_bypass_disable=off [X86,PPC]
spectre_bhi=off [X86]
spectre_v2_user=off [X86]
srbds=off [X86,INTEL]
ssbd=force-off [ARM64]
Expand Down Expand Up @@ -5417,16 +5439,13 @@
See Documentation/admin-guide/laptops/sonypi.rst

spectre_bhi= [X86] Control mitigation of Branch History Injection
(BHI) vulnerability. Syscalls are hardened against BHI
reglardless of this setting. This setting affects the
(BHI) vulnerability. This setting affects the
deployment of the HW BHI control and the SW BHB
clearing sequence.

on - unconditionally enable.
off - unconditionally disable.
auto - (default) enable hardware mitigation
(BHI_DIS_S) if available, otherwise enable
alternate mitigation in KVM.
on - (default) Enable the HW or SW mitigation
as needed.
off - Disable the mitigation.

spectre_v2= [X86] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
Expand Down Expand Up @@ -5458,9 +5477,9 @@
retpoline,generic - Retpolines
retpoline,lfence - LFENCE; indirect branch
retpoline,amd - alias for retpoline,lfence
eibrs - enhanced IBRS
eibrs,retpoline - enhanced IBRS + Retpolines
eibrs,lfence - enhanced IBRS + LFENCE
eibrs - Enhanced/Auto IBRS
eibrs,retpoline - Enhanced/Auto IBRS + Retpolines
eibrs,lfence - Enhanced/Auto IBRS + LFENCE
ibrs - use IBRS to protect kernel

Not specifying this option is equivalent to
Expand Down
14 changes: 14 additions & 0 deletions Documentation/core-api/dma-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,20 @@ Returns the maximum size of a mapping for the device. The size parameter
of the mapping functions like dma_map_single(), dma_map_page() and
others should not be larger than the returned value.

::

size_t
dma_opt_mapping_size(struct device *dev);

Returns the maximum optimal size of a mapping for the device.

Mapping larger buffers may take much longer in certain scenarios. In
addition, for high-rate short-lived streaming mappings, the upfront time
spent on the mapping may account for an appreciable part of the total
request lifetime. As such, if splitting larger requests incurs no
significant performance penalty, then device drivers are advised to
limit total DMA streaming mappings length to the returned value.

::

bool
Expand Down
10 changes: 7 additions & 3 deletions Documentation/filesystems/locking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -442,17 +442,21 @@ prototypes::
void (*lm_break)(struct file_lock *); /* break_lease callback */
int (*lm_change)(struct file_lock **, int);
bool (*lm_breaker_owns_lease)(struct file_lock *);
bool (*lm_lock_expirable)(struct file_lock *);
void (*lm_expire_lock)(void);

locking rules:

====================== ============= ================= =========
ops inode->i_lock blocked_lock_lock may block
ops flc_lock blocked_lock_lock may block
====================== ============= ================= =========
lm_notify: yes yes no
lm_notify: no yes no
lm_grant: no no no
lm_break: yes no no
lm_change yes no no
lm_breaker_owns_lease: no no no
lm_breaker_owns_lease: yes no no
lm_lock_expirable yes no no
lm_expire_lock no no yes
====================== ============= ================= =========

buffer_head
Expand Down
Loading

0 comments on commit 11e22c3

Please sign in to comment.