From 86bdf3ebcfe1ded055282536fecce13001874740 Mon Sep 17 00:00:00 2001 From: Gavin Shan Date: Thu, 10 Nov 2022 18:49:10 +0800 Subject: KVM: Support dirty ring in conjunction with bitmap ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is enabled. It's conflicting with that ring-based dirty page tracking always requires a running VCPU context. Introduce a new flavor of dirty ring that requires the use of both VCPU dirty rings and a dirty bitmap. The expectation is that for non-VCPU sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to the dirty bitmap. Userspace should scan the dirty bitmap before migrating the VM to the target. Use an additional capability to advertise this behavior. The newly added capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Suggested-by: Marc Zyngier Suggested-by: Peter Xu Co-developed-by: Oliver Upton Signed-off-by: Oliver Upton Signed-off-by: Gavin Shan Acked-by: Peter Xu Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20221110104914.31280-4-gshan@redhat.com --- Documentation/virt/kvm/api.rst | 34 ++++++++++++++++++++----- Documentation/virt/kvm/devices/arm-vgic-its.rst | 5 +++- 2 files changed, 31 insertions(+), 8 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index eee9f857a986..1f1b09aa6db4 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one needs to kick the vcpu out of KVM_RUN using a signal. The resulting vmexit ensures that all dirty GFNs are flushed to the dirty rings. -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual -machine will switch to ring-buffer dirty page tracking and further -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail. - NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that should be exposed by weakly ordered architecture, in order to indicate the additional memory ordering requirements imposed on userspace when @@ -8018,6 +8011,33 @@ Architecture with TSO-like ordering (such as x86) are allowed to expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL to userspace. +After enabling the dirty rings, the userspace needs to detect the +capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the +ring structures can be backed by per-slot bitmaps. With this capability +advertised, it means the architecture can dirty guest pages without +vcpu/ring context, so that some of the dirty information will still be +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP +can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL +hasn't been enabled, or any memslot has been existing. + +Note that the bitmap here is only a backup of the ring structure. The +use of the ring and bitmap combination is only beneficial if there is +only a very small amount of memory that is dirtied out of vcpu/ring +context. Otherwise, the stand-alone per-slot bitmap mechanism needs to +be considered. + +To collect dirty bits in the backup bitmap, userspace can use the same +KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all +the generation of the dirty bits is done in a single pass. Collecting +the dirty bitmap should be the very last thing that the VMM does before +considering the state as complete. VMM needs to ensure that the dirty +state is final and avoid missing dirty pages from another ioctl ordered +after the bitmap collection. + +NOTE: One example of using the backup bitmap is saving arm64 vgic/its +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on +KVM device "kvm-arm-vgic-its" when dirty ring is enabled. + 8.30 KVM_CAP_XEN_HVM -------------------- diff --git a/Documentation/virt/kvm/devices/arm-vgic-its.rst b/Documentation/virt/kvm/devices/arm-vgic-its.rst index d257eddbae29..e053124f77c4 100644 --- a/Documentation/virt/kvm/devices/arm-vgic-its.rst +++ b/Documentation/virt/kvm/devices/arm-vgic-its.rst @@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL KVM_DEV_ARM_ITS_SAVE_TABLES save the ITS table data into guest RAM, at the location provisioned - by the guest in corresponding registers/table entries. + by the guest in corresponding registers/table entries. Should userspace + require a form of dirty tracking to identify which pages are modified + by the saving process, it should use a bitmap even if using another + mechanism to track the memory dirtied by the vCPUs. The layout of the tables in guest memory defines an ABI. The entries are laid out in little endian format as described in the last paragraph. -- cgit v1.2.3 From 9cb1096f8590bc590326087bea65db932b53c3b5 Mon Sep 17 00:00:00 2001 From: Gavin Shan Date: Thu, 10 Nov 2022 18:49:11 +0800 Subject: KVM: arm64: Enable ring-based dirty memory tracking Enable ring-based dirty memory tracking on ARM64: - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL. - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Add ARM64 specific kvm_arch_allow_write_without_running_vcpu() to keep the site of saving vgic/its tables out of the no-running-vcpu radar. Signed-off-by: Gavin Shan Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20221110104914.31280-5-gshan@redhat.com --- Documentation/virt/kvm/api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 1f1b09aa6db4..773e4b202f47 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf. 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL ---------------------------------------------------------- -:Architectures: x86 +:Architectures: x86, arm64 :Parameters: args[0] - size of the dirty log ring KVM is capable of tracking dirty memory using ring buffers that are -- cgit v1.2.3 From 83f8a81dece8bc4237d8d94af357fb5df0083e63 Mon Sep 17 00:00:00 2001 From: Usama Arif Date: Thu, 3 Nov 2022 13:12:10 +0000 Subject: KVM: arm64: Fix pvtime documentation This includes table format and using reST labels for cross-referencing to vcpu.rst. Suggested-by: Bagas Sanjaya Signed-off-by: Usama Arif Reviewed-by: Steven Price Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20221103131210.3603385-1-usama.arif@bytedance.com --- Documentation/virt/kvm/arm/pvtime.rst | 14 ++++++++------ Documentation/virt/kvm/devices/vcpu.rst | 2 ++ 2 files changed, 10 insertions(+), 6 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/arm/pvtime.rst b/Documentation/virt/kvm/arm/pvtime.rst index 392521af7c90..e88b34e586be 100644 --- a/Documentation/virt/kvm/arm/pvtime.rst +++ b/Documentation/virt/kvm/arm/pvtime.rst @@ -23,21 +23,23 @@ the PV_TIME_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES mechanism before calling it. PV_TIME_FEATURES - ============= ======== ========== + + ============= ======== ================================================= Function ID: (uint32) 0xC5000020 PV_call_id: (uint32) The function to query for support. Currently only PV_TIME_ST is supported. Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant PV-time feature is supported by the hypervisor. - ============= ======== ========== + ============= ======== ================================================= PV_TIME_ST - ============= ======== ========== + + ============= ======== ============================================== Function ID: (uint32) 0xC5000021 Return value: (int64) IPA of the stolen time data structure for this VCPU. On failure: NOT_SUPPORTED (-1) - ============= ======== ========== + ============= ======== ============================================== The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory with inner and outer write back caching attributes, in the inner shareable @@ -76,5 +78,5 @@ It is advisable that one or more 64k pages are set aside for the purpose of these structures and not used for other purposes, this enables the guest to map the region using 64k pages and avoids conflicting attributes with other memory. -For the user space interface see Documentation/virt/kvm/devices/vcpu.rst -section "3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL". +For the user space interface see +:ref:`Documentation/virt/kvm/devices/vcpu.rst `. \ No newline at end of file diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst index 716aa3edae14..31f14ec4a65b 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -171,6 +171,8 @@ configured values on other VCPUs. Userspace should configure the interrupt numbers on at least one VCPU after creating all VCPUs and before running any VCPUs. +.. _kvm_arm_vcpu_pvtime_ctrl: + 3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL ================================== -- cgit v1.2.3 From d9459922a15ce7a20a85b38a976494ac7f445732 Mon Sep 17 00:00:00 2001 From: Claudio Imbrenda Date: Fri, 11 Nov 2022 18:06:28 +0100 Subject: KVM: s390: pv: api documentation for asynchronous destroy Add documentation for the new commands added to the KVM_S390_PV_COMMAND ioctl. Signed-off-by: Claudio Imbrenda Reviewed-by: Nico Boehr Reviewed-by: Steffen Eiden Reviewed-by: Janosch Frank Link: https://lore.kernel.org/r/20221111170632.77622-3-imbrenda@linux.ibm.com Message-Id: <20221111170632.77622-3-imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank --- Documentation/virt/kvm/api.rst | 41 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index eee9f857a986..9175d41e8081 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -5163,10 +5163,13 @@ KVM_PV_ENABLE ===== ============================= KVM_PV_DISABLE - Deregister the VM from the Ultravisor and reclaim the memory that - had been donated to the Ultravisor, making it usable by the kernel - again. All registered VCPUs are converted back to non-protected - ones. + Deregister the VM from the Ultravisor and reclaim the memory that had + been donated to the Ultravisor, making it usable by the kernel again. + All registered VCPUs are converted back to non-protected ones. If a + previous protected VM had been prepared for asynchonous teardown with + KVM_PV_ASYNC_CLEANUP_PREPARE and not subsequently torn down with + KVM_PV_ASYNC_CLEANUP_PERFORM, it will be torn down in this call + together with the current protected VM. KVM_PV_VM_SET_SEC_PARMS Pass the image header from VM memory to the Ultravisor in @@ -5289,6 +5292,36 @@ KVM_PV_DUMP authentication tag all of which are needed to decrypt the dump at a later time. +KVM_PV_ASYNC_CLEANUP_PREPARE + :Capability: KVM_CAP_S390_PROTECTED_ASYNC_DISABLE + + Prepare the current protected VM for asynchronous teardown. Most + resources used by the current protected VM will be set aside for a + subsequent asynchronous teardown. The current protected VM will then + resume execution immediately as non-protected. There can be at most + one protected VM prepared for asynchronous teardown at any time. If + a protected VM had already been prepared for teardown without + subsequently calling KVM_PV_ASYNC_CLEANUP_PERFORM, this call will + fail. In that case, the userspace process should issue a normal + KVM_PV_DISABLE. The resources set aside with this call will need to + be cleaned up with a subsequent call to KVM_PV_ASYNC_CLEANUP_PERFORM + or KVM_PV_DISABLE, otherwise they will be cleaned up when KVM + terminates. KVM_PV_ASYNC_CLEANUP_PREPARE can be called again as soon + as cleanup starts, i.e. before KVM_PV_ASYNC_CLEANUP_PERFORM finishes. + +KVM_PV_ASYNC_CLEANUP_PERFORM + :Capability: KVM_CAP_S390_PROTECTED_ASYNC_DISABLE + + Tear down the protected VM previously prepared for teardown with + KVM_PV_ASYNC_CLEANUP_PREPARE. The resources that had been set aside + will be freed during the execution of this command. This PV command + should ideally be issued by userspace from a separate thread. If a + fatal signal is received (or the process terminates naturally), the + command will terminate immediately without completing, and the normal + KVM shutdown procedure will take care of cleaning up all remaining + protected VMs, including the ones whose teardown was interrupted by + process termination. + 4.126 KVM_XEN_HVM_SET_ATTR -------------------------- -- cgit v1.2.3 From a4baf8d2639f24d4d31983ff67c01878e7a5393f Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Thu, 3 Nov 2022 18:10:41 -0700 Subject: Documentation: document the ABI changes for KVM_CAP_ARM_MTE Document both the restriction on VM_MTE_ALLOWED mappings and the relaxation for shared mappings. Signed-off-by: Peter Collingbourne Acked-by: Catalin Marinas Reviewed-by: Cornelia Huck Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20221104011041.290951-9-pcc@google.com --- Documentation/virt/kvm/api.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index eee9f857a986..b55f80dadcfe 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7385,8 +7385,9 @@ hibernation of the host; however the VMM needs to manually save/restore the tags as appropriate if the VM is migrated. When this capability is enabled all memory in memslots must be mapped as -not-shareable (no MAP_SHARED), attempts to create a memslot with a -MAP_SHARED mmap will result in an -EINVAL return. +``MAP_ANONYMOUS`` or with a RAM-based file mapping (``tmpfs``, ``memfd``), +attempts to create a memslot with an invalid mmap will result in an +-EINVAL return. When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to perform a bulk copy of tags to/from the guest. -- cgit v1.2.3 From d8ba8ba4c801b794f47852a6f1821ea48f83b5d1 Mon Sep 17 00:00:00 2001 From: David Woodhouse Date: Sun, 27 Nov 2022 12:22:10 +0000 Subject: KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured Closer inspection of the Xen code shows that we aren't supposed to be using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall. If we randomly set the top bit of ->state_entry_time for a guest that hasn't asked for it and doesn't expect it, that could make the runtimes fail to add up and confuse the guest. Without the flag it's perfectly safe for a vCPU to read its own vcpu_runstate_info; just not for one vCPU to read *another's*. I briefly pondered adding a word for the whole set of VMASST_TYPE_* flags but the only one we care about for HVM guests is this, so it seemed a bit pointless. Signed-off-by: David Woodhouse Message-Id: <20221127122210.248427-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 34 ++++++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 9175d41e8081..5617bc4f899f 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -5339,6 +5339,7 @@ KVM_PV_ASYNC_CLEANUP_PERFORM union { __u8 long_mode; __u8 vector; + __u8 runstate_update_flag; struct { __u64 gfn; } shared_info; @@ -5416,6 +5417,14 @@ KVM_XEN_ATTR_TYPE_XEN_VERSION event channel delivery, so responding within the kernel without exiting to userspace is beneficial. +KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG. It enables the + XEN_RUNSTATE_UPDATE flag which allows guest vCPUs to safely read + other vCPUs' vcpu_runstate_info. Xen guests enable this feature via + the VM_ASST_TYPE_runstate_update_flag of the HYPERVISOR_vm_assist + hypercall. + 4.127 KVM_XEN_HVM_GET_ATTR -------------------------- @@ -8059,12 +8068,13 @@ to userspace. This capability indicates the features that Xen supports for hosting Xen PVHVM guests. Valid flags are:: - #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR (1 << 0) - #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1) - #define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2) - #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3) - #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4) - #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5) + #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR (1 << 0) + #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1) + #define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2) + #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3) + #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4) + #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5) + #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG (1 << 6) The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG ioctl is available, for the guest to set its hypercall page. @@ -8096,6 +8106,18 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID/TIMER/UPCALL_VECTOR vCPU attributes. related to event channel delivery, timers, and the XENVER_version interception. +The KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG flag indicates that KVM supports +the KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG attribute in the KVM_XEN_SET_ATTR +and KVM_XEN_GET_ATTR ioctls. This controls whether KVM will set the +XEN_RUNSTATE_UPDATE flag in guest memory mapped vcpu_runstate_info during +updates of the runstate information. Note that versions of KVM which support +the RUNSTATE feature above, but not thie RUNSTATE_UPDATE_FLAG feature, will +always set the XEN_RUNSTATE_UPDATE flag when updating the guest structure, +which is perhaps counterintuitive. When this flag is advertised, KVM will +behave more correctly, not using the XEN_RUNSTATE_UPDATE flag until/unless +specifically enabled (by the guest making the hypercall, causing the VMM +to enable the KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG attribute). + 8.31 KVM_CAP_PPC_MULTITCE ------------------------- -- cgit v1.2.3 From 5c8c0b3273822cf982c250a9a19e003e4b315edb Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 31 Aug 2022 00:17:04 +0000 Subject: KVM: x86: Delete documentation for READ|WRITE in KVM_X86_SET_MSR_FILTER Delete the paragraph that describes the behavior when both KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE are set for a range. There is nothing special about KVM's handling of this combination, whereas explicitly documenting the combination suggests that there is some magic behavior the user needs to be aware of. Signed-off-by: Sean Christopherson Link: https://lore.kernel.org/r/20220831001706.4075399-2-seanjc@google.com --- Documentation/virt/kvm/api.rst | 7 ------- 1 file changed, 7 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 5617bc4f899f..373cf425e85c 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -4115,13 +4115,6 @@ flags values for ``struct kvm_msr_filter_range``: a write for a particular MSR should be handled regardless of the default filter action. -``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE`` - - Filter both read and write accesses to MSRs using the given bitmap. A 0 - in the bitmap indicates that both reads and writes should immediately fail, - while a 1 indicates that reads and writes for a particular MSR are not - filtered by this range. - flags values for ``struct kvm_msr_filter``: ``KVM_MSR_FILTER_DEFAULT_ALLOW`` -- cgit v1.2.3 From b93d2ec34ef368bb854289db99d8d6ca7f523e25 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 31 Aug 2022 00:17:05 +0000 Subject: KVM: x86: Reword MSR filtering docs to more precisely define behavior Reword the MSR filtering documentatiion to more precisely define the behavior of filtering using common virtualization terminology. - Explicitly document KVM's behavior when an MSR is denied - s/handled/allowed as there is no guarantee KVM will "handle" the MSR access - Drop the "fall back" terminology, which incorrectly suggests that there is existing KVM behavior to fall back to - Fix an off-by-one error in the range (the end is exclusive) - Call out the interaction between MSR filtering and KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER - Delete the redundant paragraph on what '0' and '1' in the bitmap means, it's covered by the sections on KVM_MSR_FILTER_{READ,WRITE} - Delete the clause on x2APIC MSR behavior depending on APIC base, this is covered by stating that KVM follows architectural behavior when emulating/virtualizing MSR accesses Reported-by: Aaron Lewis Signed-off-by: Sean Christopherson Link: https://lore.kernel.org/r/20220831001706.4075399-3-seanjc@google.com --- Documentation/virt/kvm/api.rst | 70 +++++++++++++++++++++--------------------- 1 file changed, 35 insertions(+), 35 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 373cf425e85c..a4d07b866dea 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -4104,15 +4104,15 @@ flags values for ``struct kvm_msr_filter_range``: ``KVM_MSR_FILTER_READ`` Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap - indicates that a read should immediately fail, while a 1 indicates that - a read for a particular MSR should be handled regardless of the default + indicates that read accesses should be denied, while a 1 indicates that + a read for a particular MSR should be allowed regardless of the default filter action. ``KVM_MSR_FILTER_WRITE`` Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap - indicates that a write should immediately fail, while a 1 indicates that - a write for a particular MSR should be handled regardless of the default + indicates that write accesses should be denied, while a 1 indicates that + a write for a particular MSR should be allowed regardless of the default filter action. flags values for ``struct kvm_msr_filter``: @@ -4120,57 +4120,55 @@ flags values for ``struct kvm_msr_filter``: ``KVM_MSR_FILTER_DEFAULT_ALLOW`` If no filter range matches an MSR index that is getting accessed, KVM will - fall back to allowing access to the MSR. + allow accesses to all MSRs by default. ``KVM_MSR_FILTER_DEFAULT_DENY`` If no filter range matches an MSR index that is getting accessed, KVM will - fall back to rejecting access to the MSR. In this mode, all MSRs that should - be processed by KVM need to explicitly be marked as allowed in the bitmaps. + deny accesses to all MSRs by default. -This ioctl allows user space to define up to 16 bitmaps of MSR ranges to -specify whether a certain MSR access should be explicitly filtered for or not. +This ioctl allows userspace to define up to 16 bitmaps of MSR ranges to deny +guest MSR accesses that would normally be allowed by KVM. If an MSR is not +covered by a specific range, the "default" filtering behavior applies. Each +bitmap range covers MSRs from [base .. base+nmsrs). -If this ioctl has never been invoked, MSR accesses are not guarded and the -default KVM in-kernel emulation behavior is fully preserved. +If an MSR access is denied by userspace, the resulting KVM behavior depends on +whether or not KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER is +enabled. If KVM_MSR_EXIT_REASON_FILTER is enabled, KVM will exit to userspace +on denied accesses, i.e. userspace effectively intercepts the MSR access. If +KVM_MSR_EXIT_REASON_FILTER is not enabled, KVM will inject a #GP into the guest +on denied accesses. + +If an MSR access is allowed by userspace, KVM will emulate and/or virtualize +the access in accordance with the vCPU model. Note, KVM may still ultimately +inject a #GP if an access is allowed by userspace, e.g. if KVM doesn't support +the MSR, or to follow architectural behavior for the MSR. + +By default, KVM operates in KVM_MSR_FILTER_DEFAULT_ALLOW mode with no MSR range +filters. Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes an error. -As soon as the filtering is in place, every MSR access is processed through -the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff); -x2APIC MSRs are always allowed, independent of the ``default_allow`` setting, -and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base -register. - .. warning:: - MSR accesses coming from nested vmentry/vmexit are not filtered. + MSR accesses as part of nested VM-Enter/VM-Exit are not filtered. This includes both writes to individual VMCS fields and reads/writes through the MSR lists pointed to by the VMCS. -If a bit is within one of the defined ranges, read and write accesses are -guarded by the bitmap's value for the MSR index if the kind of access -is included in the ``struct kvm_msr_filter_range`` flags. If no range -cover this particular access, the behavior is determined by the flags -field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW`` -and ``KVM_MSR_FILTER_DEFAULT_DENY``. - -Each bitmap range specifies a range of MSRs to potentially allow access on. -The range goes from MSR index [base .. base+nmsrs]. The flags field -indicates whether reads, writes or both reads and writes are filtered -by setting a 1 bit in the bitmap for the corresponding MSR index. - -If an MSR access is not permitted through the filtering, it generates a -#GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that -allows user space to deflect and potentially handle various MSR accesses -into user space. + x2APIC MSR accesses cannot be filtered (KVM silently ignores filters that + cover any x2APIC MSRs). Note, invoking this ioctl while a vCPU is running is inherently racy. However, KVM does guarantee that vCPUs will see either the previous filter or the new filter, e.g. MSRs with identical settings in both the old and new filter will have deterministic behavior. +Similarly, if userspace wishes to intercept on denied accesses, +KVM_MSR_EXIT_REASON_FILTER must be enabled before activating any filters, and +left enabled until after all filters are deactivated. Failure to do so may +result in KVM injecting a #GP instead of exiting to userspace. + 4.98 KVM_CREATE_SPAPR_TCE_64 ---------------------------- @@ -6500,6 +6498,8 @@ wants to write. Once finished processing the event, user space must continue vCPU execution. If the MSR write was unsuccessful, user space also sets the "error" field to "1". +See KVM_X86_SET_MSR_FILTER for details on the interaction with MSR filtering. + :: @@ -7937,7 +7937,7 @@ KVM_EXIT_X86_WRMSR exit notifications. This capability indicates that KVM supports that accesses to user defined MSRs may be rejected. With this capability exposed, KVM exports new VM ioctl KVM_X86_SET_MSR_FILTER which user space can call to specify bitmaps of MSR -ranges that KVM should reject access to. +ranges that KVM should deny access to. In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to trap and emulate MSRs that are outside of the scope of KVM as well as -- cgit v1.2.3 From 1f158147181b83c5ae02273d0b3b9eddaebcc854 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 31 Aug 2022 00:17:06 +0000 Subject: KVM: x86: Clean up KVM_CAP_X86_USER_SPACE_MSR documentation Clean up the KVM_CAP_X86_USER_SPACE_MSR documentation to eliminate misleading and/or inconsistent verbiage, and to actually document what accesses are intercepted by which flags. - s/will/may since not all #GPs are guaranteed to be intercepted - s/deflect/intercept to align with common KVM terminology - s/user space/userspace to align with the majority of KVM docs - Avoid using "trap" terminology, as KVM exits to userspace _before_ stepping, i.e. doesn't exhibit trap-like behavior - Actually document the flags Signed-off-by: Sean Christopherson Link: https://lore.kernel.org/r/20220831001706.4075399-4-seanjc@google.com --- Documentation/virt/kvm/api.rst | 40 ++++++++++++++++++++++++---------------- 1 file changed, 24 insertions(+), 16 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index a4d07b866dea..c6857f6b25ab 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6473,29 +6473,29 @@ if it decides to decode and emulate the instruction. Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code -will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR +may instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR exit for writes. -The "reason" field specifies why the MSR trap occurred. User space will only -receive MSR exit traps when a particular reason was requested during through +The "reason" field specifies why the MSR interception occurred. Userspace will +only receive MSR exits when a particular reason was requested during through ENABLE_CAP. Currently valid exit reasons are: KVM_MSR_EXIT_REASON_UNKNOWN - access to MSR that is unknown to KVM KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits KVM_MSR_EXIT_REASON_FILTER - access blocked by KVM_X86_SET_MSR_FILTER -For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest -wants to read. To respond to this request with a successful read, user space +For KVM_EXIT_X86_RDMSR, the "index" field tells userspace which MSR the guest +wants to read. To respond to this request with a successful read, userspace writes the respective data into the "data" field and must continue guest execution to ensure the read data is transferred into guest register state. -If the RDMSR request was unsuccessful, user space indicates that with a "1" in +If the RDMSR request was unsuccessful, userspace indicates that with a "1" in the "error" field. This will inject a #GP into the guest when the VCPU is executed again. -For KVM_EXIT_X86_WRMSR, the "index" field tells user space which MSR the guest -wants to write. Once finished processing the event, user space must continue -vCPU execution. If the MSR write was unsuccessful, user space also sets the +For KVM_EXIT_X86_WRMSR, the "index" field tells userspace which MSR the guest +wants to write. Once finished processing the event, userspace must continue +vCPU execution. If the MSR write was unsuccessful, userspace also sets the "error" field to "1". See KVM_X86_SET_MSR_FILTER for details on the interaction with MSR filtering. @@ -7265,19 +7265,27 @@ the module parameter for the target VM. :Parameters: args[0] contains the mask of KVM_MSR_EXIT_REASON_* events to report :Returns: 0 on success; -1 on error -This capability enables trapping of #GP invoking RDMSR and WRMSR instructions -into user space. +This capability allows userspace to intercept RDMSR and WRMSR instructions if +access to an MSR is denied. By default, KVM injects #GP on denied accesses. When a guest requests to read or write an MSR, KVM may not implement all MSRs that are relevant to a respective system. It also does not differentiate by CPU type. -To allow more fine grained control over MSR handling, user space may enable +To allow more fine grained control over MSR handling, userspace may enable this capability. With it enabled, MSR accesses that match the mask specified in -args[0] and trigger a #GP event inside the guest by KVM will instead trigger -KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space -can then handle to implement model specific MSR handling and/or user notifications -to inform a user that an MSR was not handled. +args[0] and would trigger a #GP inside the guest will instead trigger +KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications. Userspace +can then implement model specific MSR handling and/or user notifications +to inform a user that an MSR was not emulated/virtualized by KVM. + +The valid mask flags are: + + KVM_MSR_EXIT_REASON_UNKNOWN - intercept accesses to unknown (to KVM) MSRs + KVM_MSR_EXIT_REASON_INVAL - intercept accesses that are architecturally + invalid according to the vCPU model and/or mode + KVM_MSR_EXIT_REASON_FILTER - intercept accesses that are denied by userspace + via KVM_X86_SET_MSR_FILTER 7.22 KVM_CAP_X86_BUS_LOCK_EXIT ------------------------------- -- cgit v1.2.3 From 61e15f871241ee86f217320909005cd022dd844f Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Fri, 2 Dec 2022 11:50:08 +0100 Subject: KVM: Delete all references to removed KVM_SET_MEMORY_REGION ioctl The documentation says that the ioctl has been deprecated, but it has been actually removed and the remaining references are just left overs. Suggested-by: Sean Christopherson Signed-off-by: Javier Martinez Canillas Message-Id: <20221202105011.185147-2-javierm@redhat.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 16 ---------------- 1 file changed, 16 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 5617bc4f899f..850e187c0a38 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -272,18 +272,6 @@ the VCPU file descriptor can be mmap-ed, including: KVM_CAP_DIRTY_LOG_RING, see section 8.3. -4.6 KVM_SET_MEMORY_REGION -------------------------- - -:Capability: basic -:Architectures: all -:Type: vm ioctl -:Parameters: struct kvm_memory_region (in) -:Returns: 0 on success, -1 on error - -This ioctl is obsolete and has been removed. - - 4.7 KVM_CREATE_VCPU ------------------- @@ -1377,10 +1365,6 @@ the memory region are automatically reflected into the guest. For example, an mmap() that affects the region will be made visible immediately. Another example is madvise(MADV_DROP). -It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. -The KVM_SET_MEMORY_REGION does not allow fine grained control over memory -allocation and is deprecated. - 4.36 KVM_SET_TSS_ADDR --------------------- -- cgit v1.2.3 From 66a9221d73e71199a120ca12ea5bfaac2aa670a3 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Fri, 2 Dec 2022 11:50:09 +0100 Subject: KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl The documentation says that the ioctl has been deprecated, but it has been actually removed and the remaining references are just left overs. Suggested-by: Sean Christopherson Signed-off-by: Javier Martinez Canillas Message-Id: <20221202105011.185147-3-javierm@redhat.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 850e187c0a38..07e8a42d839a 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -356,17 +356,6 @@ see the description of the capability. Note that the Xen shared info page, if configured, shall always be assumed to be dirty. KVM will not explicitly mark it such. -4.9 KVM_SET_MEMORY_ALIAS ------------------------- - -:Capability: basic -:Architectures: x86 -:Type: vm ioctl -:Parameters: struct kvm_memory_alias (in) -:Returns: 0 (success), -1 (error) - -This ioctl is obsolete and has been removed. - 4.10 KVM_RUN ------------ -- cgit v1.2.3 From 30ee198ce42d60101620d33f8bc70c3234798365 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Fri, 2 Dec 2022 11:50:10 +0100 Subject: KVM: Reference to kvm_userspace_memory_region in doc and comments There are still references to the removed kvm_memory_region data structure but the doc and comments should mention struct kvm_userspace_memory_region instead, since that is what's used by the ioctl that replaced the old one and this data structure support the same set of flags. Signed-off-by: Javier Martinez Canillas Message-Id: <20221202105011.185147-4-javierm@redhat.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 07e8a42d839a..92e14823619a 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1309,7 +1309,7 @@ yet and must be cleared on entry. __u64 userspace_addr; /* start of the userspace allocated memory */ }; - /* for kvm_memory_region::flags */ + /* for kvm_userspace_memory_region::flags */ #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) -- cgit v1.2.3 From 10c5e80b2c4d67fa9a931ac57beab782cc3db2ef Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Fri, 2 Dec 2022 11:50:11 +0100 Subject: KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR The ioctls are missing an architecture property that is present in others. Suggested-by: Sergio Lopez Pascual Signed-off-by: Javier Martinez Canillas Message-Id: <20221202105011.185147-5-javierm@redhat.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 92e14823619a..a63d86be45d9 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3266,6 +3266,7 @@ valid entries found. ---------------------- :Capability: KVM_CAP_DEVICE_CTRL +:Architectures: all :Type: vm ioctl :Parameters: struct kvm_create_device (in/out) :Returns: 0 on success, -1 on error @@ -3306,6 +3307,7 @@ number. :Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, KVM_CAP_VCPU_ATTRIBUTES for vcpu device KVM_CAP_SYS_ATTRIBUTES for system (/dev/kvm) device (no set) +:Architectures: x86, arm64, s390 :Type: device ioctl, vm ioctl, vcpu ioctl :Parameters: struct kvm_device_attr :Returns: 0 on success, -1 on error -- cgit v1.2.3 From 549a715b98a13c6d05452be3ad37e980087bb081 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 7 Dec 2022 00:09:59 +0000 Subject: KVM: x86: Add proper ReST tables for userspace MSR exits/flags Add ReST formatting to the set of userspace MSR exits/flags so that the resulting HTML docs generate a table instead of malformed gunk. This also fixes a warning that was introduced by a recent cleanup of the relevant documentation (yay copy+paste). >> Documentation/virt/kvm/api.rst:7287: WARNING: Block quote ends without a blank line; unexpected unindent. Fixes: 1ae099540e8c ("KVM: x86: Allow deflecting unknown MSR accesses to user space") Fixes: 1f158147181b ("KVM: x86: Clean up KVM_CAP_X86_USER_SPACE_MSR documentation") Reported-by: kernel test robot Signed-off-by: Sean Christopherson Message-Id: <20221207000959.2035098-1-seanjc@google.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) (limited to 'Documentation/virt') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index c618fae44ad7..778c6460d1de 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6455,9 +6455,11 @@ The "reason" field specifies why the MSR interception occurred. Userspace will only receive MSR exits when a particular reason was requested during through ENABLE_CAP. Currently valid exit reasons are: - KVM_MSR_EXIT_REASON_UNKNOWN - access to MSR that is unknown to KVM - KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits - KVM_MSR_EXIT_REASON_FILTER - access blocked by KVM_X86_SET_MSR_FILTER +============================ ======================================== + KVM_MSR_EXIT_REASON_UNKNOWN access to MSR that is unknown to KVM + KVM_MSR_EXIT_REASON_INVAL access to invalid MSRs or reserved bits + KVM_MSR_EXIT_REASON_FILTER access blocked by KVM_X86_SET_MSR_FILTER +============================ ======================================== For KVM_EXIT_X86_RDMSR, the "index" field tells userspace which MSR the guest wants to read. To respond to this request with a successful read, userspace @@ -7256,11 +7258,13 @@ to inform a user that an MSR was not emulated/virtualized by KVM. The valid mask flags are: - KVM_MSR_EXIT_REASON_UNKNOWN - intercept accesses to unknown (to KVM) MSRs - KVM_MSR_EXIT_REASON_INVAL - intercept accesses that are architecturally - invalid according to the vCPU model and/or mode - KVM_MSR_EXIT_REASON_FILTER - intercept accesses that are denied by userspace - via KVM_X86_SET_MSR_FILTER +============================ =============================================== + KVM_MSR_EXIT_REASON_UNKNOWN intercept accesses to unknown (to KVM) MSRs + KVM_MSR_EXIT_REASON_INVAL intercept accesses that are architecturally + invalid according to the vCPU model and/or mode + KVM_MSR_EXIT_REASON_FILTER intercept accesses that are denied by userspace + via KVM_X86_SET_MSR_FILTER +============================ =============================================== 7.22 KVM_CAP_X86_BUS_LOCK_EXIT ------------------------------- -- cgit v1.2.3