KVM: x86: Expose TSC offset controls to userspace

To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.

A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.

Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-8-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit is contained in:
Oliver Upton
2021-09-16 18:15:38 +00:00
committed by Paolo Bonzini
parent 58d4277be9
commit 828ca89628
4 changed files with 178 additions and 0 deletions

View File

@@ -161,3 +161,60 @@ Specifies the base address of the stolen time structure for this VCPU. The
base address must be 64 byte aligned and exist within a valid guest memory
region. See Documentation/virt/kvm/arm/pvtime.rst for more information
including the layout of the stolen time structure.
4. GROUP: KVM_VCPU_TSC_CTRL
===========================
:Architectures: x86
4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
:Parameters: 64-bit unsigned TSC offset
Returns:
======= ======================================
-EFAULT Error reading/writing the provided
parameter address.
-ENXIO Attribute not supported
======= ======================================
Specifies the guest's TSC offset relative to the host's TSC. The guest's
TSC is then derived by the following equation:
guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
This attribute is useful for the precise migration of a guest's TSC. The
following describes a possible algorithm to use for the migration of a
guest's TSC:
From the source VMM process:
1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0),
kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0).
2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
guest TSC offset (off_n).
3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
guest's TSC (freq).
From the destination VMM process:
4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds
(k_0) and realtime nanoseconds (r_0) in their respective fields.
Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
structure. KVM will advance the VM's kvmclock to account for elapsed
time since recording the clock values.
5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_1) and
kvmclock nanoseconds (k_1).
6. Adjust the guest TSC offsets for every vCPU to account for (1) time
elapsed since recording state and (2) difference in TSCs between the
source and destination machine:
new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1
7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
respective value derived in the previous step.