summaryrefslogtreecommitdiff
path: root/drivers/thermal/intel/intel_hfi.c
AgeCommit message (Collapse)Author
2024-07-09thermal: intel: hfi: Give HFI instances package scopeZhang Rui
The Intel Software Developer's Manual defines the scope of HFI (registers and memory buffer) as a package. Use package scope(*) in the software representation of an HFI instance. Using die scope in HFI instances has the effect of creating multiple conflicting instances for the same package: each instance allocates its own memory buffer and configures the same package-level registers. Specifically, only one of the allocated memory buffers can be set in the MSR_IA32_HW_FEEDBACK_PTR register. CPUs get incorrect HFI data from the table. The problem does not affect current HFI-capable platforms because they all have single-die processors. (*) We used die scope for HFI instances because there had been processors with packages enumerated as dies. None of those systems supported HFI, though. If such a system emerged, it would need to be quirked. Co-developed-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://patch.msgid.link/20240703055445.125362-1-rui.zhang@intel.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08thermal: intel: hfi: Increase the number of CPU capabilities per netlink eventRicardo Neri
The number of updated CPU capabilities per netlink event is hard-coded to 16. On systems with more than 16 CPUs (a common case), it takes more than one thermal netlink event to relay all the new capabilities after an HFI interrupt. This adds unnecessary overhead to both the kernel and user space entities. Increase the number of CPU capabilities updated per event to 64. Any system with 64 CPUs or less can now update all the capabilities in a single thermal netlink event. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNTRicardo Neri
When processing a hardware update, HFI generates as many thermal netlink events as needed to relay all the updated CPU capabilities to user space. The constant HFI_MAX_THERM_NOTIFY_COUNT is the number of CPU capabilities updated per each of those events. Give this constant a more descriptive name. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08thermal: intel: hfi: Shorten the thermal netlink event delay to 100msRicardo Neri
The delay between an HFI interrupt and its corresponding thermal netlink event has so far been hard-coded to CONFIG_HZ jiffies (1 second). This delay is too long for hardware that generates updates every tens of milliseconds. The HFI driver uses a delayed workqueue to send thermal netlink events. No subsequent events will be sent if there is pending work. As a result, much of the information of consecutive hardware updates will be lost if the workqueue delay is too long. User space entities may act on obsolete data. If the delay is too short, multiple events may overwhelm listeners. Set the delay to 100ms to strike a balance between too many and too few events. Use milliseconds instead of jiffies to improve readability. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08thermal: intel: hfi: Rename HFI_UPDATE_INTERVALRicardo Neri
The name of the constant HFI_UPDATE_INTERVAL is misleading. It is not a periodic interval at which HFI updates are processed. It is the delay in the processing of an HFI update after the arrival of an HFI interrupt. Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-27thermal: intel: hfi: Enable HFI only when requiredStanislaw Gruszka
Enable and disable hardware feedback interface (HFI) when user space handler is present. For example, enable HFI, when intel-speed-select or Intel Low Power daemon is running and subscribing to thermal netlink events. When user space handlers exit or remove subscription for thermal netlink events, disable HFI. Summary of changes: - Register a thermal genetlink notifier - In the notifier, process THERMAL_NOTIFY_BIND and THERMAL_NOTIFY_UNBIND reason codes to count number of thermal event group netlink multicast clients. If thermal netlink group has any listener enable HFI on all packages. If there are no listener disable HFI on all packages. - When CPU is online, instead of blindly enabling HFI, check if the thermal netlink group has any listener. This will make sure that HFI is not enabled by default during boot time. - Actual processing to enable/disable matches what is done in suspend/resume callbacks. Create two functions hfi_enable_instance() and hfi_disable_instance(), which can be called from the netlink notifier callback and suspend/resume callbacks. Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-15x86/cpu/topology: Rename topology_max_die_per_package()Thomas Gleixner
The plural of die is dies. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240213210253.065874205@linutronix.de
2024-01-12thermal: intel: hfi: Add syscore callbacks for system-wide PMRicardo Neri
The kernel allocates a memory buffer and provides its location to the hardware, which uses it to update the HFI table. This allocation occurs during boot and remains constant throughout runtime. When resuming from hibernation, the restore kernel allocates a second memory buffer and reprograms the HFI hardware with the new location as part of a normal boot. The location of the second memory buffer may differ from the one allocated by the image kernel. When the restore kernel transfers control to the image kernel, its HFI buffer becomes invalid, potentially leading to memory corruption if the hardware writes to it (the hardware continues to use the buffer from the restore kernel). It is also possible that the hardware "forgets" the address of the memory buffer when resuming from "deep" suspend. Memory corruption may also occur in such a scenario. To prevent the described memory corruption, disable HFI when preparing to suspend or hibernate. Enable it when resuming. Add syscore callbacks to handle the package of the boot CPU (packages of non-boot CPUs are handled via CPU offline). Syscore ops always run on the boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend and hibernation. Syscore ops only run in these cases. Cc: 6.1+ <stable@vger.kernel.org> # 6.1+ Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> [ rjw: Comment adjustment, subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-03thermal: intel: hfi: Disable an HFI instance when all its CPUs go offlineRicardo Neri
In preparation to support hibernation, add functionality to disable an HFI instance during CPU offline. The last CPU of an instance that goes offline will disable such instance. The Intel Software Development Manual states that the operating system must wait for the hardware to set MSR_IA32_PACKAGE_THERM_STATUS[26] after disabling an HFI instance to ensure that it will no longer write on the HFI memory. Some processors, however, do not ever set such bit. Wait a minimum of 2ms to give time hardware to complete any pending memory writes. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-03thermal: intel: hfi: Enable an HFI instance from its first online CPURicardo Neri
Previously, HFI instances were never disabled once enabled. A CPU in an instance only had to check during boot whether another CPU had previously initialized the instance and its corresponding data structure. A subsequent changeset will add functionality to disable instances to support hibernation. Such change will also make possible to disable an HFI instance during runtime via CPU hotplug. Enable an HFI instance from the first of its CPUs that comes online. This covers the boot, CPU hotplug, and resume-from-suspend cases. It also covers systems with one or more HFI instances (i.e., packages). Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-03thermal: intel: hfi: Refactor enabling code into helper functionsRicardo Neri
In preparation for the addition of a suspend notifier, wrap the logic to enable HFI and program its memory buffer into helper functions. Both the CPU hotplug callback and the suspend notifier will use them. This refactoring does not introduce functional changes. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15thermal: Remove core header inclusion from driversDaniel Lezcano
As the name states "thermal_core.h" is the header file for the core components of the thermal framework. Too many drivers are including it. Hopefully the recent cleanups helped to self encapsulate the code a bit more and prevented the drivers to need this header. Remove this inclusion in every place where it is possible. Some other drivers did a confusion with the core header and the one exported in linux/thermal.h. They include the former instead of the latter. The changes also fix this. The tegra/soctherm driver still remains as it uses an internal function which need to be replaced. The Intel HFI driver uses the netlink internal framework core and should be changed to prevent to deal with the internals. No functional changes intended. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> # armada_thermal.c Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> # uniphier_thermal.c Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> # rcar_gen3_thermal.c Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> # amlogic_thermal.c Acked-by: Florian Fainelli <f.fainelli@gmail.com> # bcm2835_thermal.c Acked-by: Thierry Reding <treding@nvidia.com> # tegra30-tsensor.c Link: https://lore.kernel.org/r/20230206153432.1017282-1-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-02thermal: intel: hfi: Remove a pointless die_id checkRicardo Neri
die_id is an u16 quantity. On single-die systems the default value of die_id is 0. No need to check for negative values. Plus, removing this check makes Coverity happy. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23thermal: intel: hfi: ACK HFI for the same timestampSrinivas Pandruvada
Some processors issue more than one HFI interrupt with the same timestamp. Each interrupt must be acknowledged to let the hardware issue new HFI interrupts. But this can't be done without some additional flow modification in the existing interrupt handling. For background, the HFI interrupt is a package level thermal interrupt delivered via a LVT. This LVT is common for both the CPU and package level interrupts. Hence, all CPUs receive the HFI interrupts. But only one CPU should process interrupt and others simply exit by issuing EOI to LAPIC. The current HFI interrupt processing flow: 1. Receive Thermal interrupt 2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS 3. Try and get spinlock, one CPU will enter spinlock and others will simply return from here to issue EOI. (Let's assume CPU 4 is processing interrupt) 4. Check the stored time-stamp from the HFI memory time-stamp 5. if same 6. ignore interrupt, unlock and return 7. Copy the HFI message to local buffer 8. unlock spinlock 9. ACK HFI interrupt 10. Queue the message for processing in a work-queue It is tempting to simply acknowledge all the interrupts even if they have the same timestamp. This may cause some interrupts to not be processed. Let's say CPU5 is slightly late and reaches step 4 while CPU4 is between steps 8 and 9. Currently we simply ignore interrupts with the same timestamp. No issue here for CPU5. When CPU4 acknowledges the interrupt, the next HFI interrupt can be delivered. If we acknowledge interrupts with the same timestamp (at step 6), there is a race condition. Under the same scenario, CPU 5 will acknowledge the HFI interrupt. This lets hardware generate another HFI interrupt, before CPU 4 start executing step 9. Once CPU 4 complete step 9, it will acknowledge the newly arrived HFI interrupt, without actually processing it. Acknowledge the interrupt when holding the spinlock. This avoids contention of the interrupt acknowledgment. Updated flow: 1. Receive HFI Thermal interrupt 2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS 3. Try and get spin-lock Let's assume CPU 4 is processing interrupt 4.1 Read MSR_IA32_PACKAGE_THERM_STATUS and check HFI status bit 4.2 If hfi status is 0 4.3 unlock spinlock 4.4 return 4.5 Check the stored time-stamp from the HFI memory time-stamp 5. if same 6.1 ACK HFI Interrupt, 6.2 unlock spinlock 6.3 return 7. Copy the HFI message to local buffer 8. ACK HFI interrupt 9. unlock spinlock 10. Queue the message for processing in a work-queue To avoid taking the lock unnecessarily, intel_hfi_process_event() checks the status of the HFI interrupt before taking the lock. If CPU5 is late, when it starts processing the interrupt there are two scenarios: a) CPU4 acknowledged the HFI interrupt before CPU5 read MSR_IA32_THERM_STATUS. CPU5 exits. b) CPU5 reads MSR_IA32_THERM_STATUS before CPU4 has acknowledged the interrupt. CPU5 will take the lock if CPU4 has released it. It then re-reads MSR_IA32_THERM_STATUS. If there is not a new interrupt, the HFI status bit is clear and CPU5 exits. If a new HFI interrupt was generated it will find that the status bit is set and it will continue to process the interrupt. In this case even if timestamp is not changed, the ACK can be issued as this is a new interrupt. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Tested-by: Arshad, Adeel<adeel.arshad@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23thermal: intel: Protect clearing of thermal status bitsSrinivas Pandruvada
The clearing of the package thermal status is done by Read-Modify-Write operation. This may result in clearing of some new status bits which are being or about to be processed. For example, while clearing of HFI status, after read of thermal status register, a new thermal status bit is set by the hardware. But during write back, the newly generated status bit will be set to 0 or cleared. So, it is not safe to do read-modify-write. Since thermal status Read-Write bits can be set to only 0 not 1, it is safe to set all other bits to 1 which are not getting cleared. Create a common interface for clearing package thermal status bits. Use this interface to replace existing code to clear thermal package status bits. It is safe to call from different CPUs without protection as there is no read-modify-write. Also wrmsrl results in just single instruction. For example while CPU 0 and CPU 3 are clearing bit 1 and 3 respectively. If CPU 3 wins the race, it will write 0x4000aa2, then CPU 1 will write 0x4000aa8. The bits which are not part of clear are set to 1. The default mask for bits, which can be written here is 0x4000aaa. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-28thermal: intel: hfi: Improve the type of hfi_features::nr_table_pagesRicardo Neri
A Coverity static code scan raised a potential overflow_before_widen warning when hfi_features::nr_table_pages is used as an argument to memcpy in intel_hfi_process_event(). Even though the overflow can never happen (the maximum number of pages of the HFI table is 0x10 and 0x10 << PAGE_SHIFT = 0x10000), using size_t as the data type of hfi_features::nr_table_pages makes Coverity happy and matches the data type of the argument 'size' of memcpy(). Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18thermal: intel: hfi: remove NULL check after container_of() callHaowen Bai
container_of() will never return NULL, so remove useless code. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03thermal: intel: hfi: Notify user space for HFI eventsSrinivas Pandruvada
When the hardware issues an HFI event, relay a notification to user space. This allows user space to respond by reading performance and efficiency of each CPU and take appropriate action. For example, when the performance and efficiency of a CPU is 0, user space can either offline the CPU or inject idle. Also, if user space notices a downward trend in performance, it may proactively adjust power limits to avoid future situations in which performance drops to 0. To avoid excessive notifications, the rate is limited by one HZ per event. To limit the netlink message size, send parameters for up to 16 CPUs in a single message. If there are more than 16 CPUs, issue as many messages as needed to notify the status of all CPUs. In the HFI specification, both performance and efficiency capabilities are defined in the [0, 255] range. The existing implementations of HFI hardware do not scale the maximum values to 255. Since userspace cares about capability values that are either 0 or show a downward/upward trend, this fact does not matter much. Relative changes in capabilities are enough. To comply with the thermal netlink ABI, scale both performance and efficiency capabilities to the [0, 1023] interval. Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03thermal: intel: hfi: Enable notification interruptRicardo Neri
When hardware wants to inform the operating system about updates in the HFI table, it issues a package-level thermal event interrupt. For this, hardware has new interrupt and status bits in the IA32_PACKAGE_THERM_ INTERRUPT and IA32_PACKAGE_THERM_STATUS registers. The existing thermal throttle driver already handles thermal event interrupts: it initializes the thermal vector of the local APIC as well as per-CPU and package-level interrupt reporting. It also provides routines to service such interrupts. Extend its functionality to also handle HFI interrupts. The frequency of the thermal HFI interrupt is specific to each processor model. On some processors, a single interrupt happens as soon as the HFI is enabled and hardware will never update HFI capabilities afterwards. On other processors, thermal and power constraints may cause thermal HFI interrupts every tens of milliseconds. To not overwhelm consumers of the HFI data, use delayed work to throttle the rate at which HFI updates are processed. Use a dedicated workqueue to not overload system_wq if hardware issues many HFI updates. Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03thermal: intel: hfi: Handle CPU hotplug eventsRicardo Neri
All CPUs in a package are represented in an HFI table. There exists an HFI table per package. Thus, CPUs in a package need to coordinate to initialize and access the table. Do such coordination during CPU hotplug. Use the first CPU to come online in a package to initialize the HFI instance and the data structure representing it. Other CPUs in the same package need only to register or unregister themselves in that data structure. The HFI depends on both the package-level thermal management and the local APIC thermal local vector. Thus, to ensure that a CPU coming online has an associated HFI instance when the hardware issues an HFI event, enable the HFI only after having enabled the local APIC thermal vector. The thermal throttle driver takes care of the needed package-level initialization. Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03thermal: intel: hfi: Minimally initialize the Hardware Feedback InterfaceRicardo Neri
The Intel Hardware Feedback Interface provides guidance to the operating system about the performance and energy efficiency capabilities of each CPU in the system. Capabilities are numbers between 0 and 255 where a higher number represents a higher capability. For each CPU, energy efficiency and performance are reported as separate capabilities. Hardware computes these capabilities based on the operating conditions of the system such as power and thermal limits. These capabilities are shared with the operating system in a table resident in memory. Each package in the system has its own HFI instance. Every logical CPU in the package is represented in the table. More than one logical CPUs may be represented in a single table entry. When the hardware updates the table, it generates a package-level thermal interrupt. The size and format of the HFI table depend on the supported features and can only be determined at runtime. To minimally initialize the HFI, parse its features and allocate one instance per package of a data structure with the necessary parameters to read and navigate a local copy (i.e., owned by the driver) of individual HFI tables. A subsequent changeset will provide per-CPU initialization and interrupt handling. Reviewed-by: Len Brown <len.brown@intel.com> Co-developed by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>