It hasn’t been here for a long time, but there is another incident where instability is discovered in some processors, which otherwise appear to be fully functional, in some particular application. On 13th and 14th generation Intel Core processors, issues have been reported in games using Unreal Engine 5 and one particular data compression library. It is possible that this code exposed an architectural weakness that was not caught by the validation and testing of the CPU in production.
To begin with, it must be said that it is not entirely certain whether the problem is really in the Intel processors, because the aggressive settings of the motherboards could also be to blame. The problem also seems to affect only a small part of the chips produced, so it doesn’t look like there are any big reasons to panic yet.
Crashes in Oodle and shader compilation
The issues in question are reported in Unreal Engine based games and occur when decompressing data from Oodle format from RAD Game Tools (part of Epic as of 2021) during shader compilation. According to the developers, decompression on the highest models of Raptor Lake processors creates such a load that the processor becomes unstable and the game crashes with a message that decompression failed, or so-called unpredictable behavior occurs (which can also result in a crash or freeze of the OS).
In addition to RAD Game Tools themselves, as the authors of this software, the authors of the games Darktide and Vermintide 2 also report these problems. However, there may be more games affected by crashes during shader compilation among various titles using the Unreal Engine and Oodle (for example, Fortnite, Remnant 2, Hogwarts Legacy).
Why does the crash occur when compiling shaders and decompressing them for compilation purposes? This task probably uses all CPU cores and uses them to the maximum, so the CPU reaches its maximum consumption, it can heat up very quickly, and at the same time, a sudden increase in current consumption can probably destabilize the voltage, so there is a risk of a temporary voltage drop below a critical level, when the CPU it loses stability or executes some instructions incorrectly. In other words, shader compilation works as a kind of stress test (albeit a practical application) similar to Prime95.
According to RAD Game Tools, this does not appear to be a bug in the software itself, and is likely a manifestation of hardware instability that can occur on high-power, high-frequency Raptor Lake processors when processing Oodle code. Intel pushed the frequencies very high for the Core 13th and even more so for the Core 14th generation. The K and KF series Core i9 processors seem to have the biggest problems (i9-13900K/KF, i9-14900K/KF), the i7-13700K/KF and i7-14700K/KF are less likely to have problems.
Core i7–13700K
Author: Ľubomír Samák, used with permission of the author
As a solution to the problem for owners of these processors (if they experience instability, which only a fraction of processors do) it is recommended to disable all forms of overclocking that could cause instability, including the various default motherboard settings that CPUs often they unlock or increase consumption limits or remove boost restrictions (for example, limits based on the number of loaded cores, the temperature criterion for Thermal Velocity Boost, various Multicore Enhancement functions that set single-threaded boost clock rates on all cores).
On the Intel platform, similar forms of overclocking on boards are often enabled by default, so the user can overclock without knowing it. In the case of these crashes in games, it is recommended to set the board’s BIOS to the officially recommended consumption limit values (253 W).
Game authors recommend underclocking by 100-200 MHz in case of problems
However, according to RAD Game Tools, it seems that not all occurrences of the problem are due to overclocking. As one of the solutions, if disabling various forms of overclocking is not enough, reducing the maximum processor frequencies by 100 to 200 MHz is mentioned (this is recommended by the authors of the games Vermintide 2 / Darktide and Nightingale).
It will probably still be necessary to check whether some processors actually make errors until the frequencies are reduced, which would mean a real hardware error, or that Intel has set their clocks higher than the silicon can handle. It is possible that it will eventually turn out that all these cases were ultimately due to overclocking of the CPU or RAM.
A reportedly successful workaround for many people is to use Intel XTU and lower the Performance Core multiplier from x55 to x54 or x53.
In addition to reducing the clocks, it is recommended to try setting the SVID Behavior option in the BIOS to “Intel’s Fails Safe”, directly increasing the core voltage slightly until the instability stops, or using a higher CPU Vcore Loadline Calibration setting. The goal of all these settings is to improve the stability of the processor at maximum load and frequencies. RAD Game Tools states in the issues document that issues causing instability should also be detectable by stress testing with AVX2 instructions in the Intel XTU application.
Intel doesn’t seem to have commented on these issues yet. We’ll see if they do so and maybe don’t immediately announce that the root cause has been found and a fix is planned.
The error should hopefully be fixable
Even if Raptor Lake is confirmed to be unstable with Oodle decompression, it probably won’t mean that the CPUs will be thrown away. The bugs should be fixed by updating the microcode (usually probably via a new BIOS for the board). The solution could be to insert some “breather” cycles between AVX2 instructions that will be identified as the cause of these crashes, or the power management can be modified to increase the processor voltage in these critical situations. Anyway, there should be ways to stabilize these processors by updating the board’s BIOS, so there should be no need for a recall or even a recall.
Perhaps the situation could be similar to the early Ryzen errors associated with FMA3 instructions, where the processor was also overloaded and the problem was fixed with a microcode update.
In the comments on this report, there were speculations as to whether these errors could show that the processors, due to the high frequencies to which Intel pushed them, degrade over time and lose stability even when setting the default parameters. This would probably be the most dramatic and problematic option, hopefully not a downgrade.
Resources: RAD Game Tools, Fasshark, Sebastian Castellanos, Reddit
2024-02-22 05:06:43
#Intel #Raptor #Lake #processors #unstable #games #frequency #chase #backfired #Cnews.cz