A lot of workloads are one of two types:
- Poorly scalable across cores, almost everything on 1 thread
- Good scalability across cores, almost anything across many threads
The number of workloads in between, which for example scale to 4 or 8 cores but not to more, is simply very small.
For poorly scalable workloads, you want some really fast cores. How much depends mainly on how many poorly scalable workloads you want to run simultaneously. For a game there may be 3 to 6, more often not.
Then you have the highly scalable workloads. You can of course make it faster by cramming as many large and fast cores as possible onto a chip. But that’s not the most effective way, because large cores spend disproportionately extra space to squeeze out the last bits of performance. And that space is scarce on a chip, every mm2 costs extra.
When we go to the SPEC2017 benchmarks Looking at Alder Lake, we see that a P-core has a score of 8.14 for integer operations and 14.16 for floating point. The E-core has a score of 5.25 for integer (65% of the performance) and 7.66 for floating point (54%). Roughly between half and two-thirds of the performance, depending of course on the workload.
However, the space that an E-core takes up is a lot of smaller. I can’t find the exact sizes right now, but so let’s say 4 E-cores fit on the same surface as 1 P-core. That four E-cores together have (with perfect scaling) 2.6x the integer or 2.2x the floating point performance on the same surface as 1 P-core!
So in the future I see a lot of chips with 4 to 8 performance cores for single-thread performance, and dozens of efficiency cores for multi-thread performance. And not only Intel seems to be aware of this, but AMD is also said to be working on Zen4 Dense cores.
–