To me, the most defining difference between “cache” and “non-cache” is the persistence and amount of it.
Everything up to and including RAM loses data when the power is turned off, and to put it bluntly, this is why hard drives, floppy drives, SSDs, and other storage have been an important part of every computer design since day one. Without them, you would have to start over with your OS and everything else around it.
RAM itself is actually just cache, albeit a lot faster than your SSD, still a lot slower than L3, L2, and L1 cache. Those cache levels sometimes also have a completely different “design”, filesystems are not really an issue there anyway, but let’s be honest: there are enough applications where a filesystem is not a core function. L1 is a very funny one, because it is sometimes divided into “instructions” and “data”. Instructions can be seen as “x, +, -, >”, etc; and data can be seen as the numbers “1” and “2”, both of which are in a ‘register’. In fact, in order to be able to execute that, a program will often be of the level: “read register with 1, read register with 2, do instruction + on registers, write result of function to register answer, where that then becomes the value “3”.
Because doing all that stuff by hand is hard, we have OSes, cache managers, compilers, and higher level languages than assembly. But the reality is that even with supermodern CPUs, there’s only about 80 kilobytes of cache per core, because that cache is pretty big on the CPU.
If we could make L1 cache of gigabytes that at the same time never loses its content on power loss, then we could come up with quite revolutionary applications, and the fundamental design of many OSes and applications would change drastically. I don’t think that will happen right away, in fact, “persistent RAM” (by putting 3D Xpoint in DIMM slots) was very poorly handled because it is such a deviation from how we work now, but given the fact that the ASIC and GPU revolution in which this HBM memory plays a role is already a revolution, I can secretly hope that someone will come up with the idea, while they are at it, to also lift an absurd load of 3D X-point onto a GPU or ASIC with an absurdly wide bus.