Saturday, December 19, 2020

Apple M1 foreshadows Rise of RISC-V

Likewise with these interrupts we can send complex machine learning tasks to the M1 Neural Engine to say identify a face on the WebCam. Simultaneously the rest of the computer is responsive because the Neural Engine is chewing through the image data in parallel to everything else the CPU is doing.

RISC-V based board from SiFive capable of running Linux

The Rise of RISC-V

Back in 2010 at UC Berkley the Parallel Computing Laboratory saw the development towards heavier use of coprocessors. They saw how the end of Moore’s Law meant that you could no longer easily squeeze more performance out of general purpose CPU cores. You needed specialized hardware: Coprocessors.

Let us reflect momentarily on why that is. We know that the clock frequency cannot easily be increased. We are stuck on close to 3–5 GHz. Go higher and Watt consumption and heat generation goes through the roof.

However we are able to add a lot more transistors. We simply cannot make the transistors work faster. Thus we need to do more work in parallel. One way to do that is by adding lots of general purpose cores. We could add lots of decoders and do Out-of-Order Execution (OoOE) as I have discussed before: Why Is Apple’s M1 Chip So Fast?

You can keep playing that game and eventually you have 128 general cores like the Ampere Altra Max ARM processor. But is that really the best use of our silicon? For servers and cloud services lots of cores work well. But for desktop computing this has limited utility.

Instead of spending all that silicon on more CPU cores, perhaps we can add more coprocessors instead?

Think about it this way: You got a transistor budget. In the early days, maybe you had a budget of 20 000 transistors and you figured you could make a CPU with 15 000 transistors. That is close to reality in the early 80s. Now this CPU could do maybe 100 different tasks. Say making a specialized coprocessor to one fo these tasks cost you 1000 transistors. If you made a coprocessor for every task you would get to 100 000 transistors. That would blow your budget.

Thus in early designs you need to focus on general purpose computing. But today, we can stuff chips with so many transistors, we hardly know what to do with them.

Thus designing coprocessors has become a big thing. A lot of research goes into making all sorts of new coprocessors. However these tend to contain pretty dumb accelerators which needed to be babied. Unlike a CPU they cannot read instructions which tells them all the steps to do. They don’t generally know how to access memory and organize anything.

Thus the common solution to this is to have a simple CPU as a sort of controller. So the whole coprocessor is some specialized accelerator circuit controlled by a simple CPU, which configures the accelerator to do its job. Usually this is highly specialized. For instance, something like a Neural Engine or Tensor Processing Unit deal with very large registers that can hold matrices (rows and columns of numbers).

This is exactly what RISC-V got designed for. It has a bare minimum instruction-set of about 40–50 instructions which lets it do all the typical CPU stuff. It may sound like a lot, but keep in mind that an x86 CPU has over 1500 instructions.

Instead of having a large fixed instruction-set, RISC-V is designed around the idea of extensions. Every coprocessor will be different. It will thus contain a RISC-V processor to manage things which implements the core instruction-set as well as an extension instruction-set tailor made for what that co-processor needs to do.

Okay, now maybe you start to see the contours of what I am getting at. Apple’s M1 is really going to push the industry as whole towards this coprocessor dominated future. And to make these coprocessors, RISC-V will be an important part of the puzzle.

But why? Can’t everybody making a coprocessor just invent their own instruction-set? After all that is what I think Apple has done. Or possibly they use ARM. I have no idea. If somebody knows, please drop me a line.

What is the Benefit of Sticking with RISC-V for Coprocessors?

Making chips have become a complicated and costly affair. Building up tools to verify your chip. Run tests programs, diagnosis and a host of other things requires a lot of effort.

This is part of the value of going with ARM today. They have a large ecosystem of tools to help verify your design and test it.

Going for custom proprietary instruction-sets is thus not a good idea. However with RISC-V there is a standard which multiple companies can make tools for. Suddenly there is an eco-system and multiple companies can share the burden.

But why not just use ARM which is already there? You see ARM is made as a general purpose CPU. It has a large fixed instruction-set. After pressure from customers and RISC-V competition ARM has relented and in 2019 opened its instruction-set for extensions.

Still the problem is that it wasn’t made for this from the onset. The whole ARM toolchain is going to assume you got the whole large ARM instruction set implemented. That is fine for the main CPU of a Mac or an iPhone.

But for a coprocessor you don’t want or need this large instruction-set. You want an eco-system of tools that have been built around the idea of a minimal fixed base instruction-set with extensions.

Why is that such a benefit? Nvidia’s use of RISC-V offers some insight. On their big GPUs they need some kind of general purpose CPU to be used as a controller. However the amount of silicon they can set aside for this, and the amount of heath it is allowed to produce is minimal. Keep in mind that lots of things are competing for space.

Because RISC-V has such a small and simple instruction-set it beat all the competition, including ARM. Nvidia found they could make smaller chips by going for RISC-V than for anybody else. They also reduced watt usage to a minimum.

Thus with the extension mechanism you can limit yourself to adding only the instructions crucial for the job you need done. A controller for a GPU likely needs other extensions than a controller on an encryption coprocessor e.g.

ARM Will be the New x86

Thus ironically we may see a future where Macs and PCs are powered by ARM processors. But where all the custom hardware around them, all their coprocessors will be dominated by RISC-V. As coprocessor get more popular more silicon in your System-on-a-Chip (SoC) may be running RISC-V than ARM.

Read more: RISC-V: Did Apple Make the Wrong Choice?

When I wrote the story above, I had not actually fully grasped what RISC-V was all about. I though the future would be about ARM or RISC-V. Instead it will likely be ARM and RISC-V.

General purpose ARM processors will be at the center with an army of RISC-V powered coprocessors accelerating every possible task from graphics, encryption, video encoding, machine learning, signal processing to processing network packages.

Prof. David Patterson and his team at UC Berkeley saw this future coming and that is why RISC-V is so well tailored to meet this new world.

We are seeing such a massive uptake and buzz around RISC-V in all sorts of specialized hardware and micro-controllers that I think a lot of the areas dominated by ARM today will go RISC-V.

Raspberry Pi 4 Microcontroller, currently using an ARM processor.

Imagine something like Raspberry Pi. Now it runs ARM. But future RISC-V variants could offer a host of variants tailored for different needs. There could be machine learning microcontrollers. Another can be image processing oriented. A third could be for encryption. Basically you could pick your own a little micro-controller with its own little flavor. You may be able to run Linux on it and do all the same tasks, except the performance profile will be different.

RISC-V microcontrollers with special machine learning instructions will train neural networks faster than the RISC-V microcontroller with instructions for video encoding.

Nvidia has already ventured down that path with their Jetson Nano, shown below. It is a Raspberry Pi sized microcontroller with specialized hardware for machine learning, so you can do object detection, speech recognition and other machine learning tasks.

NVIDIA Jetson Nano Developer Kit.

Share Your Thoughts

Let me know what you think. There is a lot going on here which is hard to guess. We see e.g. now there are claims of RISC-V CPUs which really beats ARM on watt and performance. This also makes you wonder if there is indeed a chance that RISC-V becomes the central CPU of computers.

I must admit it has not been obvious to me why RISC-V would outperform ARM. By their own admission, RISC-V is a fairly conservative design. They don’t use much instructions which have not already been used in some other older design.

However there seems to be a major gain from pairing everything down to a minimum. It makes it possible to make exceptionally small and simple implementations or RISC-V CPUs. This again makes it possible to reduce Watt usage and increase clock frequency.

Hence the last word on RISC-V and ARM is not yet said.



from Hacker News https://ift.tt/3nDTSNy

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.