Tuesday, June 28, 2022

The Path Is Set for PCI-Express 7.0 in 2025

The ink is barely dry on the PCI-Express 6.0 specification, which was released after years of development in January 2022, we hardly have PCI-Express 5.0 peripherals in the market, and the PCI-SIG organization that controls the PCI-Express standard for peripheral interconnects already has us all coveting the bandwidth that will come later in the decade with PCI-Express 7.0 interconnects.

With I/O becoming ever more central to system architecture, the Peripheral Component Interconnect-Special Interest Group (PCI-SIG) body that drives the peripheral bus in systems is always looking out a lot further into the future to find the materials and new signaling and encoding methods that will keep the bandwidth improvements on the PCI-Express bus growing at a reasonably steady cadence. During the hegemony of the Intel Xeon processor in the datacenter in the 2010s, Intel did not have a lot of competition in processors and therefore there was not enough pressure to keep PCI-Express moving at something averaging around a three year cadence. And so the move from PCI-Express 3.0 to PCI-Express 4.0 took seven years.

To be fair, there were some pretty serious materials science and signaling barriers at the same time, which also hit datacenter switch and router ASICs and caused all kinds of issues with the normal bandwidth increases we see have seen historically in inter-node interconnects.

The good news is that both PCI-Express interconnects for inside systems and now across a few racks and the Ethernet and InfiniBand interconnects that span racks and whole datacenters are both picking up the innovation pace. It is beginning to look like there will be a three year cadence that, hopefully, all vendors and customers can line up against. (It might be more like 30 months than 36 months. We shall see.) It was beginning to look like it might be the fast paced two year cadence we saw in the move from PCI-Express 4.0 to PCI-Express 5.0, but perhaps that was a bit optimistic. That optimism was reflected in our August 2020 coverage of the PCI-Express 6.0 specification as it was moving towards ratification.

What we know for sure is that it can never take seven years again to do a PCI-Express speed hike. We also know that Intel is most definitely not alone in the CPU driver’s seat and needs PCI-Express to keep advancing steadily to support CXL-connected accelerators, storage, and main memory as much as any other compute engine vendor, and so it is now pushing PCI-Express as hard as others have been pulling it for years.

All’s well that ends better.

What we also know is that the advances in PCI-Express 6.0 lay a good foundation for PCI-Express to keep rolling well out into the next decade. That foundation includes the PAM-4 signaling that has made cheaper and cooler 100 Gb/sec Ethernet and InfiniBand possible and that laid the foundation for 200 Gb/sec, 400 Gb/sec, and now 800 Gb/sec ports on switches and routers. But it also includes lightweight forward error correction (FEC) that is necessary because signals are progressively fuzzier as bandwidth goes up with the addition of PAM-4. And of course, the new flow control unit, or FLIT, way of encoding each bit that is radically different from how it has been done in the past on the PCI, PCI-X, and PCI-Express buses.

We have tweaked this bandwidth chart from PCI-SIG, which does not show PCI-Express 6.0 being released in 2022, but in 2021, which is incorrect.

We said this two years ago, and it bears repeating now. On switch ASICs with PAM-4 encoding, there is a 100 nanosecond or so overhead that comes with forward error correction. The PCI-Express bus cannot sustain such a latency hit, and the PCI-Express 6.0 spec said it had to be under 10 nanoseconds, and in fact, the goal was to keep it down to 1 nanosecond or maybe 2 nanoseconds. And the engineers came up with the FLIT method of checking and encoding bits that overlays PAM-4 and that meets this ambitious – some might have said crazy – goal for error correction without a massive latency penalty.

As far as we can tell, they did it, but we won’t know for sure until the first PCI-Express 6.0 devices hit the streets in maybe early 2023 to late 2024. It usually takes 12 months to 18 months for new devices supporting the spec to get into the field, but a lot depends on when the CPUs get each generation, since that drives the peripherals. The desire to move to CXL main memory is pretty strong, and that requires lots of bandwidth and low latency, so we think engineers will be working on the PCI-Express specifications for 7.0, 8.0, and 9.0 with the mind of energy we have not seen in the past, and there will be a lot more of them, too, which increases the odds of breakthroughs.

The PCI-SIG has not released a lot of information about what the plan is for PCI-Express 7.0, but it will employ the same PAM-4 signaling and not move to PAM-8 or PAM-16 encoding, which the network ASIC folks have not moved to yet, either. (That could come in a few years, though, if clock speeds hit some walls.) A single lane of PCI-Express 7.0 will run at 128 Gb/sec without encoding overhead, which is four times PCI-Express 5.0 and two times PCI-Express 6.0, which was just ratified in January. (That was later than expected, but not hugely so.)

Here is how the PCI-Express lanes map out over the generations:

At these bandwidths, you can see why everyone is excited about the prospect of having the PCI-Express bus replace DDR4 memory controllers are we know them, or at the very least, augmenting memory bandwidth on CPUs with CXL-attached memory. At the bandwidth and latencies that the PCI-SIG has been able to drive and is expected to drive, why shouldn’t there be one less thing to design in a CPU? Why shouldn’t there be generic PCI-Express controllers that can be used to implement memory, NUMA buses for CPU interconnect in a shared memory system, and peripheral attachment?

Here is what the past of the PCI, PCI-X, and PCI-Express speed jumps have looked like, and how we can roughly project out with a three-year cadence for specifications:

The PCI-Express 7.0 spec is not expected to be ratified until 2025, and that means we won’t see it appearing in systems until 2026 or 2027. That’s a long way off, of course. Beyond that, it is hard to say what will happen with electrical signaling for peripherals and we might find ourselves in a world where CXL is running over optical links, some with outboard lasers and some with silicon photonics on the die.

But assuming electrical signaling can keep moving ahead – it is a better than even assumption that it can – then PCI-Express 10.0 should be in products in 2035 or 2036 and should be driving 1 Tb/sec signaling lanes and 4 TB/sec across an x16 duplex slot in a server. If we even have a thing called a “server” then, that is. By then, a server might be an abstraction of interconnected components, with an interconnect hypervisor standing in for a printer circuit motherboard and slots.



from Hacker News https://ift.tt/HxhgaGQ

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.