Tuesday, August 25, 2020

Lightmatter's Mars SoC Bends Light to Process Data

Slow transistors got you down? Just shoot lasers through silicon instead. That's the fundamental idea behind silicon photonics-based computing, and it appears to be coming closer to reality. Lightmatter takes that approach with its new Mars SoC that's designed specifically for AI inferencing workloads, but the tech could eventually bleed out to general-purpose chips. The company unveiled its working Mars test chip, which bends light generated from a laser to enable computation, at Hot Chips 2020. The chip touts some impressive specs. 

The new Mars SoC marks several fundamental steps forward that could bring optical computing closer to the mainstream, and the company even shared a picture of a large wafer-scale optical device (above) that could hold multiple SoCs (below).

Lightmatter claims the photonics core operates at the speed of light through an optical tensor core, boosting bandwidth by a factor of ten while reducing latency from the typical 100ns with electronics-based chips to a staggering 100 picoseconds (a 1000X improvement). The chip can be made with standard CMOS manufacturing processes, meaning it doesn't require exotic materials and can be made in existing fabs. Unlike quantum chips, it can run at standard temperatures, too. 

Image 1 of 3

(Image credit: Lightmatter via Hot Chips 2030)
Image 2 of 3

(Image credit: Lightmatter via Hot Chips 2020)
Image 3 of 3

(Image credit: Lightmatter via Hot Chips 2020)

The core pulls only one microwatt of power, which is yet another 1000X improvement over the typical one milliwatt of power consumed by an electronics-based chip. The photonic module is part of a 3D-stacked device that includes a laser and a 14nm ASIC to handle digital work like I/O operations. The optical core itself uses almost zero power for computation, but zooming out to the SoC level, the entire device sucks a mere 3W under load. Lightmatter claims to have pulled off this feat within the same die area as a comparable transistor-powered chip – meaning it will be a similarly compact solution compared to a normal processor.  

The Mars SoC sets the stage for radical advances in compute efficiency, but the company hasn't shared final performance data yet outside of saying the end product is three orders of magnitude faster than electronics-based devices. The finished SoC rides on a standard PCIe-attached test device for now. Still, the company teased a wafer-scale switched optical interconnect that could house multiple photonics units, and other elements like memory, connected via CoW (Chip-on-Wafer) 3D stacking. That means these chips are moving closer to actual use in the real world, and the company outlined a few future improvements that could lead to even more impressive performance in the future. 

Lightmatter shared in-depth details of the design during its Hot Chips presentation, and we've down our best to boil it down to understandable terms below. 

Image 1 of 9

(Image credit: Lightmatter via Hot Chips 2020)
Image 2 of 9

(Image credit: Lightmatter via Hot Chips 2020)
Image 3 of 9

(Image credit: Lightmatter via Hot Chips 2020)
Image 4 of 9

(Image credit: Lightmatter via Hot Chips 2020)
Image 5 of 9

(Image credit: Lightmatter via Hot Chips 2020)
Image 6 of 9

(Image credit: Lightmatter via Hot Chips 2020)
Image 7 of 9

(Image credit: Lightmatter via Hot Chips 2020)
Image 8 of 9

(Image credit: Lightmatter via Hot Chips 2020)
Image 9 of 9

(Image credit: Lightmatter via Hot Chips 2020)

The rationale for switching to optical computing is pretty simple: The rate of frequency improvements from moving to smaller, denser process nodes has declined, so performance gains are becoming less pronounced with each new generation of chips. While we're approaching the fundamental limits of switching efficiency in transistors, photons don't have to play by the same rules. To reset the performance clock, Lightmatter created a multi-chip design that fuses the benefits of transistor density (it still uses an ASIC as part of the solution) with the speed and efficiency of optical computing. 

It all starts in the MZI (Mach Zehnder Interferometer, first image). A laser shoots a beam of photons into the device, which has a silicon waveguide that directs the light (yes, light can travel through silicon). The waveguide splits the stream into two beams, and the basic concept here is to create a different phase shift for each beam. This creates either a constructive or deconstructive interference when the two beams come back together at the end of the waveguide, which is then observed/measured as the output. Sounds simple enough, right?

The real innovation comes in creating the phase shifter. Lightmatter had a few options of how to do this, but found that simply bending the silicon waveguides, and thus the light traveling through them, creates a phase shift while still meeting the company's power and speed requirements. The company uses a nano-optical electro-mechanical system (NOEMS - yeah, that's a mouthful) to bend the waveguides in a pretty innovative way. 

The waveguides are suspended in the air, and then a charge is applied to a group of surrounding capacitors, which causes the waveguides to bend. Lightmatter says this technique requires very little power ("nearly zero" - leakage is minimal), and the capacitors can operate at several hundred GHz.   

With this basic building block in hand, the company creates more complex structures via directional couplers that combine input signals into pairs, with the end result being the ability to do matrix vector multiply functions. 

These structures are then combined into larger arrays (scales to 1000's) to create more computational power, and the latency of the data traveling through the arrays is, well, the speed of light. The end result is a 64X64 matrix * 64 element vector that can do a eight operations per (the equivalent of) a cycle. Lightmatter hasn't specified the overall clock speed, but says it is "GHz." 

Bandwidth through the array weighs in at terahertz, so the other electronics in the device become the limiting factor. Data is fed into the device with a fairly standard technique of using voltage to manipulate the laser, and the light exiting the device is fed into a series of converters that bring it back into digital code. That's the obvious bottleneck, and conversion consumes most of the power for the end device.  

Image 1 of 2

(Image credit: Lightmatter via Hot Chips 2020)
Image 2 of 2

(Image credit: Lightmatter via Hot Chips 2020)

The photonic devices' compute performance scales with area, just like in normal chips, so stacking up more arrays creates more performance. Latency also increases with more units, too, but Lightmatter claims it is still well under a nanosecond of latency for a 1000x1000 array of units, which is 3X lower than standard chips.

Power used inside the array is negligible, and while laser power is typically the largest contributor to power consumption in photonic chips, it only consumes a few milliwatts. Lightmatter says the only meaningful power consumption comes from converting the data to and from the optical signal, which happens on either side of the array. 

In fact, adding more photonic compute units to the array increases efficiency - you get quadratic performance scaling compared to power consumption increases. In other words, adding a unit will give you four times more performance compared to the amount of additional power consumed. In contrast, performance and power scale linearly with standard transistor-based chips, so efficiency gains aren't as pronounced. 

Image 1 of 4

(Image credit: Lightmatter via Hot Chips 2020)
Image 2 of 4

(Image credit: Lightmatter via Hot Chips 2020)
Image 3 of 4

(Image credit: Lightmatter via Hot Chips 2020)
Image 4 of 4

(Image credit: Lightmatter via Hot Chips 2020)

Lightmatter combines the 90mm2 photonic compute unit, which uses a standard GlobalFoundries 12nm photonic process, with a 50mW laser and a 14nm ASIC (50mm2, 30MB of SRAM) into a 3D-stacked multi-chip module. All connected by a low-power analog I/O interface that reduces data travel to 1mm.  

The entire device measures 150mm2 and total latency measures under 200 picoseconds (that only includes analog and optical, not digital conversion). The ASIC handles some AI operations that the photonics core isn't well suited for, and also provides connections to external interfaces. The net effect is an SoC that has a 3W TDP, and it runs at standard data center operating temperatures. 

For now, the test chip rides on a PCIe-connected device, but as shown at the top of the article, it will eventually ride along with many others on a massive wafer-scale dynamically switched optical interconnect. The photonics units will be mounted on the wafer using CoW (Chip-on-Wafer) 3D stacking. That will help address the power consumption concerns associated with data movement, which usually consumes more power than computation. Lightmatter claims that, using photonics, data transfers can be reduced from using tens of watts to single-digit microwatts.

Lightmatter says the devices will interface with all standard deep learning frameworks, like TensorFlow, PyTorch, and ONNX, compilers, and model exchange formats. 

(Image credit: Lightmatter via Hot Chips 2020)

In the end, Lightmatter says the device is incredibly fast and power efficient, but there's room to grow, too. Even with today's fiber optic systems, different wavelengths and colors can allow encoding multiple streams of data into one stream, thus multiplying performance. Lightmatter says those same techniques could eventually be used in its photonics cores to multiply the performance of the device. 

The dream of optical computing, at least in a cost-effective enough manner to see wide deployment, has long eluded the industry. Lightmatter's design uses standard CMOS manufacturing techniques, so it could conceivably be etched out on standard wafers. If the product makes it out of the lab, naturally, we'd expect the leading devices to be quite expensive, but those could be offset with power savings in both computation and data movement.  

The company hasn't shared hard performance data yet, instead saying the solution is three orders of magnitude faster than electronics-based solutions, or final clock rates, but says it plans to share more information as it works its product closer to market. Lightmatter says that production units arrive in the fall of 2021.



from Hacker News https://ift.tt/2Ea4XnW

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.