Saturday, July 15, 2023

Linux Code for “Device Memory TCP” – Network to/from Accelerator RAM

Google engineers have published early code around "Device Memory TCP" (a.k.a. Devmem TCP) as a proposal for transferring data to/from device memory efficiently by avoiding the need to copy the data to a host memory buffer.

The Device Memory TCP initiative aims to help avoid costly device memory transfers needed when moving large amounts of training data from storage into GPU/TPU memory, exchanging data with accelerator devices on different networked systems, and moving data to/from remote SSDs that are needed by accelerators but not needing any host processing. If Devmem TCP takes flight, it will allow the accelerator/GPU/device memory to be exposed directly to the network for inbound/outbound traffic.

accelerator chip next to switches

Most operations today involve device-to-host or host-to-device copies or host-to-host network transfers while Device Memory TCP hopes to radically improve the situation. The proposed Device Memory TCP would allow Linux to have socket APIs allowing for sending device memory across the network directly as well as to receive incoming packets into the device memory. Devmem TCP would save a lot of host memory bandwidth pressure as well as PCI Express bandwidth pressure. Google engineers have achieved a ~96.6% link rate speed with data being sent to/from the device memory directly with this proposed kernel code.

More details on the Device Memory TCP proposal with the initial "request for comments" Linux patches can be found on the dri-devel list.



from Hacker News https://ift.tt/uI5h9j1

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.