Recently I had the need to use several different DOS VMs that all used a SMB network client. Although I did not use networking heavily, I noticed that there are massive differences in performance between the VMs. Copying a circa 42 MB file would take anywhere between about 5 seconds and 49 minutes (not a typo). What’s more, some VMs were fast in both directions, while others were very slow sending and yet others were very slow receiving.
Since in all cases the VMs communicated with the same server (Synology NAS running Samba) from the same host (AMD Ryzen 3800X running Windows 10 and VirtualBox), there really should not be that much performance variation, and yet there it was.
In all cases, NetBIOS over TCP/IP was used as the protocol underlying SMB, and it should be said that TCP/IP greatly complicates the picture. I used three different software stacks, mostly to get some sanity checking:
- Microsoft Network Client 3.11 with included TCP/IP
- IBM DOS LAN Services (DLS) version 4.0 with IBM TCP/IP
- IBM DOS LAN Services 5.0 with Network TeleSystems TCP/IP
The SMB clients are in all cases very similar and in fact nearly identical. But the TCP/IP stacks are obviously different, and it matters.
It should be underscored that although the Microsoft Network Client and IBM DLS may make the impression of an integrated product, they are not. There are three separate layers:
- The SMB client and DOS redirector
- NetBIOS, either direct, over IPX, or over TCP/IP
- NDIS network driver
The layers communicate through well defined interfaces and are interchangeable. From the very beginning (1984-85), the Microsoft and IBM SMB clients ran on top of NetBIOS and didn’t much care about the underlying machinery. It could be the NETBIOS in the actual ROM of the IBM PC Network Adapter, it could be the Token Ring NetBIOS emulator, it could be NetWare’s NetBIOS emulator, it could be be any of the numerous NetBIOS over TCP/IP implementations.
Similarly the NDIS interface is well abstracted and does not require an actual NDIS driver. There could be a packet driver or a Novell ODI driver actually controlling the network hardware and exposing an NDIS interface through a shim. The upper layers don’t care.
Same or Different?
The Microsoft Networks Client version 3.11 was the final DOS-based network client from Microsoft. It was freely available, had TCP/IP support with DHCP, and it worked pretty well. The TCP/IP stack was developed by Hewlett-Packard going back to the mid-1980s, and was once available as Hewlett-Packard ARPA Services for DOS. Microsoft first shipped the HP TCP/IP stack with the LAN Manager 2.1 client in 1991, and the final version from 1995 is not very different.
IBM was a little late to the TCP/IP game but in 1994, IBM LAN Server 4.0 shipped DOS LAN Services that could work with a separately installed IBM TCP/IP package (also provided with LAN Server 4.0). IBM offered TCP/IP version 2.1.1 for DOS, and much like the OS/2 offering, the product was chopped up into several separately orderable packages. Apart from the required base kit, there was a NFS kit, NetBIOS over TCP/IP kit, and a programming kit. LAN Sever 4.0 shipped with just the base (which included NDIS and ODI drivers) and NetBIOS over TCP/IP.
In 1995, IBM’s Warp Server 4.0 (which somewhat confusingly included LAN Server 5.0) shipped with an updated DLS 5.0. The new DOS LAN Services included a completely different TCP/IP stack from Network TeleSytstems (NTS). The NTS TCP/IP stack for DOS survived for a good while as part of Norton Ghost boot disks, well into the early 2000s.
Starting Numbers
Now let’s see some numbers. The first table shows how long it takes to copy a 42 MB compressed archive from and to a file server using the default configuration of the three DOS-based SMB clients. Again, the actual SMB clients are very very similar, but the underlying TCP/IP stacks are three completely unrelated implementations.
Server to DOS client | DOS client to server | |
MS Net 3.11 + MS TCP/IP 1.0a | 4.2s | 48m 43s |
IBM DLS 4.0 + IBM TCP/IP 2.1.1 | 3m 22s | 3m 5s |
IBM DLS 5.0 + NTS TCP/IP 3.18 | 19s | 4.6s |
Note that the copy performance in one direction is in no way predictive of copy performance in the other direction. The MS client has the fastest server to client times but incredibly awful performance copying from client to server. DLS 4.0 has uniformly poor performance in both directions. DLS 5.0 takes several times longer than the MS client when copying from the server, but it’s quick copying to the server.
So what is going on? Wireshark to the rescue…
TCP/IP and SMB Tunables
Anyone who has had the pleasure of analyzing poor TCP/IP performance knows that there are lots of variables and a good deal of complexity. It gets even more complicated when SMB is thrown into the mix.
SMB Read/Write RAW
For a long time, SMB has had the capability to use “raw” read and write requests. Raw SMB requests are large block transfers that actually somewhat bypass the SMB protocol. Raw SMB transfers can be close to 64K in size and are notable for not including the SMB header in most packets (that’s why they’re called raw).
Note that raw SMB is a concept orthogonal to the underlying protocol (TCP/IP, NetBEUI, etc.). When a raw request is initiated, the sending side produces a stream of data packets with no SMB headers until the entire raw request is completed. Raw SMB requests obviously cannot be mixed.
DLS 4.0 with IBM TCP/IP 2.1.1
After staring at the packet traces for a while and attempting various tweaks to both the DOS LAN Requester and the TCP/IP stack (the \TCPDOS\ETC\TCPDOS.INI has quite a lot of poorly documented tunables), I was not able to make any headway.
There was no obvious problem visible in the packet traces, notably no packet loss or retransmits. The DOS client was using SMB raw reads and writes, but somehow the transfers were just… slow. That is to say, the DOS client simply took relatively long (several milliseconds) to respond to network traffic. That very quickly adds up.
I know that the IBM DOS TCP/IP stack is not inherently slow and it performs reasonably well with NFS.
Given that this version of DLS was superseded and the DLS 4.0 + TCP/IP combination was surprisingly painful to install and configure, I gave up on it.
Note: After installing IBM’s TCP/IP 2.1.1 on top of PC DOS 2000, I was initially unable to get anywhere at all because every attempt to configure the default route failed with the following message:
Error: The route you are attempting to add already exists
even though the route most definitely did not exist. The answer was unexpectedly found in this document—the core DOSTCP.SYS driver needs to be updated to work with PC DOS 7 (there is no such problem with PC DOS 5.0). Updating the entire TCP/IP base and NetBIOS kits made no impact on the poor performance.
DLS 5.0 with NTS TCP/IP
Recall that DLS 5.0 turned out to have quite good send performance, but noticeably slower receive. Examining a Wireshark trace shows that sends are fast because they’re done efficiently using raw SMB requests. The DOS client sends 1,514-byte Ethernet frames with 1,460 bytes of payload in each TCP packet. Since the receiving side (file server) has a 65,535 byte TCP receive window which is larger than the raw SMB write size (65,024 bytes), the DOS client is able to blast out the entire block of data as fast as it can.
But on the receiving side, DLS 5.0 is not using raw SMB read. It’s instead using normal SMB read requests, which results in a lot of ping-pong traffic:
SMB Read Request →
TCP ACK ←
SMB Read Response ←
TCP ACK →
SMB Read Request →
TCP ACK ←
…
Each SMB Read Response only contains 1,024 bytes of payload. This is clearly not terribly efficient, and certainly nowhere near efficient as a SMB raw read.
The problem should be unrelated to the underlying TCP/IP transport, since it is the SMB client and not NetBIOS deciding whether raw reads will be used.
There’s a good number of tunables in DLS’s NETWORK.INI file. And since this was an IBM product, it’s quite well documented, in fact vastly better than Microsoft’s client.
This 1994 document (Understanding Performance Tuning Theory for IBM OS/2 LAN Server) explains in reasonable detail how “work buffers” and “big buffers” are used in SMB clients and servers.
This 1996 document (Inside OS/2 Warp Server, Volume 1: Exploring the Core Components) provided updated information for OS/2 Warp Server.
And finally this 1997 document (Network Clients for OS/2 Warp Server: OS/2 Warp 4, DOS/Windows, Windows 95/NT, and Apple Macintosh) covers among other things DLS 5.0 and the included NTS TCP/IP stack.
There I learned about the autocache
, numbigbuf
, sizbigbuf
, numworkbuf
, and sizworkbuf
parameters. But changing those in NETWORK.INI made no impression on the DOS client, except some combinations made it crash.
More or less by accident I discovered something that all the IBM documents do not mention: I was using the protected-mode redirector (predir
) which is meant to be equivalent to the full
redirector but use less conventional memory. But somehow it’s not quite equivalent.
As soon as I switched to the full
redirector, two things happened: Conventional memory usage went up by a couple dozen kilobytes, and suddenly the DOS client started issuing SMB raw read requests. This sped up large reads quite noticeably.
I also tried increasing the TCP window size for the NTS TCP/IP stack, but that did not bring any noticeable improvement.
At this point I did not try tweaking the redirector tunables further, in part because the performance seemed quite good, and in part because I could find no way to show what settings the DOS redirector is currently using. Besides (maybe) changes in performance, the only way to check if changing the settings made any difference is to see whether memory usage went up or down.
Microsoft Network Client 3.11
Looking at packet traces of Microsoft’s network client showed that, rather unsurprisingly, reads work quite well with using SMB raw reads, large packets, and no packet loss, but writes show clear problems.
Wireshark was showing frequent “TCP previous segment not captured” warnings on the outgoing (DOS client) side, followed by duplicate ACKs and retransmissions. Since this was using the built-in VirtualBox packet tracing, it’s not plausible the packets would fail to be captured. Moreover the receiving side (file server) certainly behaved like it never saw them.
After much head scratching, I was unable to determine what the problem was. In fact I saw the same behavior (incredibly slow writes) on physical DOS machines. That excludes the possibility of a virtualization/emulation bug.
In desperation, I tried the universal TCP/IP network boot disk, only to be shocked to discover that in the exact same VM, it’s not slow at all. It did not take me long to determine that the difference is the full
vs. basic
redirector. The full
redirector is meant to have better performance, but something is going badly wrong in certain configurations.
That makes the problem even more mysterious, if anything: The environment is the same, the emulation is the same, the DOS network driver is the same, the TCP/IP stack is the same, but one redirector somehow fails to send packets every now and then and the other does not.
The issue also illustrates just how sensitive TCP/IP is to packet loss. Losing just one out of every 1,000 packets cuts the theoretical maximum TCP throughput by a factor of ten. In this scenario, the packet loss was much worse and the bandwidth was cut by a factor of several hundred.
Final Numbers
To summarize what I did or didn’t do with the three different DOS clients:
- IBM DLS 4.0: gave up
- IBM DLS 5.0: sped up reads by switching from protected-mode to full redirector
- MS Client 3.11: vastly sped up writes by switching from full to basic redirector
Omitting the DLS 4.0 client, here are the updated numbers for the same 42 MB file copy:
Server to DOS client | DOS client to server | |
MS Net 3.11 + MS TCP/IP 1.0a | 3.9s | 3.8s |
IBM DLS 5.0 + NTS TCP/IP 3.18 | 3.5s | 3.9s |
The DLS 5.0 performance went up at the cost of higher conventional memory usage. To be fair, for normal use even the protected-mode redirector is very usable, but for moving a lot of data to the DOS client side, the full redirector is roughly 5 times faster.
The Microsoft Network Client 3.11 had trouble with the opposite direction and the performance with the full redirector when moving data from the DOS client to a file server was just atrocious. The basic redirector has fewer features, but most users are unlikely to notice, and the performance differential is huge.
After switching the redirector types, the IBM and Microsoft redirectors with very different underlying TCP/IP stacks perform almost identically.
Yes, networking is complicated.
from Hacker News https://ift.tt/3u6YfmO
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.