Saturday, April 24, 2021

Common interface for Linux NIC statistics

In Linux 5.13 ethtool gains an interface for querying IEEE and IETF statistics. This removes the need to parse vendor specific strings in ethtool -S.

Status quo

Linux has two sources of NIC statistics, the common interface stats (which show up in ifconfig, ip link, sysfs and few other places) and ethtool -S. The former – common interface stats – are a mix of basic info (packets, bytes, drops, errors in each direction) and a handful of lower level stats like CRC errors, framing errors, collisions or FIFO errors. Many of these statistics became either irrelevant (collisions) or semantically unclear (FIFO errors) in modern NICs.

This is why deployments increasingly depend on ethtool -S statistics for error tracking. ethtool -S is a free form list of stats provided by the driver. It started out as a place for drivers to report custom, implementation specific stats, but ended up also serving as a reporting place for new statistics as the networking standards developed.

Sadly there is no commonality in how vendors name their ethtool statistics. The spelling and abbreviation of IEEE stats always differ, sometimes the names chosen do not resemble the standard names at all (reportedly because vendors consider those names “too confusing” for the users). This forces infrastructure teams to maintain translations and custom per-vendor logic to scrape ethtool -S output.

What changed

Starting with Linux 5.6 Michal Kubecek has been progressively porting ethtool from ioctls to a more structured and extensible netlink interface. Thanks to that we can now augment the old commands to carry statistics. When user specifies -I | --include-statistics on the command line (or the appropriate flag in netlink) kernel will include relevant statistics in its response, e.g. for flow control:

 # ethtool -I -a eth0
 Pause parameters for eth0:
 Autonegotiate:    off
 RX:        off
 TX:        on
 Statistics:
   tx_pause_frames: 25545561
   rx_pause_frames: 0

General statistics such as PHY and MAC counters are now available via ethtool -S under standard-based names though a new --groups switch, e.g.:

 # ethtool -S eth0 --groups eth-mac
 Standard stats for eth0:
 eth-mac-FramesTransmittedOK: 902623288966
 eth-mac-FramesReceivedOK: 28727667047
 eth-mac-FrameCheckSequenceErrors: 1
 eth-mac-AlignmentErrors: 0
 eth-mac-OutOfRangeLengthField: 0

Each of the commands supports JSON-formatted output for ease of parsing (--json).

So little, so late

Admittedly the new interface is quite basic. It mostly includes statistics provided in IEEE or IETF standards, and NICs may report more interesting data. There is also no metadata about “freshness” of the stats here, or filtering built into the interface.

The starting point is based on fulfilling immediate needs. We hope the interfaces will be extended as needed. Statistics can be made arbitrarily complex, so after a couple false-starts with complex interfaces we decided to let the use cases drive the interface.

It’s also very useful to lean on the standards for clear definition of the semantics. Going forward we can work with vendors on codifying the definitions of other counters they have.

List of currently supported stats

IEEE 802.3 attributes::

 30.3.2.1.5 aSymbolErrorDuringCarrier
 30.3.1.1.2 aFramesTransmittedOK
 30.3.1.1.3 aSingleCollisionFrames
 30.3.1.1.4 aMultipleCollisionFrames
 30.3.1.1.5 aFramesReceivedOK
 30.3.1.1.6 aFrameCheckSequenceErrors
 30.3.1.1.7 aAlignmentErrors
 30.3.1.1.8 aOctetsTransmittedOK
 30.3.1.1.9 aFramesWithDeferredXmissions
 30.3.1.1.10 aLateCollisions
 30.3.1.1.11 aFramesAbortedDueToXSColls
 30.3.1.1.12 aFramesLostDueToIntMACXmitError
 30.3.1.1.13 aCarrierSenseErrors
 30.3.1.1.14 aOctetsReceivedOK
 30.3.1.1.15 aFramesLostDueToIntMACRcvError
 
 30.3.1.1.18 aMulticastFramesXmittedOK
 30.3.1.1.19 aBroadcastFramesXmittedOK
 30.3.1.1.20 aFramesWithExcessiveDeferral
 30.3.1.1.21 aMulticastFramesReceivedOK
 30.3.1.1.22 aBroadcastFramesReceivedOK
 30.3.1.1.23 aInRangeLengthErrors
 30.3.1.1.24 aOutOfRangeLengthField
 30.3.1.1.25 aFrameTooLongErrors

 30.3.3.3 aMACControlFramesTransmitted
 30.3.3.4 aMACControlFramesReceived
 30.3.3.5 aUnsupportedOpcodesReceived
 
 30.3.4.2 aPAUSEMACCtrlFramesTransmitted
 30.3.4.3 aPAUSEMACCtrlFramesReceived

 30.5.1.1.17 aFECCorrectedBlocks
 30.5.1.1.18 aFECUncorrectableBlocks

IETF RMON (RFC 2819)

 etherStatsUndersizePkts
 etherStatsOversizePkts
 etherStatsFragments
 etherStatsJabbers

 etherStatsPkts64Octets
 etherStatsPkts65to127Octets
 etherStatsPkts128to255Octets
 etherStatsPkts256to511Octets
 etherStatsPkts512to1023Octets
 etherStatsPkts1024to1518Octets
 (incl. further stats for jumbo MTUs)

Kernel side changes: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=8203c7ce4ef2840929d38b447b4ccd384727f92b



from Hacker News https://ift.tt/2PosLKC

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.