Some technical notes from IETF 108 by Kannan Jayaraman
The IETF 108 meetings happened between July 27 - 31. There were some very interesting presentations in various working groups. This is an attempt to document the highlights of the proceedings of some of the working group meetings and some observations on key work ongoing in these WGs. This does not capture all WGs but those that the author attended. Also it is only an overview of some of the ongoing work based on the author’s understanding, for more details please see the respective IETF WG pages.
I enjoyed my first IETF virtual event. Every day had some interesting presentations and we could quickly switch from one meeting to another if we wanted to. The attendance was good and the timings also worked for us in India. The downside was not being able to meet the folks in person. I did not try the gather.town tool as well where we could have got a chance to meet with some folks. One observation is that I felt we did not have sufficient folks attending from India. Hopefully we will have more participation the next time. Attending IETF meetings and keeping abreast of working group discussions are a great way to track developments in our respective technology domains. In this context, I felt I learnt quite a few things from the IETF 108 meetings.
I look forward to the next meeting and hopefully will be able to attend IETF 110 in person next year!
1. Segment Routing (Spring WG)
This was one of the first meetings and there was a presentation of some drafts on the SR policy architecture , service programming and Seamless SR. In addition there was a couple of interesting topics:
a) Formation of design team to review the approaches to SRv6 header compression. There are two approaches to SRv6 , one lead by Cisco (SRv6) and the other (SRm6) by Juniper. There is a focus on compressing the SRH to make its size compact given that the header size can become large if there are many segments as each Segment is 128 bits.
Two key drafts have been proposed to handle this :
- Compressed SRv6 Segment List Encoding in SRH
- Compact routing header (CRH).
The design team is to produce recommendations to the WG on two topics:
What are the requirements for solutions to compressing segment routing information for use over IPv6
A comparison of proposed approaches to compressing segment routing information for use over IPv6
It will be interesting to follow this work and see which way it is headed. A draft on the above will be published by the design team soon.
b) There was a proposal to use TWAMP / STAMP protocols for SR OAM.
The IETF Two-Way Active Measurement Protocol (TWAMP) defines a standard for measuring round-trip network performance between any two devices that support the TWAMP protocols.
The Simple Two-way Active Measurement Protocol (STAMP) enables the measurement of both one-way and round-trip performance metrics like delay, delay variation, and packet loss.
However this was debated intensely since both TWAMP/STAMP do not have a mechanism for liveness detection. It was proposed that this should be discussed with the IPPM WG and then taken forward.
2. IRTF - Measuring IP liveness
There was an interesting talk on Internet host scanning - measuring which hosts on the Internet are “live”. e.g Does a target IP address respond to a probe packet ? This was presented by a person from Facebook. This has some applications such as :
• Address space utilization
• Host reachability
• Topology discovery
• Service availability
How can you determine whether or not an IP address is in use?
A simple approach would be to ping every IP to check if there is a response.
However this will not work as a number of systems do not respond to ICMP requests.
So the key points are :
- What type of probe packets should we send if we, want to maximize the responding host population?
- What type of responses can we expect and which factors determine such responses?
- What degree of consistency can we expect when probing the same host with different probe packets
According to this research , ICMP Echo probes are most effective in discovering network active IPs (79% of all IPs), followed by TCP probes. 16% of all IPs can only exclusively be discovered via TCP, 2% can only be discovered via UDP probes. Only 24% of active hosts respond to probe packets on all five ports (potentially due to firewalling and/or filtering)
The results also depended from where the probes were originated.
The presentation begins by building a taxonomy of liveness, describing the process they use to determine if an address is in use or not. Methodology for systematically inferring IP liveness by performing Internet-wide scans concurrently across a set of different protocols at various layers (ICMP, TCP, UDP)
The RIPE Atlas project also does some work on Internet measurement. the RIPE NCC is building the largest Internet measurement network ever made. RIPE Atlas employs a global network of probes that measure Internet connectivity and reachability, providing an unprecedented understanding of the state of the Internet in real time.
Will be interesting to see how the two approaches compare with each other.
3. RATS (Remote Attestation Procedures)
Summary of the work :
Remote attestation is a method by which a client authenticates it's hardware and software configuration to a remote system (server). The goal of remote attestation is to enable a remote system (challenger) to determine the level of trust in the integrity of platform of another system (attestator).
Remote device attestation is a fundamental service that allows a remote device such as a mobile phone, an Internet-of-Things (IoT) device, or other endpoint to prove itself to a relying party, a server or a service. This allows the relying party to know some characteristics about the device and decide whether it trusts the device.
One way this is achieved is via the Entity Attestation Token (EAT). An Entity Attestation Token (EAT) provides a signed (attested) set of claims that describe state and characteristics of an entity, typically a device like a phone or an IoT device. These claims are used by a relying party to determine how much it wishes to trust the entity.
Trusted path routing
One of the proposals in this WG is trusted path routing. Below is an overview as described in the draft :
There are end-users who believe encryption technologies like IPSec alone are insufficient to protect the confidentiality of their highly sensitive traffic flows. This specification describes two alternatives for protecting these sensitive flows as they transit a network. In both alternatives, protection is accomplished by forwarding sensitive flows across network devices currently appraised as trustworthy.
The key aspects of this are :
- Trusted Topology dynamically maintained based on Attestation Results
- Sensitive flows bypass insecure / potentially compromised network devices
- Link adjacency added to Trusted Topology based on latest appraisal
- Existing routing protocol distribution
Currently we are told there is a prototype to demo trusted path routing Using SR flex Algo for IS-IS. It will be interesting to see what (if any) Routing protocol enhancements may be needed to enable this.
4. DC routing ( RIFT , LSVR )
RIFT is one of the approaches to building a DC fabric. It is a hybrid of both link-state and distance-vector protocols, and provides several benefits for IP fabrics, such as ease of management, There was an interesting discussion in the RIFT WG on positive and negative route advertisement and why the latter is needed in RIFT. One of the challenges with CLOS topologies is that route aggregation can cause traffic to black hole when there are failures.
Automatic disaggregation is one of the most interesting innovations in the RIFT protocol. In most existing protocols (BGP, OSPF, ISIS) disaggregation requires extensive manual configuration. In RIFT, by contrast, disaggregation is fully automatic without the need for any configuration. In most existing protocols, disaggregation is an optional and manual feature that is mainly used for traffic engineering. RIFT, on the other hand, relies on disaggregation as an essential feature to recover from failures. In the absence of a failure, RIFT routers only have a default route for the north-bound direction. When a failure occurs, RIFT automatically triggers necessary disaggregation to install more specific north-bound routes to route traffic around the failure.
There are actually two flavors of disaggregation in RIFT:
Positive disaggregation is used to deal with most types of failures. It works by the repair path “attracting” traffic from the broken path by advertising more specific routes. Positive disaggregation in RIFT works very similar to how disaggregation works in existing protocols, except that it is triggered automatically instead of configured manually.
Negative disaggregation is used to deal with a very particular type of failure that only occurs in large data centers, so-called multi-plane fat trees. Negative disaggregation works by the broken path “repelling” traffic towards the repair path by advertising so-called negative routes. Negative disaggregation uses completely new mechanisms that do not have an equivalent in any widely deployed existing protocols.
This was presented by Bruno Rijsman and here is a good description of how RIFT works: https://hikingandcoding.wordpress.com/2020/07/22/rift-disaggregation/
In Massive Data Centers, BGP-SPF and similar routing protocols are used to build topology and reachability databases. These protocols need to discover IP Layer 3 attributes of links, such as logical link IP encapsulation abilities, IP neighbor address discovery, and link liveness. This Layer 3 Discovery and Liveness protocol collects these data, which may then be disseminated using BGP-SPF and similar protocols.
Here is the draft: https://tools.ietf.org/html/draft-ietf-lsvr-l3dl-06
There is also work to setup auto BGP sessions in large scale networks such as Cloud data centers and enterprise networks where manual configuration will be cumbersome. One approach is to use LLDP for neighbor discovery. More details can be found in the IDR working group.
5. Transport WG ( L4S ) - Rethinking Congestion control for the Internet
There was a set of interesting presentations on L4S which seeks to work on new ways of controlling queuing delay in the network.
Past work on scalable congestion control includes proposals such as Data Center TCP(DCTCP). This is however implemented only in a controlled environment such as A data center and not in the public internet.
As the DCTCP draft explains…
Network traffic in a data center is often a mix of short and long flows (aka mice and elephant flows) , where the short flows require low latencies and the long flows require high throughputs. Data centers also experience incast bursts, where many servers send traffic to a single server at the same time.
These factors place some conflicting demands on the queue occupancy of a switch:
The queue must be short enough that it does not impose excessive latency on short flows.
The queue must be long enough to buffer sufficient data for the long flows to saturate the path capacity.
The queue must be long enough to absorb incast bursts without excessive packet loss.
Standard TCP congestion control relies on packet loss to detect congestion. This does not meet the demands described above. First, short flows will start to experience unacceptable latencies before packet loss occurs. Second, by the time TCP congestion control kicks in on the senders, most of the incast burst has already been dropped.
RFC3168 describes a mechanism for using Explicit Congestion Notification (ECN) from the switches for detection of congestion. However, this method only detects the presence of congestion, not its extent. In the presence of mild congestion, the TCP congestion
window is reduced too aggressively, and this unnecessarily reduces the throughput of long flows.
Data Center TCP changes traditional ECN processing by estimating the fraction of bytes that encounter congestion rather than simply detecting that some congestion has occurred. DCTCP then scales the TCP congestion window based on this estimate. This method
achieves high-burst tolerance, low latency, and high throughput with shallow-buffered switches. DCTCP is a modification to the processing of ECN by a conventional TCP and requires that standard TCP congestion control be used for handling packet loss.
The root of the problem is the presence of standard TCP congestion control. It has been demonstrated that if the sending host replaces a Classic congestion control with a 'Scalable' alternative, when a suitable AQM is deployed in the network the performance under load of all the above interactive applications can be significantly improved.
The L4S architecture, enables Internet applications to achieve Low Latency, Low Loss, and Scalable throughput (L4S). The insight on which L4S is based is that the root cause of queuing delay is in the congestion controllers of senders, not in the queue itself. The L4S architecture is intended to enable all Internet applications to transition away from congestion control algorithms that cause queuing delay, to a new class of congestion controls that utilize explicit congestion signaling provided by the network. This new class of congestion control can provide low latency for capacity-seeking flows, so applications can achieve both high bandwidth and low latency.
6. LOOPS BoF. (Local Optimizations on Path Segments)
This is yet to be chartered. It started as a BOF and is now progressing and there are a few drafts. This proposal aims to Enhance the network to enable it to recover from packet loss instead of leaving recovery to the end hosts i.e reduce end-to-end packet loss and recover locally, where needed.
Here is a brief overview of its goals :
Traditional transport protocols (e.g., TCP) respond to packet loss slowly especially in long-haul networks: they either wait for some signal from the receiver to indicate a loss and then retransmit from the sender or rely on sender's timeout which is often quite long.
LOOPS (Local Optimizations on Path Segments) is a network-assisted performance enhancement over path segment and it aims to provide local in-network recovery to achieve better data delivery by making packet loss recovery faster. LOOPS can be used with tunneling protocols to efficiently recover lost packets on a single segment of an end-to-end path instead of leaving recovery to the end-to-end protocol, traversing the entire path.
In a nutshell the way this works within a path segment is by the LOOP ingress and LOOP egress nodes using some signaling to detect packet loss such as by acking the
packet reception. The ingress will retransmit when there is no ack from the egress node.
There are some challenges however such as disrupting the current way the internet works:
concealing losses removes important congestion signal that today’s transport protocol rely on
End-hosts would ramp up to higher rates, increase congestion
Work is still in early stage and there are a couple of drafts on problem statement and use cases.
More details and slides for all sessions can be found at https://datatracker.ietf.org/meeting/materials/
About the Author Kannan Jayaraman has been working in the Networking Industry for the last 25 years as an engineering leader . His interests are primarily IP Routing and Switching technologies where he has spent a lot of time in his career. He keeps abreast of Industry trends and new
technologies (by reading various blogs) and also standards bodies especially IETF where he follows working groups such as SPRING , IDR and LSR.