Latency Tips

If you have a WAN, then one very important concern should be latency. Latency, in this case, is the time that a package of information takes to reach the other end of the slow link. This package of information could be a DNS query, ping, file, or a transaction in a client/server application. Notice that the package of information has not fully arrived at the destination until all bits have arrived. It is often tempting to assume that the only variable here is the bandwidth. It seems logical that the fatter the pipe, the quicker the information package will arrive at its destination. For a large file, latency doesn’t play much of a part; however, in the case of small packages like a DNS query, ping, or a transaction, latency can kill your performance. It is very important to distinguish between bandwidth and latency.

Bandwidth is the capacity of the link to transfer quantities of information in a given amount of time. A 128k link can transfer roughly 13 k bytes of information in one second. This can vary depending on compression and other factors. For a 13 meg file, the total time to transfer the file across the link would be 1000 seconds. This doesn’t vary that much between different kinds of slow links: frame, ppp, ISDN, etc. This gets interesting, though, when you consider that there are many packages of information that are relatively small. Different slow links have different latency characteristics. Further, as these slow links get loaded down with many different kinds of network traffic, latency can go up quickly.

Latency is best seen by a simple ping test: ping -t -w 5000. This ping syntax is specific to MS and will instruct ping to ping continuously and wait 5000 mS before it times out.

Here is a good example of a latent link:

64 bytes from 204.146.18.33: icmp_seq=53 ttl=243 time=2550.4 ms
64 bytes from 204.146.18.33: icmp_seq=54 ttl=243 time=2309.8 ms
64 bytes from 204.146.18.33: icmp_seq=55 ttl=243 time=2219.0 ms
64 bytes from 204.146.18.33: icmp_seq=56 ttl=243 time=3639.9 ms
64 bytes from 204.146.18.33: icmp_seq=57 ttl=243 time=3550.4 ms
64 bytes from 204.146.18.33: icmp_seq=58 ttl=243 time=3450.0 ms
64 bytes from 204.146.18.33: icmp_seq=59 ttl=243 time=3229.9 ms
64 bytes from 204.146.18.33: icmp_seq=60 ttl=243 time=3139.9 ms
64 bytes from 204.146.18.33: icmp_seq=61 ttl=243 time=3039.9 ms
64 bytes from 204.146.18.33: icmp_seq=62 ttl=243 time=2809.8 ms
64 bytes from 204.146.18.33: icmp_seq=63 ttl=243 time=3019.9 ms
64 bytes from 204.146.18.33: icmp_seq=64 ttl=243 time=3500.3 ms
64 bytes from 204.146.18.33: icmp_seq=65 ttl=243 time=4380.5 ms
64 bytes from 204.146.18.33: icmp_seq=66 ttl=243 time=4309.8 ms
64 bytes from 204.146.18.33: icmp_seq=67 ttl=243 time=3869.1 ms

In this example, if a DNS query was made by an application that was sharing the same slow link, and the DNS server was on the other side of the slow link, then every single name resolution request would take from 2-4 seconds. The same would go for client/server apps. A typical client/server app can make numerous individual requests for one screen of data. If each request takes four seconds, it is possible that it might take a minute to refresh one screen. If this is compounded by name resolution and authentication latency over the slow link, the network might very well be unusable, despite the fact that the bandwidth is sufficient.

There are several ways to combat latency: application design, network infrastructure, and traffic prioritizing. Application design is one of the more difficult battles. Minimizing the quantity of transactions performed with a client/server app can help significantly. Often, though, it is impractical for an IT shop to change the way a client/server app is written. Citrix and Terminal server can help with this, because the numerous transactions can be performed across a fast link. ICA or RDP traffic (the protocols that Citrix and Terminal server use) are relatively well behaved and work better across slow links that are prone to latency issues.

Network infrastructure design can also help latency issues. One easy change to make is name resolution. Where on your network are name requests resolved? Where is your WINS server? Where is your DNS server? What is the primary means of name resolution for the applications you run? If your DNS server is on the other side of a slow link, this could cause some major delays for your applications. You might want to seriously consider a local DNS server that is configured as a secondary server. The administration can still be done centrally, and the zone transfers don’t take much work to administer. Just set up the servers once as secondaries, and they will merrily suck down the zones automatically. Put the local (secondary) DNS server in as the primary entry at the client, and put the central DNS server in as the secondary client entry.

If you have a local WINS server that replicates with your central WINS server, you will not experience any latency issues with netBIOS name resolution. This is complicated with WINS, though, because WINS replication for large enterprises can easily puke. If you are tempted to implement a central WINS server to alleviate replication/WINS corruption issues, then beware of latency! Another issue is how your clients authenticate. NT authentication across a slow/latent link can be painfully slow and is aggravated by a latent connection to the WINS database. Consider a local BDC in addition to a local WINS server. One configuration we have speculated might work is using m-node instead of h-node. This way local clients use broadcasts first, but the client still registers with a central WINS server so you can centrally manage all clients. The general wisdom (default!!) is h-node. With m-node it might be possible to dodge latency issues but also avoid mondo WINS replication schemes and associated database corruption. If anybody out there successfully implements m-node with a central WINS server on a large enterprise network, let us know!

Check out Packeteer and browse around their site. If you have Cisco equipment, there are several different ways to prioritize traffic with Cisco products. Here is a nice RFC on the issue RFC2475 : http://www.landfield.com/rfcs/rfc2475.html

Although bandwidth and latency are separate issues, when you use up bandwidth, latency gets worse. The packets queue behind the routers, etc. By prioritizing traffic, you can make sure that somebody downloading the latest Star Wars preview won’t bog down your corporate business systems users. Corporate policies are also another way to prioritize traffic. Tell your users to refrain from downloading those Star Wars previews on the company’s equipment. This is a tricky area. You might have better luck asking them to please, please, download these kind of items after hours. If you have multiple Exchange servers, you should be very conscious about exactly what happens when multiple emails with large attachments are transferred between MTAs. As the messages queue up, the MTA will open more connections and further degrade the slow link.

One final note on latency. Check with your WAN service provider and what kind of service level agreements they have and what kind of documentation they require before they consider latency problems an issue. Ask them specifically: “If I buy this 128k link, and I use up three fourths of the bandwidth, what is the guaranteed turnaround time on a 256 byte ping?” The latency specifications for different WAN providers varies quite a bit. And, don’t believe them, even if you do get it in writing. Test the link with your applications before you deploy. See what it takes to kill the link. Download IE 5 on three workstations and then try and use your application. Test, test, test. The wrong time to discover latency issues is after you are fully deployed. 🙂

Information

About