Sunday, January 13, 2013

The Bastardization of 503

I've been noticing a trend with some SIP carriers and it troubles me. They've begun to configure their gateways/SBC's to use the "503 Service Unavailable" response code at their discretion. As a whim, if you will. The RFC states that, "The server is temporarily unable to process the request due to a  temporary overloading or maintenance of the server." And, in my mind, we should accept that as the literal meaning, that there is an actual network problem. We measure the volume of such response codes so that we can get fair warning when something is acting up. And it is through those measurements that we've come to know that the 503 has been bastardized.

After pursuing with at least a couple different carriers on the topic we've found that they return the 503 when their routes (presumably their cheapest routes) are full and they send the 503 because, also according to the RFC, "a client (proxy or UAC) receiving a 503 (Service Unavailable) SHOULD attempt to forward the request to an alternate server." And it does. So, the offending carrier believes, no harm done. And technically they're right. The call attempts the next route and completes. No biggie, right?

Not so much.

In this scenario, [1] we've overflowed to a less preferred route (if the second route were preferred it would've been the first choice) and [2] it completely bastardizes the meaning of the 503 because it now means "Service Unavailable" but it also means "Go Away We Don't Want That Particular Call At Least Not Now." In my mind, this is the bigger of the two problems.

So what should these carriers do? It would seem there is a shortcoming in the SIP protocol. If they were to return a "480 Temporarily Unavailable" or a "486 Busy Here" then the client would likely return a treatment to the caller, which is not what we want. To reliably bounce the call to another route we typically would want a 5xx response, but none seem to fit the bill. What we need is a "5xx Temporarily Busy" or a "5xx Not Now I Have A Headache" sort of response code.

1 comment:

Peter Eisengrein said...

It's 2019 and I still feel strongly about this. How are we supposed to tell the difference between a service provider having a problem from one that does a poor job managing capacity?