Monday, October 21, 2013

Unity Connection Design Guide Drops 90ms from Latency Budget

This is just a quick note concerning a documentation change in the Unity Connection Design Guides. A few days ago one of my teammates was working on a Unity Connection design for a customer and came across a change in the network requirements for clustering Unity Connection over the WAN. Apparently, Cisco modified the clustering requirements section in the Unity Connection Release 8x and Release 9x design guides. 

The bandwidth requirements remain the same as they were when 8x was released. However, the round trip time (RTT) requirements have changed from 150 ms to 60 ms. Which is quite a big jump. It looks like this change was made on August 29, 2013.

This is more than a little disconcerting because I was looking at the design guide in July and made design recommendations based on the previous RTT requirement. Fortunately, my customer's network can still accommodate the updated budget. But what if that wasn't the case and I had to make a major design change in the middle of the project? Or worse, what if I didn't go back to re-read the design guide and the customer ran into an issue? 

Basically, I would have been screwed. Again, I am fortunate that we are still within budget and that I generally pick the lowest common denominator for clustering over the WAN designs. Which, prior to August 29, 2013, was 80 ms RTT for CUCM clustering. Interestingly enough, UCCX clustering also has a 80 ms RTT budget. It is odd (to me) that Unity would drop below the 80 ms threshold. 

Last point of interest. The UC 9x SRND still states that the maximum Round Trip Time (RTT) budget is 150 ms. That is probably still there to keep things interesting for the operators in the field! Obviously, it is best to err on the side of caution and assume the SRND recommendation is no longer valid.

I tried to find more information to see why the sudden drop in RTT budget. I suspect there may be a defect or some other revelation. If anyone has more information please post a comment. I am genuinely curious.


Thanks for reading. If you have time, post a comment!

5 comments:

  1. Response I got from Cisco:

    "There are two different requirements based on how you deploy and use a Unity Connection HA pair separated between data centers.

    Active / Active HA Model – When call load and web traffic is being sent to both servers the latency requirement is 60ms.

    Publisher / Subscriber Active / Standby Model – When call load and web traffic is only sent to the Publisher under normal conditions but in a down / DR / maintenance scenario, call load can be re-routed to the Subscriber the latency requirement is 150ms.

    The lower latency limit for Active/Active is due to DB polling that occurs when requests hit the Sub, ensuring it has the most current information, which under high latency conditions can stack up and cause conversation delays. We are looking in to fixing this limitation but due to some complexities it is not a near term item on our road map."

    They are supposed to be updating the SRND soon to reflect the distinction.

    ReplyDelete
  2. Matt,

    Thanks for the post and for the background information.

    -Bill (@ucguerrilla)

    ReplyDelete
  3. Matt,

    CUC ver 8 also had Active/Active but with a more lenient 150ms requirement:

    http://www.cisco.com/en/US/docs/voice_ip_comm/connection/8x/design/guide/8xcucdg060.pdf

    I would still be interested to see why this change for Active/Active now. I have posted this on the Cisco communities as well to see if maybe (hopefully!) this is a documentation discrepancy rather than a requirement change.

    ReplyDelete
    Replies
    1. Actually, the PDF referenced says 60 ms. I do have an offline copy of that design guide (and others) where the stated RTT budget is 150 ms. I am fairly certain they are going to impose 60 ms for Active/Active clustering over the WAN. It would be nice if they could get it up to at least 80 ms.

      What is interesting is that there are plenty of customers with Active/Active running over the WAN and following the previously accepted RTT budget. I have a few myself and am going to check with them to see if they have had any issues.

      -Bill

      Delete
  4. Interesting post! Thanks much! I looked into the 9.x guide and saw only 60ms, but not 150 ms.

    ReplyDelete