Monday, February 8, 2016

Cisco Mobile and Remote Access Troubleshooting Basic Connectivity

The Cisco Mobile and Remote Access (MRA) feature is a "client edge" solution that allows external software and hardware clients to register to enterprise Cisco Unified Communication (UC) solutions without requiring a VPN. Like most things, there are a lot of moving parts working together to create a relatively seamless user experience. And, like most things, the first time you deploy MRA there are a few "gotchas" that can eat up a significant amount of troubleshooting time. 

This blog entry captures procedures I use when troubleshooting or validating a MRA deployment. These procedures can be used to validate the initial deployment or they can be used to troubleshoot connectivity problems for an individual user.

Background

Proper troubleshooting technique requires that you have a thorough understanding of how things should work during normal operations. I presented on the MRA registration process during a NetCraftsmen Cisco Mid-Atlantic User Group (CMUG) meeting last year. If the reader needs a review of the architecture with a walk through of the Jabber client discovery and registration process then a PDF of that presentation is available here: http://bit.ly/1BUHL4f

At a high-level, the MRA registration process follows this flow:
  1. Service Discovery
  2. Service Provisioning
  3. XMPP Registration
  4. SIP Registration
  5. Establish Visual Voicemail connectivity
This blog entry is focused on a scenario where we are using corporate presence services and UCM for call control. We are also roughly following the sequence of transactions that are actually used by a Jabber client. Procedures were originally developed with the 10.x version of Jabber running on Mac OS X and Windows. 

Overview of Process
Service Discovery

Upon initialization, the Jabber client enters into a "Service Discovery" mode. At this stage, the client is trying to determine if it is inside the corporate network or outside of the network. The mechanism that it is used is DNS. Specifically, the Jabber client will query for specific DNS service records (SRV record) based on the assigned service domain.

The service domain is derived from the Jabber ID (JID) assigned to the end user. For example, if my JID is bill@company.com then the service domain is company.com. The service domain is usually specified by the user the first time they attempt to log into Jabber. Though, the service domain can also be administratively assigned in the jabber-config.xml file. 

Once the service domain is known, the Jabber client will go through the sub-process of Service Discovery. This starts with DNS SRV queries for:

_cisco-uds._tcp.company.com  : Points to the UDS service on a UCM cluster
_cuplogin._tcp.company.com : Legacy record that points to XMPP service on IM&P service 

In a MRA scenario, where a client is outside of the corporate network, the above SRV records should not be resolvable. If the client fails to receive a positive response to the UDS/cuplogin queries, it will then send a DNS SRV query is for _collab-edge._tls.company.com. In a properly implemented solution, this query should return one or more records that point to your Expressway Edge (or VCS-E) cluster.

Assuming everything is configured correctly and fully operational, the Jabber client will attempt to establish a TLS connection to one of your Edge appliances.


Service Provisioning

Once the client establishes a TLS connection to port 8443 on the Edge appliance, the user credentials are authenticated. At this point, the proxy connection is established and the client will start downloading configuration information from the UCM cluster. This configuration information is used to complete the service registration phases.


XMPP Registration

If the Jabber client is provisioned for IM&P presence services, the client will attempt to establish a connection on TCP port 5222. Registration requests are sent to the Edge appliance, which then proxies the transaction through the Core appliance to the IM&P cluster node(s).

SIP Registration

If the Jabber client is provisioned as a voice/video soft phone, the client will attempt to establish a connection on TCP port 5061. Registration requests are sent to the Edge appliance, which is then proxied through the Core appliance to the UCM cluster node(s). Successful registration is required for voice/video call functionality.

Visual Voicemail

If the Jabber client is provisioned with visual voicemail, the Jabber client will submit registration requests to the Edge appliance using the already established TLS connection on port 8443. The Edge appliance proxies the request through the Core to the REST API on Unity Connection.

Troubleshooting MRA Initialization Process

All of these procedures are performed from the client perspective.  

Service Discovery

This step is fairly straightforward. We need to determine if the client can resolve the proper DNS SRV records. Using dig or nslookup, verify that the client can resolve the collaboration edge SRV records. For example:

DIG

dig srv _collab-edge._tls.company.com

NSLOOKUP

nslookup -type=srv _collab-edge._tls.company.com

It is also a good idea to verify that the client is unable to resolve the UDS and cuplogin SRV records. 

If the client can resolve the UDS records then the Jabber client will never attempt to connect to the Edge. If the client receives a positive response to the UDS query and/or the client fails to receive a positive response to the Edge discovery then review your external DNS configuration. 


Service Provisioning

This troubleshooting step is a little more involved. The web-based API on the Edge appliance uses API calls comprised of Base64 values. Therefore, you need a way to generate Base64 values (such as openssl). The API calls also are built using specific application hostnames in your environment. So, you will need to have that information handy.

Let's start with the base64 conversion process. On a Mac OSX system, you can use openssl to generate the base64 string. For example:

echo -n 'human readable string' | openssl base64

For Windows users, there are several tools that you can download and install. That is a pain, so you may be better off using this online encoder: https://www.base64encode.org/

Now that we have a method to create the base64 values for the API calls, we'll need to compile a set of values that we can use for our testing. Specifically, we'll need to create base64 values for the following strings:

String 1: company.com
String 2: company.com/https/ucmpub.company.com/8443
String 3: company.com/http/ucmtftp.company.com/6970

Where:
  • company.com : is the service domain
  • ucmpub.company.com : is the UCM publisher node (for UDS calls)
  • ucmtftp.company.com : is one of your TFTP service nodes (for service provisioning)

Basic TLS Connection

OK, we are now armed with almost all of the tools we need. The last tool you want to use is a web browser. I use Google Chrome (tested with version 45.0.2454.101). The first step is to confirm that we can establish the basic TLS connection and can authenticate the user through the Edge appliance. 

Sticking with our example, the base64 value for String 1 (above) is PW4gY29tcGFueS5jb20K. Now, assume that the Edge appliance hostname is expway-e.company.com. Armed with this information, we can use our web browser to go to the following URL:

https://expway-e.company.com:8443/PW4gY29tcGFueS5jb20K/get_edge_config?service_name=_cisco-uds&service_name=_cuplogin

If everything is provisioned correctly, your browser should render a login window where you will enter the Jabber user ID and assigned password. Similar to the following.



After a successful login, the browser window should render the XML content that is returned from the Edge appliance. For example:



At this point, we are testing a few things:


  1. If you are not prompted for a login, receive a connection error, or the connection times out then you most likely have a firewall configuration issue (blocking port 8443).
  2. You should check to see if your browser prompts you with certificate errors or warnings. If there are cert errors then your Jabber client may not be able to complete the service provisioning phase. You should check the certificate on the Edge appliance and verify it is signed by a CA that is in the local client trust store.
  3. Your browser should render the complete XML response that identifies your _cuplogin, _cisco-uds, tftpserver, SIP Edge, and XMPP Edge services. If you don't get a response then you have an issue between the Core and Edge OR your Core is unable to resolve the proper DNS records.

A clue is revealed in item 3. The internal DNS SRV records we identified during the Service Discovery phase are used by the Core appliance to do its job. So, if you have misconfigurations on your internal DNS, errors will be seen in the XML response above.


Verify UDS Discovery

The next step in troubleshooting is to verify that your client can communicate with the UDS service on your UCM cluster. To do this we use the base64 value of String 2 (above). Using our example: 

Y29tcGFueS5jb20vaHR0cHMvdWNtcHViLmNvbXBhbnkuY29tLzg0NDM=

The URL we are going to test is:

https://expway-e.company.com:8443/Y29tcGFueS5jb20vaHR0cHMvdWNtcHViLmNvbXBhbnkuY29tLzg0NDM=/cucm-uds/clusterUser?username=bill

As with the previous test, a successful transaction will render an XML response. If you are running this after you completed the basic TLS connection then you won't be challenged for authentication credentials. 

If you are receiving a response then UDS is operational. If not then the UDS service on the UCM may be experiencing a problem.

You can also test querying a list of UCM UDS servers:

https://expway-e.company.com:8443/Y29tcGFueS5jb20vaHR0cHMvdWNtcHViLmNvbXBhbnkuY29tLzg0NDM=/cucm-uds/servers


Verify TFTP Configurations

If the previous validation procedures are successful then you have determined that the Jabber client can communicate to the Edge appliance for the purposes of Service Provisioning. Certificates are validated, credentials are validated, and basic UDS functionality is confirmed. 

The next step that a Jabber client would take is to to identify device configurations. As with standard telephony devices, the Cisco TFTP service has configuration files that Jabber can download to retrieve device specifications.

To do this test we use the base64 value of String 2 (above). The URL we will put in our browser to get a list of devices for the Jabber user is:

https://expway-e.company.com:8443/Y29tcGFueS5jb20vaHR0cHMvdWNtcHViLmNvbXBhbnkuY29tLzg0NDM=/cucm-uds/user/bill/devices

You may or may not be prompted to authenticate. If you are authenticated then enter the same Jabber user credentials as before. If all goes well, then you will receive a list of devices that are associated to the user in UCM (Edit User pages). For example:



Once you have a list of the devices associated with the user, you can then pull the detail configuration for a specific device. The Jabber client (or DX80 or whatever you are using for the Edge registration) will be able to identify which device configuration to retrieve (by device type). To test this yourself, you will need to look at the "name" child node associated to the device identified as a "Cisco Unified Client Services Framework" model identifier. 

To test retrieval of the configuration file via the UCM TFTP service we use the base64 value of String 3 (above). Using our example:


Y29tcGFueS5jb20vaHR0cC91Y210ZnRwLmNvbXBhbnkuY29tLzY5NzA=

The URL we can test with is:

https://expway-e.company.com:8443/Y29tcGFueS5jb20vaHR0cC91Y210ZnRwLmNvbXBhbnkuY29tLzY5NzA=/devicename.cnf.xml

Where "devicename" is the name as provided in the UDS device list query in the previous step. A successful response will provide XML content that provides a complete device configuration file. 

If you get to this point then the Core/Edge proxy function is fully tested and functional. Next, we need to verify service registration.


Service Registration (XMPP and SIP)

We are now done with the funky base64 strings (yay!). To test basic XMPP and SIP connectivity we are going to dumb things down a bit. We can use telnet from a command prompt to verify connectivity to the appropriate ports. 

For example:


galactus-2:utils wjb$ telnet expway-e.company.com 5222
Trying a.b.c.d...
Connected to expway-e.company.com.
Escape character is '^]'.
^]
Connection closed by foreign host.
galactus-2:utils wjb$ telnet expway-e.company.com 5061
Trying a.b.c.d...
Connected to expway-e.company.com.


The fact that we received a "Connected" response means that we were able to connect to the Edge device using port 5222 (XMPP) and port 5061 (SIP/TLS). If you receive a Connection Refused response then you may be running into a firewall issue.
Conclusion

This covers all of the basic connectivity tests that you can use to verify or troubleshoot your MRA implementation. Used in conjunction with event logs and validation tools on the Expressway appliances you should be well on your way to buttoning this up and calling it a day.




Thanks for reading. If you have time, post a comment!

6 comments:

  1. Hey man, I love your SQL stuff! I found and old response of your about running this command:

    admin:run sql select lg.name as LineGroup,n.dnorpattern,dhd.hlog from linegroup as lg inner join linegroupnumplanmap as lgmap on lgmap.fklinegroup=lg.pkid inner join numplan as n on lgmap.fknumplan = n.pkid inner join devicenumplanmap as dmap on dmap.fknumplan = n.pkid inner join device as d on dmap.fkdevice=d.pkid inner join devicehlogdynamic as dhd on dhd.fkdevice=d.pkid order by lg.name

    to get a list of ext of people logged into hlog or not. How would I use this to refine only a certain Line Group though and not everyone on my CUCM?

    ReplyDelete
  2. Try this (replace LineGroup1 with the actual line group name):
    run sql select lg.name as LineGroup,n.dnorpattern,dhd.hlog from linegroup as lg inner join linegroupnumplanmap as lgmap on lgmap.fklinegroup=lg.pkid inner join numplan as n on lgmap.fknumplan = n.pkid inner join devicenumplanmap as dmap on dmap.fknumplan = n.pkid inner join device as d on dmap.fkdevice=d.pkid inner join devicehlogdynamic as dhd on dhd.fkdevice=d.pkid where lg.name='LineGroup1'

    ReplyDelete
  3. the login screen comes up but doesnt accept my credentials. The user credentials are fine as checked using the self care portal.

    Exp-C keeps throwing a log:

    "Unable to determine home CUCM - Unknown CUCM cluster for node example-CUCM.domain.com"

    TAC is also having a hard time resolving it.

    CUCM is in standalone mode with ONE cluster, pub and sub
    CUCM pings from Exp-C domain lookup tool using FQDN
    DNS SRVs works fine

    Don't know how to go further with troubleshooting :-/

    ReplyDelete
  4. "Unable to determine home CUCM - Unknown CUCM cluster for node example-CUCM.domain.com" I was able to resolve this by adding the CUCM as host name "CUCM.domain.com" in Expressway -C instead of the IP Address.

    ReplyDelete