Saturday, February 4, 2012

Kerberos SPNs and Double Hops in a Nuthshell

By Saleh Najar

You are implementing an enterprise distributed application and everything is working nicely on your development machine. All layers are communicating properly. Your web UI is talking nicely to your WCF service and your WCF service is talking nicely to the database. Suddenly, when you deploy to a multi-tier environment, very close to your production release deadline, all hell breaks loose. You start getting weird non-descriptive errors that you suspect have to do with permissions. You could swear that everything was working fine in your development environment. What happened?

Welcome to Kerberos authentication double-hop SPN issues! Desperate to find out and after a thorough analysis, you find out that the only difference between your development environment and the UAT or test environment is the deployment architecture and topology. In your development environment, all your layers were running on the same tier or machine. In the test environment your layers were distributed over three tiers where you have the web server on one machine, the WCF service on another machine and the database server yet on a third machine.

When running on the same machine, Kerberos authentication is not engaged however, when two machine are communicating, Kerberos is engaged and when three machines are communicating then Kerberos SPNs are involved in the infamous double-hop scenario.

In the two-tier or two-machine scenario, Kerberos authentication should not cause a problem as the authentication is straight forward. The problem starts when a third machine is involved and the second machine has to delegate to that third machine. In other words, you have a web server that needs to impersonate the user to the service machine where the business logic resides and in turn the second machine or tier needs to communicate with the third machine: the data tier.

In this scenario, if SPNs (Service Primary Name) are not set up properly for the customer network account you are using, you will run into security issues. The best place to check for these issues is the Events Viewer as they are marked nicely as Kerberos errors.

Before Kerberos SPNs, authentication was only one way, the server authenticating the client and user. With the rise of server spoofing, the need came to also have the client authenticate the server to make sure that it wasn't spoofed. This is what Kerberos SPNs were created to solve. Service Primary Names (SPNs) are a mechanism that enables the client to authenticate the server to make sure it is talking to the authenticated server on the network.

For a client to authenticate a server's service, it needs some kind of an identifier to send to the authentication authority (KDC). This identifier is what is called an SPN or Service Primary Name. The format of this SPN is predetermined by convention. For example, the browser knows how to put together that SPN from available information.

Once a client forms the SPN string, it sends a request to the authentication authority to authenticate this SPN. The authentication service in return checks its store (Active Directory) to see which network account is this SPN associated with, then uses the password of this account to encrypt messages and communicate with that service host to make sure it is the right server. If that server service is able to decrypt the message (using its password) and reply to the authentication service, then the authentication service or KDC knows that this is the right service with the right SPN so it authenticates the service to the client who requested it and communications goes on. That's Kerberos SPNs in a nutshell.

So back to our three-tier scenario. To have your distributed system work in a three tier scenario with custom accounts (service accounts or user accounts) make sure to create an SPN for your first and second tiers and make sure to associate them with the account that is running your web application and your WCF service. You will need domain admin permission in order to do that using the Microsoft spn tool. You also need to allow delegation in Active Directory for these machines. You will need a domain admin to work with you on this and make sure to allow enough time in your project schedule because in a big enterprise, it typically takes around a couple of weeks to get your request for a domain admin resource to work with.

If your services run using system accounts such as NETWORK SERVICE then you don't need to worry about creating new SPN as the default ones will be used.