Understanding Lync 2013 Server Failover

Lync 2010 introduced the concept of a backup registrar.  In this scenario, within topology builder, you have the ability to define a backup register and publish that information into the topology.  This was vital for not only the Survivable Branch Appliance but gave users the ability to fall into limited functionality mode in the client.

What happens is the client, upon login, is given both the primary and backup registrar pool during login with the q=0.7 and q=0.3 values – where q=0.7 is the primary server and q=0.3 is your backup server.

When client services stop, the client would go into limited functionality mode.  When the client is in this mode, Enterprise Voice functionality would work, but features like configuring Sim Ring, Conferencing and other services that relied on the back-end database are no longer available.

This was very powerful and allowed Lync to be survivable for enterprise voice features.  The problem was, users would lose access to their contacts, conferences wouldn’t work, response groups would not work and bringing these services back online on the far side required significant work to import contacts using the old dbimpexp.exe tool.

Lync 2013 changes all of these behaviors and makes survivability in Lync more competitive with other PBX vendors.

According to the “What’s new article” (http://technet.microsoft.com/en-us/library/jj204892%28v=ocs.15%29)

As in Lync Server 2010, the main high availability (HA) scheme for Lync Server 2013 Preview is based on server redundancy via pooling. If a server running a certain server role fails, the other servers in the pool running the same role take the load of that server. This applies to Front End Servers, Edge Servers, Mediation Servers, and Directors.

Lync Server 2013 Preview adds new disaster recovery measures by enabling you to pair Front End pools located in two datacenters. If one of the paired pools goes down, an administrator can fail over the users from that pool to the other pool in the pair, to provide continuation of service.

Lync Server 2013 Preview also adds Back End Server high availability. This is an optional topology in which you deploy two Back End Servers for a Front End pool, and set up synchronous SQL mirroring for all the Lync databases running on the Back End Servers. You may choose whether to deploy a witness for the mirror.

Understanding these is important to understand the functionality of server failover.

#1 is a recap of the feature we have today in Lync 2010.  It tells us that when a server within a pool fails, other servers will pick up it’s place.  For example, failure of front-end server #1 could push the AVMCU features that were hosted on front-end server #1 onto front-end server #2.

#2 is the first look at a new feature of Lync Server 2013.  And unfortunately, we are weeks into the public preview of Lync 2013 and I’ve already heard people miss handle the explanation of this feature.  So what is this exactly?  In Lync Server 2013, when you specify a pool as a backup registrar to another pool, this does a lot more than it used to.  First off, configuration of this option is exactly the same as before within topology.  Here we can see the same configuration option as before.  We can specify automatic failover and failback for voice and again specify the time limits.  This is where items start to change though.  If we take a peak at the services installed on the server, we see something completely new.  The Lync Server Backup Service.  It is important to note, that you must run bootstrapper on the server after you specify a backup registrar.

Image at: http://masteringlync.com/wp-content/uploads/2012/07/image_thumb11.png

So what exactly is this service doing?  According to TechNet, “Lync Server Backup Service provides real-time data replication to keep the pools synchronized”.  So what data are we talking about here?  Pretty much everything.  User information, contacts, conferencing data and more.  Pretty much everything in the database (with the exception of response group information).  What happens if the front-end service is turned off on each server?  And here is where the early reports are slightly confusing.  Some have assumed that if services are offline on one pool, users will automagically failover with everything.  That isn’t quite the case.  Users will failover but have the same limited feature set that they had in the previous version of Lync.

So this should look very similar to anyone who has used Lync previously.  So now our users have a similar set of features.  I can IM and make/receive phone calls but items like conferencing still aren’t available.  So what does #2 of the new features of Lync 2013 exactly tell us.  An administrator can declare an emergency and fail over the pool to the backup pool.  That is done by using the:

Invoke-CsPoolFailover –PoolFQDN <Pool fqdn> –DisasterMode -Verbose

image_thumb2

There is one important note about the above command.  The –DisasterMode is required if the pool (or front-end services) are down.  If you leave that off, you will receive an error in the Management Shell.  The great thing here is this, you could use this command to force a failover to a far side pool so you can perform maintenance on all of the servers.  When the command is completed, the Lync Client will automatically refresh with client and take it out of Limited Functionality Mode.  Contacts, groups, and everything else will come back as they were before.

So what have we learned thus far?  Just because Lync 2013 includes this exciting new feature for DR purposes, there is still some sort of intervention required.  Personally, this is a great decision, because we don’t want users simply flipping between servers because of latency issues.  Second, this completely changes how HA/DR are done today.  How about this HA/DR scenario.  You are a small business and you want to implement some sort of HA/DR.  In the past, you needed to spin an enterprise pool, clustered SQL and much more.  Now, you can spin two Standard Edition servers and point them at each other for failover.

So what happens when your servers are back online and you want to fail-back.  Again, we do this through PowerShell:

Invoke-CsPoolFailback –PoolFQDN <Pool fqdn> –Verbose

Now, one last interesting item, the failback also has the –disastermode option.  Not 100% sure where the use case for this would be.  Maybe you failed over to the backup pool but not that pool has failed and you are going back to your production pool that is back with all of it’s data?  I guess it could happen, but if you were suffering that much failure I have a gut feeling you have other issues.

8 thoughts on “Understanding Lync 2013 Server Failover

  • Pingback: Understanding Lync 2013 Server Failover

  • Pingback: Understanding how the Backup Service Works

  • November 13, 2013 at 12:09 pm
    Permalink

    quick question we talk allot about the FE here what about the SQL layer is there one per DC or are they just one instance in single DC

    Reply
    • November 13, 2013 at 1:44 pm
      Permalink

      From a domain controller layout, you would have DC’s in both the primary and secondary location presumably for authentication.

      Thanks,

      Richard

      Reply
  • April 30, 2014 at 9:23 pm
    Permalink

    Hi,

    I do not understand the part about automatic failover with limited features. In my case when I shut down the primary pool, the lync client is unable to automatically connect to backup pool. I have to manually failover CMS to backup server and also I have to manually failover users to backup pool in order for Lync client to connect to backup Pool. I have two SRV records setup with different priority to enable HA between two pools.

    Reply
  • July 24, 2014 at 5:51 am
    Permalink

    Dear Lync Admin,
    Could you please help me in the below Scenario.
    I am building Lync 2013 Pair Pooling environment in two different Datacenter locations. Before planning, I need certain clarifications on it.
    1 – I have Single Forest & Single Domain Environment.
    2 – If one Site is down, how other site will take over (Manually or Automatically)
    3 – Do I need to have strong enough WAN connection (point to point) between the datacenter?
    4 – If my WAN connection is failed than how both the sites replicate?
    My major concern if the WAN Connection failed between the both Sites than how Lync replicate since its Single Forest & Single Domain Environment.

    Reply
  • Pingback: Understanding how the Backup Service Works | Mastering Lync

  • May 13, 2016 at 3:23 pm
    Permalink

    very nicely written article, grew my understanding about lync datacenter failover, and I am glad failover doesn’t happen automatically. Thank you for the post.

    Reply

Leave a Reply to Understanding Lync 2013 Server Failover Cancel reply

Your email address will not be published. Required fields are marked *