Categories
Digital Transformation Modern Workplace

Staying current in the new world

(Originally published on LinkedIn)

In this post, I´ll keep covering our digital transformation. If you haven’t read the previous part, you can the first part here and the second here. This is the story of how we left a legacy workplace in 2018 and started to build for the future.

One thing I’ve noticed that you often come across when you working bigger changes, and especially moving to new technology, is variations of the phrase “yeah we don´t do it like that here, it would never work”.

If you have never tried it and you don’t really know what it is/means, how can you be so sure that it will not work?

I quite often play the “hey I´m a millennial”-card when discussing change (it works surprisingly well), especially when I talk about things that might be a bit naive and oversimplified. But it´s an effective way to push forward and skip over some of those road bumps which you tend to get stuck on.

We now live in a world which is ever changing when it comes to the workplace. You can update the Office suite every month and Windows feature updates are released every six months. This is quite different from the past.

So how did we decide to navigate this?

The first step we took was to accept that this is what the world looks like now. No matter how much we complain by the coffee machine, this is the reality now.

The second step is to sell this to the organization, especially key stakeholders such as application owners and senior management. This is the tricky part since this is not so much technology as politics.

Instead of seeing each upgrade as a project itself, we built a process to support this flow of an evergreen world. This means that once we have finished the last step in the process, it’s time to start over again. Our process contains the following steps (imagine this as a circle):

  1. Inform stakeholders that new release is coming in 2-3 weeks.
  2. Release update to first evaluation group (ring 0) to clear any compatibility issues in the environment.
  3. Release update to second evaluation group (ring 1) which contains application testers for business-critical applications, to give them as much time as possible to evaluate.
  4. Release update to third evaluation group (ring 2) which contains application testers for important business applications which are not deemed critical but still would like to evaluate on an early stage.
  5. Release update to the first pilot group for broad deployment (ring 3) to make sure that deployment works on a global scale. This step is estimated to happen 2-3 month after the Windows 10 feature upgrade is released, but it also depends on the outcome of the previous steps.
  6. Release update to broad production (ring 4).

During this entire process, we are monitoring the deployments and keeping track that nothing breaks. If an application is identified as problematic, the computers can simply be rolled back to the previous version of Windows 10 and that application will be put on an exclusion list (basically be put in ring 5) until the application owner has taken action on the application. This has however not yet happened.

Does this process work in the real world?

Yes. We ran through this but at a slightly higher pace when moving from Windows 10 1709/1803 to Windows 10 1809. To our knowledge, we did not have any major incidents where we broke an end user’s computer. We upgraded roughly 18 000 computers in a matter of a few weeks.

We did have errors though, and a lot of them during the first week. But all errors were indicating that users were not able to run the upgrade (it was blocked). This was also expected based on the earlier test we had run with the earlier rings, but nothing we couldn’t handle. Everyone was confident in the servicing, and all errors were either “solved by them self” or fixed by our technicians in bulk or case by case.

After our first major Windows as a Service experience, we still trust the servicing. We were even more confident after the upgrade that the Windows as a Service process works.

BUT, having static rings as we do today is far from ideal. Until we have better tools (such as Microsoft Desktop Analytics) to create dynamic rings, this is our approach. We will spend some time fine-tuning the setup and move to dynamic rings once we have the tools.

The outcome

  • Users had the update as available for 21 days, after that the installation was mandatory
  • We upgraded roughly 18 000 computers in about a month
  • No major application compatibility issues
  • Branch Cache took about 50-60% of the workload
  • No reported network disturbances during this time caused by SCCM

Bonus learning

One thing we realized quite early on was that the phrase “application testing” scares people, especially management. Testing is expensive and time-consuming is a general feeling and causes unwanted friction when you want to speed up the pace. Therefore, we decided to rephrase it. We were not aiming to do “application testing” in ring 1 and 2, we are aiming to do “application verification“. This minor change in the wording changed the dialogue a lot and people became less scared of the flow we set up. Verification is less scary then testing.

Categories
Digital Transformation Modern Workplace

Deploying the future

(Originally published on LinkedIn)

This is the second part of a series about the digital transformation journey we are doing at Sandvik. You can find the first part here, Leaving legacy in 2018.

When I joined Sandvik back in 2017 we were right in the middle of upgrading our Configuration Manager environment from SCCM 2007 to SCCM Current Branch. This was a huge project in which we invested a lot of money and time into with our delivery partner.

We finally pulled through. Everyone involved in the project did a huge effort to get us there, from the SCCM delivery team/technicians to local IT. This was our first step towards the future for our clients and this meant we could start working on Windows 10.

Configuration Manager and deploying applications were however still somewhat of a struggle for us. Every other time we did a large deployment we had to deploy in waves, spend a lot of time and effort into not “killing” the slower sites which often meant deploying on weird hours and asking users to leave their machines on during the night at the office. It happened more than one time that we had to pull the plug on deployments since we were consuming all the bandwidth in the network for some sites, even the bigger ones. We did have a peer-to-peer solution, but it was not spread out to all sites and machines.

We had to fix this.

Since we had moved to SCCM CB a lot of new opportunists opened up (maybe not from day one though) which meant that we actually had tools in our toolbox to solve this in a new way, such as Branch Cache and Peer Cache (which in them self are not new functions).

We decided to start with Branch Cache since our biggest problem was application distribution. We piloted the Branch Cache at a few sites to see if we actually could gain something from this, and the results were really promising so we started deploying this throughout our whole environment, starting with the most prioritized sites without local distribution points and then over to all sites. When Branch Cache was widely deployed, we scaled down our 1E Nomad solution and eventually removed it.

We managed to do the following bigger things without causing network interference and seeing Branch Cache being utilized.

  • Deploy Office 365 ProPlus update to > 25 000 computers
  • Deploy Windows 10 feature update to > 18 000 computers

And then we had the one we are most proud of to date. We deployed Teams to > 25 000 users, with utilization in Branch Cache of 70%. This is our best number so far for applications, and then we are not yet using phased deployments in Config Manager.

Our next step right now is to get Peer Cache out on a few sites, especially sites with bad connections to the closest distribution point. The reason we want to get Peer Cache out in the environment is to ease PxE installation on our smaller/remote sites. In parallel to this, we are also investigating how we could utilize LEDBAT for the traffic between our SCCM servers. This, however, requires that our SCCM servers are running at least Windows Server 2016 and we are not completely there yet. But there is still a lot of time left during 2019!

The take away from this

The biggest takeaway, Branch Cache works, and it works really well. If you have not yet started to investigate Branch Cache, I would advise you to do so. This has saved us a lot of headache and time since we can now deploy with great confidence that we will not disturb our critical business systems with our traffic which might not be as critical. The fact that we have managed to reduce the WAN traffic with up to 70% for larger deployments has improved the trust of other teams that we can deploy things in a disturbance-free way.

I also want to point out that our team of technicians and architects has done tremendous work making this possible.