News/Trends, Product, Tech/Engineering

The WAN That Time Forgot

When large organizations deploy cloud-based software, it’s easy to overlook the smaller offices with lousy connectivity, or to postpone dealing with the slow-bandwidth WANs until the end of the project. BAD IDEA, say Yadin Porter de León and Tony Piscopo. Here’s how you really ought to go about it.

Moving applications into the cloud is a lot like deploying software from an in-house data center – except when it’s not. One unique issue for system administrators to deal with is the network availability and connectivity in the company’s field offices. And, we’ve found from experience with many corporate customers, sysadmins tend to make wrong assumptions about the remote offices’ WANs. Too often, those assumptions come back to bite them in the project delivery schedule.

Most large enterprises are dispersed across the world, and naturally the bandwidth in each company office varies. IT staff members’ assumptions about the networks and connectivity in each office are formed tacitly, largely because people close to the regional data centers usually have ridiculously high available bandwidth. You could throw a thousand people a day on it and it wouldn’t impact the network. Satellite sites in the U.S. and in some of Europe might have smaller bandwidth, say a 50MB WAN pipe.

And then you start getting into small sites. We run into them in every organization, no matter the company size. For instance, one of our largest customers has one European office with a user base of 550 users, all of whom compete for the use of a 3MB WAN pipe.

That has a big impact on a competing trend, wherein we see large enterprises taking hardware out of the edge, out of local little sites, and moving it into the regional data centers. They’re motivated to do so for several reasons, primarily cost and support, and it really does benefit them. With 10 servers spread out over several sites, you need remote storage, and you end up with mish-mashed hardware all across the world, different standards everywhere, and a support person trying to manage it. So organizations are trying to bring that storage pool into the regional data centers, and access it through the cloud. (Not so much mobile devices, in this context.)

But that raises new questions about supporting these slow WAN links. How do you backup those 550 users on a 3MB WAN pipe? Those users can’t even surf the Internet or check their email, so you can’t expect 20 or 30 of them regularly to do backups, restores, or other things that depend on fast connectivity.

Pipe Dreams

Given that most of Druva inSync deployments are to a cloud environment, the two of us have a lot of experience in guiding our customers to roll out the software worldwide in the most efficient manner. And that includes when to include those small-pipe offices in the deployment plan. We expect that this advice applies to anybody rolling out cloud-based software, whether it’s in-house software emanating from your data center or a vendor’s cloud-based application.

Most organizations begin with a pilot project, so that they can work out the kinks in the workflow, figure out the best configuration (such as determining which files to exclude or include), or determine the correct level of client side bandwidth throttling to ensure good network performance. It’s easier to get feedback from a limited set of users and fix problems when the “Oops!” affects relatively few people. Ideally, the sites you choose for the deployment pilot – which likely has two or three stages – reflect the organization’s diverse infrastructure and endpoint topology (such as one office that’s primarily Windows, another where most employees use MacOS, as just one example).

You end up with a bell curve of deployment, where usually you start out slow and you accomplish your pilot; and then you try to get as many people as you possibly can into the next round. But then what happens is that you hit a peak, wherein you end up with a lot of all these small sites towards the end of the deployment project. Usually it’s because you said, “I don’t want to deal with them right now; I’ll deal with them later.” The end result is that you get 70% of the user base done in the first three months, and then the next six months is spent dealing with all the small sites representing the more troublesome resource constraints, user issues, and bandwidth bottlenecks.

Obviously, you want to resolve problems sooner in the process rather than the later. The costs associated with fixing the small-sites issues – such as project delays or planning for infrastructure improvements – are not zero. That could seriously derail the budget; major delays in the project plan could seriously impact your team and your credibility. Plus, some architectural concerns can’t really be undone; you have to start patching stuff.

With Druva inSync, there’s another reason for rolling out the software in waves: our data de-duplication features. It makes sense for the enterprise to collect the initial copies of data from the office locations with the best bandwidth. If you’re only going to copy a huge file to the server once, it’s sensible to do so when the data transfer takes seconds, not minutes.

Finding the Sweet Spot

Sysadmins have to determine when to get these smaller sites into a deployment. You have to include the less-technologically-enabled sites soon enough in your project that you can figure out what to take into account (which is why you do a pilot). On the other hand, you don’t want to do that so soon that the users on slow WANs don’t get the benefit of deduplication; they would have to send more data on everybody’s hard drive across that limited pipe. In essence: How much data do I need to collect to have a really nice dedupe pool so that these users don’t see a ton of information?

A lot of the project success comes in the planning stage. Consider the things you take for granted. For example, you make assumptions about where the user is currently storing the data. Is it on the local machine, or are they storing it on a network home drive? Where is that home drive located? Is it located at the local site or it located in the original data center? One way or another: To back it up, that data needs be transferred from that local machine.

The key is to work into the schedule at least some of the low bandwidth sites – just not in the first round. Let’s assume your IT shop has to deploy to 100 sites, and you plan three rounds of deployment.

  • In the initial, pilot deployment (the first 10 company offices), go only to high bandwidth sites. You want to get as much data onto the servers as you can, and you want the results to represent as many users as possible.
  • In the second round (say, 30-40 of the remote offices) start looking for your lower-bandwidth sites to include. Include at least a half dozen medium-bandwidth sites as well as four or five of the sites with low bandwidth and limited WAN capabilities.
  • Finally, the third – and hopefully final – round should include all other sites, which by this point should include a random mix of office sizes and network capability.

That all sounds blazingly obvious. But we have seen customers with a massive global infrastructure deploy Druva inSync. Even though their IT team works to avoid mistakes and to lay out everything right, we see them get stuck in a phase two which include only U.S. high- and medium-bandwidth sites. They put off international deployment until the final phase… whereupon they discover how many wrong assumptions they made, and how much they need to go back and change. You don’t want to discover you need to ask for more budget because you need to add another storage node in South America, or because the company’s sales force is doing most of their work over mobile Wi-Fi devices tethered to laptops – obviously untenable for backing up to the cloud, or for most other serious cloud computing.

Plus, you don’t know how long it takes to bring in that unplanned-for infrastructure. If you had discovered the need in your first round and it takes three months to get it installed, that process could be underway while you work on other parts of the deployment.

From a overall deployment project perspective, choosing to complete all the easy high-bandwidth sites first may seem like the path of least resistance; in fact, it’s not. A blended approach starts to make more sense as you gather information from each remote office. You begin to see the issues that could arise from some of the more resource-constrained deployment sites. As with any other project, working on solutions early to address issues that you can see coming is always the best strategy to ensure that your team, your infrastructure, and your budget are taking everything into account.

Cloud considerations