Enterprise organizations already struggle with the mass of data they need to manage and everyone knows that – “big data” buzzwords notwithstanding – it’s only going to get worse. As it turns out, a physics analogy may help you visualize the data problem and approach better solutions.
Some large enterprise organizations are caught off guard by the pace at which data is being generated internally and externally. Without proper planning, IT teams can find themselves painted into a corner with limited options. They need to get a handle on what can be done with growing data sets, as well as to deal with the severe limitations on using and deploying new and existing applications that rely on the data being in close proximity.
Those collections of ever-larger sets of data build mass. The greater the mass of data, the greater the pull this data has on a host of network resources.
If words like “mass” and “pull” make you think of physics, you aren’t alone. Data gravity, a term coined by Dave McCrory, compares the pull that all objects have on each other in our physical universe to the forces exerted by data sets within the digital universe of the global enterprise. Much as the Big Bang Theory proposes a universe in which all matter was drawn together, then exploded, the data gravity analogy applies some of the same properties to the behavior observed by the consolidation of the traditional data repositories and workloads.
It’s not just an intellectual conversation-starter. Those who are considering data gravity as an important element in IT planning – and we think you should be among them – see it as a way to address ongoing challenges in enterprise computing.
For example, one result of intense data growth is that it becomes harder and harder to move that data. LAN and WAN bandwidth become barriers to migrating entire data sets and they restrict how the data is accessed, manipulated, and analyzed. IT professionals spend significant time and effort re-designing their distributed architecture applications to handle the sheer size of the data. They have to, to save on bandwidth and to provision additional storage resources in the same geographic area.
Data analytics is also pulled close to the data, to maximize performance or simply to save bandwidth utilization. At the very least, the application may be moved closer to the data to ensure that the prescribed performance metrics are met. We can observe the pull of data gravity as applications are forced to move physically closer to the data in order to maximize network resource efficiency, or simply to meet the basic service level requirements needed for an application.
The greatest phenomenon with respect to the data’s value is that, as greater sets of data are collected in a single location, there is a greater likelihood that additional applications will be attracted to this data due to the potential value gained from analyzing it. This can be described as the pull of the data.
This powerful pull is what has started the Edge to Core initiatives most large enterprises are now performing, wherein they are moving data and infrastructure out of local sites and into regional data centers. Doing so permits central data management as well as the benefits of data deduplication. (Most data isn’t unique to a single user; with global deduplication the storage costs can be drastically reduced.) The similarities between this phenomenon and the properties of gravity are noteworthy. Just as the mass or density increases within a physical item, so does the strength of gravitational pull.
Build a Data Gravity Profile Before Starting Your Next Project
Regardless of how large the next application may be, it’s always best to consider the long-term effects of your user population growth as well as the resources the application will consume. If the data grows, how long will it be before it cannot be moved easily? At that point, what effect will its data gravity have on other application resources? How important will that data become to other services that may be located further away? What issues will arise if the data cannot be moved?
Choosing the right software, architecture, storage, and vendors becomes critical when you considering any company-wide solution, no matter how small it may seem in the beginning. The design decisions you make early on have far-reaching effects on your ability to manipulate, move, or analyze data, and thus to harness the information in a way that results in timely and relevant business intelligence. Since company value is the ultimate goal, careful planning with data gravity in mind can be a critical step in all your projects moving forward.
Did you find this post helpful? You might enjoy reading our white paper, Data Deduplication for Corporate Endpoints.