Finding The Needle In the Haystack – How Druva’s federated search aids in investigations

Sahil Goyal, Senior Product Manager

There are times when the information security team of an organization needs to conduct internal investigations. There may be a departing employee suspected of potentially transferring sensitive data to their personal email account, or there has been a data leak where a file containing a critical trade secret is accidentally shared with a wider employee base in the organization.

In such scenarios – organizations only have days to investigate before the employee leaves or even a few crucial hours before the data leak becomes an unrecoverable extent. This means an organization can’t waste hours trying to investigate the incident. Yet in most cases, critical hours are wasted chasing dead ends due to bureaucracy and legacy IT systems.

What makes investigations so difficult?

The biggest issue with digital investigations faced by firms today is the distributed nature of data sources employees can potentially utilize – desktops, cloud shares, emails, and team drives. When an investigation is needed, searching all of these data sources quickly and efficiently is like searching for a needle in the haystack.

Additionally, there is the bureaucracy and logistical delays of separate admins and departments managing different data sources. Each investigation request to separate admins of different departments naturally causes further delays and bureaucratic headaches. To compound the problem further, there is no single person who knows the system 100%.

Finally, the legacy search capability provided by most data source vendors makes searching for a needle in a haystack an impossible task. There either is a limitation on the historical time period for a particular email or file, or a limitation on what metadata is available.

How can Druva help?

During investigations, working efficiently and quickly is the key. Druva’s search capability has been designed for the scale and diversity of a large enterprise. Druva’s federated search functionality collects data from multiple sources and provides results on a near real-time basis across multiple data sources in a single pane of glass. The data sources supported include, laptops, mobile devices, box folders, GSuite and Office 365. Some of Druva’s key capabilities to streamline search include:

Search at scale on a near real-time basis

With a focus on scale, Druva’s solution indexes data during backup as opposed to indexing the data when searched. However complex the query, Druva’s shard-based indexing mechanism ensures that search results show up in a matter of few seconds.

More importantly, data is available to search when it gets created. The net result that it saves investigators a significant amount of time.

Historical investigations made easy

Historical investigations are always a concern. We are often asked by our customers “What happens when a case is 6 months old and the employee no longer works for the organization?”

Fortunately, Druva maintains historic “As Is” copies of data enabling the ability to go back in time and search within a specific investigative time window. Even if an employee clears all traces – it is still possible to search historic snapshots to indicate whether a file or email existed at a specific point in time.

Extended metadata overage

Finally, Druva captures all relevant metadata of a file and email. This means search is quick, efficient and consistent. Through a single query, it is possible to search across data sources and potentially reduce the time to investigate from days to a few seconds.

Druva federated search today supports following metadata parameters

  • File metadata parameters:
    • Filename
    • Extension
    • Checksum
    • Modified Time
    • Create Time
    • File Size
  • Email metadata Parameters
    • Email Subject
    • Email Recipients (from/to/cc/bcc)
    • Email Receive Time
    • Attachment Name
    • Attachment Extension
    • Attachment Checksum
    • Attachment Size

With all this in mind, consider the following three scenarios:

Scenario 1: HR and Legal Investigation

Employee Max gives notice to leave the company. Max has been working with several high profile customers in his role as a systems engineer. His manager suspects that he will be joining their competition and alerts HR. HR requests that John, the data protection analyst, to investigate and run a risk assessment of Max. John is given a couple of keywords including the companies that Max was working with.

HR and legal investigation

Scenario 2: Track Malware and Infected files

Bob the information security admin is investigating which machines contain a recently detected trojan and Bob has been provided with the following IOC’s (indicators of compromise), the time period, and the file extension – sha1. His objective is to figure out which machines are infected with the trojan and determine the sequence of events leading up to the initial compromise (i.e. which machine got infected first and when)

Bob can use Druva’s federated search functionality to search file hashes to track an infected file across data sources and users. Bob can further narrow down his search using the file extension along with when the files were modified or created.

Track malware and infected files

Scenario 3: Forensic Investigations

Company X is planning on an M&A and the information about the acquisition is supposed to be a tightly held secret. All documents associated with the deal are only supposed to be available to a small set of employees but it is discovered that a couple of sensitive documents are now in the hands of others.

Forensic investigation teams can utilize Druva’s federated search to identify files or emails edited/sent by a custodian during an investigative window. This data can also be made available via APIs for third party consumption. Druva also provides a chain of custody report of all user data to ensure the data is admissible.

Forensic investigations


The data protection industry continues to see exponential data growth. More data means products are backing up more data every year. At the same time, businesses need to reduce costs.

The nature of backup lends itself to unique data-focused value propositions such as the ability to facilitate investigations on a large and dispersed data set. Searches across the entirety of your enterprise dataset, no matter the data source to assist with HR and legal investigations, help track malware and infected files and assist in forensic investigations are capabilities beyond standard backup and restore.

To learn more about Druva’s federated search capability, select one of the following options: