Want To Leverage Your Dark Data? 5 Steps to Get Started


Traditional analytics have become commonplace in today’s business world. There’s hardly a retailer around that doesn’t survey inventory and sales, or a service provider that isn’t analyzing transactional data. The majority of businesses that compete in the marketplace have some type of basic data reporting set up to glean insights about their operations.

In this environment, however, it takes a bolder approach to gain a leg up, and mining your “dark data”—data that you currently collect and store, but haven’t yet put to use—can help you gain the fresh, actionable information about your customers that you need to compete. So, how can you start extracting the value of this data?

To help you get started, we spoke to dark data experts to learn what best practices you should employ when beginning your first project.

Why Should You Dig Into Your Dark Data?

Dark data is simply structured or unstructured data that is gathered during the course of regular business operations, but that is not actively being used. In many organizations, this can refer to unused information contained within call center logs, social media feeds or website metrics.

Dark data will vary from business to business, but some examples might include a caller’s time zone information found in call center logs, or user engagement on your company’s Facebook page. While you may not formally be using these sources of information, they could be leveraged to help you determine something valuable about callers in certain time zones, or customers who engage with your brand at certain points.

Digging into your dark data can be beneficial because it allows you to fill in the holes often left by traditional sources of “light data,” such as sales reports or shipping records. A sales report is rather one-dimensional on its own—but if you were able to pair customer sentiment from Twitter feeds with particular dips in sales, that information would suddenly become a lot more meaningful.

Indeed, putting dark data to use is about solving vexing problems, particularly when the data you are currently using only provides a glimpse into the whole story. But identifying your dark data is often the easiest part. Harnessing its true worth can be much more challenging.

“When you’re talking about getting value out of your data, it’s not just about understanding what it is and where it is,” says Joe Garber, the vice president of information governance for HP Autonomy, a division of Hewlett-Packard that specializes in data analytics. “It’s about the ability to migrate that data to a consolidated repository where you can look at it together with other information, and start to gain strategic insight from it.”

In other words, there are tools you can use to pair up different dark data sources, and look for correlations between them. But before you do that, you must identify why it is you need to utilize your dark data.

Step 1: Determine the Challenge You Want to Address

In the simplest terms, a dark data project must begin with a problem.

That problem could be a clear and present one: for example, quarterly sales reports reveal that a particular customer segment is dropping off for no apparent reason. Or it could be more obscure, such as why your organization hasn’t been able to perform well in a certain market.

Either way, it’s critical that you clearly define what you want to know before you get started, particularly so you don’t overlook readily-available data that might provide the answer. Dark data projects are expensive, and they require expensive personnel—either as members of your full-time staff, or as consultants brought in to help with particular projects. It’s important to view these projects as investments rather than as experiments.

“You don’t need to use dark data to solve basic questions that you can solve with regular sales data or point-of-sale data,” says Bala Chandran, the senior product and strategy lead for MicroStrategy, a global provider of enterprise software platforms. “The symptoms of a business problem that would be interesting [to solve] with dark data are things that you notice in your business that you can’t explain.”

Basic questions you could solve with regular (light) sales data might include how much of a certain product your business sold in a particular month, or how much revenue was earned in a particular quarter. However, determining why sales went a certain direction or revenue slumped in a certain quarter may require more than those regular data sources to answer.

Step 2: Create a Team That Includes Old and New Blood

Once you’ve chosen a problem to solve and found the right data sources, Chandran advises staffing your dark data project with a mix of people who are familiar with the problem, and those from other departments who may not be. Additionally, hiring outside business consultants who can view the problem from a new perspective may also be beneficial.

“Recruit from within and hire from the outside,” Chandran says of an ideal dark data project team. “What I’ve seen is that people who know the business tend to look at the problem the same way they’ve always looked at it, and look at the same data sources they’ve always looked at.”

In other words, you need expertise that understands the business, but you also need fresh blood to ask questions that are outside the box. Taking this point further, Chandran advises appointing an overall leader, as well as multiple “pods” within the team. Preferably, he says, each pod should consist of one inside and one outside person.

In this model, each pod is independently tasked with creating prototypes for how they plan to tackle the dark data project. After one to three months, Chandran recommends testing each pod’s ideas against one another to determine which ideas to move forward with.

For example, one pod might decide that the team’s problem can be solved using only dark data from social media feeds. They would need to chart out why this is and how they plan to execute. Meanwhile, another pod may determine that they need social media feeds paired with additional customer information, and would explain how they plan to use this data. The pods’ plans would then need to compete against one another with input from the entire team, with the most effective plan emerging as the best way to move forward.

Additionally, your team will need to include programmers who write map-reduce code, which is used for processing large data sets. If you don’t have employees who can write this code, it might be something to outsource to a freelancer.

Furthermore, Chandran cautions, “The people who tend to know how to write map-reduce code aren’t necessarily the people who are thinking about dark data the right way.”

Therefore, he also recommends bringing on team members such as data scientists and experts from within the business who can analyze the data critically after it is coded. Again, if these people are not part of your full-time staff, hiring from the outside may be necessary.

Step 3: Identify the Right Dark Data Sources

After you’ve identified a challenge, you must find the best places to look for the data that will address it. Chandran says he most often finds dark data to be underutilized customer information, and therefore advises first looking to your customer-facing departments as a prime dark data source.

“The customer service department, the marketing department, the customer retention department—those tend to typically be where the most payoff occurs,” he says. In other words: these departments are likely places to find the answers you seek.

Chandran further explains that these departments hold large stores of historical customer data, including “digital exhaust,” or data from call center notes and social media in which customers express their opinion of a product. Advertising data is another valuable dark data source, which can allow you to track how many customers viewed a particular ad, and for how long.

Chandran advises asking managers in these departments about all the ways a customer or a potential customer might interact with the brand and company, as information from these interactions is often recorded and is frequently a good source of dark data.

“This includes how the customer interacts with the company online, offline, how they use the product and how they perceive and talk about the product,” he says.

Step 4: Use Hadoop to Mine Your Dark Data

So, now that you’ve identified and gathered your dark data, you’ll need some tools to dig into it. At this point, advanced algorithms and data mining techniques need to be put to use, together with the map-reduce code written by your programmers.

To do this, Chandran suggests using open-source, big data storage and processing tools such as Apache Hadoop, primarily because they don’t require the end user to organize information before entering it in. These tools can also easily process images, audio files, videos and other non-structured information.

Once correlations within your data begin to emerge using Hadoop or a similar program, looking at the data critically may involve an additional tool.

“These tend to be statistical or data analytics software programs where [data scientists or business analysts] can generate their ideas, and those ideas get translated by the tool into executable code,” Chandran says. This often means allowing the analysts to manipulate and test the data in multiple ways. The executable code, created from their ideas, allows for those manipulations to occur.

Beyond that, dark data projects are significantly boosted by predictive analytics software, which allow users to run a variety of hypotheses and tests.

“And in a lot of cases,” Chandran says, “you need to be able to act on this information quickly. So you may also want a tool that can stream the data that’s coming in from all these sources and analyze it in real time.”

With regards to how long this process takes, it is dependent on the project at hand and will differ from business to business.

Microstrategy’s analytics platform includes many of the tools you can use to extract value from your dark data

Step 5: Test Results Against Existing Sets of Data

The above processes will ideally begin to yield some of the answers you are looking for—however, once you begin to retrieve those answers, it’s important to be cautious and keep in mind the difference between causation and correlation.

As an example, Chandran points to an early dark data project in which people were trying to figure out what caused the Standard & Poor’s (S&P) stock market index to move the way it does.

“It turned out that the closest correlation to the S&P was butter prices in Bangladesh,” Chandran says. “It just happened to be that way. But butter prices in Bangladesh obviously don’t cause the S&P to move, so you have to beware of that on the tactical side. When you do get [one of these] ‘aha’ moment[s], you should step back and determine: is this random correlation, or is this actual causation?”

The “aha moments” Chandran refers to are when you find a correlation in your data that may help explain the larger problem. “Aha criteria,” on the other hand, are information points that might be causing the “aha moment” to occur. The best way to determine if your “aha moment” represents actual causation is to test your “aha criteria” against another subset of data.

For example: A group of analysts is trying to figure out the movements of a particular stock price. They look at data from 1995 through 2005, and identify five factors (or “aha criteria”) they think might be causing the stock to move (e.g., oil prices, employment numbers, military tensions etc.). Their next step would be to test those same factors against data from 2005-2010 to see if they caused the stock to move within this dataset, as well.

“That’s the sort of test you would do if you get an ‘aha’ moment, or you think you’ve got an ‘aha’ set of criteria,” Chandran says. “If it’s still valid, you’re still getting the same ‘aha’ results, then you can be more confident and begin to take action.”



Be Prepared for the Long Haul

At this point, your dark data project is well underway. Testing different sets of data against one another until you’ve arrived at insights that may explain your problem will take some time. And you may have to go back and start over again. But it’s important to let this process play out, and remember that failure is a real and common aspect of dark data projects.

“A dark data project has to be something you’re willing to commit yourself to for a good year, or a year and half, before you can judge whether or not it’s a success,” Chandran says.

It’s also important to remain flexible and be willing to adjust course if the results you’re getting point in an unexpected direction.

“The whole point of dark data is that you don’t know what’s out there and what you might find,” Chandran says. “And you don’t want to have too rigid a plan that boxes you into what you might end up doing.”

Once you’ve begun to solve those seemingly-impossible problems, however, there’s really no limit to how far your dark data can take you.

Share this post:  

About the Author

Abe Selig joined Software Advice in 2014. As the Managing Editor of Plotting Success, Abe analyzes and writes about BI trends and tools. He also writes content related to supply chain management.

Connect with Abe Selig via: 
Email  | Google+  | LinkedIn