Buzzword Breakdown 2.0: 5 Baffling BI Terms ExplainedJuly 17, 2014 by Abe Selig
While the software market in general can be full of confusing terms and acronyms, the realm of business intelligence (BI) may be one of the most confusing. Take, for example, this 2013 report from Gartner Inc., which highlights a slight growth slowdown in the otherwise-booming BI software market. One of the top reasons for that slowdown? “Confusion related to emerging technology terms.”
And who can blame those who get confused? Terms such as “data marts” and acronyms like “OLTP” abound in the BI world, and if your boss or CEO has asked you to get up to speed on business intelligence, the amount of buzzwords out there can be downright daunting.
But don’t despair. Just as we introduced you to some basic BI terminology in our first buzzword breakdown article, this second installment will shed light on additional BI terms—from basic concepts that still need clarification to newer terminology. We’ve asked BI experts Ray Major, Jonathan Mills and Ben Plummer to weigh in and help explain these terms in a way that non-BI professionals can understand.
What Is ETL?
Mills: ETL stands for “extract, transform and load.” This is a process for combining data from more than one source with different types of information.
Say, for example, that you had an order system that had order information. You want to extract the contact information on the people who placed the order, and load it into a new contacts database. The struggle is that, in the order system, you have a “Name” field that has both first and last names in [the same] field. [But] the new contacts database has those separated into a “First Name” and “Last Name” field. In this case, your ETL process would “extract” the data from the order database. It would then “transform” the name field into two fields by separating on the space, and finally, [would] “load” the newly formatted data into the contacts database.
Plummer: ETL is most often used [to restructure] data into a source or format that can be used for analytics. People tend to think of ETL in a traditional fashion, which is, going from data to re-organized data—but what’s happening now is that because of “big data,” or the introduction of many new data sources to be analyzed, the definition is broadening. So it’s moving from just [transforming] data to data, to [transforming] documents to data, or from sound to data or even from tweets to data.
The definition has definitely begun to change. People have always thought of ETL as transforming [one type of] structured data to [another type of] structured data; now, it’s transforming many different types of data to structured data.
Major: Typically, data is collected by companies through various systems. It could be from scanners, an HR system, their website or clerks entering customer information via keyboards. At the end of the day, the company has a bunch of data in different formats: some are in spreadsheets, and others are in databases. The challenge is that the data can’t be analyzed unless it is standardized.
Ergo, the need for ETL. “Extract” involves pulling all the information from all the disparate formats and getting them in a single common format. During the “transform” process, the data is transformed to meet a company’s data needs. For instance, data entered as “M” and “F” in one system and as “male” and “female” in another might be standardized in the system as “MALE” and “FEMALE,” so that there is consistency among the data. Finally, the “load” process is where the data is written into a data warehouse system. The data is usually refreshed on a periodic basis, so the load process can happen nightly, weekly or hourly.
What Is OLAP?
Mills: “Online analytical processing,” or OLAP, is a way of analyzing multiple dimensions of data in a neat and clean way. Usually this is done in a “[data] cube,” and often through [Microsoft] Excel. “Multidimensional data” is just a complicated way of saying “data that can be looked at from several different perspectives.”
[For example,] think “sales of products,” “locations” and “time.” In a cube with those dimensions, I could look at sales of products at a specific location, then expand that to see sales over time. Or I could rotate the cube and look at total sales over time by location, then narrow down to a specific product. The idea of an OLAP cube is that it allows you to “slice and dice” that data to explore possible correlations [within] it.
Plummer: OLAP gives you the ability to organize data for quicker analytics, typically in a [data] cube structure. Once again, this has changed as data sizes have grown. Traditionally, what [would happen] is [that] you [would] use some ETL process to load an OLAP data structure so you could do analysis. What’s happening now is [that] because people are utilizing ETL to translate so many types of data, to organize it into a nice, neat cube is becoming more difficult.
Major: OLAP is essentially a way of storing data that enables a user to easily and selectively extract and view data by different attributes. It is stored in a multidimensional database. [Conversely,] a relational database can be thought of as a two-dimensional database: the rows and columns in [Microsoft] Excel are a perfect example.
A two-dimensional structure is limiting in terms of the questions an analyst can ask. On a spreadsheet, you could collect data on “sales” of products (one product per row) or collect sales data by date (each month’s sales in a column). [But] more complicated queries, such as “sales by product by region” or “sales by product by sales rep,” require a multidimensional structure or database.
A multidimensional database considers each data attribute (such as product, geographic region, sales rep and time period) as a separate “dimension.” OLAP software can locate the intersection of dimensions (all products sold in California by Bob Smith in February) and retrieve just that data for display purposes.
What Is Dashboard Reporting?
Mills: “Dashboard reporting” is essentially what it sounds like. Think about the dashboard of your car. All of the information you need to know about your car is right there in one place for quick review. Speed, gas levels, engine RPMs and maybe some error indicators, all within a glance. That is what reporting dashboards are trying to replicate. A good dashboard … will give you all the information you need to know about how well you are performing with a quick glance.
Plummer: Dashboards really began by offering users the ability to view data visualizations as part of their BI reporting. This is changing, as now users want not only to see an aggregate view of their data—which is typically what you get in a traditional BI dashboard—but to use their dashboard reporting to visually discover data on a very granular level. That means taking out data points, and then interacting with visualizations [of those data points] to focus on outliers.
When I look at a dashboard report, I want to know why some metric is way out to one side … or I want to know why some other metric is way down here, and try to eliminate that. You can’t do that when you roll it all up and view it in the aggregate, because things average out. That part of dashboard reporting has changed tremendously, so that dashboards are becoming less about the static uses of information and more about granular views of visualizations you can interact with.
Major: A dashboard is simply a way to provide a visualization of data to end users in the form of charts, maps, graphs, tables etc. Volumes of numbers, and rows and columns in spreadsheets, are hard to interpret and make actionable. People “see” [graphical data] much more easily. Dashboards give signs about a business, letting the user know [if] something is wrong or something is right.
Dashboards can be either strategic, analytic or operational in nature. Strategic dashboards … provide an overview that decision-makers need to monitor the health and opportunities of the business. Strategic dashboards provide snapshots of data, such as “YTD sales by month.” Dashboards for analytical purposes allow users to interact with the data, slicing and dicing and drilling down into the underlying details. Dashboards for monitoring operations track events that are constantly changing, and might require attention and response at a moment’s notice.
What Is the “R” Programming Language?
Mills: Every programming language out there has its strengths and weaknesses. R’s strength is data; it is a language built to analyze and manipulate data. You will see this used by true, hardcore data scientists, but it does not currently have much traction outside of that area.
Plummer: R is a language that’s going to allow you to take big data sets and reduce them down to something meaningful, and apply algorithmic processes against them to get more value and understanding out of them. It’s a [programming] language, not dissimilar to any other language, but it’s been specifically designed to reduce, simplify and organize data for you. It’s really designed for dealing with larger data sets, and it’s better for, say, data science—meaning aggregating and interacting with data—than just retrieving [data].
Major: R programming language is a statistical software programming language that allows data analysts, data scientists and statisticians to apply sophisticated mathematical models to their data for [analytical] purposes.
As an example, a company might have raw data that measures sales information for every quarter for the past five years. That data can be easily viewed in a dashboard, but it is only valuable from a historical perspective; to get insights into what might happen with sales in the future, a company would want to produce a forecast. R contains the statistical algorithms to forecast the historic data, and develop insights into what might happen in the coming quarters.
What Is Data Exhaust?
Mills: “Data exhaust” is a term that is used to describe all of the extra data that is generated by a user over the course of a transaction. This is data that is not necessarily related to the task you were performing, but [that] may be of interest to the site you [were] using.
Think about a simple trip to an online retailer where you select an item, add it to your cart and then pay for it. Over the course of this transaction, a typical user has the ability to generate a large amount of data exhaust: [For example,] what search terms did you use to find what you ended up purchasing? Did you add more than one thing to your cart, and then remove some things? Did you leave the site and return later to complete the transaction?
All of these nuggets of information make up the “data exhaust” of your transaction. [And] these are things that can be mined later by the application, to help improve the user experience moving forward.
Plummer: In the sense that [data exhaust] is data that isn’t always of immediate or obvious use, it’s similar to “dark data,” which we define as data [that] you should be using, but that you think you’ve given up on, either because of its structure or [its] perceived lack of value.
Often times, this [exhaust] data will fall by the wayside, because people don’t understand how to bring it to life; they don’t know how to analyze it. Here, it’s important to be able to work with data at a very granular level—and it’s why people discard data quite frequently. They give up on data because of its lack of structure, or they give up on it because they don’t know how to see it.
Major: Data exhaust refers to data, usually unstructured in nature, that is collected as a byproduct of just doing business. Many retailers collecting large quantities of retail data often lack the capability to process it, and much of the data collected is being ignored.
There is value in the exhaust data, but most companies don’t know how to make it actionable. One simple example [of a company that has made its data exhaust actionable] is Amazon. They not only track the books you purchase, but also keep a record of the ones you browse but do not buy, to help recommend other books to you.
Meet Our Experts
Jon Mills is the director of Paige Technologies, an IT staffing firm based in Kansas City, Missouri. Jon is an accomplished thought leader with nearly 20 years of experience working with big data and BI. He was recently featured in a Kansas City Business Journal article that focused on key principles of big data, and has spoken on the topic at the Kauffman Foundation, an organization that promotes entrepreneurship.
Ben Plummer is a veteran thought leader of the technology industry. At current company Datawatch, he is responsible for worldwide marketing and communications strategies as well as business and partner development. Plummer joined Datawatch from IBM Cognos, where he was vice president of data discovery and analytics. During his 13 years at Cognos, he held a number of executive positions, including vice president of customer operations, vice president of North American marketing, vice president of global marketing and vice president of global partner operations.
Ray Major is currently the chief strategist of Halo Business Intelligence. A data scientist, economist and statistician by training, he’s a life-long practitioner of the mysterious arts of data intelligence and analytics. Prior to taking the reins at Halo Business Intelligence as CEO, Ray was a senior vice president at the Nielsen Company, where he oversaw the integration of Claritas, a data collection and analytics firm, into the global Nielsen brand.