Business Intelligence 101 – A Beginner’s Guide to BI SoftwareDecember 16, 2011 by Houston Neal
If the words “extract transform load” sound like a foreign language, you’re in the right place. Making sense of the business intelligence (BI) software market can be challenging for even the most technical of software buyers. It’s complex stuff.
It’s important, however, to grasp the main concepts and to understand what BI tools do before you dive headfirst into a purchase. You’ll want to be literate when discussing a BI purchase with your management team, IT staff, or vendors’ sales reps.
We’re here to help you understand basic BI concepts. In this Beginner’s Guide to Business Intelligence, we’ll introduce three foundational BI components, explain why and when you would need to use them, and share examples of vendors that offer these capabilities. The three components we’ll cover are:
A data warehouse stores… data. But its technology is much more sophisticated than its name. A data warehouse allows you to consolidate data from several sources (i.e. other software systems) and then perform queries and analyses.
Let’s say, for example, you want to determine the profitability of a new product line in Q3 of 2011. You would need to pull manufacturing cost information from your enterprise resource planning (ERP) software, distribution and transportation costs from your supply chain management system, and finally, marketing campaign costs from your marketing software system. You would then need to compare this with revenue data from your accounting system. All this data would be extracted from these operational systems and then aggregated and organized in a data warehouse. Users would then run queries from the data warehouse, or more accurately, from data marts which are considered to be the access layer.
A data warehouse is organized in a way that is optimized for complex analysis of data from multiple systems, whereas the underlying operational systems are optimized to handle a high volume of transactions specific to their function.
What’s the difference?
An operational system supports daily activities such as entering sales interactions in a sales automation app or expenses in an accounting system. You can run simple queries and analyses on an operational system’s database. However, you may impede performance and slow down other processes it was originally intended for. For example, picture a line of frustrated customers waiting for your point of sale (POS) system to process a sale while your manager is in the back office analyzing how many yellow t-shirts to order next month for your three retail stores. These activities should not be hitting the same database at the same time.
Also, that POS system’s operational database probably doesn’t have all of your accounting data; that data is in the accounting system. And data from your distribution center is probably in your supply chain management system. So if you are just analyzing the POS database, you won’t get a comprehensive view of the whole process.
Data warehouses, on the other hand, are specifically designed to run complex analysis on large volumes of historical data originating from multiple source systems. Data warehouses can be built on standard relational database management systems (RDBMS) or on a database designed specifically for data warehouse applications. Analysts may refer to this as a “data warehouse database management system," while some vendors (e.g. Netezza, Vertica and Greenplum) use "high-performance data warehouse." There are a variety of vendors, large (e.g. IBM, Oracle and Teradata) to small (e.g. illuminate and SAND technology), that offer a data warehouse DBMS.
ETL tools perform three functions to move data from one place to another:
- Extract data from sources such as ERP or CRM applications;
- Transform that data into a common format that fits with other data in the warehouse; and,
- Load the data into the data warehouse for analysis.
The ETL concept sounds easy, but the execution is complex. We’re not talking about simple copy and paste stuff here. Each step in the process has its challenges. For example, during the extract step, data may come from different source systems (e.g. Oracle, SAP, Microsoft) and different file formats such as XML, flat files with delimiters (e.g. CSV), or the worst – old legacy systems that store data in arcane formats no one else uses anymore.
The transform step may include multiple data manipulations such as splitting, translating, merging, sorting, pivoting and more. For example, a customer name might be split into first and last name, or dates might be changed to the standard ISO format (e.g. from 11-21-11 to 2011-11-21). The final step, load, involves loading the transformed data into the data warehouse. This can either be done in batch processes or row by row, more or less in real-time.
ETL tools often come bundled with databases or sold as bolt-on tools. For example, Microsoft, Oracle and IBM all offer some type of ETL capabilities with their databases. Meanwhile, third-party ETL vendors offer tools that will support a variety of disparate applications and data structures. As a final option, some BI buyers choose to build their own custom ETL tools.
We should mention that despite being a core component of data warehouse environments, ETL is not unique to data warehousing. This concept and technology has existed in some form or fashion for a long time. It can be used to move data between databases, transactional systems (e.g. ERP to CRM) and of course, data warehouses.
Online analytical processing (OLAP) is another key process and technology found in almost all data warehouse environments and business intelligence systems. But while the data warehouse and ETL tools support “back end” processes, OLAP tools support the presentation layer or “front end” processes such as querying, analysis and reporting. OLAP tools provide the impressive tables, charts and visualizations that make BI exciting!
But OLAP isn’t just about good looks; it’s got some serious smarts under the hood. OLAP allows users to perform sophisticated, multidimensional analysis. In other words, it allows you to analyze more than one dimension of data at a time. For example, you could analyze how many yellow t-shirts were sold during the month of July in Texas and compare it against other t-shirt sales during the same month.
An in-depth conversation about OLAP will almost always include some discussion of “schemas” and “cubes.” Stay with us here. A schema describes how tables are arranged in a database. Star and snowflake schemas are two common styles. They consist of “fact” tables and “dimension tables.” Here is a visual representation of a star schema:
A fact table is essentially a single table with rows and columns – think spreadsheet – that contains business data. For example, one table could include data about sales:
A dimension table contains information that describes the records in the fact table. It contains textual attributes, and those attributes may be descriptive or may provide instructions on how the fact table data should be summarized. Additionally, the information in dimension tables is independent of information in other dimension tables. For example, a product dimension table has information about products while a customer dimension table has information about customers.
Now onto cubes.
Cubes are the core components of OLAP systems. They aggregate facts from every level in a dimension provided in a schema. For example, they could take data about products, units sold and sales value, then add them up by month, by store, by month and store and all other possible combinations. They’re called cubes because the end data structure resembles a cube.
OLAP systems are able to provide fast responses to queries because of this cube data structure. They essentially already have all the answers to your queries. For example, if you ask for two years of revenue data for store “A” and for product “B,” the cube already has this information aggregated and can spit it back out in seconds.
Almost all BI vendors will offer an OLAP tool or a similar type of analytical tool. Similarly, most BI buyers will need OLAP or a similar tool for analysis and reporting. OLAP tools can run on data warehouses or on transactional databases. So, companies can purchase OLAP without having to invest in a complete data warehouse environment. Though, as mentioned earlier, there could be performance issues when taking this approach.
Business Intelligence Trends in 2011 and Beyond
An introduction to business intelligence wouldn’t be complete without some discussion of trends. Here are four you should know about:
Predictive Analytics – While traditional analytic tools allow you to glean insights from historical data, predictive analytic tools allow you to look into the future and identify potential opportunities and threats. It allows you to build predictive models using rules and “predictors.” For example, you could predict which customers are most likely to respond to a marketing campaign based on data such as their demographics (e.g. income and locale), purchase statistics (e.g. frequency of purchase) and more. Predictive analytics will actually build these models for you and identify those that best meet your objective.
Predictive analytic capabilities have existed for many years, though interest has piqued just within the last decade. Important acquisitions (e.g. IBM and SPSS) and publicized success stories – such as those discussed in Tom Davenport’s bestseller “Competing On Analytics” – are thought to be key drivers of this trend.
Software as a Service – This is a trend playing out in nearly every software market. Instead of installing software on-premise, more and more organizations are considering web-based systems that are hosted by the software vendor. This method of deployment is called Software as a Service (SaaS), and as the name suggests, the software is essentially rented as a service on a monthly or annual basis. This appeals to many buyers because upfront costs can be lower, there is little to no maintenance required, web-based systems can be deployed quicker and they can be accessed from any location with broadband Internet.
There is a burgeoning web-based BI software market with vendors such as Tableau, Jaspersoft, MicroStrategy and PivotLink leading the way. Meanwhile, veteran BI players such as Oracle, SAP and SAS are entering the market with hosted versions of their software.
Big Data – Dubbed as “the next frontier for business intelligence,” the big data trend is getting considerable attention from industry pundits, major media and business intelligence buyers. The idea is that modern data sets are becoming extremely large and unwieldy. Too large, in fact, to be managed with conventional databases and software.
Take, for example, the online retailer EBay. They have over 200 million items for sale, separated into 50,000 categories, and bought and sold by 100 million registered users. This amounts to 9 petabytes of data, according to a recent article from The New York Times. They’re not alone. Google is said to process ~24 petabytes of data per day; AT&T processes 19 petabytes through their networks each day, and; the video game World of Warcraft uses 1.3 petabytes of storage.
That’s a ton of data to store, process and manage. The analyst that can wield this data stands to make some interesting, if not profitable, discoveries. Enterprises are privy to this, and so are software vendors. So several BI vendors are developing technology to support demand for this growing market.
In-Memory Processing – A classic case of Moore’s Law, in-memory processing is gaining traction because of the reduced costs and increased power of random access memory (RAM). Instead of loading data onto disks (i.e. hard drives) in the form of tables and cubes, data is loaded into RAM.
Accessing data in memory is “literally millions of times faster than accessing data from disk,” suggests BI analyst Cindi Howson. Not to mention, it provides more flexibility. Users don’t have to pre-process data and organize it into cubes. They can perform ad hoc queries to make quick, insightful business decisions. Many vendors now offer some type of in-memory solution, including Tibco, Tableau, QlikTech, SAP and more.
This concludes our beginner’s guide to business intelligence. If you need additional help with your software research, call us for a free consultation. If there are other tools you’d like to learn about, or if you have an idea for a future report, leave us a comment below or get in touch through Google+.