5 Quick Fixes For Common Database Reporting SlowdownsApril 9, 2014 by Abe Selig
Effective database management is a central component of any successful business intelligence (BI) system. Your database stores crucial (and often large) volumes of data that your BI software utilizes for reporting, and accessing it without delay is key to running an efficient operation.
But what happens when your database reporting system begins to slow down? Data takes longer to pull and analyze, reports may time out and tables may lock—all of which can compromise the efficiency of your system and considerably slow down your business.
To prevent this from happening, we’ve identified five quick fixes you can use to greatly speed up your database reporting and help your BI operations run faster and more smoothly.
Purge Infrequently Accessed Data
Chuck Ezell, a SQL tuner with Datavail Corporation, which provides database management services, explains that purging nonessential data from your system can be a good way to boost reporting performance.
“It’s one of the first things I look for,” Ezell says. “What’s your data retention policy, and do you need ten years worth of data in your tables?”
Ezell explains that data retention policies will vary by company. For example, a large, publicly traded company will likely have a firm policy in place that limits what type and how much data they can get rid of, while a smaller, private organization might not have one at all.
“Publicly traded companies are definitely going to have a five, 10, maybe even a 15-year retention policy, particularly for data such as accounting records that they may need to show SOX [Sarbanes-Oxley Act] compliance,” he says, referring to the federal law that requires upper management of a publicly-held company to individually certify financial information.
But beyond data needed for daily, quarterly or annual reporting, there may be a large amount of information in your database that can be purged. Purging is just what it sounds like—completely erasing data from your system. Unlike a simple delete function, purging renders the information completely unsalvageable once it’s been purged.
One example of purgeable data Ezell recommends looking for is machine data, or data that was created without human interaction (e.g. web server logs or call detail records). After that, it’s all about figuring out what data you need to access most frequently, then clearing out the rest.
“Most of the time you want to keep data from the past 24 months easily accessible,” Ezell says. ”But beyond that, you can purge the data that you know you won’t need again, or archive data that you may need to access infrequently on a separate server.”
One way to purge data involves using simple utilities within your database management system. According to Ezell, “Most database management systems will have a ‘purge out from table’ command, or a ‘purge from this date to this date’ type of command.”
While purging itself is rather straightforward, Ezell says it’s important to make sure that the tables being affected by the purge are still operational after the unwanted data is gone.
“Because a lot of the tables in most relational database systems are going to be connected, you should make sure the tables you’re purging don’t have relational data connected to other tables that you’ll still need to use,” he says.
When done right, Ezell says purging can be an extremely effective way to speed up your database reporting.
“A recent client of mine was experiencing reporting slowdowns and needed to purge data and reorganize his tables,” Ezell says. “After we did that—and it took 36 hours of work, as his tables were huge and took large amounts of time to reorganize—he benefited from a 20-30 percent speed increase, simply from purging old data out of his system.”
Add Indices to Improve Data Retrieval Times
Another fix for database slowdowns is remedying poor or non-existent indices, which can add a significant boost to data retrieval times. When data sets are relatively small, inefficient queries can go unnoticed as they may only be adding fractions of a second. But without indices, full table scans are required to retrieve data. This means that as data continues to grow, queries become slower, and the seconds can really start to add up.
“An index gives your database a hint as to where data is, and lets you bypass unneeded data to get right where you need to be,” explains Tim Henrich, owner of Turret.IO, an email marketing platform for web developers.
“Maybe you have a database that lists thousands of cities worldwide—it’s huge. If you don’t index that type of location data properly, trying to pull it back from a database could grind your system to a halt.”
In this case, adding indices to help skip all of the cities that aren’t associated with a particular query can help you access the data you need more quickly. “Often times, you can see a 50 percent speed increase just by adding proper indices when you didn’t have them in there before,” Henrich says.
To further explain this example, Henrich points to a theoretical database with 3.25 million rows for cities across the globe.
“If you removed the index you had for ‘city name’ and then issued a query to find the latitude and longitude of the town ‘Shoemakersville,’ you might get the following result: ‘one row in set, 5.08 seconds,’” he says. “This means it took the database 5 seconds to find the town of ‘Shoemakersville’ within 3.25 million records.”
But, Henrich says, if you create an index for city name (essentially allowing the database to skip the rows leading up to the first occurrence of “Shoemakersville,” rather than all city information) and perform the same query, you might instead get return of, “one row in set, 0.00 seconds.”
“In this example, it took less than one hundredth of a second using an index on the same query,” he explains.
Henrich notes that the indices you use will vary depending on the type of data being stored and queried. He warns not to overuse them—the correct application of indices is the key to optimizing results.
“A novice mistake would be to index every column in a table just in case the data is used to filter a query,” he says. “The problem with doing this is two-fold. Each time a row is updated or a new row is added, the indices must be updated, too. Plus indices consume memory, which is a closely guarded asset when working with large amounts of data.”
Replicate Data Across Multiple Platforms
Replicating data across multiple databases can also be a good fix for reporting slowdowns, especially for users who run many simultaneous reads and writes to their system as part of daily operations.
While replicating data will require at least a second database (sometimes referred to as a “slave”), it can also remedy common causes of slowdowns—such as query timeouts—that may plague overworked tables by distributing an otherwise untenable workload across multiple databases.
The speed factor here is gained by freeing up each database to perform certain workload tasks, instead of having one database perform them all.
“[Replication] allows your application, for example, to spread its ‘read operations,’ or data retrieval, to different servers,” Henrich explains.
“When locks are created on an entire table, it forces other queries to wait until that query has finished before proceeding. So spreading these potentially-locking queries across several read-only databases can result in improved overall scalability because a single query isn’t blocking the application’s one-and-only source of data.”
When using the MyISAM storage engine, for example (MySQL’s default storage engine in older versions), you might want to perform a long running “read operation” while an update on the same table is attempted simultaneously.
Without replication, “the long running read operation will block the update from occurring until it has finished, and any further read operations that occur during this time would be forced to wait for both the initial read operation and update to occur before executing,” Henrich explains. As a result, the single database would likely grind to a halt.
However, if replication is used so those read operations take place on read-only slave databases, while the update operation takes place on a master write-only database, “that contention can be reduced,” Henrich says.
Partition Frequently Accessed Data
Some businesses will have no choice but to retain decades worth of data and keep it easily accessible, which means that indexing will only go so far, and replicating, purging or even archiving it elsewhere might not be an option. In cases like these, using a method called partitioning for portions of data needed for daily operations can go a long way in speeding up a bogged down system.
A vertical partition is often used to reduce the width of a table by splitting it vertically, so that only certain columns remain included in a particular dataset. This method is ideal for databases with many rows, where it makes sense to partition frequently and infrequently accessed data.
“Let’s say you have a table with 30 columns,” Ezell explains. “Maybe there’s a necessity for having all that data, but often, when developers are writing their SQL query to access that data, they’re actually only accessing 10-15 columns at a time.”
In this case, Ezell advises taking those 10-15 columns and grouping them with a “vertical partition,” so that while you still have a physical table of 30 columns, it will respond programmatically as though you have a table of 15 when you write your code because the partition is telling your reporting system to only query 15 at a time
A horizontal partition, on the other hand, divides rows in a table so that particular row-based datasets are created that can then be addressed individually or collectively. An example of this would be dividing a table that contains twenty years of historical data into twenty partitions, with each one containing one year’s worth of data.
“The idea with either partition is that it improves your performance because you’re only dealing with so many rows and so many columns at one time,” Ezell says.
“You’re getting your system to perform better for daily or weekly operations, but you still have that other data on hand for when you need to do year-end reporting or even reporting on multiple years.”
Use Sharding to Reduce Bulky Sets of Data
While partitioning and replication are more general terms for splitting up data, sharding is a more specific type of horizontal partitioning. It breaks up data structures that would normally reside on a single server and places them on multiple servers to reduce the amount of data that each node must handle.
This essentially takes the partition or replication a step further, and slices up the partitioned data into even smaller “shards.”
For example, let’s say you have a “user” table that stores information such as a person’s name, email and phone number. Using a simple replication method, you might divide your data across three servers, one master and two slaves, each of which has 6 million users.
With this method, Henrich says, each database has less data, which allows it to perform faster queries, and “it gives you the ability to have a single node fail without affecting the entire application.”
A challenge with this setup, however, is that your system must be intelligent enough to know which node to use at which time. For example, in the sharding method above, the users are split into two shards, which means some type of “lookup” is required to determine where a particular user is stored.
“How this is handled depends on the particular implementation,” Henrich says. “One solution is to create a non-sharded lookup table containing every user and their particular shard, but that of course has its own scaling concerns.”
A better approach, Henrich points out, would involve generating a “shard key” that uses an algorithm to determine which shard a particular user is on instead of resorting to a lookup table. “This creates another layer of complexity and additional queries,” Henrich says, “but it can offer greater scalability as well.”
What quick fixes have you used to successfully speed up your database reporting? Let us know by leaving a comment in the section below.