For data analyst freelance services contact me [email protected]
When data is huge, most BI tools start to fail, they get stuck or work very slow. The reason for that failure is that most of the tools work at “in memory” approach, that means that data and calculations are done on the RAM memory of the machine and with huge data the memory size is limited.
There are several methods to deal with this limitations:
Working directly with the database engine
There are BI tools that are able to work directly with the database, that means that the bi tool requests the data from the database and does not store it in the local memory of the tool. In that form of work, the calculations are done also by the database and the tool will only present the results.
This approach passes the limits of the RAM memory and the report can work very fast.
There are some disadvantages when using this approach. The speed still relies on calculations performed on the database and when large amounts of data they might still be slow. Another problem may occur is that any action that is done on the report is being calculated on the database and it uses resources of the database.
Working with aggregate tables
Instead of fetching all the data from the database and then using aggregate functions to summarize the data, you can create small aggregative tables in the database and calculate the summary of the data from these tables. That way the BI tool will handle much smaller amounts of data and work much faster.
The disadvantages of this method is that the BI tool cannot show the data at the row level, for example if we want to see the data of a specific customer you won’t be able to do so.
The article was written by Yuval Marnin.
For data analyst mentoring services click here.
For more articles in that subject:
A beginners guide for data analysis with Power BI.
The pros and cons of Power BI