Power BI Incremental Refresh for Faster Dataset Refreshes
Author: Ralph Gonzales
In today’s day and age, data is the new gold. Companies keep track of all the data they generate as it aids in understanding their strengths, weaknesses, opportunities, and efficiency. Many companies have adopted a data-driven approach in their strategies and decision-making to further grow their business. This leads to more data being captured, stored, and needing to be analysed. As data volume increases so is the demand to have them available anytime and at any place.
The date was May 8, 2018 – Microsoft released a Power BI Feature that would help deal with large datasets, reduce the time it takes to refresh data models and have up-to-date data more available. Since then, it’s gone from being a preview feature to a staple in optimising performance in data modelling.
Microsoft introduced Power BI Incremental Refresh, it’s a feature that allows you to refresh large datasets without having to reload historical data each time. Your refreshes become more reliable as you no longer need to keep long-running data connections open. It also frees up resources used to refresh your data – smaller amounts of data to refresh means less consumption of overall memory and compute capacity. If your dataset is in Power BI Premium, Premium per user, Power BI Pro and Power BI Embedded – this feature is available for you to use. Plus, it is incredibly easy to set up!
There are 4 steps to properly set up incremental refresh:
- Define your filter parameters
- Use the parameters to apply a filter
- Define the incremental refresh policy
- Publish your dataset to Power BI Service
Step 1 – Define the Parameters
You will need to set the start and end dates of the data that you want to be loaded in Power BI Desktop. Ideally, we want to set this to the most recent values of our data and limit it to a small sample amount.
We achieve this in Power Query by creating 2 Date/Time parameters; RangeStart and RangeEnd. These 2 parameters are reserved for Incremental refresh and are case sensitive.
When we publish to Power BI Service, these values will be overridden by the incremental refresh policy that we set up.
Step 2 – Use the Parameters to apply a filter
In Power Query, we then need to apply the parameters as date filters in the table that contains the large volumes. Ideally, these would be fact tables.
In my example, I have a table called Orders with a column named OrderDate. I applied the filters on this column as per the screenshot below.
Step 3 – Define the incremental refresh policy
After the parameters have been created and applied, the next step is to create the refresh policy.
This is done outside of Power Query in the report view. Right click the table that we applied the parameter filters to and select Incremental refresh.
This should open a new window where you can define the refresh policy. The window should look like the example below with a few areas that need to be filled in.
In the example above, the incremental refresh is set up for the Orders table and it archives 2 years’ worth of data. This means that any data that falls between the dates 01/01/2021 and 31/01/2023 will be loaded into the dataset once and not be refreshed anymore. Any data that falls between 01/02/2023 and 28/02/2023 will be refreshed according to the refresh frequency you’ve set for this dataset. When we enter a new month, the date ranges for archiving and data refreshes will move forward as well.
Step 4 – Publish your dataset
This is the last step where you publish your dataset in Power BI Service. Remember to set the refresh schedule for the dataset so it’s constantly up to date. The initial load of the data is quite long depending on how big the archive data was; however, the succeeding refreshes will only load a small amount and therefore will be faster.
There are a few limitations to be aware of when implementing Incremental Refresh in your dataset. These are important points to note as they can lead to issues if not considered.
- Any data that falls outside of the incremental refresh policy is not refreshed, and any data that falls outside of the archive range is not kept in the dataset. If a portion of your data needs to be updated/included and falls outside of the refresh policies, then we would need to amend the policy.
- Any changes to the Power Query transformation steps to the table on which the incremental refresh is implemented would require another initial load of the data, which as we know, can be quite lengthy.
- Once incremental refresh has been implemented and the dataset is published in Power BI Service, the option to download the dataset as a PBIX file is no longer available. In the event that you can no longer access the Power BI Desktop file, this is a major drawback of the feature as you’re not able to export the dataset as a PBIX from Power BI Service.
- Your data source must support “query folding”. In most cases, this does not apply as data sources are usually SQL databases.
- Although this is designed to handle large datasets, there is a limit of 1GB for your dataset size. This can be upscaled if you’re in Premium Capacity.
In conclusion, Power BI Incremental Refresh is an incredible feature that can help improve your dataset refreshes by only refreshing a portion of your data and leaving historical data archived and untouched. With 4 steps to consider, it is very easy to implement in your dataset and opens up the opportunity for faster, more reliable and more frequent refresh times to keep your dataset up to date.