Author: Etienne Oosthuysen
This blog provides a journey starting with Excel and its promotion to BI, it moves on to Power BI version 2 and concludes with a much larger Azure analytical story.
Let’s start this journey with some topical statistics.
- US$97bn the projected annual dollar spend on BI services by 2016. Good
- 19% the percentage annual growth rate of the Analytics market. Good
- 21% the average adoption rate of BI products. The range of 18-24% range remains unchanged since 2005. Bad
http://www.thoughtspot.com/blog/15-game-changing-bi-stats-2015
The story of Excel’s promotion into a BI tool
We all have war stories of huge amounts of effort and spend on large BI projects that often included a Data Warehouse and some Data Marts, perhaps a couple of Cubes and then some Reports. Just for the business to ignore all the hard work and use one feature the BI solution above any other, the reports’ “export button”. The business wants the data – they want to do stuff with it.
So although the general adoption rate of BI products was and remains very low, the notable exception is Excel as this is where the data mostly end up post report export for further analysis and manipulation.
The Microsoft epiphany:
Excel is pervasive in most organizations, and it is already used in a quasi-self-service BI capacity – refer to the reports’ “export button” story. So why not take Excel, and transform it into a proper “BI” tool and remove the “but it’s only Excel” stigma, towards a “hey, I did not know you could do that with Excel”, provide a clear distinction between solution Discovery and solution Production, and yet still retain the familiarity of Excel.
As a result, MS set out to develop the “Power-“ components (Power Pivot, Power Map, Power Query and Power View, with a local instance of the Analysis Services engine to support Power Pivot) and provide a deployment roadmap into SharePoint and/ or SSAS. This was a stroke of genius as it could even satisfy at least some of the skeptics over the fence in ICT.
The story was a simple yet powerful one. Your BI journey starts life in Excel, in the hands of some kind of power user with intimate domain knowledge, i.e. that guy or gal who do some pretty intense data stuff in Excel as part of their day-to-day job, think of the risk analysts, actuaries, etc. Let them use their domain knowledge and skills to evolve a solution (which includes a performant semantic model in the form of Power Pivot) in a discovery mode that keeps their business capability front of mind. Then hand their solution over the fence to ICT who enables the solution for a corporate audience, i.e. deploy it to an SSAS Tabular instance, or to SharePoint or SharePoint online for smaller models, wrap security around it, etc.
This, in turn, allows the data professionals in ICT to discover what is required in the data platform space to complement and enable the user experience further, for example, data quality, master data and shared dimensions, slowly changing history tracking, live data streams, unstructured data, etc. No longer has large data warehouse deployments been a prerequisite for delivering useful solutions and insights to the business – in fact the opposite becomes the norm, i.e. let the daily analytical tasks drive what is needed in the data platform domain.
Along comes Power BI V2
Step forward to today and Power BI for Excel has a new(ish) and much better twin in the form of Power BI version 2. But why create a new and better “twin” rather than just improve the existing guy?
As the adoption of Power BI in Excel increased, so too did the challenges. These challenges need to be resolved in order for Power BI to become a serious player in a relatively crowded market.
Power BI in Excel vs. Power BI V2
With the ability to easily access datasets, including very large ones, came a requirement for more compute power. 32-bit versions of Excel are limited to 2 GB of virtual address space. The 2 GB is shared by the entire workbook components including spreadsheets, the Power- add-ins and the subsequently created models, whereas 64-bit versions impose no such limits and the workbook size is instead only limited only by local memory and resources. The problem is that 32-bit version of Office is the more common deployment in most organizations and they are simply not reliable enough for serious data workflow.
The second limitation relates to upgrades. Excel is built on a solid and stable code platform and changing that is no easy feat. So pushing periodic updates in the Excel “Power-” space was always going to be an issue.
Power BI V2 removes these limitations: no longer is 32 vs 64-bit versions and issue, and MS can now push constant updates out to the user community with ease (the September 2015 Update (2.27.4163.351) for example included 44 additional features. It also ensures Power BI is backed up by the huge computational engines at MS’s data centers.
Does this mean Power BI in Excel no longer applies? No, not at all, but we just need to recognize that there are two flavours of the same product available, the Excel version and the Power BI V2 version and that they will likely serve two different audience profiles: The power analysists in organisations used to get their hands dirty with Excel, and the data proficient business users who prefer drag and drop environments.
We also need to recognize that the improvements of the Excel version will always lag behind the Power BI Vn version and as a result, the capabilities of the latter will most likely always trump the capabilities of the former.
The two versions will also have slightly different deployment stories: The Power Pivot part of Excel becomes the semantic layer in a deployed SSAS Tabular solution, whereas the Query and Modelling features of Power BI V2 become the semantic model out there in Azure-land.
The two versions are however very similar under the covers as shown in his table:
* Please remember that this version will be an improvement of the Excel version due to the regular updates pushed to the user community as discussed earlier.
The bigger Azure analytics story
Please note that this is not an in-depth discussion around Cortana Analytics or the IoT suites in Azure (follow up blogs will cover these topics), as it focuses on Power BI’s integration with some of the key products in these suites.
I think it’s safe to assume that all vendors or their vendor partners representing Qlik, Tableau, Oracle, etc. will be able to make a strong case for their specific technologies and some will fare better than others in terms of characteristics such as visualisations, breadth of data sources, cost, etc. so I will not delve into the advantages and/or disadvantages of Power BI V2 (hereafter simply referred to as “Power BI”) vs. its competitors. Also if you consider the manner in which Power BI receive constant pushed updates, a disadvantage today could no longer be the case tomorrow so such comparisons become largely superfluous. What does remain relevant is the bigger landscape in which Power BI operates and I think this is where it has a clear advantage over competitors.
Windows Azure
In order to understand this advantage of Power BI, we need to consider some topical concepts:
Big data, live streams of data (including IoT), predictive analytics and mobility:
This could involve ingesting high volume structured or unstructured data through the data platform which have been generated from devices (IoT) and pitting that in context of more structured data from internal and external data sources in interactive metadata models and exposed to the users through easy to use and performant self-service BI platforms.
Use case
Think of a real-world use case where a government transport agency wanted to improve traffic flow through a city and of course the public perception thereof and costs associated with it. For monitoring and analysis they would need:
- Real-time flow of vehicles to show what is happening right now – measured by sensors at road intersections.
- Social media and its sentiment as to monitor perception – for example, Tweets monitoring keywords @-mentions and #-tags.
- Road maintenance operational data captured in an on-premise operational maintenance system.
- Cost information obtained from a General Ledger system.
- Demographics by postcode obtained from the Australian Bureau of Statistics and Australian Tax Office.
- Access to historical views of in-flight data (real-time flow of vehicles and social media sentiment) and some key historical values from the on-premise systems to track changes over time.
Power BI has deep integration with relevant to services in the Azure ecosystem, so achieving visualization of real-time data flows, predictive analytics using real-time and historical views of data to predict future trends, and reporting and analysing real-time and historical data augmented with prediction scores in a self-service and performant platform and proactive alerts when thresholds are breached on a range of devices is a key selling point for Power BI.
The overall solution architecture is shown in the diagram below. Please note that the Azure products and services showed are not comprehensive, but the architecture shown will satisfy the requirements to support the use case.
Architectures for the constituent use case parts are shown thereafter. You can see what a prominent role Power BI plays in the overall architecture as well as the architectures for the constituent parts.
- Use case parts 1 & 2 involves a more conventional (recognisable) deployment of BI.
- Use case part 3 includes live data feeds (into live visualizations and as a historical view).
- Use case part 4 involves big data
- Use case part 5 closes the loop and provides predictive analytics
Use case 1 – create useful analytics across sourced data
Maintenance data is sourced directly from the Operational Maintenance system, associated costs from the General Ledger system and demographics (such as population and average incomes) from web sources such as the ATO and the ABS websites.
Data is mashed together and enhanced as to create business-friendly Semantic Models which are then used to visualize the data in reports and dashboards.
Visualisations are accessed by users on any device and alerts (such as when certain cost thresholds are breached) are managed at the user end via their own devices.
A note on Semantic Models – Power BI uses the SSAS Tabular/ velocity data engine under the covers and as it is built on Azure, it is supported by literally thousands of Azure servers so performance was always going to be good.
Use case 2 – store key data in data mart structures to allow for history tracking where source system does not.
Source systems often do not track non-transactional history (for example in this scenario the change of a maintenance contractors’ names). This is achieved by way of SCD2 tracking using SSIS and a data store hosted in SQL DB.
The Semantic Model is now extended to include SCD2 history tracked data.
Use case 3 – add live data streams to live visualizations
High volume data from the traffic sensors and social media are ingested by Event Hubs and passed to the complex event processing engine, or Stream Analytics, from where queries push the data directly into Power BI for live visualization alongside existing visualizations.
Alerts also become very useful in this context when thresholds such as traffic volumes are met.
Use case 4 – the same live data streams are passed to cost-effective storage for a historical capture of the data.
The same high volume data from the traffic sensors and social media are also pushed to cost-effective Azure Storage, instead of pushed directly into live visualizations in Power BI, as to accumulate history over time. This is now used to augment the existing visualizations to show not only what is happening right now, but also what has happened in the past.
The sheer volume of data accumulated in this way may require the help of a big data cluster, which is achieved through HDInsight.
A note on Azure Data Sources – The diagram below (taken from https://support.powerbi.com/knowledgebase/articles/568614-azure-and-power-bi ) shows some of the Azure Data Sources accessed through Power BI (not a comprehensive list). It shows both Storage and HDInsight as sources.
Notes on Big data beyond HDInsight – Big data integration with Power BI can be achieved via other means too. For example, Spark on HDInsight and indeed through the SQL Data Warehouse (which provides “massively parallel processing” power in the Azure ecosystem.
Use case 5 – predictive analytics
Use case 2 provides for operational and finance data in a data repository.
Use case 4 provides for a historical view on large volumes of unstructured data.
Other data sources which are publically available and provides historical data on weather, population demographics, macroeconomic indicators such as unemployment, etc.
This data can now be brought together and passed through Machine Learning algorithms to achieve a range of predictive outcomes. In this scenario, we use a whole range of attributes including planned maintenance, weather, employment, and social media sentiment to predict traffic flows based on forecasted weather and current employment and social media sentiment and planned maintenance. Proactive alerts can become very relevant in this scenario.
In this scenario, predictive scores are stored in the cost-effective Azure Storage accounts from where they are included in the Visualizations and the Alerts.
Notes on other routes for Predictive Analytics – Machine Learning can, of course, distribute scores to Stream Analytics and/ or web services to achieve a real-time flow of predictions (either to Power BI or to other applications), or it can be passed to SQL DB or SQL DW for inclusion into other datasets.
Conclusion
You can see how the tight integration between Power BI and the Azure Analytics ecosystem services places it front and center in the Microsoft Analytics story, whether it be through the Cortana Analytics suite, or even the IoT suite (http://www.informationweek.com/cloud/infrastructure-as-a-service/microsofts-azure-iot-suite-is-now-available/d/d-id/1322388 ).
You can also see how factors such as the deep integration with topical and relevant cloud services, the simplicity with which this integration can be achieved, the constant updates that keep the product cutting edge, and supported by a low-cost point make Power BI a very big deal.