Author: Etienne Oosthuysen
Welcome back to Part 2 of this blog on Azure Purview. In this instalment continuing from Part 1, I will go through a detailed review and highlights of Purview.
Azure Purview – classifications and sensitivity:
- Classifications using Purview’s own 100+ prebuilt classifications, or BYO ones, are used to mark and identify data of a specific type that is found within your data estate. We could easily see where all the credit card and Australian Tax File numbers were located across the data lake.
- Sensitivity labels define how sensitive certain data is and they are applied when one or more classifications and conditions are found together. We could clearly find the data with a sensitivity of ‘secret’ – an immediate application of this could be to support the IRAP data classifications as defined by the Australian Signals Directorate and PCI in the financial sector.
When the scan completes all data meeting classification rules will be discoverable, whereas Purview’s sensitivity labelling needs a couple of hours to reflect the new assets and auto label sensitivity after which time it too will be discoverable. It is also possible to view insight reports for classifications and sensitivity labels.
Azure Purview – business glossary
We could easily employ a glossary to overlay business friendly terms over the metadata so that we converted the physical vocabulary with a standard vocabulary the business can understand. Remember data is the business’ asset more so than that of ICT, so a business vocabulary is important.
Purview has, as I previously mentioned, more than 100 system classifiers it uses during scans to automatically detect the system and it can also use your own BYO classifications and apply them to data assets and schemas. But it was easy to override these with the business glossary and anything a human override was never replaced by subsequent automated scans. We for example overrode CC# with Credit Card Number.
It is also in the glossary where data stewards and owners are set, two core elements of effective data governance.
Terms can also be subject to workflow and approvals so that it does not become a free for all.
Azure Purview – show lineage at various levels
We could see a bird’s-eye view of the data estate, including very impressive lineage at the asset, column and process, levels, as well as the full data map:
- At the data asset/ entity level, i.e., where the entity/ asset came from.
- At the column level. i.e., where the attribute came from.
- At the process level, i.e., how data was moved/ orchestrated.
Azure Purview – gain insights
It is also easy to see insights across many of these concepts – across the data assets, glossary, scans, classifications, sensitivity, and file extensions.
The images below show only some insight examples for the glossary, for classification and for sensitivity:
Conclusion – is Azure Purview a worthy data governance tool
Does Purview hit the mark? As I said before, there were, as at the date of authoring this, some kinks Microsoft needed to sort out and my correspondence with their product team suggests they are working on this. So, looking at it from a pure data cataloguing perspective, it ticks many boxes and at a very compelling price point.
But data governance is broader than just cataloguing, and even though Purview crosses the boundary into some aspects that would not normally sit within a data catalog (which is a good thing), other areas still require attention, notably master data, data quality and data security. BUT we all know this is just the first module, so watch this space!
Originally published by Etienne Oosthuysen at https://www.makingmeaning.info/post/azure-purview-does-it-fill-the-data-governance-blind-spot-for-microsoft