In today’s data landscape, organisations often find themselves operating across multiple platforms to meet diverse analytical needs. For many enterprises, this means leveraging both Databricks for data engineering and machine learning alongside Snowflake for enterprise analytics. However, this multi-platform approach can create data silos, leading to redundant data storage, complex ETL pipelines, and increased costs.
Fortunately, recent advancements have made it easier than ever to bridge these platforms. In this article, we’ll explore the options for reading Databricks Unity Catalog tables in Snowflake, with special focus on the powerful new Delta Direct feature that’s transforming cross-platform data access.
The Multi-Platform Data Challenge
Before diving into solutions, let’s understand the challenge. Many organisations adopt Databricks for its powerful data processing capabilities and ML toolkit, storing data in Delta Lake format. Simultaneously, business analysts and data consumers prefer Snowflake’s ease of use and performance for analytics. This creates a fundamental question: How do we provide seamless access to the same data across both platforms without duplication?
Traditionally, organisations dealt with this by:
- Duplicating data through ETL processes (expensive and creates latency)
- Using standard external tables (performance issues at scale)
- Converting to different formats (adds complexity)
None of these approaches is ideal, especially for utility companies dealing with massive volumes of operational data that need to be accessible for both real-time operations and business intelligence.
Three Approaches to Integration
There are three main approaches to accessing Databricks data from Snowflake:
1. Delta Direct
Delta Direct is Snowflake’s newest and most streamlined approach for reading Delta Lake tables. It enables direct querying of Delta tables without format conversion and with significantly better performance than traditional external tables.
Key advantages:
- Simplified architecture with direct access to Delta tables
- Improved query performance compared to external tables
- No need for data conversion
- Cost efficiency through reduced data movement
- Near real-time analytics capabilities
Limitations:
- Read-only integration (no write capability from Snowflake)
- Requires scheduling metadata refreshes
- Currently doesn’t support deletion vectors
2. Apache Iceberg with UniForm
This approach leverages Databricks’ Universal Format (UniForm) feature to generate Iceberg metadata alongside Delta metadata, allowing Snowflake to read Delta tables through its Iceberg integration.
Key advantages:
- One copy of the data accessible by both platforms
- Performance comparable to native Snowflake tables
- No data duplication required
Limitations:
- Asynchronous metadata generation can cause inconsistencies
- Limited support for certain data types
- Advanced Delta features are not fully available
- No write operations from Snowflake
3. Native Apache Iceberg
This approach involves using Apache Iceberg directly within Databricks and connecting to these tables from Snowflake using Iceberg catalog integration.
Key advantages:
- Complete snapshot capture for transaction history
- Robust time-travel capabilities
- Support for all Iceberg data types
- Write operations supported from Iceberg-compatible tools
Limitations:
- ‘MERGE INTO’ transactions not supported in Databricks’ Spark
- Requires additional configuration in Databricks
- No access to Delta-specific features
- More complex metadata management
Implementing Delta Direct with Azure Databricks
Let’s walk through a practical implementation of Delta Direct with Azure Databricks.
Step 1: Create an External Volume in Snowflake
First, create an external volume pointing to the storage location where your Delta tables reside:
— Switch to ACCOUNTADMIN role
USE ROLE ACCOUNTADMIN;
— Create external volume pointing to Azure storage
CREATE OR REPLACE EXTERNAL VOLUME DELTA_VOLUME_AZURE
STORAGE_LOCATIONS = (
(
NAME = ‘delta-azure-eastus’
STORAGE_PROVIDER = ‘AZURE’
STORAGE_BASE_URL = ‘abfss://<container>@<storage-account-name>.dfs.core.windows.net/delta’
AZURE_TENANT_ID = ‘<azure-tenant-id>’
STORAGE_AZURE_MANAGED_IDENTITY = ‘<azure-managed-identity>’
)
)
ALLOW_WRITES = TRUE;
— Grant usage permissions
GRANT USAGE ON EXTERNAL VOLUME DELTA_VOLUME_AZURE TO ROLE SYSADMIN;
After creating the volume, you’ll need to configure authentication between Snowflake and Azure:
- Note the consent URL from the external volume creation output
- Visit the URL in a browser, log in with an Azure administrator account
- Grant the requested permissions
Step 2: Create a Delta Catalog Integration
Next, create a catalog integration to enable reading Delta table metadata:
USE ROLE ACCOUNTADMIN;
— Create catalog integration for Delta tables
CREATE OR REPLACE CATALOG INTEGRATION DELTA_CATALOG_INTEGRATION
CATALOG_SOURCE = OBJECT_STORE
TABLE_FORMAT = DELTA
ENABLED = TRUE;
— Grant usage permissions
GRANT USAGE ON INTEGRATION DELTA_CATALOG_INTEGRATION TO ROLE SYSADMIN;
Step 3: Create a Snowflake Table Pointing to the Delta Table
Now, create a Snowflake table that points to your Delta table:
USE ROLE SYSADMIN;
USE DATABASE ENERGY_DATA;
USE SCHEMA SMART_METER;
— Create table with volume and catalog integration
CREATE OR REPLACE ICEBERG TABLE meter_readings
CATALOG = DELTA_CATALOG_INTEGRATION
EXTERNAL_VOLUME = DELTA_VOLUME_AZURE
BASE_LOCATION = ‘meter_readings’;
The BASE_LOCATION should be the relative path under the STORAGE_BASE_URL specified in the external volume.
Step 4: Set Up Automatic Refresh for Delta Table Metadata
To ensure access to the latest data, set up a task to periodically refresh the metadata:
— Create refresh task
CREATE OR REPLACE TASK refresh_meter_readings
USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = ‘XSMALL’
SCHEDULE = ‘USING CRON */15 * * * * America/Los_Angeles’ — Every 15 minutes
AS
BEGIN
ALTER ICEBERG TABLE meter_readings REFRESH;
END;
— Activate the task
ALTER TASK refresh_meter_readings RESUME;
Step 5: Query the Delta Table from Snowflake
Now you can query the Delta table directly from Snowflake:
— Example query for meter readings analysis
SELECT
reading_date,
meter_id,
AVG(consumption_kwh) as avg_consumption,
MAX(consumption_kwh) as peak_consumption
FROM meter_readings
WHERE reading_date >= DATEADD(day, -7, CURRENT_DATE())
GROUP BY reading_date, meter_id
ORDER BY reading_date, avg_consumption DESC;
Best Practices Implementations
Based on our experience implementing cross-platform data access, we recommend the following best practices:
- Partition your Delta tables strategically: Partitioning by date, region, or other high-cardinality fields that match common query patterns will improve performance significantly.
- Optimise refresh schedules: Align metadata refresh schedules with your data update frequency. For near real-time data, consider more frequent refreshes (5-15 minutes), while daily batch updates may need only hourly refreshes.
- Implement proper RBAC: Ensure your Snowflake roles have appropriate access to the external volumes and catalog integrations, following the principle of least privilege.
- Monitor query performance: Regularly analyse query performance to identify optimisation opportunities. Snowflake’s query history can help identify access patterns that might benefit from different partitioning or indexing strategies in Databricks.
- Create optimised views: Consider creating materialised views or regular views in Snowflake that optimise common query patterns against your Delta tables.
Conclusion: Breaking Down Data Silos
The ability to directly query Delta tables from Snowflake represents a significant advancement in breaking down data silos between Databricks and Snowflake environments. For data-intensive companies, Delta Direct offers a compelling solution that provides:
- Seamless access to operational data across platforms
- Improved performance for analytics workloads
- Reduced storage costs by eliminating data duplication
- Simplified data architecture
As data environments continue to evolve, the bridges between platforms will become increasingly important. By implementing Delta Direct today, organisations can establish a foundation for a more integrated, efficient, and cost-effective data ecosystem.
Whether you’re just starting your multi-platform journey or looking to optimise an existing implementation, the approaches outlined in this article provide practical solutions for breaking down the barriers between your Databricks and Snowflake environments.
If you have any questions or need assistance solving your Data Challenges, contact our friendly team for a no-obligation discussion.