Breaking the Silos: Integrating Databricks and Snowflake with Delta Direct

May 14 2025

Articles

In today’s data landscape, organisations often find themselves operating across multiple platforms to meet diverse analytical needs. For many enterprises, this means leveraging both Databricks for data engineering and machine learning alongside Snowflake for enterprise analytics. However, this multi-platform approach can create data silos, leading to redundant data storage, complex ETL pipelines, and increased costs.

Fortunately, recent advancements have made it easier than ever to bridge these platforms. In this article, we’ll explore the options for reading Databricks Unity Catalog tables in Snowflake, with special focus on the powerful new Delta Direct feature that’s transforming cross-platform data access.

The Multi-Platform Data Challenge

Before diving into solutions, let’s understand the challenge. Many organisations adopt Databricks for its powerful data processing capabilities and ML toolkit, storing data in Delta Lake format. Simultaneously, business analysts and data consumers prefer Snowflake’s ease of use and performance for analytics. This creates a fundamental question: How do we provide seamless access to the same data across both platforms without duplication?

Traditionally, organisations dealt with this by:

Duplicating data through ETL processes (expensive and creates latency)
Using standard external tables (performance issues at scale)
Converting to different formats (adds complexity)

None of these approaches is ideal, especially for utility companies dealing with massive volumes of operational data that need to be accessible for both real-time operations and business intelligence.

Three Approaches to Integration

There are three main approaches to accessing Databricks data from Snowflake:

1. Delta Direct

Delta Direct is Snowflake’s newest and most streamlined approach for reading Delta Lake tables. It enables direct querying of Delta tables without format conversion and with significantly better performance than traditional external tables.

Key advantages:

Simplified architecture with direct access to Delta tables
Improved query performance compared to external tables
No need for data conversion
Cost efficiency through reduced data movement
Near real-time analytics capabilities

Limitations:

Read-only integration (no write capability from Snowflake)
Requires scheduling metadata refreshes
Currently doesn’t support deletion vectors

2. Apache Iceberg with UniForm

This approach leverages Databricks’ Universal Format (UniForm) feature to generate Iceberg metadata alongside Delta metadata, allowing Snowflake to read Delta tables through its Iceberg integration.

Key advantages:

One copy of the data accessible by both platforms
Performance comparable to native Snowflake tables
No data duplication required

Limitations:

Asynchronous metadata generation can cause inconsistencies
Limited support for certain data types
Advanced Delta features are not fully available
No write operations from Snowflake

3. Native Apache Iceberg

This approach involves using Apache Iceberg directly within Databricks and connecting to these tables from Snowflake using Iceberg catalog integration.

Key advantages:

Complete snapshot capture for transaction history
Robust time-travel capabilities
Support for all Iceberg data types
Write operations supported from Iceberg-compatible tools

Limitations:

‘MERGE INTO’ transactions not supported in Databricks’ Spark
Requires additional configuration in Databricks
No access to Delta-specific features
More complex metadata management

Implementing Delta Direct with Azure Databricks

Let’s walk through a practical implementation of Delta Direct with Azure Databricks.

Step 1: Create an External Volume in Snowflake

First, create an external volume pointing to the storage location where your Delta tables reside:

— Switch to ACCOUNTADMIN role

USE ROLE ACCOUNTADMIN;

— Create external volume pointing to Azure storage

CREATE OR REPLACE EXTERNAL VOLUME DELTA_VOLUME_AZURE

STORAGE_LOCATIONS = (

(

NAME = ‘delta-azure-eastus’

STORAGE_PROVIDER = ‘AZURE’

STORAGE_BASE_URL = ‘abfss://<container>@<storage-account-name>.dfs.core.windows.net/delta’

AZURE_TENANT_ID = ‘<azure-tenant-id>’

STORAGE_AZURE_MANAGED_IDENTITY = ‘<azure-managed-identity>’

)

ALLOW_WRITES = TRUE;

— Grant usage permissions

GRANT USAGE ON EXTERNAL VOLUME DELTA_VOLUME_AZURE TO ROLE SYSADMIN;

After creating the volume, you’ll need to configure authentication between Snowflake and Azure:

Note the consent URL from the external volume creation output
Visit the URL in a browser, log in with an Azure administrator account
Grant the requested permissions

Step 2: Create a Delta Catalog Integration

Next, create a catalog integration to enable reading Delta table metadata:

USE ROLE ACCOUNTADMIN;

— Create catalog integration for Delta tables

CREATE OR REPLACE CATALOG INTEGRATION DELTA_CATALOG_INTEGRATION

CATALOG_SOURCE = OBJECT_STORE

TABLE_FORMAT = DELTA

ENABLED = TRUE;

— Grant usage permissions

GRANT USAGE ON INTEGRATION DELTA_CATALOG_INTEGRATION TO ROLE SYSADMIN;

Step 3: Create a Snowflake Table Pointing to the Delta Table

Now, create a Snowflake table that points to your Delta table:

USE ROLE SYSADMIN;

USE DATABASE ENERGY_DATA;

USE SCHEMA SMART_METER;

— Create table with volume and catalog integration

CREATE OR REPLACE ICEBERG TABLE meter_readings

CATALOG = DELTA_CATALOG_INTEGRATION

EXTERNAL_VOLUME = DELTA_VOLUME_AZURE

BASE_LOCATION = ‘meter_readings’;

The BASE_LOCATION should be the relative path under the STORAGE_BASE_URL specified in the external volume.

Step 4: Set Up Automatic Refresh for Delta Table Metadata

To ensure access to the latest data, set up a task to periodically refresh the metadata:

— Create refresh task

CREATE OR REPLACE TASK refresh_meter_readings

USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = ‘XSMALL’

SCHEDULE = ‘USING CRON */15 * * * * America/Los_Angeles’ — Every 15 minutes

BEGIN

ALTER ICEBERG TABLE meter_readings REFRESH;

END;

— Activate the task

ALTER TASK refresh_meter_readings RESUME;

Step 5: Query the Delta Table from Snowflake

Now you can query the Delta table directly from Snowflake:

— Example query for meter readings analysis

SELECT

reading_date,

meter_id,

AVG(consumption_kwh) as avg_consumption,

MAX(consumption_kwh) as peak_consumption

FROM meter_readings

WHERE reading_date >= DATEADD(day, -7, CURRENT_DATE())

GROUP BY reading_date, meter_id

ORDER BY reading_date, avg_consumption DESC;

Best Practices Implementations

Based on our experience implementing cross-platform data access, we recommend the following best practices:

Partition your Delta tables strategically: Partitioning by date, region, or other high-cardinality fields that match common query patterns will improve performance significantly.
Optimise refresh schedules: Align metadata refresh schedules with your data update frequency. For near real-time data, consider more frequent refreshes (5-15 minutes), while daily batch updates may need only hourly refreshes.
Implement proper RBAC: Ensure your Snowflake roles have appropriate access to the external volumes and catalog integrations, following the principle of least privilege.
Monitor query performance: Regularly analyse query performance to identify optimisation opportunities. Snowflake’s query history can help identify access patterns that might benefit from different partitioning or indexing strategies in Databricks.
Create optimised views: Consider creating materialised views or regular views in Snowflake that optimise common query patterns against your Delta tables.

Conclusion: Breaking Down Data Silos

The ability to directly query Delta tables from Snowflake represents a significant advancement in breaking down data silos between Databricks and Snowflake environments. For data-intensive companies, Delta Direct offers a compelling solution that provides:

Seamless access to operational data across platforms
Improved performance for analytics workloads
Reduced storage costs by eliminating data duplication
Simplified data architecture

As data environments continue to evolve, the bridges between platforms will become increasingly important. By implementing Delta Direct today, organisations can establish a foundation for a more integrated, efficient, and cost-effective data ecosystem.

Whether you’re just starting your multi-platform journey or looking to optimise an existing implementation, the approaches outlined in this article provide practical solutions for breaking down the barriers between your Databricks and Snowflake environments.

If you have any questions or need assistance solving your Data Challenges, contact our friendly team for a no-obligation discussion.

Previous Post Next Post