We’re excited to announce the Public Preview of LakeFlow Connect for SQL Server, Salesforce, and Workday. These ingestion connectors allow easy and environment friendly ingestion from databases and enterprise apps—powered by incremental knowledge processing and sensible optimizations beneath the hood. LakeFlow Join can be native to the Knowledge Intelligence Platform, so it provides each serverless compute and Unity Catalog governance. In the end, this implies organizations can spend much less time transferring their knowledge and extra time getting worth from it.
Extra broadly, it is a key step in the direction of realizing the way forward for knowledge engineering on Databricks with LakeFlow: the unified resolution for ingestion, transformation and orchestration that we introduced at Knowledge + AI Summit. LakeFlow Join will work seamlessly with LakeFlow Pipelines for transformation and LakeFlow Jobs for orchestration. Collectively, these will allow clients to ship brisker and higher-quality knowledge to their companies.
Challenges in knowledge ingestion
Organizations have a variety of knowledge sources: enterprise apps, databases, message buses, cloud storage, and extra. To handle the nuances of every supply, they usually construct and keep customized ingestion pipelines, which introduces a number of challenges.
- Advanced configuration and upkeep: It’s troublesome to hook up with databases, particularly with out impacting the supply system. It’s additionally laborious to be taught and sustain with ever-changing software APIs. Due to this fact, customized pipelines require a number of effort to construct, optimize, and keep—which might, in flip, restrict efficiency and improve prices.
- Dependencies on specialised groups: Given this complexity, ingestion pipelines usually require extremely expert knowledge engineers. Which means that knowledge shoppers (e.g., HR analysts, and monetary planners) rely on specialised engineering groups, thus limiting productiveness and innovation.
- Patchwork options with restricted governance: With a patchwork of pipelines, it’s laborious to construct governance, entry management, observability, and lineage. This opens the door to safety dangers and compliance challenges, in addition to difficulties in troubleshooting any points.
LakeFlow Join: easy and environment friendly ingestion for each workforce
LakeFlow Join addresses these challenges in order that any practitioner can simply construct incremental knowledge pipelines at scale.
LakeFlow Join is straightforward to configure and keep
To start out, the connectors take as little as only a few steps to arrange. Furthermore, when you’ve arrange a connector, it’s absolutely managed by Databricks. This lowers the prices of upkeep. It additionally signifies that ingestion not requires specialised data—and that knowledge could be democratized throughout your group.
“The Salesforce connector was easy to arrange and gives the power to sync knowledge to our knowledge lake. This has saved an excessive amount of growth time and ongoing assist time making our migration sooner”
— Martin Lee, Know-how Lead Software program Engineer, Ruffer
LakeFlow Join is environment friendly
Below the hood, LakeFlow Join pipelines are constructed on Delta Stay Tables, that are designed for environment friendly incremental processing. Furthermore, lots of the connectors learn and write solely the info that’s modified within the supply system. Lastly, we leverage Arcion’s source-specific expertise to optimize every connector for efficiency and reliability whereas additionally limiting affect on the supply system.
As a result of ingestion is simply step one, we don’t cease there. You may also assemble environment friendly materialized views that incrementally remodel your knowledge as it really works its manner by way of the medallion structure. Particularly, Delta Stay Tables can course of updates to your views incrementally—solely updating the rows that want to vary somewhat than absolutely recomputing all rows. Over time, this may considerably enhance the efficiency of your transformations, which in flip makes your end-to-end ETL pipelines simply that rather more environment friendly.
“The connector enhances our potential to switch knowledge by offering a seamless and sturdy integration between Salesforce and Databricks. […] The time required to extract and put together knowledge has been diminished from roughly 3 hours to simply half-hour”
— Amber Howdle-Fitton, Knowledge and Analytics Supervisor, Kotahi
LakeFlow Join is native to the Knowledge Intelligence Platform
LakeFlow Join is absolutely built-in with the remainder of your Databricks tooling. Like the remainder of your knowledge and AI property, it is ruled by Unity Catalog, powered by Delta Stay Tables utilizing serverless compute, and orchestrated with Databricks Workflows. This permits options like unified monitoring throughout your ingestion pipelines. Furthermore, as a result of it’s all a part of the identical platform, you possibly can then use Databricks SQL, AI/BI and Mosaic AI to get essentially the most out of your knowledge.
”With Databricks’ new LakeFlow Connector for SQL Server, we are able to get rid of […] middleman merchandise between our supply database and Databricks. This implies sooner knowledge ingestion, diminished prices, and fewer effort spent configuring, sustaining, and monitoring third-party CDC options. This characteristic will tremendously profit us by streamlining our knowledge pipeline.”
— Kun Lee, Senior Director Database Administrator, CoStar
An thrilling LakeFlow roadmap
The primary wave of connectors can create SQL Server, Salesforce, and Workday pipelines through API. However this Public Preview is simply the start. Within the coming months, we plan to start Non-public Previews of connectors to further knowledge sources, resembling:
- ServiceNow
- Google Analytics 4
- SharePoint
- PostgreSQL
- SQL Server on-premises
The roadmap additionally features a deeper characteristic set for every connector. This will embrace:
- UI for connector creation
- Knowledge lineage
- SCD sort 2
- Strong schema evolution
- Knowledge sampling
Extra broadly, LakeFlow Join is simply the primary part of LakeFlow. Later this yr, we plan to preview LakeFlow Pipelines for transformation and LakeFlow Jobs for orchestration—the evolution of Delta Live Tables and Workflows, respectively. As soon as they’re obtainable, they won’t require any migration. One of the simplest ways to arrange for these new additions is to begin utilizing Delta Stay Tables and Workflows immediately.
Getting began with LakeFlow Join
SQL Server connector: Helps ingestion from Azure SQL Database and AWS RDS for SQL Server, with incremental reads that use change knowledge seize (CDC) and alter monitoring expertise. Be taught extra concerning the SQL Server Connector.
Salesforce connector: Helps ingestion from Salesforce Gross sales Cloud, permitting you to affix these CRM insights with knowledge within the Knowledge Intelligence Platform to ship further insights and more accurate predictions. Be taught extra concerning the Salesforce connector.
Workday connector: Helps ingestion from Workday Stories-as-a-Service (RaaS), permitting you to research and enrich your experiences. Be taught extra concerning the Workday connector.
“The Salesforce connector offered in LakeFlow Join has been essential for us, enabling direct connections to our Salesforce databases and eliminating the necessity for an extra paid intermediate service.”
— Amine Hadj-Youcef, Answer Architect, Engie
To get entry to the preview, contact your Databricks account workforce.
Observe that LakeFlow Join makes use of serverless compute for Delta Stay Tables. Due to this fact:
- Serverless compute should be enabled in your account (see how to take action for Azure or AWS, and see a listing of serverless-enabled areas for Azure or AWS)
- Your workspace should be enabled for Unity Catalog.
For additional steering, confer with the LakeFlow Connect documentation.