At Databricks, we all know that knowledge is one in all your Most worthy belongings. Our product and safety groups work collectively to ship an enterprise-grade Data Intelligence Platform that allows you to defend in opposition to safety dangers and meet your compliance obligations. Over the previous yr, we’re proud to have delivered new capabilities and assets comparable to securing knowledge entry with Azure Private Link for Databricks SQL Serverless, retaining knowledge non-public with Azure firewall support for Workspace storage, defending knowledge in-use with Azure confidential computing, attaining FedRAMP High Agency ATO on AWS GovCloud, publishing the Databricks AI Security Framework, and sharing particulars on our method to Responsible AI.
In keeping with the 2024 Verizon Data Breach Investigations Report, the variety of knowledge breaches has elevated by 30% since final yr. We imagine it’s essential so that you can perceive and appropriately make the most of our security measures and undertake beneficial security best practices to mitigate knowledge breach dangers successfully.
On this weblog, we’ll clarify how one can leverage a few of our platform’s high controls and lately launched security measures to ascertain a sturdy defense-in-depth posture that protects your knowledge and AI belongings. We will even present an outline of our safety greatest practices assets so that you can stand up and operating rapidly.
Shield your knowledge and AI workloads throughout the Databricks Knowledge Intelligence Platform
The Databricks Platform gives safety guardrails to defend in opposition to account takeover and knowledge exfiltration dangers at every entry level. Within the beneath picture, we define a typical lakehouse structure on Databricks with 3 surfaces to safe:
- Your shoppers, customers and purposes, connecting to Databricks
- Your workloads connecting to Databricks providers (APIs)
- Your knowledge being accessed out of your Databricks workloads
Let’s now stroll by at a excessive stage a few of the high controls—both enabled by default or obtainable so that you can activate—and new safety capabilities for every connection level. Our full record of suggestions based mostly on totally different risk fashions will be present in our security best practice guides.
Connecting customers and purposes into Databricks (1)
To guard in opposition to access-related dangers, it’s best to use a number of components for each authentication and authorization of customers and purposes into Databricks. Utilizing solely passwords is insufficient as a result of their susceptibility to theft, phishing, and weak person administration. In actual fact, as of July 10, 2024, Databricks-managed passwords reached the end-of-life and are now not supported within the UI or by way of API authentication. Past this extra default safety, we advise you to implement the beneath controls:
- Authenticate by way of single-sign-on on the account stage for all person entry (AWS, SSO is mechanically enabled on Azure/GCP)
- Leverage multi-factor authentication provided by your IDP to confirm all customers and purposes which can be accessing Databricks (AWS, Azure, GCP)
- Allow unified login for all workspaces utilizing a single account-level SSO and configure SSO Emergency entry with MFA for streamlined and safe entry administration (AWS, Databricks integrates with built-in identification suppliers on Azure/GCP)
- Use front-end non-public hyperlink on workspaces to limit entry to trusted non-public networks (AWS, Azure, GCP)
- Configure IP entry lists on workspaces and in your account to solely permit entry from trusted community places, comparable to your company community (AWS, Azure, GCP)
Connecting your workloads to Databricks providers (2)
To stop workload impersonation, Databricks authenticates workloads with a number of credentials through the lifecycle of the cluster. Our suggestions and obtainable controls rely in your deployment structure. At a excessive stage:
- For Traditional clusters that run in your community, we suggest configuring a back-end private link between the compute airplane and the management airplane. Configuring the back-end non-public hyperlink ensures that your cluster can solely be authenticated over that devoted and personal channel.
- For Serverless, Databricks mechanically gives a defense-in-depth safety posture on our platform utilizing a mixture of application-level credentials, mTLS shopper certificates and personal hyperlinks to mitigate in opposition to Workspace impersonation dangers.
Connecting from Databricks to your storage and knowledge sources (3)
To make sure that knowledge can solely be accessed by the appropriate person and workload on the appropriate Workspace, and that workloads can solely write to approved storage places, we suggest leveraging the next options:
- Utilizing Unity Catalog to control entry to knowledge: Unity Catalog gives a number of layers of safety, together with fine-grained entry controls and time-bound down-scoped credentials which can be solely accessible to trusted code by default.
- Leverage Mosaic AI Gateway: Now in Public Preview, Mosaic AI Gateway permits you to monitor and management the utilization of each exterior fashions and fashions hosted on Databricks throughout your enterprise.
- Configuring entry from approved networks: You’ll be able to configure entry insurance policies utilizing S3 bucket insurance policies on AWS, Azure storage firewall and VPC Service Controls on GCP.
- With Traditional clusters, you possibly can lock down entry to your community by way of the above-listed controls.
- With Serverless, you possibly can lock down entry to the Serverless community (AWS, Azure) or to a devoted private endpoint on Azure. On Azure, now you can allow the storage firewall for your Workspace storage (DBFS root) account.
- Sources exterior to Databricks, comparable to exterior fashions or storage accounts, will be configured with devoted and personal connectivity. Here’s a deployment guide for accessing Azure OpenAI, one in all our most requested eventualities.
- Configuring egress controls to forestall entry to unauthorized storage places: With Traditional clusters, you possibly can configure egress controls in your community. With SQL Serverless, Databricks doesn’t permit web entry from untrusted code comparable to Python UDFs. To find out how we’re enhancing egress controls as you undertake extra Serverless merchandise, please this form to affix our previews.
The diagram beneath outlines how one can configure a personal and safe surroundings for processing your knowledge as you undertake Databricks Serverless merchandise. As described above, a number of layers of safety can defend all entry to and from this surroundings.
Outline, deploy and monitor your knowledge and AI workloads with industry-leading safety greatest practices
Now that now we have outlined a set of key controls obtainable to you, you in all probability are questioning how one can rapidly operationalize them for your small business. Our Databricks Safety workforce recommends taking a “outline, deploy, and monitor” method utilizing the assets they’ve developed from their expertise working with a whole lot of shoppers.
- Outline: It is best to configure your Databricks surroundings by reviewing our greatest practices together with the dangers particular to your group. We have crafted comprehensive best practice guides for Databricks deployments on all three main clouds. These paperwork provide a guidelines of safety practices, risk fashions, and patterns distilled from our enterprise engagements.
- Deploy: Terraform templates make deploying safe Databricks workspaces simple. You’ll be able to programmatically deploy workspaces and the required cloud infrastructure utilizing the official Databricks Terraform supplier. These unified Terraform templates are preconfigured with hardened safety settings just like these utilized by our most security-conscious prospects. View our GitHub to get began on AWS, Azure, and GCP.
- Monitor: The Safety Evaluation Device (SAT) can be utilized to watch adherence to safety greatest practices in Databricks workspaces on an ongoing foundation. We lately upgraded the SAT to streamline setup and improve checks, aligning them with the Databricks AI Safety Framework (DASF) for improved protection of AI safety dangers.
Keep forward in knowledge and AI safety
The Databricks Knowledge Intelligence Platform gives an enterprise-grade defense-in-depth method for safeguarding knowledge and AI belongings. For suggestions on mitigating safety dangers, please consult with our security best practices guides in your chosen cloud(s). For a summarized guidelines of controls associated to unauthorized entry, please consult with this document.
We repeatedly improve our platform based mostly in your suggestions, evolving {industry} requirements, and rising safety threats to raised meet your wants and keep forward of potential dangers. To remain knowledgeable, bookmark our Security and Trust blog, head over to our YouTube channel, and go to the Databricks Security and Trust Center.