Skip to content

Commit

Permalink
docs: start databricks page
Browse files Browse the repository at this point in the history
Part of #4724
  • Loading branch information
efiop committed Sep 5, 2023
1 parent 9d9f229 commit c9deb47
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 0 deletions.
4 changes: 4 additions & 0 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,10 @@
{
"label": "Amazon SageMaker",
"slug": "sagemaker"
},
{
"label": "Databricks",
"slug": "databricks"
}
]
},
Expand Down
59 changes: 59 additions & 0 deletions content/docs/user-guide/integrations/databricks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Databricks

As of September 2023 Databricks doesn't expose the underlying GIT repo in your
project, so GIT-related DVC functionality within the repo provided by Databricks
is not supported (e.g. [experiments], `--rev/--all-commits/--all-tags/etc`). But
everything will operate as normal if you `git clone` a project yourself or use
remote projects with DVC directly.

## Install

```bash
%pip install dvc
```

## DVC API

You can use your existing DVC projects through [Python API] as normal, for example:

```python
import dvc.api

with dvc.api.open(
'get-started/data.xml',
repo='https://github.com/iterative/dataset-registry',
) as fobj:
...
```

### Secrets

If you need to use secrets to access your data, first add them to databricks
secrets https://docs.databricks.com/en/security/secrets/index.html and then use
them with DVC, for example:

```python
import dvc.api

remote_config = {
'access_key_id': dbutils.secrets.get(scope='test_scope', key='aws_access_key_id'),
'secret_access_key': dbutils.secrets.get(scope='test_scope', key='aws_secret_access_key'),
}

with dvc.api.open(
'recent-grads.csv',
repo='https://github.com/efiop/mydataregistry',
remote_config=remote_config
) as fobj:
...
```

## Running DVC commands

Databricks doesn't provide a classic terminal, so you'll need to use [magic
commands] to run it, e.g. `!dvc add data`.

[experiments]: /doc/start/experiments
[Python API]: /doc/api-reference
[magic commands]:
https://ipython.readthedocs.io/en/stable/interactive/magics.html

0 comments on commit c9deb47

Please sign in to comment.