The {brickster}
package connects to a Databricks
workspace is two ways:
It’s recommended to use option (1) when using
{brickster}
interactively, if you need to run code via an
automated process the only option currently is (2).
{brickster}
will automatically detect when a session has
Posit
Workbench managed Databricks OAuth credentials enabled. For more
information about this authentication flow see the section Posit
Workbench Managed Databricks OAuth Credentials.
Personal Access Tokens can be generated in a few steps, for a step-by-step breakdown refer to the documentation.
Once you have a token you’ll be able to store it alongside the
workspace URL in an .Renviron
file. The
.Renviron
is used for storing the variables, such as those
which may be sensitive (e.g. credentials) and de-couple them from the
code additional
reading.
To get started add the following to your .Renviron
:
DATABRICKS_HOST
: The workspace URL
DATABRICKS_TOKEN
: Personal access token (not
required if using OAuth U2M)
DATABRICKS_WSID
: The workspace ID (docs)
DATABRICKS_WSID
is only required for the RStudio IDE
integration with the connection pane.
Example of entries in .Renviron
:
DATABRICKS_HOST=xxxxxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapi123456789012345678a9bc01234defg5
DATABRICKS_WSID=123123123123123
Note: Recommend creating an .Renviron
for each project. You can create .Renviron
within your user
home directory if required.
Restarting your R session will allow those variable to be picked up
via the {brickster}
package.
{brickster}
Authentication should now be possible without specifying the
credentials in your R code. You can load {brickster}
and
list the clusters within the workspace using
db_cluster_list()
, to access the host/token use
db_host()
/db_token()
respectively.
library(brickster)
# using db_host() and db_token() to get credentials
clusters <- db_cluster_list(host = db_host(), token = db_token())
All {brickster}
functions have their host/token
parameters default to calling
db_host()
/db_token()
therefore we can omit
explicit calls to the functions.
When using OAuth U2M authentication you don’t define a token in
.Renviron
and therefore db_token()
will return
NULL
.
There are two methods that {brickster}
supports to
simplify switching of credentials within an R project/session:
.Renviron
, each
additional set of credentials is differentiated via a suffix
(e.g. DATABRICKS_TOKEN_DEV
).databrickscfg
file (primary method in Databricks
CLI)To differentiate between (1) and (2) the option
use_databrickscfg
is used, the following example shows how
to switch the session to use .databrickscfg
.
# will use the `DEFAULT` profile in `.databrickscfg`
options(use_databrickscfg = TRUE)
# values returned should be those in profile of `.databrickscfg`
db_host()
db_token()
The default behaviour is to read credentials from
.Renviron
. If you wish to change this it’s recommended to
set the option within .Rprofile
so that it’s set during
initialization of the R session.
The db_profile
option controls which profiles
credentials are returned by
db_host()
/db_token()
/db_wsid()
.
Profiles enable you to switch contexts between:
Different workspaces (e.g. development or production)
Different permissions (e.g. admin or restricted user)
This behaviour works when using credentials specified in either
.Renviron
or .databrickscfg
:
# using .Renviron
db_host() # returns `DB_HOST` (.Renviron)
# switch profile to 'prod'
options(db_profile = "prod")
db_host() # returns `DB_HOST_PROD` (.Renviron)
# set back to default (NULL)
options(db_profile = NULL)
# use .databrickcfg
options(use_databrickscfg = TRUE)
db_host() # returns host from `DEFAULT` profile (.databrickscfg)
options(db_profile = "prod")
db_host() # returns host from `prod` profile in (.datarickscfg)
It is expected that profiles in .Renviron
will adhere to
the same naming convention as default but add an additional suffix.
Here is an example of an .Renviron
file that has three
profiles (default, dev, prod):
# default
DATABRICKS_HOST=xxxxxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID=123123123123123
# dev
DATABRICKS_HOST_DEV=xxxxxxx-dev.cloud.databricks.com
DATABRICKS_TOKEN_DEV=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID_DEV=123123123123124
# prod
DATABRICKS_HOST_PROD=xxxxxxx-prod.cloud.databricks.com
DATABRICKS_TOKEN_PROD=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID_PROD=123123123123125
.databrickscfg
For details on configuring please refer to documentation from Databricks CLI.
There is only one {brickster}
specific feature and it is
the inclusion of wsid
alongside
host
/token
.
wsid
is used by the connections pane integration in
RStudio as the underlying API’s require it.
Posit Workbench has a managed
Databricks OAuth credentials feature, which allows users to sign
into a Databricks workspace from the home page of Workbench when
launching a session and then access Databricks resources as their own
identity. When in an RStudio Pro session running on Posit Workbench with
managed Databricks OAuth credentials selected, {brickster}
functions using db_host()
/db_token()
respectively should just work without needing to specify any credentials
in your R code. See the code below as an example.
{brickster}
will automatically detect when a session has
Workbench managed OAuth credentials and then use the
workbench
profile defined in a .databrickscfg
file at the DATABRICKS_CONFIG_FILE
specified location.
Workbench generates this .databrickscfg
file in a temporary
directory and should not be modified directly.
To use an alternative .databrickscfg
file, a different
profile
, an alternative env variable
DATABRICKS_HOST
or set an env variable
DATABRICKS_TOKEN
, launch an RStudio Pro session without the
Databricks managed credentials box selected.