Introduction to rchroma

Introduction

The rchroma package provides an R interface to ChromaDB, a vector database for storing and querying embeddings. This vignette demonstrates the basic usage of the package.

Installation

You can install the development version of rchroma from GitHub:

# install.packages("remotes")
remotes::install_github("cynkra/rchroma")

Installing ChromaDB

Before using rchroma, you need to have a running ChromaDB instance. The easiest way to get started is using Docker:

docker pull chromadb/chroma
docker run -p 8000:8000 chromadb/chroma

This will start a ChromaDB server on http://localhost:8000.

For other installation methods and configuration options, please refer to the ChromaDB documentation.

Basic Usage

Connecting to ChromaDB

First, we need to establish a connection to ChromaDB:

library(rchroma)

# Connect to a local ChromaDB instance
client <- chroma_connect()

# Check the connection
heartbeat(client)
version(client)

Managing Collections

Collections are the main way to organize your data in ChromaDB:

# Create a new collection
create_collection(client, "my_collection")

# List all collections
list_collections(client)

# Get a specific collection
get_collection(client, "my_collection")

Working with Documents

Documents are the basic unit of data in ChromaDB. Each document consists of text content and its associated embedding:

# Add documents with embeddings
docs <- c(
  "apple fruit",
  "banana fruit",
  "carrot vegetable"
)
embeddings <- list(
  c(1.0, 0.0, 0.0),  # apple
  c(0.8, 0.2, 0.0),  # banana (similar to apple)
  c(0.0, 0.0, 1.0)   # carrot (different)
)

# Add documents to the collection
add_documents(
  client,
  "my_collection",
  documents = docs,
  ids = c("doc1", "doc2", "doc3"),
  embeddings = embeddings
)

# Query similar documents using embeddings
results <- query(
  client,
  "my_collection",
  query_embeddings = list(c(1.0, 0.0, 0.0)),  # should match apple best
  n_results = 2
)

Updating and Deleting

You can update or delete documents as needed:

# Update embedding separately
update_documents(
  client,
  "my_collection",
  ids = "doc1",
  embeddings = list(c(0.9, 0.1, 0.0))  # slightly different from original apple
)

# Delete documents
delete_documents(client, "my_collection", ids = "doc2")  # removes banana