Builds a numeric matrix of embeddings for each text unit. Row names come from by (data frame) or from names(corpus) / corpus (character vector). Use the result with search_vector for semantic search.

util_fetch_embeddings(
  corpus,
  by = NULL,
  api_token,
  api_url = "https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5"
)

Arguments

corpus

A data frame with text and by columns, or a character vector of texts. If a named character vector, names become row names; if unnamed, the strings themselves are used as row names.

by

Character vector of identifier columns; required when corpus is a data frame (row names), ignored when corpus is a character vector.

api_token

Hugging Face API token.

api_url

Inference endpoint URL (default BAAI/bge-small-en-v1.5).

Value

Numeric matrix with row names (unit ids).