Retrieves embeddings for text data using Hugging Face's API. It can process a batch of texts or a single query. Mostly for demo purposes.

api_huggingface_embeddings(
  tif,
  text_hierarchy,
  api_token,
  api_url =
    "https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2",
  query = NULL,
  dims = 384,
  batch_size = 250,
  sleep_duration = 1,
  verbose = TRUE
)

Arguments

tif

A data frame containing text data.

text_hierarchy

A character vector indicating the columns used to create row names.

api_token

Token for accessing the Hugging Face API.

api_url

The URL of the Hugging Face API endpoint (default is all-MiniLM-L6-v2).

query

An optional single text query for which embeddings are required.

dims

The dimension of the output embeddings.

batch_size

Number of rows in each batch sent to the API.

sleep_duration

Duration in seconds to pause between processing batches.

verbose

A boolean specifying whether to include progress bar

Value

A matrix containing embeddings, with each row corresponding to a text input.

Examples

if (FALSE) { # \dontrun{
api_url <- "https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2"
 tif <- data.frame(doc_id = c('1'), text = c("Hello world."))
 embeddings <- api_huggingface_embeddings(tif,
                                          text_hierarchy = 'doc_id',
                                          api_token = api_token,
                                          api_url = api_url)
} # }