Convert the token list returned by nlp_tokenize_text into a data frame (long format), with identifiers and optional spans.

nlp_cast_tokens(tok)

Arguments

tok

List with at least a tokens element (and optionally spans), e.g. output of nlp_tokenize_text(..., include_spans = TRUE).

Value

Data frame with columns for unit id, token, and optionally start/end spans.

Examples

tok <- list(
  tokens = list(
    "1.1" = c("Hello", "world", "."),
    "1.2" = c("This", "is", "an", "example", "."),
    "2.1" = c("This", "is", "a", "party", "!")
  )
)
dtm <- nlp_cast_tokens(tok)