Convert the token list returned by nlp_tokenize_text into a data
frame (long format), with identifiers and optional spans.
Arguments
- tok
List with at least a tokens element (and optionally spans), e.g. output of nlp_tokenize_text(..., include_spans = TRUE).
Value
Data frame with columns for unit id, token, and optionally start/end spans.
Examples
tok <- list(
tokens = list(
"1.1" = c("Hello", "world", "."),
"1.2" = c("This", "is", "an", "example", "."),
"2.1" = c("This", "is", "a", "party", "!")
)
)
dtm <- nlp_cast_tokens(tok)