This function converts a list of tokens into a data frame, extracting and separating document and sentence identifiers if needed.

nlp_cast_tokens(tok)

Arguments

tok

A list where each element contains tokens corresponding to a document or a sentence.

Value

A data frame with columns for token name and token.

Examples

tokens <- list(c("Hello", "world", "."),
               c("This", "is", "an", "example", "." ),
               c("This", "is", "a", "party", "!"))
names(tokens) <- c('1.1', '1.2', '2.1')
dtm <- nlp_cast_tokens(tokens)