This function tokenizes a data frame based on a specified token column and groups the data by one or more specified columns.
nlp_melt_tokens(
df,
melt_col = "token",
parent_cols = c("doc_id", "sentence_id")
)
A list of vectors, each containing the tokens of a group defined by the `by` parameter.
dtm <- data.frame(doc_id = as.character(c(1, 1, 1, 1, 1, 1, 1, 1)),
sentence_id = as.character(c(1, 1, 1, 2, 2, 2, 2, 2)),
token = c("Hello", "world", ".", "This", "is", "an", "example", "."))
tokens <- nlp_melt_tokens(dtm, melt_col = 'token', parent_cols = c('doc_id', 'sentence_id'))