Search corpus by regex. Specific strings/patterns; good for KWIC-style results. Returns matches with optional highlighting.
Data frame or data.table with a text column and the identifier columns specified in by.
Search pattern (regex).
Character vector of identifier columns that define the text unit (e.g. doc_id or c("url", "node_id")). Default c("doc_id").
Length-two character vector for wrapping matches (default c("<b>", "</b>")).
Data.table with id, by columns, text, start, end, pattern.