All functions

abbreviations

Common abbreviations for NLP

dict_generations

Demo dictionary of generation-name variants for NER

dict_political

Demo dictionary of political / partisan term variants for NER

fetch_urls()

Fetch URLs from a search engine

fetch_wiki_refs()

Fetch external citation URLs from Wikipedia article(s)

fetch_wiki_urls()

Fetch Wikipedia page URLs by search query

nlp_cast_tokens()

Convert token list to data frame

nlp_index_tokens()

Build a BM25 index for ranked keyword search

nlp_roll_chunks()

Roll units into fixed-size chunks with optional context

nlp_split_paragraphs()

Split text into paragraphs

nlp_split_sentences()

Split text into sentences

nlp_tokenize_text()

Tokenize text into a clean token stream

read_urls()

Read content from URLs

search_dict()

Exact phrase / MWE matcher

search_index()

Search the BM25 index

search_regex()

Search corpus by regex

search_vector()

Semantic search by cosine similarity

util_fetch_embeddings()

Fetch embeddings from a Hugging Face inference endpoint