puremoe ships three MeSH reference tables: a thesaurus
of descriptors and entry terms, a tree of hierarchical classifications,
and a bundled table of MeSH annotation counts per descriptor across
PubMed.
data_mesh_thesaurus() downloads and combines the MeSH
Descriptor Thesaurus and Supplementary Concept Records (SCR). One row
per term, including synonyms and entry terms for each descriptor.
thesaurus <- puremoe::data_mesh_thesaurus()data_mesh_trees() provides the hierarchical
classification structure. Each descriptor can appear in multiple
branches; tree_location encodes the full path (e.g.,
I01.880.604 = Social Sciences > Political Science >
Political Systems).
trees <- puremoe::data_mesh_trees()data_mesh_frequencies is a bundled dataset giving the
annotation frequency of each MeSH descriptor across the full PubMed
corpus (39.7 M PMIDs, April 2026). Counts reflect the number of records
indexed with each descriptor by NLM curators, not text frequency, making
them suitable as a baseline for enrichment analyses against arbitrary
PubMed subsets.
Both datasets are ~10 MB and fetched from GitHub on each call by
default. To avoid re-downloading every session, set
use_persistent_storage = TRUE — the files are cached to a
system data directory and reused on subsequent calls.
thesaurus <- puremoe::data_mesh_thesaurus(use_persistent_storage = TRUE)
trees <- puremoe::data_mesh_trees(use_persistent_storage = TRUE)