Scores the MeSH descriptors of a retrieved corpus against PubMed-wide
descriptor frequencies, identifying the terms that are over- or
under-represented relative to PubMed as a whole. This is a local transform of
the pubmed_abstracts output – it makes no API calls – and is intended
to characterise a corpus and to guide search refinement and expansion.
mesh_keyness(
records,
frequencies = NULL,
measure = c("log_odds", "g2"),
smoothing = 0.5,
min_count = 1L
)A pubmed_abstracts table from
get_records(endpoint = "pubmed_abstracts") (with its
annotations list-column), or a long data.frame already exposing
pmid and DescriptorUI (optionally DescriptorName and a
type column, in which case only type == "MeSH" rows are used).
Baseline descriptor frequencies. Defaults to the bundled
data_mesh_frequencies; must contain DescriptorUI,
n_pmids, and prop_total.
Keyness statistic: "log_odds" (default) for a
Haldane-corrected log odds ratio with standard error and z-score, or
"g2" for the signed Dunning log-likelihood ratio.
Positive continuity correction added to each cell of the
2x2 incidence table for measure = "log_odds" (default 0.5,
the Haldane-Anscombe correction).
Drop descriptors indexed in fewer than min_count
corpus PMIDs before scoring (default 1).
A data.table, one row per scored descriptor, ordered by keyness
(descending). Common columns: DescriptorUI, DescriptorName,
corpus_count, corpus_total, corpus_prop,
baseline_count, baseline_total, baseline_prop, and
direction ("over"/"under"). With
measure = "log_odds": log_odds, std_error, z.
With measure = "g2": g2.
Keyness is computed on document incidence: for each descriptor, the number of
distinct corpus PMIDs indexed with it is compared against the number of
distinct PubMed PMIDs indexed with it (data_mesh_frequencies).
if (FALSE) { # \dontrun{
pmids <- search_pubmed('"doxorubicin"[TiAb] AND "cardiotoxicity"[TiAb]')
records <- get_records(pmids, endpoint = "pubmed_abstracts")
mesh_keyness(records) # most over-represented descriptors
mesh_keyness(records, measure = "g2")
} # }