Count PubTator Entity Co-occurrence from Sentence Context

Counts pairs of biomedical entities that co-occur in the same sentence (window = 0) or within window sentences of each other, using the contextualized entity table returned by pubtator_context. Co-occurrence is computed within each pmid/tiab passage; title and abstract sentence IDs are not compared to one another.

pubtator_cooccurrence(x, window = 0L, by = c("type", "entity"))

Arguments

x: A PubTator context list returned by pubtator_context, or a contextualized entity data.frame with pmid, tiab, type, identifier, text, and sentence_id.
window: Non-negative integer sentence distance. 0 counts entities in the same sentence; n counts entities whose sentences are at most n apart within the same pmid/tiab passage.
by: One of "type" (default) or "entity". "type" aggregates counts by entity-type pair; "entity" aggregates by the specific (type, identifier, text) pair.

Value

A data.table. With by = "type": type_x, type_y, n (co-occurrence instances), and n_pmids (distinct documents), ordered by n. With by = "entity": the same plus identifier_x/text_x/identifier_y/ text_y.

Details

Entities are de-duplicated to one mention per sentence before pairing, and pairs of the same entity (identical type, identifier, and text) are dropped.

Examples

if (FALSE) { # \dontrun{
pmids <- search_pubmed('"biomarker"[TiAb] AND "cancer"[TiAb]')

ctx <- pmids |>
  get_records(endpoint = "pubtator") |>
  pubtator_context()

ctx |> pubtator_cooccurrence(window = 0, by = "type")
ctx |> pubtator_cooccurrence(window = 1, by = "entity")
} # }