Starting from an icites data.table returned by get_records(endpoint = "icites"), follows the citation links already present in the citation_net column and returns a candidate table. The function does not call iCite again; use get_records(endpoint = "icites") explicitly on the returned PMIDs if metadata is needed for the expanded corpus.

citation_snowball(
  icites,
  max_nodes = 2000,
  direction = c("both", "citing", "cited"),
  min_links = 2
)

Arguments

icites

A data.table returned by get_records(endpoint = "icites"). Must contain pmid and citation_net columns.

max_nodes

Hard ceiling on the total number of PMIDs in the returned corpus (seed + discovered). Candidates are filtered by min_links, ranked by citation-link evidence, and then truncated to the remaining slots after all seed PMIDs are retained. Publication year is not used for this cap because citation_snowball() does not fetch metadata for newly discovered PMIDs. Default 2000.

direction

One of "both" (default), "citing", or "cited". "cited" expands to papers referenced by the seeds; "citing" expands to papers that cite the seeds; "both" combines both directions.

Minimum number of seed papers a candidate must be linked to in order to be included. Default 2. Higher values yield a smaller, more focused expansion.

Value

A data.table with one row per seed or candidate PMID. Columns are pmid, seed, cited_links, citing_links, and link_count. cited_links counts seed papers that cite the candidate; citing_links counts seed papers cited by the candidate.

Examples

if (FALSE) { # \dontrun{
pmids <- search_pubmed("metformin AND PCOS [TiAb]")

snowball <- pmids |>
  get_records(endpoint = "icites") |>
  citation_snowball(direction = "cited", min_links = 2)

snowball$pmid |> get_records(endpoint = "pubmed_abstracts")
} # }