R/citation_snowball.R
citation_snowball.RdStarting from an icites data.table returned by
get_records(endpoint = "icites"), follows the citation links
already present in the citation_net column and returns a candidate
table. The function does not call iCite again; use
get_records(endpoint = "icites") explicitly on the returned
PMIDs if metadata is needed for the expanded corpus.
citation_snowball(
icites,
max_nodes = 2000,
direction = c("both", "citing", "cited"),
min_links = 2
)A data.table returned by
get_records(endpoint = "icites"). Must contain pmid and
citation_net columns.
Hard ceiling on the total number of PMIDs in the returned
corpus (seed + discovered). Candidates are filtered by min_links,
ranked by citation-link evidence, and then truncated to the remaining
slots after all seed PMIDs are retained. Publication year is not used for
this cap because citation_snowball() does not fetch metadata for
newly discovered PMIDs. Default 2000.
One of "both" (default), "citing", or
"cited". "cited" expands to papers referenced by the seeds;
"citing" expands to papers that cite the seeds;
"both" combines both directions.
Minimum number of seed papers a candidate must be linked
to in order to be included. Default 2. Higher values yield a
smaller, more focused expansion.
A data.table with one row per seed or candidate PMID.
Columns are pmid, seed, cited_links,
citing_links, and link_count. cited_links counts seed
papers that cite the candidate; citing_links counts seed papers
cited by the candidate.
if (FALSE) { # \dontrun{
pmids <- search_pubmed("metformin AND PCOS [TiAb]")
snowball <- pmids |>
get_records(endpoint = "icites") |>
citation_snowball(direction = "cited", min_links = 2)
snowball$pmid |> get_records(endpoint = "pubmed_abstracts")
} # }