puremoe provides a unified interface to PubMed and the wider NIH/NLM data stack, organized as two layers: retrieve records from public services with search_pubmed() and get_records(), then analyze the returned tables locally with no further API calls. PMIDs are the common key throughout. This vignette covers the retrieval layer end-to-end; the analysis layer is covered in the topic vignettes linked at the close.

search_pubmed() accepts standard PubMed query syntax and returns a vector of PMIDs.

pmids <- puremoe::search_pubmed('("political ideology"[TiAb])')
length(pmids)
#> [1] 963

Abstracts

abstracts <- puremoe::get_records(
  pmids,
  endpoint = "pubmed_abstracts",
  cores    = 1L,
  sleep    = 0.5
)

Abstract records

abstracts |>
  select(pmid, year, journal, articletitle) |>
  head(10) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

Annotations

The annotations column is a list of per-article data frames containing MeSH terms, chemical names, and keywords.

Annotation terms

bind_rows(abstracts$annotations) |>
  filter(!is.na(DescriptorName), nzchar(DescriptorName)) |>
  head(20) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

Affiliations

affiliations <- puremoe::get_records(
  pmids,
  endpoint = "pubmed_affiliations",
  cores    = 1L,
  sleep    = 0.5
)

Author affiliations

affiliations |>
  head(15) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

iCite metrics

iCite lags PubMed indexing; the most recent PMIDs in a search may not yet have citation metrics.

icites <- puremoe::get_records(
  pmids,
  endpoint = "icites",
  cores    = 1L,
  sleep    = 0.25
)

Indexed records

icites |>
  filter(!is.na(citation_count)) |>
  mutate(field_citation_rate = round(field_citation_rate, 3)) |>
  select(-citation_net, -cited_by_clin) |>
  head(10) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

PubTator annotations

pubtator <- puremoe::get_records(
  pmids,
  endpoint = "pubtator",
  cores    = 1L
)

Entity mentions

pubtator$entities |>
  select(-passage_text, -passage_offset) |>
  head(15) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

Full text

Full-text retrieval requires open-access PMC articles. pmid_to_ftp() resolves PMIDs to XML URLs via the PMC Cloud Service on AWS S3, filtering to only those with open-access full text available. In August 2026, NCBI will complete its migration from the legacy PMC FTP Service to the Cloud Service; puremoe already uses the new service.

ftp <- puremoe::pmid_to_ftp(pmids = head(pmids, 25L))

Open-access URLs

ftp |>
  head(5L) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

Section-level text

fulltext <- puremoe::get_records(
  head(ftp$url, 5L),
  endpoint = "pmc_fulltext",
  cores    = 1L
)

fulltext |>
  mutate(text = sapply(strsplit(text, "\\s+"), function(w) paste0(paste(head(w, 15), collapse = " "), "..."))) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))

Endpoint schemas

endpoint_info() returns column definitions, rate limits, and notes for any endpoint.

puremoe::endpoint_info()
#> [1] "pubmed_abstracts"    "pubmed_affiliations" "icites"             
#> [4] "pubtator"            "pmc_fulltext"
puremoe::endpoint_info("icites")
#> $description
#> [1] "NIH iCite citation metrics, influence scores, and citation links"
#> 
#> $source
#> [1] "NIH iCite"
#> 
#> $input
#> [1] "PMIDs"
#> 
#> $returns
#> [1] "data.table; one row per PMID returned by iCite"
#> 
#> $columns
#> $columns$pmid
#> [1] "PubMed ID; join key for other puremoe endpoints (character)"
#> 
#> $columns$citation_count
#> [1] "Total citations received"
#> 
#> $columns$relative_citation_ratio
#> [1] "Relative Citation Ratio (RCR), rounded to three decimals"
#> 
#> $columns$nih_percentile
#> [1] "Percentile rank relative to NIH-funded publications"
#> 
#> $columns$field_citation_rate
#> [1] "Expected citation rate for the article's co-citation field"
#> 
#> $columns$is_research_article
#> [1] "Flag indicating whether iCite classifies the article as research"
#> 
#> $columns$is_clinical
#> [1] "Flag indicating whether iCite classifies the article as clinical"
#> 
#> $columns$provisional
#> [1] "Flag indicating provisional RCR status for recent publications"
#> 
#> $columns$citation_net
#> [1] "List-column of directed citation edges with 'from' and 'to' PMIDs, built from iCite cited-by and reference fields. Covers PubMed-indexed articles only; citations from preprints or sources outside PubMed are not included."
#> 
#> $columns$cited_by_clin
#> [1] "Clinical citing PMIDs as returned by iCite"
#> 
#> 
#> $parameters
#> $parameters$cores
#> [1] "parallel workers"
#> 
#> $parameters$sleep
#> [1] "delay between requests, in seconds"
#> 
#> 
#> $rate_limit
#> [1] "Relatively permissive"
#> 
#> $notes
#> [1] "Title, journal, publication year, authors, and abstracts are intentionally omitted to avoid duplicating pubmed_abstracts metadata. Use citation_net with citation_snowball() or citation_network(). iCite citation links cover PubMed-indexed articles only; citations from preprints or sources outside PubMed are not included."

Next steps

The tables retrieved here feed puremoe’s local analysis layer – transforms over the same data frames, with no further API calls:

  • MeSH tables – look up descriptors, navigate the hierarchy, and use the PubMed-wide frequency baseline.
  • Citation snowballing – expand a seed corpus along icites citation links and characterize the result with MeSH keyness.
  • PubTator context and relation networks – anchor entity mentions to sentences, count co-occurrence, and build relation networks with evidence.