PubMed Unified REtrieval for Multi-Output Exploration. An R package that provides a single interface for accessing a range of NLM/PubMed databases, including:
PubMed abstract records,
iCite bibliometric data,
PubTator3 named entity annotations, and
full-text entries from PubMed Central (PMC).
This unified interface simplifies the data retrieval process, allowing users to interact with multiple PubMed services/APIs/output formats through a single R function.
Also included are MeSH thesaurus resources as simple data frames: Descriptor Terms, Descriptor Tree Structures, and Supplementary Concept Terms, via the mesh-resources library.
The package provides a straightforward retrieval interface for PubMed literature, with utility in LLM workflows and RAG applications requiring access to abstracts, full-text articles, entity annotations, and bibliometric data.
Get the released version from CRAN:
install.packages('puremoe')Or the development version from GitHub with:
remotes::install_github("jaytimm/puremoe")The package has two basic functions: search_pubmed and get_records. The former fetches PMIDs from the PubMed API based on user search; the latter scrapes PMID records from a user-specified PubMed endpoint – pubmed_abstracts, pubmed_affiliations, pubtations, icites, or pmc_fulltext.
Search syntax is the same as that implemented in standard PubMed search.
pmids <- puremoe::search_pubmed('("political ideology"[TiAb])')
pubmed <- pmids |>
puremoe::get_records(endpoint = 'pubmed_abstracts',
cores = 3,
sleep = 1,
ncbi_key = ncbi_key)
affiliations <- pmids |>
puremoe::get_records(endpoint = 'pubmed_affiliations',
cores = 1,
sleep = 0.5)
icites <- pmids |>
puremoe::get_records(endpoint = 'icites',
cores = 3,
sleep = 0.25)
pubtations <- pmids |>
puremoe::get_records(endpoint = 'pubtations',
cores = 2)Full-text articles can be retrieved for PMIDs if available in PMC’s open-access collection. Use pmid_to_ftp() to get download URLs, then pass these to get_records(endpoint = 'pmc_fulltext') — useful for quick retrieval in LLM/chat contexts.
For bulk downloads, use data_pmc_list().
pmcs <- puremoe::pmid_to_ftp(pmids = pmids, ncbi_key = ncbi_key)
pmc_fulltext <- puremoe::get_records(pmcs[1:5]$url, endpoint = 'pmc_fulltext', cores = 1)Returns schema, columns, and rate limits for each endpoint. Useful in LLM app contexts for tool schemas.
endpoint_info() lists endpoints; endpoint_info('endpoint_name') returns details; format = 'json' for machine-readable output.
puremoe::endpoint_info()
puremoe::endpoint_info('pmc_fulltext')## $description
## [1] "Full-text articles from PubMed Central"
##
## $returns
## [1] "data.frame"
##
## $columns
## $columns$pmid
## [1] "PubMed ID (character)"
##
## $columns$section
## [1] "Section heading (character)"
##
## $columns$text
## [1] "Section text content (character)"
##
##
## $parameters
## $parameters$cores
## [1] "parallel workers"
##
##
## $input
## [1] "Requires FTP URLs from pmid_to_ftp()"
##
## $rate_limit
## [1] "NCBI FTP: be respectful"
##
## $notes
## [1] "One row per section; use after pmid_to_ftp() to get URLs. Not all PMIDs have PMC full text available."Report bugs or request features at https://github.com/username/puremoe/issues