This function attempts to extract a publication date from the HTML content of a web page using various methods such as JSON-LD, OpenGraph meta tags, standard meta tags, and common HTML elements.

extract_date(site)

Arguments

site

An HTML document (as parsed by xml2 or rvest) from which to extract the date.

Value

A data.frame with two columns: `date` and `source`, indicating the extracted date and the source from which it was extracted (e.g., JSON-LD, OpenGraph, etc.). If no date is found, returns NA for both fields.