Skip to contents

Introduction

rdocdump is an R package designed to combine an R package’s documentation and vignettes into a single plain text file. This is particularly useful when you want to ingest complete package documentation into large language models (LLMs) or for archival purposes. rdocdump works with installed packages, source directories, tar.gz archives, or package sources available on CRAN.

Installation

To install the development version of rdocdump from GitHub, run the following commands:

# install.packages("pak") # Uncomment if pak is not installed
pak::pak("e-kotov/rdocdump")

Setting the Cache Path

By default, rdocdump stores temporary files in a directory within R’s temporary directory. You can override this by setting a custom cache path using the helper function rdd_set_cache_path(). For example:

# Set a custom cache directory
cache_dir <- file.path(tempdir(), "my_rdocdump_cache")
rdd_set_cache_path(cache_dir)
rdocdump.cache_path set to: /private/var/folders/gb/t5zr5rn15sldqybrmqbyh6y80000gn/T/Rtmpp5wRxV/my_rdocdump_cache

This ensures that temporary tar.gz archives and extracted files are stored in your specified directory.

Extracting Documentation

The main function in rdocdump is rdd_to_txt(), which accepts several types of inputs for the pkg argument:

  • An installed package name (e.g., "stats").

  • A full path to a package source directory.

  • A full path to a package archive (tar.gz).

  • A package name available on CRAN (downloaded automatically if not installed).

Here is an example that downloads the source for the package rJavaEnv from CRAN, extracts its documentation, and saves it to a file:

# Extract documentation for 'rJavaEnv' and save to a text file.
rdd_to_txt(
  pkg = "rJavaEnv",
  file = "rJavaEnv_docs.txt",
  force_fetch = TRUE,    # Force download even if the package is installed.
  keep_files = "none"      # Delete temporary files after extraction.
)
Fetching package source from CRAN...
trying URL 'https://cran.rstudio.com/src/contrib/rJavaEnv_0.2.2.tar.gz'
Content type 'application/x-gzip' length 138667 bytes (135 KB)
==================================================
downloaded 135 KB

[1] "rJavaEnv_docs.txt"

If you prefer to simply get the combined documentation as a character string, call the function without the file argument:

# Extract and capture the combined documentation in a variable.
docs <- rdd_to_txt(pkg = "stats")
cat(substr(docs, 1, 1000))  # Print the first 1000 characters for a preview.
DESCRIPTION:
Package: stats
Version: 4.4.3
Priority: base
Title: The R Stats Package
Author: R Core Team and contributors worldwide
Maintainer: R Core Team <[email protected]>
Contact: R-help mailing list <[email protected]>
Description: R statistical functions.
License: Part of R 4.4.3
Imports: utils, grDevices, graphics
Suggests: MASS, Matrix, SuppDists, methods, stats4
NeedsCompilation: yes
Encoding: UTF-8
Enhances: Kendall, coin, multcomp, pcaPP, pspearman, robustbase
Built: R 4.4.3; aarch64-apple-darwin20; 2025-02-28 21:20:18 UTC; unix

--------------------------------------------------------------------------------
Function: AIC()
Akaike's An Information Criterion

Description:

     Generic function calculating Akaike's ‘An Information Criterion’
     for one or several fitted model objects for which a log-likelihood
     value can be obtained, according to the formula -2*log-likelihood
     + k*npar, where npar represents the number of parameters in the

Choosing what to dump to text

You can also choose if you want just the package documentaiton, just the vignettes, or both to be combined with the content argument:

docs <- rdd_to_txt(
  pkg = "utils",
  content = "vignettes"
)
cat(substr(docs, 1, 1000))  # Print the first 1000 characters for a preview.

As you can see below, only the vignettes were combined:

--------------------------------------------------------------------------------
Vignette: Sweave.Rnw

% File src/library/utils/vignettes/Sweave.Rnw
% Part of the R package, https://www.R-project.org
% Copyright 2002-2022 Friedrich Leisch and the R Core Team
% Distributed under GPL 2 or later

\documentclass[a4paper]{article}

%\VignetteIndexEntry{Sweave User Manual}
%\VignettePackage{utils}
%\VignetteDepends{tools, datasets, stats, graphics}

\title{Sweave User Manual}
\author{Friedrich Leisch and R Core Team}

\usepackage[round]{natbib}
\usepackage{graphicx, Rd}
\usepackage{listings}

\lstset{frame=trbl,basicstyle=\small\tt}
\usepackage{hyperref}
\usepackage{color}
\definecolor{Blue}{rgb}{0,0,0.8}
\hypersetup{%
colorlinks,%
plainpages=true,%
linkcolor=black,%
citecolor=black,%
urlcolor=Blue,%
%pdfstartview=FitH,% or Fit
pdfstartview={XYZ null null 1},%
pdfview={XYZ null null null},%
pdfpagemode=UseNone,% for no outline
pdfauthor={Friedrich Leisch and R Core Team},%
pdftitle={Sweave 

Handling Temporary Files

The argument keep_files controls whether temporary files (the downloaded archive and/or extracted directory) are retained:

  • "none" (default): Delete both the tar.gz archive and the extracted files.

  • "tgz": Keep only the tar.gz archive.

  • "extracted": Keep only the extracted files.

  • "both": Keep both the tar.gz archive and the extracted files.

Choose the option that best fits your workflow.