Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 22 additions & 18 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
^.*\.Rproj$
^\.Rproj\.user$
^_pkgdown\.yml$
^docs$
^pkgdown$
^\.github$
^README.rmd
^data-raw
^demo.rmd
^testing.md
^LICENSE\.md$
^vignettes/articles$
^data-raw$
^demo\.rmd$
^nccsdata\.Rproj$
^README\.Rmd$
^testing\.md$
^hex$
^.*\.Rproj$
^\.Rproj\.user$
^_pkgdown\.yml$
^docs$
^pkgdown$
^\.github$
^README.rmd
^data-raw
^demo.rmd
^testing.md
^LICENSE\.md$
^vignettes/articles$
^data-raw$
^demo\.rmd$
^nccsdata\.Rproj$
^README\.Rmd$
^testing\.md$
^hex$
^\.claude$
^test\.R$
^CLAUDE\.md$
^\.Rproj\.user
2 changes: 1 addition & 1 deletion .github/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1 @@
*.html
*.html
98 changes: 49 additions & 49 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
@@ -1,49 +1,49 @@
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]

name: R-CMD-check

jobs:
R-CMD-check:
runs-on: ${{ matrix.config.os }}

name: ${{ matrix.config.os }} (${{ matrix.config.r }})

strategy:
fail-fast: false
matrix:
config:
- {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes

steps:
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-pandoc@v2

- uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.config.r }}
http-user-agent: ${{ matrix.config.http-user-agent }}
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck
needs: check

- uses: r-lib/actions/check-r-package@v2
with:
upload-snapshots: true
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]
name: R-CMD-check
jobs:
R-CMD-check:
runs-on: ${{ matrix.config.os }}
name: ${{ matrix.config.os }} (${{ matrix.config.r }})
strategy:
fail-fast: false
matrix:
config:
- {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes
steps:
- uses: actions/checkout@v4
- uses: r-lib/actions/setup-pandoc@v2
- uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.config.r }}
http-user-agent: ${{ matrix.config.http-user-agent }}
use-public-rspm: true
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck
needs: check
- uses: r-lib/actions/check-r-package@v2
with:
upload-snapshots: true
100 changes: 50 additions & 50 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
@@ -1,50 +1,50 @@
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]

name: test-coverage

jobs:
test-coverage:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

steps:
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::covr
needs: coverage

- name: Test coverage
run: |
covr::codecov(
quiet = FALSE,
clean = FALSE,
install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
)
shell: Rscript {0}

- name: Show testthat output
if: always()
run: |
## --------------------------------------------------------------------
find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true
shell: bash

- name: Upload test results
if: failure()
uses: actions/upload-artifact@v3
with:
name: coverage-test-failures
path: ${{ runner.temp }}/package
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]
name: test-coverage
jobs:
test-coverage:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v4
- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::covr
needs: coverage
- name: Test coverage
run: |
covr::codecov(
quiet = FALSE,
clean = FALSE,
install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
)
shell: Rscript {0}
- name: Show testthat output
if: always()
run: |
## --------------------------------------------------------------------
find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true
shell: bash
- name: Upload test results
if: failure()
uses: actions/upload-artifact@v4
with:
name: coverage-test-failures
path: ${{ runner.temp }}/package
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
docs
inst/doc
..Rcheck
75 changes: 75 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## About

`nccsdata` is an R package for downloading, filtering, and analyzing NCCS (National Center for Charitable Statistics) nonprofit organization data. Published by the Urban Institute.

## Common Commands

```bash
# Check package (equivalent of build + test + lint)
R CMD check .

# Run all tests
Rscript -e 'devtools::test()'

# Run a single test file
Rscript -e 'testthat::test_file("tests/testthat/test-<name>.R")'

# Regenerate documentation from roxygen2 comments
Rscript -e 'devtools::document()'

# Build pkgdown documentation site
Rscript -e 'pkgdown::build_site()'

# Install package locally
R CMD INSTALL .
```

## Architecture

### Data Source

BMF parquet files are stored in a public S3 bucket at:
`s3://nccsdata/geocoding/bmf/{YYYY_MM}/merged/bmf_{YYYY_MM}_geocoded.parquet`

The `arrow` package reads these directly via S3 URIs — no authentication needed.

### Package Functions

- `R/nccs_read.R` — Core function. Reads BMF parquet from S3 with optional predicate-pushdown filters (state, county, NTEE subsector, exempt org type). Supports column selection and lazy Arrow queries.
- `R/nccs_summary.R` — Grouped count summaries on collected data.
- `R/nccs_catalog.R` — Lists valid filter values (offline, no network needed).
- `R/nccs_dictionary.R` — Returns BMF data dictionary as a tibble with optional grep filtering. Also documents the `bmf_dictionary` dataset.

### Package Data

- `data/bmf_dictionary.rda` — 97-row tibble with column_name, description, type for all BMF columns.
- `data-raw/bmf_data_dictionary.csv` — Source CSV from S3.
- `data-raw/data_generation.R` — Script to regenerate `bmf_dictionary.rda` from CSV.

### Structure

- `R/` — Function files (nccs_read.R, nccs_summary.R, nccs_catalog.R, nccs_dictionary.R)
- `man/` — Generated by roxygen2
- `tests/testthat/` — Tests for each function
- `vignettes/getting-started.Rmd` — Introductory vignette (eval=FALSE to avoid S3 calls during build)
- `_pkgdown.yml` — Site config with function references
- `hex/` — Branding assets
- `.github/` — CI workflows

### Dependencies

- **Imports**: `arrow`, `dplyr`, `utils`
- **Suggests**: `knitr`, `rmarkdown`, `testthat`

## Testing

Tests use testthat edition 3. Network-dependent integration tests use `skip_on_cran()` and `skip_if_offline()`.

- `test-nccs_dictionary.R` — Dictionary dataset structure and filtering (offline)
- `test-nccs_catalog.R` — Valid filter values and error handling (offline)
- `test-nccs_summary.R` — Count summaries, grouping, CSV output (offline)
- `test-nccs_read.R` — S3 path construction, input validation (offline), integration reads (network)
44 changes: 21 additions & 23 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,34 +1,32 @@
Package: nccsdata
Type: Package
Title: What the Package Does (Title Case)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <yourself@somewhere.net>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
Title: Access and Analyze NCCS Nonprofit Data
Version: 2.0.0
Author: Jesse Lecy [aut],
Thiyaghessan Poongundranar [aut, cre],
Urban Institute [cph, fnd]
Maintainer: Thiyaghessan Poongundranar <tpoongundranar@urban.org>
Description: Download, filter, and analyze nonprofit organization data from
the National Center for Charitable Statistics (NCCS). Reads IRS Business
Master File (BMF) data stored as parquet files in a public S3 bucket,
with support for predicate-pushdown filtering by state, county, NTEE
subsector, and exempt organization type.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends:
R (>= 3.50)
Imports:
bit64,
Depends:
R (>= 4.0)
Imports:
arrow,
dplyr,
stringr,
data.table,
purrr,
rlang,
utils,
RCurl,
jsonlite,
stats,
utils
Suggests:
curl,
knitr,
curl
RoxygenNote: 7.2.3
URL: https://github.com/UrbanInstitute/nccsdata, https://urbaninstitute.github.io/nccsdata/
BugReports: https://github.com/UrbanInstitute/nccsdata/issues
Suggests:
rmarkdown,
testthat (>= 3.0.0)
RoxygenNote: 7.3.3
URL: https://github.com/UrbanInstitute/nccsdata, https://urbaninstitute.github.io/nccsdata/
BugReports: https://github.com/UrbanInstitute/nccsdata/issues
Config/testthat/edition: 3
VignetteBuilder: knitr
4 changes: 2 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
YEAR: 2023
COPYRIGHT HOLDER: nccsdata authors
YEAR: 2023
COPYRIGHT HOLDER: nccsdata authors
Loading
Loading