Add read_waterdata_nearest_continuous helper by thodson-usgs · Pull Request #881 · DOI-USGS/dataRetrieval

thodson-usgs · 2026-04-23T15:08:53Z

Summary

Adds read_waterdata_nearest_continuous(targets, ...) — for each target timestamp, returns the single continuous observation closest to that timestamp, fetched in one HTTP round-trip (auto-chunked when the CQL filter gets long).

Try it

Copy-paste into an R session — installs this branch and runs one end-to-end call:

remotes::install_github(
  "thodson-usgs/dataRetrieval-1",
  ref = "feat/nearest-continuous"
)

library(dataRetrieval)

targets <- as.POSIXct(
  c("2023-06-15 10:30:31", "2023-06-15 14:07:12", "2023-06-16 03:45:19"),
  tz = "UTC"
)

near <- read_waterdata_nearest_continuous(
  targets = targets,
  monitoring_location_id = "USGS-02238500",
  parameter_code = "00060"
)
near[, c("monitoring_location_id", "time", "value", "target_time")]
#> # A tibble: 3 × 4
#>   monitoring_location_id time                value target_time
#>   <chr>                  <dttm>              <dbl> <dttm>
#> 1 USGS-02238500          2023-06-15 10:30:00  22.4 2023-06-15 10:30:31
#> 2 USGS-02238500          2023-06-15 14:00:00  22.4 2023-06-15 14:07:12
#> 3 USGS-02238500          2023-06-16 03:45:00  22.4 2023-06-16 03:45:19

One HTTP request goes out, carrying a three-clause (time >= t-window AND time <= t+window) OR ... CQL filter; three rows come back, one per target. Each time is the nearest observation on the 15-minute grid; target_time identifies which target the row corresponds to.

(No API_USGS_PAT needed to run the snippet — the Water Data API serves unauthenticated requests at a lower rate limit. Set it if you're iterating.)

Tie-mode and wider-window variations:

# Widen the window and average numeric columns for targets that fall
# on the midpoint between two grid observations.
read_waterdata_nearest_continuous(
  targets = targets,
  monitoring_location_id = "USGS-02238500",
  parameter_code = "00060",
  window = "PT15M",      # or "15:00", or "15 minutes"
  on_tie = "mean"
)

Why

The Water Data API's time= parameter treats a single instant as an exact match, not a nearest-match — time = "2023-06-15T10:30:31Z" on a 15-minute gauge returns 0 rows. The advertised sortby parameter would make "nearest" expressible as filter = "time <= 'target'" + sortby = -time + limit = 1, but sortby is per-query, so N targets would mean N HTTP round-trips. There is no T_NEAREST CQL function either.

The narrow-window + client-side reduction implemented here is the one pattern that folds N targets into a single request today.

Knobs

window = "PT7M30S" — half-window around each target (7.5 minutes, ISO 8601; half of the 15-minute continuous cadence, so most windows contain exactly one observation). Accepts:
- ISO 8601 durations ("PT7M30S", "PT15M", "PT1H", ...) or any other string lubridate::duration() parses (e.g. "7 minutes 30 seconds")
- "MM:SS" or "HH:MM:SS" clock-style strings (e.g. "07:30", "15:00", "00:30:00", "01:00:00")
- for programmatic callers: a number of seconds, a difftime, or a lubridate::Period/Duration
on_tie = "first" — how to resolve ties when a target falls at the midpoint between two grid points (rare but possible). Alternatives: "last" (keep the later observation), "mean" (average numeric columns; set time to the target).

Multi-site calls return one row per (target, monitoring_location_id) pair. Targets with no observations in their window are silently dropped. Passing time, filter, or filter_lang raises an error — the helper builds those itself.

Naming

Renamed from the Python get_nearest_continuous to read_waterdata_nearest_continuous to match the R package's convention (read_waterdata_* for OGC-backed functions).

Relationship to #880

This PR is built on top of #880 (Add CQL filter passthrough to OGC waterdata functions) and will look lighter once that lands. The helper's core trick — fanning N targets into one request — is only possible because #880 adds filter / filter_lang support + automatic URL-length-safe chunking to read_waterdata_continuous. The branch feat/nearest-continuous is stacked on feat/cql-filter-passthrough, so until #880 merges the diff here will include both changesets; after #880 merges the commits on its branch become common ancestors and this PR's diff reduces to the one commit introducing read_waterdata_nearest_continuous and its tests.

Please merge #880 first.

Test plan

Non-network unit tests via with_mocked_bindings — 44/44 pass. Covers filter construction (one bracketed AND clause per target, joined by OR), nearest-observation reduction, all three on_tie modes (first / last / mean), missing-window drop, multi-site fan-out, empty targets, forbidden-kwarg validation, and window input shapes (ISO 8601 like "PT7M30S" / "PT15M" / "PT1H", natural-language strings like "7 minutes 30 seconds", "MM:SS" and "HH:MM:SS" including fractional seconds, numeric seconds, difftime, lubridate::Period).
R CMD check — 0 errors, 0 warnings, 3 unrelated NOTEs.
Live end-to-end against USGS-02238500 00060 with three off-grid targets (output shown in the Try it section above). One HTTP request, three rows returned, time snapped to the 15-minute grid, target_time preserved as POSIXct.

Marked as draft pending maintainer review.

🤖 Generated with Claude Code

Every OGC read_waterdata_* function (continuous, daily, field_measurements, monitoring_location, ts_meta, latest_continuous, latest_daily, channel) now accepts `filter` and `filter_lang` arguments that are forwarded as the OGC `filter` / `filter-lang` query parameters. The R argument `filter_lang` is translated to the hyphenated `filter-lang` URL parameter that the service expects. When a filter is a top-level OR chain that exceeds a conservative URI-length budget (5 KB), the library transparently splits it into multiple sub-requests and concatenates (and deduplicates) the results. This keeps the common multi-interval use case out of the caller's way -- they don't need to know about the server's 414 boundary. Mirrors dataretrieval-python PR DOI-USGS#238. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rame handling Addresses feedback on the companion Python PR (DOI-USGS/dataretrieval-python#238): - Skip chunking when `filter_lang` is not `cql-text`. The splitter is text- and single-quote-aware and would corrupt cql-json. Non-cql-text filters are now forwarded as-is. - Budget each chunk against the server's URL byte limit (`.WATERDATA_URL_BYTE_LIMIT = 8000`, matching the observed HTTP 414 cliff of ~8,200 bytes) rather than a fixed raw filter length. `effective_filter_budget` probes the non-filter URL, subtracts, and converts back to raw CQL bytes using the max per-clause encoding ratio (with the " OR " joiner included — in R's percent-encoding the joiner inflates 2x, heavier than typical clause ratios, and the previous clause-only max let chunks overflow the URL cap). - When the non-filter URL already exceeds the byte limit, return a budget larger than the filter so it passes through unchanged — one clear 414 is better feedback than N failing sub-requests. - Move filter chunking out of the recursive `get_ogc_data` path and into the post-transform branch, so the probe sees the real request args. Collect raw frames, drop empty ones before `rbind` (a plain empty frame first would downgrade a later sf result and drop geometry/CRS), and dedup on the pre-rename feature `id`. - Add regression tests for doubled single-quote CQL escape, the URL byte budget guarantee, and non-cql-text pass-through. - Document CQL filter usage with two examples on `read_waterdata_continuous`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the helper organization in the merged Python PR (DOI-USGS/dataretrieval-python#238) so the per-language implementations stay easy to read alongside each other. The single-vs-fanned distinction is now expressed once, in `plan_filter_chunks`, which always returns a list of "chunk overrides" -- `list(NULL)` for "send `args` as-is", or a list of chunked cql-text expressions otherwise. `fetch_chunks` issues one request per entry and returns the per-chunk frames plus the first sub-request (for the `request` attribute). `combine_chunk_frames` handles the empty-frame and dedup-by-`id` cases. `get_ogc_data` is now a linear pipeline: chunks <- plan_filter_chunks(args) fetched <- fetch_chunks(args, chunks) return_list <- combine_chunk_frames(fetched$frames) req <- fetched$req ... post-processing ... Behavior unchanged: same chunk sizing (URL-byte-budget aware), same cql-text-only guard, same empty-frame and id-dedup handling. The only observable difference is that the `request` attribute now points at the first sub-request instead of the last (matching Python's choice of representative metadata), which is a debugging-only change for the chunked path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

For each target timestamp, returns the single continuous observation closest to that timestamp, fetched in one HTTP round-trip (auto-chunked when the underlying CQL filter gets long). Why: the Water Data API's `time=` parameter treats a single instant as an *exact match*, not a nearest-match -- `time=2023-06-15T10:30:31Z` on a 15-minute gauge returns 0 rows. The advertised `sortby` parameter would make "nearest" expressible as `filter=time <= 'target' & sortby=-time & limit=1`, but `sortby` is per-query, so N targets would mean N HTTP round-trips. There is no `T_NEAREST` CQL function either. The narrow-window + client-side reduction implemented here is the one pattern that folds N targets into a single request today, made possible by the CQL filter passthrough + auto-chunking on the preceding filter PR. Knobs: - `window` (default 450s, i.e. 7.5 min, half of the 15-min continuous cadence) -- accepts numeric seconds, a difftime, a lubridate Period/Duration, or a string coercible to one. - `on_tie` in {"first", "last", "mean"} controls behavior when a target sits exactly at the midpoint between two observations. Passing `time`, `filter`, or `filter_lang` raises an error -- this function builds those itself. Mirrors dataretrieval-python PR DOI-USGS#239, renamed from `get_nearest_continuous` to `read_waterdata_nearest_continuous` to match R package conventions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The primary way to specify `window` is now an `"HH:MM:SS"` string: window = "00:07:30" # default (7.5 min, half of 15-min cadence) window = "00:15:00" window = "00:30:00" window = "01:00:00" Reads more cleanly than raw seconds (`450`) or a loose time-unit string (`"7.5 mins"`) when comparing windows at a glance. Programmatic callers can still pass a number of seconds, a `difftime`, or a `lubridate::Period`/`Duration` -- the fuzzy `"7.5 mins"` / `lubridate::duration` string path is dropped in favor of the unambiguous `HH:MM:SS` form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Default is now `"07:30"` (MM:SS) instead of `"00:07:30"` -- reads cleanly at a glance for the common sub-hour case and matches how people write cadence offsets for 15-minute gauges. - The parser now accepts: * MM:SS / HH:MM:SS clock-style strings (new MM:SS form for brevity), * ISO 8601 duration strings (`"PT7M30S"`, `"PT15M"`, `"PT1H"`, ...) or any other string `lubridate::duration()` parses, * numeric seconds, `difftime`, `lubridate::Period`/`Duration` (unchanged). - Error message and tests updated accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

All non-ISO string forms (MM:SS, HH:MM:SS, natural-language via lubridate) still parse; only the declared default changes. Picks the unambiguous, internationally-standard form for what shows up in the function signature and the generated help page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

thodson-usgs and others added 2 commits April 22, 2026 15:21

thodson-usgs temporarily deployed to CI_config April 23, 2026 15:09 — with GitHub Actions Inactive

thodson-usgs temporarily deployed to CI_config April 23, 2026 15:52 — with GitHub Actions Inactive

thodson-usgs temporarily deployed to CI_config April 23, 2026 15:57 — with GitHub Actions Inactive

thodson-usgs temporarily deployed to CI_config April 23, 2026 16:01 — with GitHub Actions Inactive

thodson-usgs and others added 5 commits April 23, 2026 15:45

thodson-usgs force-pushed the feat/nearest-continuous branch from c8c15a3 to 01e76c4 Compare April 23, 2026 20:46

thodson-usgs temporarily deployed to CI_config April 23, 2026 20:46 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read_waterdata_nearest_continuous helper#881

Add read_waterdata_nearest_continuous helper#881
thodson-usgs wants to merge 7 commits intoDOI-USGS:developfrom
thodson-usgs:feat/nearest-continuous

thodson-usgs commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thodson-usgs commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Try it

Why

Knobs

Naming

Relationship to #880

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thodson-usgs commented Apr 23, 2026 •

edited

Loading