Add tidy-r skill for modern tidyverse R development#43
Add tidy-r skill for modern tidyverse R development#43statzhero wants to merge 7 commits intoposit-dev:mainfrom
Conversation
gadenbuie
left a comment
There was a problem hiding this comment.
Thanks @statzhero! I have a few comments, but the underlying theme is that the most important decision to make in designing a skill is around the context in which the skill will be invoked and to then organize the skill around that use-case. Currently this skill seems to land somewhere between reference guide and a workflow skill, but I think the reference guide is likely a better fit.
It would also be useful if you could describe how you prepared the skill, what materials you referenced, what versions of the related packages you covered etc.
| description: | | ||
| Modern tidyverse patterns, style guide, and migration guidance for R development. Use when writing, reviewing, or refactoring tidyverse code. Covers native pipe, join_by(), .by grouping, pick/across/reframe, filter_out/when_any/when_all, recode_values/replace_values/replace_when, tidy selection, stringr, naming conventions, and migration from base R or older tidyverse APIs. |
There was a problem hiding this comment.
The primary goal of this description field is to help the model make a decision about when to load the skill. The model will decide whether or when to load the skill based only on this description, so it's important to be as clear as possible about the scenarios in which the skill should be proactively loaded.
My sense, without testing the skill, is that this description is more about providing a description of what topics are covered in this resource and less about providing guidance to ensure that the skill is reliably loaded under the right circumstances.
| r_version: "4.5+" | ||
| tidyverse_version: "2.0+" | ||
| dplyr_version: "1.2+" | ||
| allowed-tools: Read, Edit, Write, Grep, Glob, Bash, mcp__r-btw__* |
There was a problem hiding this comment.
I don't think we need to include allowed-tools here, or at least I wouldn't recommend it unless you have a very specific reason why it's required.
| metadata: | ||
| r_version: "4.5+" | ||
| tidyverse_version: "2.0+" | ||
| dplyr_version: "1.2+" |
There was a problem hiding this comment.
I'm not sure how or where this metadata shows up to the model, so it might be better to simply skip this section. If you do keep it, I'd prefer to re-use R's description syntax.
| metadata: | |
| r_version: "4.5+" | |
| tidyverse_version: "2.0+" | |
| dplyr_version: "1.2+" | |
| metadata: | |
| r_version: ">=4.5.0" | |
| tidyverse_version: ">=2.0.0" | |
| dplyr_version: ">=1.2.0" |
| # Writing Modern Tidyverse R | ||
|
|
||
| This skill covers modern tidyverse patterns for R 4.5+ and tidyverse 2.0+, style guidelines, and migration from legacy patterns. | ||
|
|
||
| ## Core philosophy | ||
|
|
||
| R's tidyverse evolves. Code from blog posts and StackOverflow often uses deprecated APIs, magrittr pipes, or base R patterns where a modern tidyverse function exists. This skill encodes the current recommended approach so the model writes code that experienced R developers would recognize as idiomatic. | ||
|
|
||
| ## When to use this skill | ||
|
|
||
| - Writing new R code with dplyr, tidyr, stringr, purrr, or other tidyverse packages | ||
| - Reviewing or refactoring existing R code for modern patterns | ||
| - Migrating from base R, magrittr pipes, or older tidyverse APIs | ||
| - Applying tidyverse style conventions (naming, spacing, error handling) | ||
| - Choosing between similar functions (e.g., `case_when` vs `recode_values`) | ||
| - Working with joins, grouping, recoding, or string manipulation in R | ||
|
|
||
| ## When NOT to use this skill | ||
|
|
||
| - Writing data.table code (different paradigm) | ||
| - Pure base R projects that intentionally avoid tidyverse | ||
| - Shiny UI/server logic (use a Shiny-specific skill) | ||
| - Package development internals (NAMESPACE, DESCRIPTION, roxygen) | ||
| - ggplot2 visualization (use the socviz skill) | ||
| - Statistical modeling or Bayesian analysis |
There was a problem hiding this comment.
The model doesn't see any of this advice until after it has already fully loaded the skill. This is the kind of information you need to compress into the ~100 allowed tokens of the description. Once moved into description, most of this can be dropped for conciseness and brevity under the assumption that if the model is seeing this text it's already made the decision to load the skill.
| When you receive a request, classify it and consult the appropriate reference: | ||
|
|
||
| ### Step 1: Classify the request |
There was a problem hiding this comment.
This section feels a little too prescriptive. Like the skill is trying to define a workflow, but it's more of a reference skill than a workflow skill. I think in general you can just lay out the references and best practices; the model will find its own way through the files.
| | **Joins** | [join-examples.md](references/join-examples.md) | Merging data, `*_join`, `join_by`, matching rows, lookup tables | | ||
| | **Grouping & columns** | [grouping-examples.md](references/grouping-examples.md) | `.by`, `group_by`, `across`, `pick`, `reframe`, column operations | | ||
| | **Recoding & replacing** | [recode-replace-examples.md](references/recode-replace-examples.md) | `case_when`, `recode_values`, `replace_values`, `replace_when`, `filter_out`, `when_any`, `when_all`, recoding, replacing, conditional updates | | ||
| | **Strings** | [stringr-examples.md](references/stringr-examples.md) | String manipulation, regex, `str_*` functions, text processing | | ||
| | **Style** | [tidyverse-style.md](references/tidyverse-style.md) | Naming, formatting, spacing, error messages, `cli::cli_abort` | | ||
| | **Migration** | [migration-examples.md](references/migration-examples.md) | Updating old code, base R conversion, deprecated functions | |
There was a problem hiding this comment.
I'd recommend dropping -examples from all of the file names.
|
|
||
| ### Step 2: Read the reference file(s) | ||
|
|
||
| Use the Read tool to load the relevant reference. For requests that span multiple categories (e.g., "rewrite this old code" touches migration + style), read multiple files. |
There was a problem hiding this comment.
This is another example of what I mean by "overly prescriptive". I don't think you need to tell the model to use the Read tool, that's baseline model behavior at this point.
| 1. **Use modern tidyverse patterns** - Prioritize dplyr 1.2+ features, native pipe, and current APIs | ||
| 2. **Write readable code first** - Optimize only when necessary | ||
| 3. **Follow tidyverse style guide** - Consistent naming, spacing, and structure | ||
| 4. **Use R MCP tools** - Automatically resolve function documentation and library references without being asked. If the `mcp__r-btw__*` tools are unavailable, fall back to running R help via Bash (see below) |
There was a problem hiding this comment.
I think mentioning btw via MCP (or the new btw CLI tool) is fine, but we shouldn't assume it's available or that it will be registered with a name that matches Claude Code's conventions.
There was a problem hiding this comment.
I wasn't aware of the new btw CLI (thanks for pointing).
| | `filter(x != val \| is.na(x))` | `filter_out(x == val)` | | ||
| | `coalesce(x, default)` | `replace_values(x, NA ~ default)` | | ||
| | `na_if(x, val)` | `replace_values(x, val ~ NA)` | | ||
| | `qs::qsave()` / `qs::qread()` | `qs2::qs_save()` / `qs2::qs_read()` | |
There was a problem hiding this comment.
The qs2 and tidylog recommendations give me pause. They're very valid choices, but they may be over-indexing on your personal preferences.
| | `na_if(x, val)` | `replace_values(x, val ~ NA)` | | ||
| | `qs::qsave()` / `qs::qread()` | `qs2::qs_save()` / `qs2::qs_read()` | | ||
|
|
||
| ## Complete workflow example |
There was a problem hiding this comment.
I think you could likely drop this section without losing any performance benefits
|
Thank you for the thoughtful comments. Let me address this first:
A fair point, I understand it as: @gadenbuie you prefer a slimmer skill in the form of a reference guide? On the sources: as I said elsewhere, I regret not keeping better records because this was built over many iterations. The ancestor skill must be Sarah Johnson's Claude R Tidyverse Expert gist. Many references are simply distilled from the tidyverse style guide (chapters 1-4, 9), the dplyr 1.2.0 release post, and dplyr reference documentation. Some idioms are manually added, e.g., "- |
Yes, I think I prefer a reference guide-style skill, which doesn't necessarily mean "slimmer". More than anything, it means structuring the content to be used as a reference guide and giving the model appropriate instructions to navigate and use the reference material as well as possible. |
|
The latest commits are a major update to the PR. I also did a code audit / review with the respective tools in Claude Code and Codex.
|
Purpose
This skill teaches Claude current tidyverse idioms. R syntax patterns have changed substantially since 2023, and Claude's defaults often lag behind. For example, it still reaches for
group_by() |> summarise() |> ungroup()instead of.by, uses the deprecatedrecode(), and so forth. This skill corrects those habits. (NB: Supersedes #11. So this is a follow-up, even if unsolicited.)Use cases
Testing
I installed the skill locally and verified that it activates on tidyverse-related prompts. Claude generally invokes the skill when writing R code by itself, though I would recommend to put an explicit pointer into CLAUDE.md.
Dependencies
None. This is a pure markdown skill with no external tools or packages required. There are some examples that suggest other packages (e.g.,
tidylog) but this is optional.Documentation
The main SKILL.md covers core rules and style conventions. Six reference files provide worked examples for grouping, joins, migration, recode/replace, stringr, and style. They are loaded only when the topic comes up.
Token count
tidy-r (1,178 lines, 8,531 tokens)
Skill description: 84 tokens