Skip to content

Add tidy-r skill for modern tidyverse R development#43

Open
statzhero wants to merge 7 commits intoposit-dev:mainfrom
statzhero:feat/tidy-r
Open

Add tidy-r skill for modern tidyverse R development#43
statzhero wants to merge 7 commits intoposit-dev:mainfrom
statzhero:feat/tidy-r

Conversation

@statzhero
Copy link
Copy Markdown

Purpose

This skill teaches Claude current tidyverse idioms. R syntax patterns have changed substantially since 2023, and Claude's defaults often lag behind. For example, it still reaches for group_by() |> summarise() |> ungroup() instead of .by, uses the deprecated recode(), and so forth. This skill corrects those habits. (NB: Supersedes #11. So this is a follow-up, even if unsolicited.)

Use cases

  • Writing or reviewing R code that uses dplyr, tidyr, stringr, or purrr.
  • Migrating older tidyverse code to current patterns.
  • Enforcing consistent style across a project.

Testing

I installed the skill locally and verified that it activates on tidyverse-related prompts. Claude generally invokes the skill when writing R code by itself, though I would recommend to put an explicit pointer into CLAUDE.md.

Dependencies

None. This is a pure markdown skill with no external tools or packages required. There are some examples that suggest other packages (e.g., tidylog) but this is optional.

Documentation

The main SKILL.md covers core rules and style conventions. Six reference files provide worked examples for grouping, joins, migration, recode/replace, stringr, and style. They are loaded only when the topic comes up.

Token count

tidy-r (1,178 lines, 8,531 tokens)

Skill description: 84 tokens

File Lines Tokens
SKILL.md 210 2,245
references/grouping-examples.md 175 848
references/join-examples.md 123 781
references/migration-examples.md 165 1,365
references/recode-replace-examples.md 188 1,283
references/stringr-examples.md 102 912
references/tidyverse-style.md 215 1,097

Copy link
Copy Markdown
Collaborator

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @statzhero! I have a few comments, but the underlying theme is that the most important decision to make in designing a skill is around the context in which the skill will be invoked and to then organize the skill around that use-case. Currently this skill seems to land somewhere between reference guide and a workflow skill, but I think the reference guide is likely a better fit.

It would also be useful if you could describe how you prepared the skill, what materials you referenced, what versions of the related packages you covered etc.

Comment thread tidyverse/tidy-r/SKILL.md Outdated
Comment on lines +3 to +4
description: |
Modern tidyverse patterns, style guide, and migration guidance for R development. Use when writing, reviewing, or refactoring tidyverse code. Covers native pipe, join_by(), .by grouping, pick/across/reframe, filter_out/when_any/when_all, recode_values/replace_values/replace_when, tidy selection, stringr, naming conventions, and migration from base R or older tidyverse APIs.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary goal of this description field is to help the model make a decision about when to load the skill. The model will decide whether or when to load the skill based only on this description, so it's important to be as clear as possible about the scenarios in which the skill should be proactively loaded.

My sense, without testing the skill, is that this description is more about providing a description of what topics are covered in this resource and less about providing guidance to ensure that the skill is reliably loaded under the right circumstances.

Comment thread tidyverse/tidy-r/SKILL.md Outdated
r_version: "4.5+"
tidyverse_version: "2.0+"
dplyr_version: "1.2+"
allowed-tools: Read, Edit, Write, Grep, Glob, Bash, mcp__r-btw__*
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to include allowed-tools here, or at least I wouldn't recommend it unless you have a very specific reason why it's required.

Comment thread tidyverse/tidy-r/SKILL.md Outdated
Comment on lines +5 to +8
metadata:
r_version: "4.5+"
tidyverse_version: "2.0+"
dplyr_version: "1.2+"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how or where this metadata shows up to the model, so it might be better to simply skip this section. If you do keep it, I'd prefer to re-use R's description syntax.

Suggested change
metadata:
r_version: "4.5+"
tidyverse_version: "2.0+"
dplyr_version: "1.2+"
metadata:
r_version: ">=4.5.0"
tidyverse_version: ">=2.0.0"
dplyr_version: ">=1.2.0"

Comment thread tidyverse/tidy-r/SKILL.md Outdated
Comment on lines +12 to +36
# Writing Modern Tidyverse R

This skill covers modern tidyverse patterns for R 4.5+ and tidyverse 2.0+, style guidelines, and migration from legacy patterns.

## Core philosophy

R's tidyverse evolves. Code from blog posts and StackOverflow often uses deprecated APIs, magrittr pipes, or base R patterns where a modern tidyverse function exists. This skill encodes the current recommended approach so the model writes code that experienced R developers would recognize as idiomatic.

## When to use this skill

- Writing new R code with dplyr, tidyr, stringr, purrr, or other tidyverse packages
- Reviewing or refactoring existing R code for modern patterns
- Migrating from base R, magrittr pipes, or older tidyverse APIs
- Applying tidyverse style conventions (naming, spacing, error handling)
- Choosing between similar functions (e.g., `case_when` vs `recode_values`)
- Working with joins, grouping, recoding, or string manipulation in R

## When NOT to use this skill

- Writing data.table code (different paradigm)
- Pure base R projects that intentionally avoid tidyverse
- Shiny UI/server logic (use a Shiny-specific skill)
- Package development internals (NAMESPACE, DESCRIPTION, roxygen)
- ggplot2 visualization (use the socviz skill)
- Statistical modeling or Bayesian analysis
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model doesn't see any of this advice until after it has already fully loaded the skill. This is the kind of information you need to compress into the ~100 allowed tokens of the description. Once moved into description, most of this can be dropped for conciseness and brevity under the assumption that if the model is seeing this text it's already made the decision to load the skill.

Comment thread tidyverse/tidy-r/SKILL.md Outdated
Comment on lines +40 to +42
When you receive a request, classify it and consult the appropriate reference:

### Step 1: Classify the request
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section feels a little too prescriptive. Like the skill is trying to define a workflow, but it's more of a reference skill than a workflow skill. I think in general you can just lay out the references and best practices; the model will find its own way through the files.

Comment thread tidyverse/tidy-r/SKILL.md Outdated
Comment on lines +46 to +51
| **Joins** | [join-examples.md](references/join-examples.md) | Merging data, `*_join`, `join_by`, matching rows, lookup tables |
| **Grouping & columns** | [grouping-examples.md](references/grouping-examples.md) | `.by`, `group_by`, `across`, `pick`, `reframe`, column operations |
| **Recoding & replacing** | [recode-replace-examples.md](references/recode-replace-examples.md) | `case_when`, `recode_values`, `replace_values`, `replace_when`, `filter_out`, `when_any`, `when_all`, recoding, replacing, conditional updates |
| **Strings** | [stringr-examples.md](references/stringr-examples.md) | String manipulation, regex, `str_*` functions, text processing |
| **Style** | [tidyverse-style.md](references/tidyverse-style.md) | Naming, formatting, spacing, error messages, `cli::cli_abort` |
| **Migration** | [migration-examples.md](references/migration-examples.md) | Updating old code, base R conversion, deprecated functions |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend dropping -examples from all of the file names.

Comment thread tidyverse/tidy-r/SKILL.md Outdated

### Step 2: Read the reference file(s)

Use the Read tool to load the relevant reference. For requests that span multiple categories (e.g., "rewrite this old code" touches migration + style), read multiple files.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another example of what I mean by "overly prescriptive". I don't think you need to tell the model to use the Read tool, that's baseline model behavior at this point.

Comment thread tidyverse/tidy-r/SKILL.md Outdated
1. **Use modern tidyverse patterns** - Prioritize dplyr 1.2+ features, native pipe, and current APIs
2. **Write readable code first** - Optimize only when necessary
3. **Follow tidyverse style guide** - Consistent naming, spacing, and structure
4. **Use R MCP tools** - Automatically resolve function documentation and library references without being asked. If the `mcp__r-btw__*` tools are unavailable, fall back to running R help via Bash (see below)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mentioning btw via MCP (or the new btw CLI tool) is fine, but we shouldn't assume it's available or that it will be registered with a name that matches Claude Code's conventions.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of the new btw CLI (thanks for pointing).

Comment thread tidyverse/tidy-r/SKILL.md Outdated
| `filter(x != val \| is.na(x))` | `filter_out(x == val)` |
| `coalesce(x, default)` | `replace_values(x, NA ~ default)` |
| `na_if(x, val)` | `replace_values(x, val ~ NA)` |
| `qs::qsave()` / `qs::qread()` | `qs2::qs_save()` / `qs2::qs_read()` |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The qs2 and tidylog recommendations give me pause. They're very valid choices, but they may be over-indexing on your personal preferences.

Comment thread tidyverse/tidy-r/SKILL.md Outdated
| `na_if(x, val)` | `replace_values(x, val ~ NA)` |
| `qs::qsave()` / `qs::qread()` | `qs2::qs_save()` / `qs2::qs_read()` |

## Complete workflow example
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could likely drop this section without losing any performance benefits

@statzhero
Copy link
Copy Markdown
Author

statzhero commented Apr 1, 2026

Thank you for the thoughtful comments. Let me address this first:

Currently this skill seems to land somewhere between reference guide and a workflow skill, but I think the reference guide is likely a better fit.

A fair point, I understand it as: @gadenbuie you prefer a slimmer skill in the form of a reference guide?

On the sources: as I said elsewhere, I regret not keeping better records because this was built over many iterations. The ancestor skill must be Sarah Johnson's Claude R Tidyverse Expert gist. Many references are simply distilled from the tidyverse style guide (chapters 1-4, 9), the dplyr 1.2.0 release post, and dplyr reference documentation. Some idioms are manually added, e.g., "- set.seed() with date-time, never 42".

@gadenbuie
Copy link
Copy Markdown
Collaborator

A fair point, I understand it as: @gadenbuie you prefer a slimmer skill in the form of a reference guide?

Yes, I think I prefer a reference guide-style skill, which doesn't necessarily mean "slimmer". More than anything, it means structuring the content to be used as a reference guide and giving the model appropriate instructions to navigate and use the reference material as well as possible.

@statzhero
Copy link
Copy Markdown
Author

statzhero commented Apr 7, 2026

The latest commits are a major update to the PR. I also did a code audit / review with the respective tools in Claude Code and Codex.

  • Description is now 98 tokens, and condenses the verbose section in the SKILL.md

  • Removed anti-patterns table from SKILL.md (all items duplicated in quick reference or reference files)

  • Removed best practices section from SKILL.md (all items covered in reference files)

  • Softened the stance on group_by()(still useful when it must persist across multiple operations)

  • Added tidyselect reference

  • Added _ placeholder rule for R pipe

  • Added stringr 1.6.0 case conversion functions

  • Added readr section: read.csv() to read_csv(), read_tsv(), read_csv2(), with vroom::vroom() noted for large files

  • Fixed example, so it actually runs

    Would appreciate further input from you @gadenbuie or the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants