Add tidy-r skill for modern tidyverse R development by statzhero · Pull Request #43 · posit-dev/skills

statzhero · 2026-03-31T17:17:01Z

Purpose

This skill teaches Claude current tidyverse idioms. R syntax patterns have changed substantially since 2023, and Claude's defaults often lag behind. For example, it still reaches for group_by() |> summarise() |> ungroup() instead of .by, uses the deprecated recode(), and so forth. This skill corrects those habits. (NB: Supersedes #11. So this is a follow-up, even if unsolicited.)

Use cases

Writing or reviewing R code that uses dplyr, tidyr, stringr, or purrr.
Migrating older tidyverse code to current patterns.
Enforcing consistent style across a project.

Testing

I installed the skill locally and verified that it activates on tidyverse-related prompts. Claude generally invokes the skill when writing R code by itself, though I would recommend to put an explicit pointer into CLAUDE.md.

Dependencies

None. This is a pure markdown skill with no external tools or packages required. There are some examples that suggest other packages (e.g., tidylog) but this is optional.

Documentation

The main SKILL.md covers core rules and style conventions. Six reference files provide worked examples for grouping, joins, migration, recode/replace, stringr, and style. They are loaded only when the topic comes up.

Token count

tidy-r (1,178 lines, 8,531 tokens)

Skill description: 84 tokens

File	Lines	Tokens
SKILL.md	210	2,245
references/grouping-examples.md	175	848
references/join-examples.md	123	781
references/migration-examples.md	165	1,365
references/recode-replace-examples.md	188	1,283
references/stringr-examples.md	102	912
references/tidyverse-style.md	215	1,097

gadenbuie

Thanks @statzhero! I have a few comments, but the underlying theme is that the most important decision to make in designing a skill is around the context in which the skill will be invoked and to then organize the skill around that use-case. Currently this skill seems to land somewhere between reference guide and a workflow skill, but I think the reference guide is likely a better fit.

It would also be useful if you could describe how you prepared the skill, what materials you referenced, what versions of the related packages you covered etc.

gadenbuie · 2026-03-31T20:38:46Z

+description: |
+  Modern tidyverse patterns, style guide, and migration guidance for R development. Use when writing, reviewing, or refactoring tidyverse code. Covers native pipe, join_by(), .by grouping, pick/across/reframe, filter_out/when_any/when_all, recode_values/replace_values/replace_when, tidy selection, stringr, naming conventions, and migration from base R or older tidyverse APIs.


The primary goal of this description field is to help the model make a decision about when to load the skill. The model will decide whether or when to load the skill based only on this description, so it's important to be as clear as possible about the scenarios in which the skill should be proactively loaded.

My sense, without testing the skill, is that this description is more about providing a description of what topics are covered in this resource and less about providing guidance to ensure that the skill is reliably loaded under the right circumstances.

gadenbuie · 2026-03-31T20:39:32Z

+  r_version: "4.5+"
+  tidyverse_version: "2.0+"
+  dplyr_version: "1.2+"
+allowed-tools: Read, Edit, Write, Grep, Glob, Bash, mcp__r-btw__*


I don't think we need to include allowed-tools here, or at least I wouldn't recommend it unless you have a very specific reason why it's required.

gadenbuie · 2026-03-31T20:41:04Z

+metadata:
+  r_version: "4.5+"
+  tidyverse_version: "2.0+"
+  dplyr_version: "1.2+"


I'm not sure how or where this metadata shows up to the model, so it might be better to simply skip this section. If you do keep it, I'd prefer to re-use R's description syntax.

Suggested change

metadata:

r_version: "4.5+"

tidyverse_version: "2.0+"

dplyr_version: "1.2+"

metadata:

r_version: ">=4.5.0"

tidyverse_version: ">=2.0.0"

dplyr_version: ">=1.2.0"

gadenbuie · 2026-03-31T20:43:23Z

+# Writing Modern Tidyverse R
+
+This skill covers modern tidyverse patterns for R 4.5+ and tidyverse 2.0+, style guidelines, and migration from legacy patterns.
+
+## Core philosophy
+
+R's tidyverse evolves. Code from blog posts and StackOverflow often uses deprecated APIs, magrittr pipes, or base R patterns where a modern tidyverse function exists. This skill encodes the current recommended approach so the model writes code that experienced R developers would recognize as idiomatic.
+
+## When to use this skill
+
+- Writing new R code with dplyr, tidyr, stringr, purrr, or other tidyverse packages
+- Reviewing or refactoring existing R code for modern patterns
+- Migrating from base R, magrittr pipes, or older tidyverse APIs
+- Applying tidyverse style conventions (naming, spacing, error handling)
+- Choosing between similar functions (e.g., `case_when` vs `recode_values`)
+- Working with joins, grouping, recoding, or string manipulation in R
+
+## When NOT to use this skill
+
+- Writing data.table code (different paradigm)
+- Pure base R projects that intentionally avoid tidyverse
+- Shiny UI/server logic (use a Shiny-specific skill)
+- Package development internals (NAMESPACE, DESCRIPTION, roxygen)
+- ggplot2 visualization (use the socviz skill)
+- Statistical modeling or Bayesian analysis


The model doesn't see any of this advice until after it has already fully loaded the skill. This is the kind of information you need to compress into the ~100 allowed tokens of the description. Once moved into description, most of this can be dropped for conciseness and brevity under the assumption that if the model is seeing this text it's already made the decision to load the skill.

gadenbuie · 2026-03-31T20:47:27Z

+When you receive a request, classify it and consult the appropriate reference:
+
+### Step 1: Classify the request


This section feels a little too prescriptive. Like the skill is trying to define a workflow, but it's more of a reference skill than a workflow skill. I think in general you can just lay out the references and best practices; the model will find its own way through the files.

gadenbuie · 2026-03-31T20:47:58Z

+| **Joins** | [join-examples.md](references/join-examples.md) | Merging data, `*_join`, `join_by`, matching rows, lookup tables |
+| **Grouping & columns** | [grouping-examples.md](references/grouping-examples.md) | `.by`, `group_by`, `across`, `pick`, `reframe`, column operations |
+| **Recoding & replacing** | [recode-replace-examples.md](references/recode-replace-examples.md) | `case_when`, `recode_values`, `replace_values`, `replace_when`, `filter_out`, `when_any`, `when_all`, recoding, replacing, conditional updates |
+| **Strings** | [stringr-examples.md](references/stringr-examples.md) | String manipulation, regex, `str_*` functions, text processing |
+| **Style** | [tidyverse-style.md](references/tidyverse-style.md) | Naming, formatting, spacing, error messages, `cli::cli_abort` |
+| **Migration** | [migration-examples.md](references/migration-examples.md) | Updating old code, base R conversion, deprecated functions |


I'd recommend dropping -examples from all of the file names.

gadenbuie · 2026-03-31T20:48:49Z

+
+### Step 2: Read the reference file(s)
+
+Use the Read tool to load the relevant reference. For requests that span multiple categories (e.g., "rewrite this old code" touches migration + style), read multiple files.


This is another example of what I mean by "overly prescriptive". I don't think you need to tell the model to use the Read tool, that's baseline model behavior at this point.

gadenbuie · 2026-03-31T20:50:37Z

+1. **Use modern tidyverse patterns** - Prioritize dplyr 1.2+ features, native pipe, and current APIs
+2. **Write readable code first** - Optimize only when necessary
+3. **Follow tidyverse style guide** - Consistent naming, spacing, and structure
+4. **Use R MCP tools** - Automatically resolve function documentation and library references without being asked. If the `mcp__r-btw__*` tools are unavailable, fall back to running R help via Bash (see below)


I think mentioning btw via MCP (or the new btw CLI tool) is fine, but we shouldn't assume it's available or that it will be registered with a name that matches Claude Code's conventions.

I wasn't aware of the new btw CLI (thanks for pointing).

gadenbuie · 2026-03-31T20:56:07Z

+| `filter(x != val \| is.na(x))` | `filter_out(x == val)` |
+| `coalesce(x, default)` | `replace_values(x, NA ~ default)` |
+| `na_if(x, val)` | `replace_values(x, val ~ NA)` |
+| `qs::qsave()` / `qs::qread()` | `qs2::qs_save()` / `qs2::qs_read()` |


The qs2 and tidylog recommendations give me pause. They're very valid choices, but they may be over-indexing on your personal preferences.

gadenbuie · 2026-03-31T21:03:27Z

+| `na_if(x, val)` | `replace_values(x, val ~ NA)` |
+| `qs::qsave()` / `qs::qread()` | `qs2::qs_save()` / `qs2::qs_read()` |
+
+## Complete workflow example


I think you could likely drop this section without losing any performance benefits

statzhero · 2026-04-01T16:19:13Z

Thank you for the thoughtful comments. Let me address this first:

Currently this skill seems to land somewhere between reference guide and a workflow skill, but I think the reference guide is likely a better fit.

A fair point, I understand it as: @gadenbuie you prefer a slimmer skill in the form of a reference guide?

On the sources: as I said elsewhere, I regret not keeping better records because this was built over many iterations. The ancestor skill must be Sarah Johnson's Claude R Tidyverse Expert gist. Many references are simply distilled from the tidyverse style guide (chapters 1-4, 9), the dplyr 1.2.0 release post, and dplyr reference documentation. Some idioms are manually added, e.g., "- set.seed() with date-time, never 42".

gadenbuie · 2026-04-01T17:08:36Z

A fair point, I understand it as: @gadenbuie you prefer a slimmer skill in the form of a reference guide?

Yes, I think I prefer a reference guide-style skill, which doesn't necessarily mean "slimmer". More than anything, it means structuring the content to be used as a reference guide and giving the model appropriate instructions to navigate and use the reference material as well as possible.

statzhero · 2026-04-07T21:45:16Z

The latest commits are a major update to the PR. I also did a code audit / review with the respective tools in Claude Code and Codex.

Description is now 98 tokens, and condenses the verbose section in the SKILL.md
Removed anti-patterns table from SKILL.md (all items duplicated in quick reference or reference files)
Removed best practices section from SKILL.md (all items covered in reference files)
Softened the stance on group_by()(still useful when it must persist across multiple operations)
Added tidyselect reference
Added _ placeholder rule for R pipe
Added stringr 1.6.0 case conversion functions
Added readr section: read.csv() to read_csv(), read_tsv(), read_csv2(), with vroom::vroom() noted for large files
Fixed example, so it actually runs

Would appreciate further input from you @gadenbuie or the community.

gadenbuie reviewed Mar 31, 2026

View reviewed changes

statzhero added 6 commits April 7, 2026 17:24

feat: Add tidy-r skill for modern tidyverse R development

656ad7a

Add redundant groupings

3740f72

Fixes most comments

2ece3a3

Remove linebreaks

698c090

Soften grouping stance

73950f1

Audit and improvements

2de0d63

statzhero force-pushed the feat/tidy-r branch from 60b01a4 to 2de0d63 Compare April 7, 2026 21:32

Fix example

416ab0c

		description: \|
		Modern tidyverse patterns, style guide, and migration guidance for R development. Use when writing, reviewing, or refactoring tidyverse code. Covers native pipe, join_by(), .by grouping, pick/across/reframe, filter_out/when_any/when_all, recode_values/replace_values/replace_when, tidy selection, stringr, naming conventions, and migration from base R or older tidyverse APIs.

		When you receive a request, classify it and consult the appropriate reference:

		### Step 1: Classify the request


		### Step 2: Read the reference file(s)

		Use the Read tool to load the relevant reference. For requests that span multiple categories (e.g., "rewrite this old code" touches migration + style), read multiple files.

Conversation

statzhero commented Mar 31, 2026

Purpose

Use cases

Testing

Dependencies

Documentation

Token count

Uh oh!

gadenbuie left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

statzhero commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gadenbuie commented Apr 1, 2026

Uh oh!

statzhero commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

statzhero commented Apr 1, 2026 •

edited

Loading

statzhero commented Apr 7, 2026 •

edited

Loading