Add python support to quarto-authoring skill#44
Add python support to quarto-authoring skill#44AlejandroGomezFrieiro wants to merge 1 commit intoposit-dev:mainfrom
Conversation
gadenbuie
left a comment
There was a problem hiding this comment.
Thanks @AlejandroGomezFrieiro, this looks great. I agree that this was a missing aspect of the skill and appreciate you bringing more Python into the examples!
I took a quick look and have a couple small comments. @mcanouil would you like to review, too?
| R: | ||
|
|
There was a problem hiding this comment.
Rather than showing two examples of the same thing, I'd rather we diversify the examples to use languages in addition to R.
In other words, in this case, we're trying to show the model the set of cell options that are relevant for figures. LLMs can generalize from one example showing the code cell options; we don't need two examples that contain identical options.
That said, I do think it's helpful for there to be diversity in the examples, which also helps the model generalize. So I'd prefer if, in cases like this example, rather than adding a new identical example we were to simply rewrite some examples in other languages. We should prefer R and Python, but it'd be nice to have Julia or another language often used by Quarto authors in the examples.
| Python (pandas — `output: asis` renders markdown table in all formats): | ||
|
|
||
| ````markdown | ||
| ```{python} | ||
| #| label: tbl-summary | ||
| #| tbl-cap: "Summary statistics by group." | ||
| #| output: asis | ||
|
|
||
| print(summary_df.to_markdown(index=False)) | ||
| ``` | ||
| ```` |
There was a problem hiding this comment.
This is a good example of a case where it is quite helpful to have an extra example in Python, because the additional example shows something new.
| Python (pandas — plain `df` auto-displays as HTML table in HTML output): | ||
|
|
||
| ````markdown | ||
| ```{python} | ||
| #| label: tbl-summary | ||
| #| tbl-cap: "Summary statistics by group." | ||
|
|
||
| summary_df | ||
| ``` | ||
| ```` |
There was a problem hiding this comment.
I'd recommend removing this example, but keeping the comment about pandas data frames auto-displaying as HTML in HTML output (either above or below the asis-output example)
|
@gadenbuie I'll take a look this week-end. A quick comment is to change examples to not even use actual code. Briefly, this means something like: ```{language}
#| option-one: value
CODE
```
where "option" use the language specific inline comment symbol plus pipe (`|`).
Quarto has three native engines: `knitr`, `jupyter`, and `julia`.
`jupyter` engine default to `python` Jupyter kernel.
```yaml
engine: ...
kernel: ...
```Also worth noting that libraries' behaviour inside Quarto depends on the libraries themselves and on the engine/kernel being used (e.g., Python code outputs via |
| YAML metadata configuration, and Quarto extensions. Also covers converting and | ||
| migrating R Markdown (.Rmd), bookdown, blogdown, xaringan, and distill projects | ||
| to Quarto, and creating Quarto websites, books, presentations, and reports. | ||
| Writing and authoring Quarto documents (.qmd) with R (knitr) and Python (Jupyter) |
There was a problem hiding this comment.
Two issues with the description rewrite.
First, "R (knitr) and Python (Jupyter) engines" is not quite accurate. Knitr runs any language registered as a knitr language engine (Python, Julia, Bash, SQL, Stan, and more), and jupyter runs any registered kernel (R via IRkernel, Julia via IJulia, Bash, and more). Pinning the description to two language/engine pairs paints Quarto as narrower than it is. There are three native engines (knitr, jupyter, julia), and in 1.9 the julia engine was refactored to sit on top of the new engine-extension mechanism, so it is both a native engine and the reference implementation for third-party engine extensions. The skill is pinned to 1.9.36 (line 17), so leaving julia and engine extensions out is a dated inaccuracy. Quarto is language-agnostic anyway, so any enumeration of languages will bit-rot the next time a new kernel or engine ships. And since Quarto 1.9 ships engine extensions, the set of computing engines behind a code cell is no longer closed: in principle anything can end up there, so engines are the stable abstraction to describe in the skill.
Second, the trigger wording. "Migrating R Markdown, bookdown, ..." narrows a key trigger, because users are far more likely to say "convert my old .Rmd" than "migrate my .Rmd", and the description is the main signal the LLM uses to pick this skill. Please keep "converting and migrating".
One more thing: the rewrite shortens the keyword surface ("code cell options, figure and table captions, cross-references, callout blocks (notes, warnings, tips), citations and bibliography" becomes "code cells, figures, tables, cross-references, callouts, citations"). Shorter descriptions match less reliably on keyword-rich queries, so the longer form was closer to what I would want.
Suggested rewording that keeps the enumerative surface, restores "converting", adds the missing .ipynb trigger, and refers to engines rather than languages:
description: >
Writing and authoring Quarto documents (.qmd) with the knitr, jupyter, and
julia engines (and any Quarto 1.9+ engine extensions). Covers code cell
options, figure and table captions, cross-references, callout blocks
(notes, warnings, tips), citations and bibliography, page layout and
columns, Mermaid diagrams, YAML metadata configuration, and Quarto
extensions. Also covers converting and migrating R Markdown (.Rmd),
bookdown, blogdown, xaringan, distill projects, and Jupyter notebooks
(.ipynb) to Quarto, and creating Quarto websites, books, presentations,
and reports.| ```{python} | ||
| #| label: tbl-summary | ||
| #| tbl-cap: "Summary statistics by group." | ||
| #| output: asis |
There was a problem hiding this comment.
output: asis shows up seven times across this PR (here, in tables.md, and in layout.md) without a single line explaining what it does. The skill will end up learning "add output: asis when doing Python tables" as a ritual rather than understanding the underlying model.
Briefly, asis tells Quarto to pass the cell output through as raw markdown that should not be processed further. That matters because:
- The content must already be valid markdown (pipe table, headings, prose).
- Non-markdown content needs a raw block:
```{=html},```{=latex}, and so on. - It interacts with
tbl-capin format-specific ways. Captions onoutput: asismarkdown tables do not behave identically to knitr's table path.
Could we add one explainer subsection in this file, under "Execution Options", that defines asis and its requirements, then reference it from the table and layout examples instead of repeating the option verbatim in every Python snippet?
| ````markdown | ||
| ```{python} | ||
| #| label: slow-computation | ||
| #| cache: true |
There was a problem hiding this comment.
Cell-level #| cache: true is a knitr feature. With the jupyter engine, caching is handled by jupyter-cache and is enabled at document level via execute: cache: true in YAML (or freeze: auto in _quarto.yml for project-level freezing). Individual Python cells can only opt out with #| cache: false. They cannot opt in.
As written this example is silently inert: it will not cache, and it will teach the skill a pattern that does nothing, which is worse than omitting Python caching entirely.
Suggest dropping the code example and replacing it with a short prose note pointing at https://quarto.org/docs/projects/code-execution.html#cache (use the .llms.md URL instead or .html).
| #| code-annotations: hover | ||
|
|
||
| import pandas as pd # <1> | ||
| df = pd.read_csv("data.csv") # <2> |
There was a problem hiding this comment.
Minor, but the example is self-inconsistent: it reads data.csv and then uses mtcars column names (mpg, cyl). Either load the dataset under a filename that matches (mtcars.csv) or switch to a source that actually has those columns, like seaborn.load_dataset("mpg"). As-is, the annotation "Load the dataset from a CSV file" leaves the skill guessing at what is in data.csv.
| @@ -0,0 +1,154 @@ | |||
| # Converting Jupyter Notebooks to Quarto | |||
There was a problem hiding this comment.
Could we scope this reference down quite a bit? Several sections work against the skill's purpose.
quarto convertis bidirectional..qmdto.ipynbis just as valid as.ipynbto.qmd. The current doc presents it as one-way, which is factually wrong. See inline below.- Scope drift into notebook JSON editing. This skill is about authoring, not hand-editing
.ipynbJSON. The "Cell Option Migration" section with JSON examples (jupyter.source_hidden, and so on) and the "Common Metadata Mappings" table push the skill towards editing notebook JSON, whenquarto converthandles that automatically. The mapping table is not even a 1:1 mapping (those are nbconvert tags, not Quarto options), so training on it would teach the wrong behaviour. - Edit-experience and version-control comparison table. Partially accurate, but not about authoring. If we want that framing, it belongs as a one-line note rather than a feature table.
- Duplication with existing references. "Controlling Re-execution", "freeze for Collaborative Projects", and "Specifying the Jupyter Kernel" all repeat content already in
code-cells.mdandyaml-front-matter.md. Since this file is only loaded when the user asks to convert an ipynb (perSKILL.md:76), the duplicated content will never be read on the normal authoring path and brings nothing more than increase token cost when "converting". - Attached to the wrong trigger. Everything except the conversion instructions is irrelevant to "convert ipynb to qmd" but would be useful for authoring, so the possibly useful content (not already mentioned in other files) is sitting behind a trigger that will not fire for authoring tasks.
Suggest cutting this file to roughly 30 lines: direct rendering of .ipynb, bidirectional quarto convert, and one paragraph on why .qmd is usually preferred for version control. Anything kernel or execution related would move into code-cells.md or yaml-front-matter.md so it is reachable during authoring.
| Use the Quarto CLI to convert: | ||
|
|
||
| ```bash | ||
| quarto convert notebook.ipynb |
There was a problem hiding this comment.
quarto convert is bidirectional: quarto convert notebook.qmd also works and produces a .ipynb. Worth showing alongside the .ipynb to .qmd direction so the skill does not invent a more awkward path when the user asks for .qmd to .ipynb.
| ``` | ||
| ```` | ||
|
|
||
| ### Python Examples |
There was a problem hiding this comment.
This section now walks through four Python plotting libraries (matplotlib, seaborn, plotnine, plotly) with usage examples. Per my earlier note on the PR thread, library-specific behaviour is out of scope for a Quarto authoring skill: behaviour depends on the library, engine, and format matrix (for example plotly's fig.show() differs between jupyter and knitr+reticulate, and plotnine's figure-size inheritance is non-obvious), and that matrix is hard to keep accurate over time.
Could we collapse this to one minimal matplotlib example showing the Quarto cell options (which is what this file is actually about), and let the library docs cover library behaviour? The upstream Quarto docs take the same approach: https://quarto.org/docs/computations/python.html#overview.
| ``` | ||
| ```` | ||
|
|
||
| ### Python with polars |
There was a problem hiding this comment.
Two concerns with the polars and great-tables sections.
df.to_pandas().to_markdown(index=False)is not a pattern to teach. Round-tripping polars through pandas just to render markdown is wasteful and it will disappear the moment polars' own output improves. If we really need to show polars to markdown,tabulate(df.rows(named=True), headers="keys", tablefmt="pipe")is closer to idiomatic.- Scope. Same concern as
figures.md: library-specific instructions (polars, great-tables) expand the surface area the skill has to maintain. This file should document Quarto's table options and cross-referencing mechanics and leave library choice to the author.
| #| tbl-cap: "Long table." | ||
| #| output: asis | ||
|
|
||
| print(long_df.to_markdown(index=False)) |
There was a problem hiding this comment.
The caption "use output: asis so the markdown table is recognized across pages" is not correct. longtable is a LaTeX package, and in the R example above it is knitr::kable(..., longtable = TRUE) that triggers the LaTeX longtable environment. A pandas or polars markdown table emitted via output: asis is processed as a standard Pandoc pipe-table and will not automatically break across pages in PDF.
The honest Python equivalents for PDF long tables are:
- use
tabulatewithtablefmt="latex_longtable"and emit the raw LaTeX in a```{=latex}block, or - render with
great_tables(GT.as_latex(self, use_longtable=False, tbl_pos=None)) and let the format engine handle pagination, or - simply document that long-table pagination is not handled automatically on the jupyter engine.
Suggest either dropping the "recognized across pages" claim or deleting the Python long-table example entirely.
Finally, longtable is a very specific LaTeX case with several rough edges in the LaTeX ecosystem. I don't feel it's a great idea to make the skill go into those very dark and deep waters as maintenance on this part will require validation in LaTeX and agains the libraries.
Authors wishing for this should look in their library of choice if/when they choose to output to LaTeX which I don't expect to be the default when there is Typst.
| progress: true | ||
| ``` | ||
|
|
||
| ## Execution Engine |
There was a problem hiding this comment.
Two issues.
- Engine is not a kernel.
engine: knitr # R (knitr)andengine: jupyter # Python, ...conflates engines with languages. Knitr handles Python, Julia, Bash, SQL, Stan via its language engines; jupyter handles any registered kernel. Quarto ships three native engines (knitr,jupyter,julia), and in 1.9 the julia engine was refactored to sit on top of the new engine-extension mechanism, so the set of engines is no longer closed, and third-party extensions can register more. Both the third native engine and the extension mechanism are missing here. - Location. Execution options (
cache,freeze,echo, and so on) affect code cells and are already authored incode-cells.md. Duplicating them here risks drift. Either move them over tocode-cells.mdand leave aSee:pointer, or drop the new "Execution Engine" subsection and fold engine selection into the existing "Execution Options" block.
Suggested rewording of the engine block, if we keep it here:
engine: knitr # native; runs R plus any knitr language engine (Python, Julia, Bash, SQL, Stan, ...)
engine: jupyter # native; runs any registered Jupyter kernel
engine: julia # native; implemented on top of the engine-extension mechanism since Quarto 1.9Plus a note that Quarto 1.9+ supports engine extensions, so additional engines can be registered the same way as any other Quarto extension.
(the engine extension will allow Julia to move out from Quarto development cycle and will allow the maintainers to ship new versions more easily)
|
(Context for my review above, which I accidentally submitted with an empty body.) Thanks @AlejandroGomezFrieiro for picking this up, and a meta suggestion before the inline comments. I would like to propose widening the scope of this PR from "add Python support" to "make the skill language-agnostic, as it should have been from the start". The original R-centric framing of this skill was an oversight in my own initial proposal: Quarto was already language-agnostic at the time, and I should have written it that way rather than anchoring everything on R and knitr. Adding Python alongside R fixes part of that oversight but also bakes the same bias in a new shape (now two privileged languages instead of one), which is why so much of this PR ends up as R/Python pairs. Quarto 1.9 ships engine extensions, so the set of computing engines behind a code cell is no longer closed: knitr, jupyter, julia, and anything a third-party extension registers, all sit behind the same Concretely this touches most of the reference files ( Individual issues follow inline. Hope this helps understand my initial oversight when writing the skill and how to correct the bias instead of building on it. |
|
Thanks for the review, I will look into the changes in a few days. Making it language agnostic sounds great. |
|
@AlejandroGomezFrieiro Let me know how you want to proceed as it's a consequence of my initial oversight. I can take over and make another PR to fix all of the above if needed and if you want. |
|
@mcanouil I don't have lots of time at hand and did this to be able to use it myself. If you want to take over I would gladly accept that and can test it out once ready. |
|
Ok, I'll do this and ping you on the new PR making the skill more agnostic and more engine-aware. Keeping the PR until I supersede it. |
The current skill outlines well how to use R to author quarto documents, but does not include Python which is another greatly supported language.