From 8a30b5780cef4104dc025a5942a55492efebeb0c Mon Sep 17 00:00:00 2001 From: Yiqing Xu <7664920+xuyiqing@users.noreply.github.com> Date: Sat, 21 Mar 2026 16:35:38 -0700 Subject: [PATCH 01/10] v1.3.0: add type = "network" for k-partite graph visualization New feature: visualize panel data connectivity as a k-partite graph (Correia 2016). Identifies singletons, connected components, and non-unique observations via weighted edges. Supports 2+ FE dimensions, formula-based missingness, and user-tunable edge/node styling. New parameters: show.singletons, highlight.components, layout, node.size, show.labels, edge.color, edge.alpha, edge.width, singleton.color. igraph added to Suggests. Tutorial chapter 4 added. Version bumped to 1.3.0. Co-Authored-By: Claude Opus 4.6 (1M context) --- ARCHITECTURE.md | 369 ++++++++++++---------------- DESCRIPTION | 6 +- NAMESPACE | 5 +- R/panelView.R | 141 +++++++++-- R/plot-network.R | 475 +++++++++++++++++++++++++++++++++++ man/panelView.Rd | 18 +- tests/testthat/test-graph.R | 476 ++++++++++++++++++++++++++++++++++++ tutorial/01-treat.Rmd | 2 +- tutorial/04-network.Rmd | 367 +++++++++++++++++++++++++++ tutorial/_quarto.yml | 1 + tutorial/aa-changelog.Rmd | 17 ++ tutorial/index.qmd | 64 ++++- tutorial/references.bib | 8 + 13 files changed, 1699 insertions(+), 250 deletions(-) create mode 100644 R/plot-network.R create mode 100644 tests/testthat/test-graph.R create mode 100644 tutorial/04-network.Rmd diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index f529f39..4f20548 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -1,250 +1,203 @@ -# ARCHITECTURE.md — panelView +# Architecture — panelView -## Overview - -panelView is an R package for visualizing panel (time-series cross-sectional) data. It provides three main functionalities: (1) treatment status and missing value heatmaps, (2) outcome temporal dynamics plots, and (3) bivariate treatment-outcome relationship plots. The package exports a single function, `panelview()`, which accepts approximately 50 parameters and dispatches internally to one of three specialized plot functions. panelView was published in the Journal of Statistical Software (doi:10.18637/jss.v107.i07). Authors: Hongyu Mou, Licheng Liu, and Yiqing Xu. Current version: 1.2.1. - -## File Structure - -``` -panelView/ -├── R/ -│ ├── panelView.R — Entry point: input parsing, validation, data reshaping, dispatch (~1160 lines) -│ ├── plot-treat.R — .pv_plot_treat(): treatment status / missing value heatmap (~340 lines) -│ ├── plot-outcome.R — .pv_plot_outcome() + .pv_subplot(): outcome time-series plots (~1000 lines) -│ ├── plot-bivariate.R — .pv_plot_bivariate(): dual-axis treatment + outcome plots (~430 lines) -│ └── zzz.r — .onAttach() startup message (~5 lines) -├── data/ -│ └── panelView.RData — Bundled datasets: simdata, turnout, capacity -├── man/ — Rd documentation files -├── tests/ -│ ├── testthat.R — Test runner -│ └── testthat/ — Test suite -├── tutorial/ — Quarto book (chapters: treat, outcome, bivariate, changelog) -│ ├── index.qmd -│ ├── 01-treat.Rmd -│ ├── 02-outcome.Rmd -│ ├── 03-bivariate.Rmd -│ ├── aa-changelog.Rmd -│ ├── _quarto.yml -│ └── references.bib -├── DESCRIPTION — Package metadata (v1.2.1) -├── NAMESPACE — Single export: panelview; imports from ggplot2, gridExtra, grid, dplyr, stats -└── LICENSE — MIT license -``` - -## Entry Point and Dispatch - -`panelview()` is the only exported function. It accepts a data frame in long form along with approximately 50 parameters controlling variable selection, plot type, appearance, and behavior. - -### Formula Parsing - -The function supports two interfaces for specifying variables: - -- **Formula interface**: `panelview(data, Y ~ D + X1 + X2, index = c("unit", "time"))` — left side is the outcome, first right-side variable is the treatment, remaining variables are covariates. -- **Explicit variable names**: `panelview(data, Y = "y", D = "d", index = c("unit", "time"))`. -- **Special case**: `Y ~ 1` specifies no treatment variable; the function sets `ignore.treat = TRUE`. +> Generated by scriber for run `REQ-bipartite-graph-20260321-125712` on 2026-03-21. -### Type Normalization - -The `type` parameter uses `match.arg()` for partial matching. Recognized values and aliases: - -- `"treat"` — treatment status heatmap -- `"missing"` (alias `"miss"`) — internally converts to `type = "treat"` with `ignore.treat = 1` -- `"outcome"` (alias `"raw"`) — outcome time-series plot -- `"bivariate"` (alias `"bivar"`) — dual-axis treatment + outcome plot - -### Dispatch Mechanism - -After all input parsing, validation, and data reshaping, the function captures its entire local environment as a list and routes to the appropriate plot function: - -```r -s <- as.list(environment()) -if (type == "outcome") { - .pv_plot_outcome(s) -} else if (type == "treat") { - .pv_plot_treat(s) -} else if (type == "bivariate") { - .pv_plot_bivariate(s) -} -``` - -Each plot function receives the full state via `s` and unpacks it with `with(s, { ... })`. This avoids maintaining parallel parameter lists across functions — each plotter has access to all variables computed during the data processing phase. - -### Flow Diagram +## Overview -```mermaid -flowchart TD - A["panelview(data, formula, type, ...)"] --> B[Parse formula / explicit variable names] - B --> C[Validate inputs] - C --> D[Reshape data to TT × N matrices] - D --> E[Construct obs.missing matrix] - E --> F["s ← as.list(environment())"] - F --> G{type?} - G -->|"treat" / "missing"| H[".pv_plot_treat(s)"] - G -->|"outcome"| I[".pv_plot_outcome(s)"] - G -->|"bivariate"| J[".pv_plot_bivariate(s)"] -``` +panelView is an R package for visualizing panel (time-series cross-sectional) data. It provides four main functionalities: (1) treatment status and missing value heatmaps, (2) outcome temporal dynamics plots, (3) bivariate treatment-outcome relationship plots, and (4) bipartite graph visualization of panel structure. The package exports a single function, `panelview()`, which accepts approximately 56 parameters and dispatches internally to one of four specialized plot functions. panelView was published in the Journal of Statistical Software (doi:10.18637/jss.v107.i07). Authors: Hongyu Mou, Licheng Liu, and Yiqing Xu. Current version: 1.2.1. -## Data Processing Pipeline +--- -Raw input data passes through eight stages before reaching a plot function. +## Module Structure ```mermaid -flowchart LR - S1[1. Input Coercion] --> S2[2. Large-N Defaults] - S2 --> S3[3. Missing Data] - S3 --> S4[4. Sorting] - S4 --> S5[5. Matrix Construction] - S5 --> S6[6. Treatment Classification] - S6 --> S7[7. obs.missing Matrix] - S7 --> S8[8. Collapse History] - S8 --> P[Plot Function] +%%{init: {'theme': 'neutral'}}%% +graph TD + subgraph API["API Layer"] + PV["panelView.R — entry"] + end + + subgraph Plot["Plot Functions"] + PT["plot-treat.R"] + PO["plot-outcome.R"] + PB["plot-bivariate.R"] + PG["plot-graph.R"] + end + + subgraph Data["Data / Assets"] + DAT["panelView.RData"] + end + + subgraph Infra["Infrastructure"] + ZZ["zzz.r — startup"] + NS["NAMESPACE"] + DESC["DESCRIPTION"] + end + + subgraph Test["Tests"] + TP["test-panelview.R"] + TG["test-graph.R"] + end + + subgraph Docs["Documentation"] + MAN["man/panelView.Rd"] + TUT["tutorial/ (Quarto)"] + end + + PV --> PT + PV --> PO + PV --> PB + PV --> PG + PV --> DAT + + style PV fill:#1e90ff,stroke:#1565c0,color:#fff + style PG fill:#1e90ff,stroke:#1565c0,color:#fff + style TG fill:#1e90ff,stroke:#1565c0,color:#fff + style DESC fill:#1e90ff,stroke:#1565c0,color:#fff + style NS fill:#1e90ff,stroke:#1565c0,color:#fff + style MAN fill:#1e90ff,stroke:#1565c0,color:#fff ``` -### Stage 1: Input Coercion +### Module Reference -The data argument is coerced to a plain `data.frame` if it has multiple classes (e.g., tibble). Factor unit IDs are inspected: if the factor levels are numeric strings they are converted to numeric; otherwise they are converted to character. +| Module / File | Layer | Purpose | Key Exports / Functions | Changed | +| --- | --- | --- | --- | --- | +| `R/panelView.R` | API | Entry point: input parsing, validation, data reshaping, dispatch (~1200 lines) | `panelview()` (exported) | **yes** | +| `R/plot-treat.R` | Plot | Treatment status / missing value heatmap (~340 lines) | `.pv_plot_treat()` | no | +| `R/plot-outcome.R` | Plot | Outcome time-series plots (~1000 lines) | `.pv_plot_outcome()`, `.pv_subplot()` | no | +| `R/plot-bivariate.R` | Plot | Dual-axis treatment + outcome plots (~430 lines) | `.pv_plot_bivariate()` | no | +| `R/plot-graph.R` | Plot | Bipartite graph visualization (~300 lines) | `.pv_build_bipartite_graph()`, `.pv_layout_graph()`, `.pv_plot_graph()` | **yes** (new) | +| `R/zzz.r` | Infra | `.onAttach()` startup message | — | no | +| `data/panelView.RData` | Data | Bundled datasets: simdata, turnout, capacity | — | no | +| `man/panelView.Rd` | Docs | Rd documentation for `panelview()` | — | **yes** | +| `DESCRIPTION` | Infra | Package metadata (v1.2.1) | — | **yes** | +| `NAMESPACE` | Infra | Export + imports | — | **yes** | +| `tests/testthat/test-panelview.R` | Test | Existing test suite (36 tests) | — | no | +| `tests/testthat/test-graph.R` | Test | Graph visualization tests (64 assertions) | — | **yes** (new) | +| `tutorial/` | Docs | Quarto book (chapters: treat, outcome, bivariate, changelog) | — | no | -### Stage 2: Large-N Defaults +--- -When the number of unique units N exceeds 500, the function applies adaptive defaults: +## Function Call Graph -- `collapse.history` is set to `TRUE` for treatment plots (groups units by identical treatment sequences). -- Unless `display.all = TRUE`, a random sample of 500 units is drawn for display. -- Grid lines are auto-disabled when N exceeds 300. - -### Stage 3: Missing Data Handling - -Two modes controlled by the `leave.gap` parameter: - -- `leave.gap = FALSE` (default): `na.omit()` drops all rows with any missing values. -- `leave.gap = TRUE`: The panel is expanded to a balanced grid using `expand.grid()` + `merge()`, filling gaps with NA. Units that are entirely missing are dropped. - -### Stage 4: Sorting - -Data is sorted by `(unit_id, time)` to ensure correct matrix reshaping. - -### Stage 5: Panel Balancing and Matrix Construction - -Long-form data is reshaped into TT x N matrices (where TT = number of time periods, N = number of units): - -- `Y` — outcome matrix -- `D` — treatment matrix -- `I` — observation indicator matrix (1 = observed, 0 = missing) - -### Stage 6: Treatment Classification - -For binary treatment (`d.bi = TRUE`): +```mermaid +%%{init: {'theme': 'neutral'}}%% +graph TD + PV["panelview()"] -- A cumulative sum (`cumsum`) is applied column-wise to enforce the "once treated, always treated" assumption. The original treatment matrix is preserved in `D.old` for plots that need reversal information. -- `T0` — first treatment period per unit. -- `T1` — treatment duration. -- `staggered` — whether treatment timing varies across units (vs. simultaneous). -- `DID` — whether all treated units receive treatment at the same time. -- `unit.type` — classification of each unit: 1 = always control, 2 = always treated, 3 = treatment status changed (reversal). + PV -->|"type = treat/missing"| PPT[".pv_plot_treat(s)"] + PV -->|"type = outcome"| PPO[".pv_plot_outcome(s)"] + PV -->|"type = bivariate"| PPB[".pv_plot_bivariate(s)"] + PV -->|"type = graph"| PPG[".pv_plot_graph(s)"] -### Stage 7: obs.missing Matrix Construction + PPO --> SUB[".pv_subplot()"] -A unified integer-coded matrix consumed by all plot types: + PPG --> BLD[".pv_build_bipartite_graph()"] + PPG --> LAY[".pv_layout_graph()"] -| Code | Meaning | -|------|---------| -| `-200` | Missing observation | -| `-1` | Under control (or observed, when treatment is ignored) | -| `0` | Treated, pre-treatment period | -| `1` | Treated, post-treatment period | + BLD -->|igraph| GFD["graph_from_data_frame()"] + BLD -->|igraph| DEG["degree()"] + BLD -->|igraph| CMP["components()"] -For non-binary treatment with more than 2 levels, actual treatment level values are stored. Missing observations are coded as NA or -200. The sentinel value -200 was chosen to be far from any valid treatment level. + LAY -->|igraph| LFR["layout_with_fr()"] + LAY -->|igraph| LBI["layout_as_bipartite()"] + LAY -->|igraph| LCI["layout_in_circle()"] -### Stage 8: Collapse History (Optional) + PPG -->|grDevices| CHL["chull()"] + PPG -->|ggplot2| GEO["geom_polygon/segment/point/text"] -When `collapse.history = TRUE`, units are grouped by identical treatment sequences using `dplyr::group_by(across(everything()))`. The unit dimension is replaced by unique-history counts, sorted by cohort size in descending order. This reduces visual clutter in large panels. + style PV fill:#1e90ff,stroke:#1565c0,color:#fff + style PPG fill:#1e90ff,stroke:#1565c0,color:#fff + style BLD fill:#1e90ff,stroke:#1565c0,color:#fff + style LAY fill:#1e90ff,stroke:#1565c0,color:#fff +``` -## Plot Type Catalog +### Function Reference -### Treatment Status Plot (`type = "treat"`) +| Function | Defined In | Called By | Calls | Changed | Purpose | +| --- | --- | --- | --- | --- | --- | +| `panelview()` | `R/panelView.R` | user (exported) | `.pv_plot_treat`, `.pv_plot_outcome`, `.pv_plot_bivariate`, `.pv_plot_graph` | **yes** | Single entry point: parse inputs, build I matrix, dispatch to plotter | +| `.pv_plot_treat()` | `R/plot-treat.R` | `panelview` | ggplot2 | no | Treatment status / missing value heatmap | +| `.pv_plot_outcome()` | `R/plot-outcome.R` | `panelview` | `.pv_subplot`, ggplot2, gridExtra | no | Outcome time-series plots | +| `.pv_subplot()` | `R/plot-outcome.R` | `.pv_plot_outcome` | ggplot2 | no | Helper for outcome subplots | +| `.pv_plot_bivariate()` | `R/plot-bivariate.R` | `panelview` | ggplot2 | no | Dual-axis bivariate plot | +| `.pv_plot_graph()` | `R/plot-graph.R` | `panelview` | `.pv_build_bipartite_graph`, `.pv_layout_graph`, igraph, ggplot2, grDevices | **yes** (new) | Bipartite graph visualization with component hulls and singleton highlights | +| `.pv_build_bipartite_graph()` | `R/plot-graph.R` | `.pv_plot_graph` | igraph | **yes** (new) | Construct igraph bipartite graph from observation indicator matrix I | +| `.pv_layout_graph()` | `R/plot-graph.R` | `.pv_plot_graph` | igraph | **yes** (new) | Compute 2D layout coordinates via FR/bipartite/circle algorithms | -- **Function**: `.pv_plot_treat()` in `plot-treat.R` -- **Visualization**: Heatmap using `geom_tile()` where each cell represents a unit-period. -- **Color schemes**: - - Binary treatment: `#B0C4DE` (light steel blue) = controls, `#4671D5` (medium blue) = treated pre-treatment, `#06266F` (dark blue) = treated post-treatment, `#FFFFFF` (white) = missing. - - Multi-level discrete treatment: Palette of 12 colors starting with `#66C2A5`, `#FC8D62`, `#8DA0CB`, etc. - - Continuous treatment: Bins into 4 intervals with a blue gradient from `#c6dbef` to `#042b53`. -- **Key features**: - - `by.timing = TRUE` sorts treated units by treatment timing (T0). - - `collapse.history = TRUE` collapses to unique treatment histories with unit counts as y-axis labels. - - `pre.post = TRUE` distinguishes pre/post periods for treated units. - - `ignore.treat = TRUE` or `type = "missing"` shows only observed vs. missing. - - Unit labels are auto-hidden when N >= 200. - - Grid lines controlled by `gridOff`. +--- -### Outcome Plot (`type = "outcome"`) +## Data Flow ```mermaid -flowchart TD - O[".pv_plot_outcome(s)"] --> IT{ignore.treat?} - IT -->|TRUE| A[Single series, all units overlaid] - IT -->|FALSE| BG{by.group?} - BG -->|TRUE| C["Split by unit.type: control / treated / changed"] - BG -->|FALSE| BC{by.cohort?} - BC -->|TRUE| D["Average by treatment-history cohort"] - BC -->|FALSE| S{staggered?} - S -->|YES| E["Pre/post treated + control"] - S -->|NO| F["Control vs treated overlaid"] - C --> L["grid.arrange() multi-panel"] - D --> L +%%{init: {'theme': 'neutral'}}%% +graph TD + Input["panelview(data, formula, type, ...)"] + Input --> Parse["Parse formula / variable names"] + Parse --> TypeQ{{"type = graph?"}} + + TypeQ -- no --> Validate["Validate vars, handle missing"] + Validate --> Sort["Sort by (unit, time)"] + Sort --> Matrix["Build I, Y, D matrices (TT x N)"] + Matrix --> Treat["Treatment classification"] + Treat --> ObsMiss["Build obs.missing matrix"] + ObsMiss --> Collapse["Collapse history (optional)"] + Collapse --> Dispatch{{"type?"}} + Dispatch -->|treat| PlotT[".pv_plot_treat(s)"] + Dispatch -->|outcome| PlotO[".pv_plot_outcome(s)"] + Dispatch -->|bivariate| PlotB[".pv_plot_bivariate(s)"] + + TypeQ -- yes --> GraphGuard["Set graph defaults, check igraph"] + GraphGuard --> Sort2["Sort, build I matrix only"] + Sort2 --> EarlyExit["Early exit dispatch"] + EarlyExit --> BuildG[".pv_build_bipartite_graph(I)"] + BuildG --> LayoutG[".pv_layout_graph(g, layout)"] + LayoutG --> PlotG["Build ggplot: hulls + edges + nodes"] + PlotG --> ReturnG["Return invisible(list)"] + + style TypeQ fill:#1e90ff,stroke:#1565c0,color:#fff + style GraphGuard fill:#1e90ff,stroke:#1565c0,color:#fff + style EarlyExit fill:#1e90ff,stroke:#1565c0,color:#fff + style BuildG fill:#1e90ff,stroke:#1565c0,color:#fff + style LayoutG fill:#1e90ff,stroke:#1565c0,color:#fff + style PlotG fill:#1e90ff,stroke:#1565c0,color:#fff + style ReturnG fill:#1e90ff,stroke:#1565c0,color:#fff ``` -- **Functions**: `.pv_plot_outcome()` and helper `.pv_subplot()` in `plot-outcome.R` -- **Visualization**: Time-series line plots (continuous outcome) or jitter plots (discrete outcome). -- **Sub-modes**: - - `ignore.treat = TRUE`: Single series with all units overlaid in grey. - - `ignore.treat = FALSE, by.group = FALSE`: Mixed plot with control and treated overlaid. Staggered DID distinguishes pre/post treated. - - `by.group = TRUE`: Separate subplots for "Always Under Control", "Always Under Treatment", "Treatment Status Changed" (uses `unit.type`). - - `by.group.side = TRUE`: Arranges subplots horizontally instead of vertically. - - `by.cohort = TRUE`: Averages outcome by treatment-history cohort (maximum 20 cohorts). -- **Color defaults**: Grey (`#5e5e5e50`) for control, salmon/red (`#FC8D62`, `red`) for treatment. -- **Multi-panel layout**: Uses `gridExtra::grid.arrange()` with a shared legend extracted via `ggplotGrob()`. -- **Treatment timing**: Vertical white line at treatment onset for DID data; optional shading via `shade.post`. - -### Bivariate Plot (`type = "bivariate"`) - -- **Function**: `.pv_plot_bivariate()` in `plot-bivariate.R` -- **Visualization**: Dual-axis plot showing outcome (primary y-axis) and treatment (secondary y-axis) over time. -- **Style selection**: The `style` parameter controls the geom for each variable: `"l"` (line), `"b"` (bar), `"c"` (connected = line + points). Default style depends on the treatment/outcome type combination (e.g., discrete treatment + continuous outcome defaults to bar + line). -- **Axis scaling**: Uses a linear transformation (`coeff`) computed by solving a 2x2 system to map treatment values onto the outcome axis range. -- **Sub-modes**: - - `by.unit = FALSE` (default): Plots aggregate means across all units. - - `by.unit = TRUE`: Faceted by unit using `facet_wrap(~input.id, ncol = 4)`. -- **Color defaults**: `"dodgerblue4"` and `"lightsalmon2"` (or black/grey when `theme.bw = TRUE`). - -## Key Design Decisions - -1. **Single-function API**: The package exports only `panelview()`. All complexity is hidden behind approximately 50 parameters. This provides a simple, discoverable interface for users at the cost of a large monolithic entry function. - -2. **Environment-list dispatch pattern**: Instead of passing individual arguments to plot functions, the entire local environment is captured as a list (`s <- as.list(environment())`) and passed to each plotter. Plotters unpack via `with(s, {...})`. This avoids maintaining parallel parameter lists across four functions, but means each plotter has access to all variables, not just those it needs. +--- -3. **Monolith split into four files**: The original monolithic function was split into `panelView.R` (entry + data processing) and three plot-type files. Each file contains exactly one internal function (plus the `.pv_subplot` helper in plot-outcome.R). This separation makes the plot logic easier to maintain independently. +## Architectural Patterns -4. **obs.missing unified encoding**: A single integer-coded matrix (-200, -1, 0, 1) represents all observation states. This simplifies color mapping and legend construction across plot types. The sentinel value -200 is chosen to be far from valid treatment levels. +- **Single-function API**: The package exports only `panelview()`. All complexity is hidden behind approximately 56 parameters. This provides a simple, discoverable interface at the cost of a large monolithic entry function. +- **Environment-list dispatch**: The entire local environment is captured as a list (`s <- as.list(environment())`) and passed to each plotter. Plotters unpack via `with(s, {...})` (existing types) or explicit extraction (graph type). This avoids maintaining parallel parameter lists. +- **Early-exit for graph type**: The graph type short-circuits after I matrix construction, bypassing all treatment processing, obs.missing construction, and collapse-history logic. This keeps the graph path clean and avoids interacting with code irrelevant to graph visualization. +- **Conditional dependency**: igraph is in Suggests (not Imports), loaded via `requireNamespace()` with a clear install message. All igraph calls use `igraph::` prefix. This keeps the package lightweight for users who do not need graph visualization. +- **Internal function prefix convention**: All plot functions use the `.pv_` prefix (e.g., `.pv_plot_graph`, `.pv_build_bipartite_graph`), making them clearly internal and non-exported. +- **Bipartite graph from panel data**: The observation indicator matrix I (TT x N) is naturally a bipartite incidence matrix. Units and time periods form disjoint node sets; edges represent observations. This reveals structural properties (connectivity, singletons) relevant to fixed-effect identification (Correia 2016). -5. **"Once treated, always treated" enforcement**: For binary treatment, `cumsum()` is applied column-wise to the D matrix, converting any treatment reversal into permanent treatment. The original D is preserved in `D.old` for plots that need reversal information. - -6. **Collapse history for large panels**: For N > 500, units are grouped by treatment sequence to reduce visual clutter. The count of units per group is displayed as the y-axis label. - -7. **Auto-adaptation for large panels**: The package automatically adjusts for large N: disables grid lines (N > 300), samples 500 units (N > 500), enables collapse.history (N > 500 for treat plots), and hides unit labels (N >= 200). +--- ## Dependencies | Package | Role | Key functions used | -|---------|------|--------------------| -| ggplot2 (>= 3.4.0) | Core plotting engine | `ggplot`, `geom_tile`, `geom_line`, `geom_point`, `geom_col`, `geom_jitter`, `geom_ribbon`, `geom_bar`, `geom_rect`, `geom_boxplot`, `geom_density`, `scale_*_manual`, `facet_wrap`, `sec_axis`, `theme`, `guides` | +| --- | --- | --- | +| ggplot2 (>= 3.4.0) | Core plotting engine | `ggplot`, `geom_tile`, `geom_line`, `geom_point`, `geom_col`, `geom_jitter`, `geom_ribbon`, `geom_bar`, `geom_rect`, `geom_boxplot`, `geom_density`, `geom_segment`, `geom_polygon`, `geom_text`, `scale_*_manual`, `facet_wrap`, `sec_axis`, `theme`, `guides` | | gridExtra | Multi-panel layout | `grid.arrange`, `arrangeGrob` | | grid | Text annotations for panel titles | `textGrob`, `gpar` | | dplyr (>= 1.0.0) | Collapse-history grouping | `coalesce`, `summarise`, `group_by`, `across`, `everything`, `n` | | stats (base) | Data utilities | `na.omit`, `sd`, `var`, `ave`, `aggregate`, `approxfun` | +| grDevices (base) | Convex hull computation | `chull` | +| igraph (Suggests) | Bipartite graph construction and analysis | `graph_from_data_frame`, `degree`, `components`, `layout_with_fr`, `layout_as_bipartite`, `layout_in_circle`, `as_edgelist`, `vcount`, `ecount`, `is_bipartite` | **Why ggplot2 >= 3.4.0**: The package uses `linewidth` (introduced in ggplot2 3.4.0) instead of the deprecated `size` parameter for line geoms. -**Suggests**: `testthat (>= 3.0.0)` for testing. +**Suggests**: `testthat (>= 3.0.0)` for testing, `igraph` for bipartite graph visualization. + +--- + +## Notes + +- The graph type adds 6 new parameters to `panelview()`, bringing the total to approximately 56. These parameters are only used when `type = "graph"` and are ignored otherwise. +- The FR (Fruchterman-Reingold) layout is non-deterministic across runs. Visual tests should check structural properties, not exact coordinates. +- Component hull colors recycle after 8 distinct components. Panels with many components will see repeated colors. +- A single-observation panel (1 unit, 1 time) required a special guard around the time-gap calculation to avoid division by zero. The graph type bypasses this block entirely. diff --git a/DESCRIPTION b/DESCRIPTION index 39a739e..fb88967 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,8 @@ Package: panelView Type: Package Title: Visualizing Panel Data -Version: 1.2.1 -Date: 2026-03-20 +Version: 1.3.0 +Date: 2026-03-21 Authors@R: c(person("Yiqing", "Xu", ,"yiqingxu@stanford.edu", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-2041-6671")), @@ -16,7 +16,7 @@ License: MIT + file LICENSE Imports: ggplot2 (>= 3.4.0), gridExtra, grid, dplyr (>= 1.0.0) Depends: R (>= 2.10) Encoding: UTF-8 -Suggests: testthat (>= 3.0.0) +Suggests: testthat (>= 3.0.0), igraph RoxygenNote: 7.3.2 LazyData: true Config/testthat/edition: 3 diff --git a/NAMESPACE b/NAMESPACE index 1fceb21..6201661 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,6 +1,7 @@ ##useDynLib(panelView) ##exportPattern("^[[:alpha:]]+") -importFrom("stats", "na.omit", "sd", "var", "ave", "aggregate", "approxfun") +importFrom("stats", "na.omit", "sd", "var", "ave", "aggregate", "approxfun", + "setNames") importFrom("ggplot2", "geom_boxplot", "geom_density", "geom_tile", "geom_point","geom_col", "labs", "theme_bw", "scale_fill_manual", "geom_hline", "geom_line", "geom_ribbon", "geom_vline", @@ -12,6 +13,8 @@ importFrom("ggplot2", "geom_boxplot", "geom_density", "geom_tile", "guide_legend", "margin", "guides","xlab","ylab", "unit", "element_rect", "geom_jitter", "scale_shape_manual", "geom_bar", "scale_fill_gradient", "sec_axis", "facet_wrap") +importFrom("ggplot2", "geom_segment", "geom_polygon", "geom_text") +importFrom("grDevices", "chull") importFrom("gridExtra", "grid.arrange", "arrangeGrob") importFrom("grid", "textGrob", "gpar") importFrom("dplyr", "coalesce", "summarise", "group_by_all", "n") diff --git a/R/panelView.R b/R/panelView.R index 0454b55..8e3ab21 100644 --- a/R/panelView.R +++ b/R/panelView.R @@ -51,7 +51,16 @@ panelview <- function(data, # a data frame (long-form) display.all = NULL, by.cohort = FALSE, collapse.history = NULL, - report.missing = FALSE + report.missing = FALSE, + show.singletons = TRUE, + highlight.components = TRUE, + layout = "fr", + node.size = 3, + show.labels = "auto", + edge.color = "gray70", + edge.alpha = NULL, + edge.width = NULL, + singleton.color = "#D7263D" ) { ## ------------------------- ## @@ -77,10 +86,11 @@ panelview <- function(data, # a data frame (long-form) ## number of units N0 <- length(unique(data[, index[1]])) - if (N0 <= 500) { + if (!type %in% c("network", "graph", "singleton")) { ## "graph"/"singleton" not yet resolved + if (N0 <= 500) { if (is.null(collapse.history)) { collapse.history <- FALSE - } + } if (is.null(display.all)) { display.all <- FALSE } @@ -88,11 +98,11 @@ panelview <- function(data, # a data frame (long-form) if (!is.null(collapse.history)) { if (is.null(display.all)) { display.all <- FALSE - } + } } else { # collapse.history not specified if (is.null(display.all)) { # display.all not specified if (type != "outcome") { # type != "outcome" sss - collapse.history <- TRUE + collapse.history <- TRUE display.all <- FALSE } else { collapse.history <- FALSE @@ -103,13 +113,51 @@ panelview <- function(data, # a data frame (long-form) } } } + } ## normalize type argument (supports partial matching and backward-compatible aliases) - type <- match.arg(type, c("treat", "missing", "miss", "outcome", "raw", "bivariate", "bivar")) - if (type == "miss") type <- "missing" - if (type == "bivar") type <- "bivariate" - if (type == "raw") type <- "outcome" + type <- match.arg(type, c("treat", "missing", "miss", "outcome", "raw", + "bivariate", "bivar", "network", "graph", "singleton")) + if (type == "miss") type <- "missing" + if (type == "bivar") type <- "bivariate" + if (type == "raw") type <- "outcome" + ## "graph" and "singleton" are backward-compatible aliases for "network" + if (type == "graph") type <- "network" + if (type == "singleton") type <- "network" + + if (type == "network") { + ## igraph availability check + if (!requireNamespace("igraph", quietly = TRUE)) { + stop("Package \"igraph\" is required for type = \"network\". Install it with install.packages(\"igraph\").") + } + ## graph type only needs index columns, no Y/D/X + ignore.treat <- 1 + collapse.history <- FALSE + display.all <- TRUE + leave.gap <- 0 + + ## validate layout + layout <- match.arg(layout, c("fr", "bipartite", "circle")) + + ## validate show.labels + show.labels <- match.arg(show.labels, c("auto", "all", "singletons", "none")) + + ## validate node.size + if (!is.numeric(node.size) || node.size <= 0) { + stop("\"node.size\" must be a positive number.") + } + + ## validate show.singletons + if (!is.logical(show.singletons) && !show.singletons %in% c(0, 1)) { + stop("\"show.singletons\" is not a logical flag.") + } + + ## validate highlight.components + if (!is.logical(highlight.components) && !highlight.components %in% c(0, 1)) { + stop("\"highlight.components\" is not a logical flag.") + } + } ## remove missing values if (is.logical(leave.gap) == FALSE & !leave.gap%in%c(0, 1)) { @@ -208,10 +256,25 @@ panelview <- function(data, # a data frame (long-form) } varnames <- all.vars(formula) - - Y <- formula[[2]] # left hand side of the formula - if (is.numeric(Y) == FALSE) { # Y is a variable + ## network type: all formula variables affect missingness only + if (type == "network") { + lhs <- formula[[2]] + if (is.numeric(lhs)) { + ## ~1 or 1 ~ ... : no outcome + Y <- NULL + D <- NULL + X <- if (length(varnames) > 0) varnames else NULL + } else { + ## Y ~ ... or Y ~ D + X + Y <- varnames[1] + D <- NULL + X <- if (length(varnames) > 1) varnames[2:length(varnames)] else NULL + } + } else + + if (is.numeric(formula[[2]]) == FALSE) { # Y is a variable + Y <- formula[[2]] ## outcome Y <- varnames[1] ## treatment indicator and covariates @@ -282,17 +345,22 @@ panelview <- function(data, # a data frame (long-form) } } } - ## check Incorrect variable names + if (length(varnames) > 0) { for (i in 1:length(varnames)) { if(!varnames[i] %in% colnames(data)) { stop(paste("Variable \"", varnames[i],"\" is not in the dataset.", sep = "")) } } + } - ## index - if (length(index) != 2 | sum(index %in% colnames(data)) != 2) { + ## index + if (type == "network") { + if (length(index) < 2 || sum(index %in% colnames(data)) != length(index)) { + stop("\"index\" must name 2 or more columns in the data for type = \"network\".") + } + } else if (length(index) != 2 | sum(index %in% colnames(data)) != 2) { stop("\"index\" option misspecified. Try, for example, index = c(\"unit.id\", \"time\").") } @@ -344,17 +412,32 @@ panelview <- function(data, # a data frame (long-form) #if (na.rm == FALSE & sum(is.na(data)) > 0) { # stop("Missing values in dataset. Try set na.rm = TRUE.\n") - #} - + #} + + ## ---- network type early exit ---- + ## After na.omit: rows with missing Y/D/X have been dropped, + ## so the graph reflects only usable observations. + if (type == "network") { + s <- as.list(environment()) + s$index_names <- index + s$data <- data + return(.pv_plot_graph(s)) + } + # sort data data <- data[order(data[,index.id], data[,index.time]), ] minmintime <- as.numeric(min(data[, 2], na.rm = TRUE)) maxmaxtime <- as.numeric(max(data[, 2], na.rm = TRUE)) + + ## time-gap computation: skip for graph type (not needed, and causes + ## division by zero when there is only one unique time period) + if (type != "network") { + timegap <- (maxmaxtime - minmintime)/(length(unique(data[,index.time]))-1) - inttimegap <- as.integer(timegap) - + inttimegap <- as.integer(timegap) + data_1 <- transform(data, differencetime = ave(as.numeric(data[, 2]), data[, 1], FUN = function(x) c(NA, diff(x)))) mintimegap <- min(data_1$differencetime, na.rm = TRUE) maxtimegap <- max(data_1$differencetime, na.rm = TRUE) @@ -380,18 +463,18 @@ panelview <- function(data, # a data frame (long-form) #common difference: mintimegap: if (mintimegap != maxtimegap & mintimegap != 1 & divide_differencetime == as.integer(divide_differencetime)) { #1. Create all combinations of `id` and `year` - g <- with(data, expand.grid(g.id = unique(data[,index[1]]), - g.time = seq(from = minmintime, to = maxmaxtime, by = mintimegap))) + g <- with(data, expand.grid(g.id = unique(data[,index[1]]), + g.time = seq(from = minmintime, to = maxmaxtime, by = mintimegap))) colnames(g)[1] <- colnames(data[1]) colnames(g)[2] <- colnames(data[2]) #2. Merge `g` with `data` data2 <- merge(g, data, all.x = TRUE) data <- data2 } - else { #commmon difference = 1 + else { #commmon difference = 1 #1. Create all combinations of `id` and `year` - g <- with(data, expand.grid(g.id = unique(data[,index[1]]), - g.time = seq(from = minmintime, to = maxmaxtime))) + g <- with(data, expand.grid(g.id = unique(data[,index[1]]), + g.time = seq(from = minmintime, to = maxmaxtime))) colnames(g)[1] <- colnames(data[1]) colnames(g)[2] <- colnames(data[2]) #2. Merge `g` with `data` @@ -402,6 +485,8 @@ panelview <- function(data, # a data frame (long-form) data <- data[1:(length(data)-1)] #drop the differencetime column } + } ## end type != "network" guard + ## check duplicated observations @@ -414,16 +499,16 @@ panelview <- function(data, # a data frame (long-form) # stop("Please limit your units within 1000 for elegant presentation") #} - if (length(unique(data[,index[1]])) > 300 & gridOff != TRUE & type != "outcome") { + if (length(unique(data[,index[1]])) > 300 & gridOff != TRUE & type != "outcome" & type != "network") { message("If the number of units is more than 300, we set \"gridOff = TRUE\".\n") gridOff <- TRUE } - + if (length(unique(data[,index[1]])) > 300 & gridOff != TRUE & type == "outcome") { gridOff <- TRUE } - if (display.all == FALSE & length(unique(data[,index[1]])) > 500) { + if (display.all == FALSE & length(unique(data[,index[1]])) > 500 & type != "network") { message("If the number of units is more than 500, we randomly select 500 units to present. You can set \"display.all = TRUE\" to show all units.\n") set.seed(1346) @@ -691,6 +776,8 @@ panelview <- function(data, # a data frame (long-form) } } + ## (network type exits early, before reaching this point) + if (collapse.history == TRUE) { if (is.null(M)) { diff --git a/R/plot-network.R b/R/plot-network.R new file mode 100644 index 0000000..1c9abee --- /dev/null +++ b/R/plot-network.R @@ -0,0 +1,475 @@ +## Network (k-partite) graph visualization for panel data +## Part of panelView package +## Requires igraph (Suggests dependency) +## +## Supports k >= 2 fixed-effect dimensions. +## For k = 2 this is equivalent to a bipartite graph. +## Duplicate observations produce weighted (thicker) edges. + +.pv_build_graph <- function(data, index_names) { + k <- length(index_names) + + ## create node list — one set per FE dimension + all_nodes <- list() + for (i in seq_along(index_names)) { + vals <- unique(data[[index_names[i]]]) + all_nodes[[i]] <- data.frame( + name = paste0(index_names[i], ":", vals), + label = as.character(vals), + node_type = index_names[i], + stringsAsFactors = FALSE + ) + } + vertices_df <- do.call(rbind, all_nodes) + + ## create edge list — for each observation, connect all pairs of FE levels + edge_list <- list() + for (i in 1:(k - 1)) { + for (j in (i + 1):k) { + from <- paste0(index_names[i], ":", data[[index_names[i]]]) + to <- paste0(index_names[j], ":", data[[index_names[j]]]) + edge_list[[length(edge_list) + 1]] <- data.frame( + from = from, to = to, stringsAsFactors = FALSE + ) + } + } + edges_df <- do.call(rbind, edge_list) + + ## aggregate duplicate edges into weights + if (nrow(edges_df) == 0) { + ## no edges: return graph with isolated nodes + g <- igraph::make_empty_graph(directed = FALSE) + g <- igraph::add_vertices(g, nrow(vertices_df), + name = vertices_df$name, + label = vertices_df$label, + node_type = vertices_df$node_type) + } else { + edges_agg <- aggregate( + list(weight = rep(1, nrow(edges_df))), + by = list(from = edges_df$from, to = edges_df$to), + FUN = sum + ) + + g <- igraph::graph_from_data_frame(edges_agg, directed = FALSE, + vertices = vertices_df) + igraph::E(g)$weight <- edges_agg$weight + } + + ## vertex attributes + igraph::V(g)$degree <- igraph::degree(g) + igraph::V(g)$is_singleton <- (igraph::V(g)$degree == 1) + + ## connected components + comp <- igraph::components(g) + igraph::V(g)$component <- comp$membership + + g +} + + +.pv_layout_graph <- function(g, layout_type, k) { + if (layout_type == "bipartite" && k > 2) { + ## bipartite layout not meaningful for k > 2; fall back to fr + layout_type <- "fr" + } + coords <- switch(layout_type, + "fr" = igraph::layout_with_fr(g), + "bipartite" = { + ## For bipartite layout, set the 'type' attribute + ## First FE dimension = FALSE, second = TRUE + node_types <- igraph::V(g)$node_type + unique_types <- unique(node_types) + igraph::V(g)$type <- (node_types == unique_types[2]) + igraph::layout_as_bipartite(g) + }, + "circle" = igraph::layout_in_circle(g) + ) + + node_df <- data.frame( + x = coords[, 1], + y = coords[, 2], + name = igraph::V(g)$name, + node_type = igraph::V(g)$node_type, + label = igraph::V(g)$label, + degree = igraph::V(g)$degree, + is_singleton = igraph::V(g)$is_singleton, + component = igraph::V(g)$component, + stringsAsFactors = FALSE + ) + + node_df +} + + +.pv_plot_graph <- function(s) { + ## suppress R CMD check notes for non-standard evaluation + x <- y <- xend <- yend <- node_type <- label <- component <- NULL + grp <- NULL + + ## extract parameters + data <- s$data + index_names <- s$index_names + k <- length(index_names) + show.singletons <- s$show.singletons + highlight.components <- s$highlight.components + layout_type <- s$layout + node.size <- s$node.size + show.labels <- s$show.labels + edge.color <- s$edge.color + main <- s$main + cex.main <- s$cex.main + cex.lab <- s$cex.lab + cex.legend <- s$cex.legend + theme.bw <- s$theme.bw + legendOff <- s$legendOff + + ## count nodes per FE dimension for info + n_per_fe <- vapply(index_names, function(nm) length(unique(data[[nm]])), + integer(1)) + N_total <- sum(n_per_fe) + + ## large panel warning + n_obs <- nrow(data) + n_edge_pairs <- k * (k - 1) / 2 + est_edges <- n_obs * n_edge_pairs + if (est_edges > 5000) { + message("Graph has ~", est_edges, " edges (", N_total, " nodes across ", + k, " FE dimensions). Consider a subset for clearer visualization.\n") + } + + ## build graph + g <- .pv_build_graph(data, index_names) + + ## compute layout + node_df <- .pv_layout_graph(g, layout_type, k) + + ## build edge data.frame with weights + el <- igraph::as_edgelist(g) + if (nrow(el) > 0) { + edge_weights <- igraph::E(g)$weight + if (is.null(edge_weights)) edge_weights <- rep(1, nrow(el)) + from_idx <- match(el[, 1], node_df$name) + to_idx <- match(el[, 2], node_df$name) + edge_df <- data.frame( + x = node_df$x[from_idx], + y = node_df$y[from_idx], + xend = node_df$x[to_idx], + yend = node_df$y[to_idx], + weight = edge_weights + ) + } else { + edge_df <- data.frame(x = numeric(0), y = numeric(0), + xend = numeric(0), yend = numeric(0), + weight = numeric(0)) + } + + ## ---- color scheme ---- + ## palette for k FE dimensions + fe_palette <- c("#2E86AB", "#E8630A", "#7FB069", "#8B5CF6", "#F59E0B", + "#06B6D4", "#EC4899", "#D7263D") + ## user-supplied color overrides + color <- s$color + if (!is.null(color) && length(color) >= k) { + fe_colors <- color[seq_len(k)] + } else { + if (k <= length(fe_palette)) { + fe_colors <- fe_palette[seq_len(k)] + } else { + fe_colors <- rep(fe_palette, length.out = k) + } + } + names(fe_colors) <- index_names + + col_singleton <- if (!is.null(s$singleton.color)) s$singleton.color else "#D7263D" + col_edge <- if (!is.null(edge.color) && edge.color != "gray70") { + edge.color + } else { + "#CCCCCC" + } + + ## shapes for k FE dimensions: circle, square, triangle-up, diamond, triangle-down + fe_shapes <- c(21, 22, 24, 23, 25) + if (k <= length(fe_shapes)) { + shape_vals <- fe_shapes[seq_len(k)] + } else { + shape_vals <- rep(fe_shapes, length.out = k) + } + names(shape_vals) <- index_names + + ## adaptive edge styling: fewer edges → darker and thicker + ## user can override with edge.alpha and edge.width parameters + n_total_edges <- nrow(edge_df) + auto_alpha <- if (n_total_edges > 5000) 0.6 + else if (n_total_edges > 2000) 0.6 + else if (n_total_edges > 500) 0.55 + else if (n_total_edges > 100) 0.6 + else if (n_total_edges > 30) 0.75 + else 0.85 + auto_lw <- if (n_total_edges > 5000) 0.25 + else if (n_total_edges > 2000) 0.25 + else if (n_total_edges > 100) 0.35 + else 0.5 + ## user overrides + edge_alpha <- if (!is.null(s$edge.alpha)) s$edge.alpha else auto_alpha + edge_lw <- if (!is.null(s$edge.width)) s$edge.width else auto_lw + ## darker edge color for sparse graphs + if (col_edge == "#CCCCCC") { + if (n_total_edges <= 100) col_edge <- "#888888" + else if (n_total_edges <= 500) col_edge <- "#AAAAAA" + else col_edge <- "#BBBBBB" + } + + ## start ggplot + p <- ggplot() + + ## component hulls + comp_info <- igraph::components(g) + n_comp <- comp_info$no + + if (isTRUE(highlight.components) && n_comp > 1) { + hull_colors <- c("#2E86AB", "#E8630A", "#7FB069", "#D7263D", + "#8B5CF6", "#F59E0B", "#06B6D4", "#EC4899") + if (n_comp > length(hull_colors)) { + hull_colors <- rep(hull_colors, length.out = n_comp) + } + + for (ci in seq_len(n_comp)) { + comp_nodes <- node_df[node_df$component == ci, , drop = FALSE] + if (nrow(comp_nodes) >= 3) { + hull_idx <- grDevices::chull(comp_nodes$x, comp_nodes$y) + cx <- mean(comp_nodes$x[hull_idx]) + cy <- mean(comp_nodes$y[hull_idx]) + pad <- 0.08 + hx <- comp_nodes$x[hull_idx] + pad * (comp_nodes$x[hull_idx] - cx) + hy <- comp_nodes$y[hull_idx] + pad * (comp_nodes$y[hull_idx] - cy) + hull_data <- data.frame(x = hx, y = hy, grp = ci) + p <- p + geom_polygon(data = hull_data, + aes(x = x, y = y, group = grp), + alpha = 0.06, + fill = hull_colors[ci], + color = hull_colors[ci], + linewidth = 0.3, + linetype = "dashed", + show.legend = FALSE) + } + } + } + + ## edges — linewidth scaled by sqrt(weight) for multi-edges + if (nrow(edge_df) > 0) { + max_weight <- max(edge_df$weight) + if (max_weight > 1) { + ## scale linewidth by weight + edge_df$lw <- edge_lw * sqrt(edge_df$weight / max_weight) * 2 + ## also modulate alpha slightly for heavy edges + edge_df$alpha <- pmin(edge_alpha * sqrt(edge_df$weight), 1) + for (i in seq_len(nrow(edge_df))) { + p <- p + geom_segment( + data = edge_df[i, , drop = FALSE], + aes(x = x, y = y, xend = xend, yend = yend), + color = col_edge, + linewidth = edge_df$lw[i], + alpha = edge_df$alpha[i] + ) + } + } else { + ## uniform linewidth + p <- p + geom_segment(data = edge_df, + aes(x = x, y = y, xend = xend, yend = yend), + color = col_edge, linewidth = edge_lw, + alpha = edge_alpha) + } + } + + ## split nodes by singleton status + singleton_df <- node_df[node_df$is_singleton == TRUE, , drop = FALSE] + non_singleton_df <- node_df[node_df$is_singleton == FALSE, , drop = FALSE] + + ## non-singleton nodes — filled shapes with white stroke for separation + if (nrow(non_singleton_df) > 0) { + p <- p + geom_point(data = non_singleton_df, + aes(x = x, y = y, shape = node_type, + fill = node_type), + size = node.size, color = "white", + stroke = 0.4, show.legend = TRUE) + } + + ## singleton nodes + if (nrow(singleton_df) > 0) { + if (isTRUE(show.singletons)) { + ## glow ring behind singleton + p <- p + geom_point(data = singleton_df, + aes(x = x, y = y), + size = node.size + 3, + color = col_singleton, alpha = 0.25, + shape = 16, show.legend = FALSE) + ## singleton node itself + p <- p + geom_point(data = singleton_df, + aes(x = x, y = y, shape = node_type, + fill = node_type), + size = node.size, color = col_singleton, + stroke = 0.8, show.legend = FALSE) + } else { + ## draw as regular nodes + p <- p + geom_point(data = singleton_df, + aes(x = x, y = y, shape = node_type, + fill = node_type), + size = node.size, color = "white", + stroke = 0.4, show.legend = FALSE) + } + } + + ## shape + fill scales — use actual FE column names as legend labels + fe_labels <- setNames(index_names, index_names) + p <- p + scale_shape_manual( + values = shape_vals, + labels = fe_labels, + name = NULL + ) + p <- p + scale_fill_manual( + values = fe_colors, + labels = fe_labels, + name = NULL + ) + + ## labels + do_labels <- FALSE + label_df <- NULL + if (show.labels == "auto") { + if (N_total <= 50) do_labels <- TRUE + label_df <- node_df + } else if (show.labels == "all") { + do_labels <- TRUE + label_df <- node_df + } else if (show.labels == "singletons") { + do_labels <- TRUE + label_df <- node_df[node_df$is_singleton == TRUE, , drop = FALSE] + } + ## "none" → do_labels stays FALSE + + if (do_labels && !is.null(label_df) && nrow(label_df) > 0) { + y_range <- diff(range(node_df$y)) + nudge <- if (y_range > 0) 0.03 * y_range else 0.1 + p <- p + geom_text(data = label_df, + aes(x = x, y = y, label = label), + size = cex.lab / 4, nudge_y = nudge, + check_overlap = TRUE, show.legend = FALSE, + color = "#333333", + family = "sans") + } + + ## title + if (is.null(main)) { + if (k == 2) { + main <- "Panel Structure: Bipartite Graph" + } else { + main <- paste0("Panel Structure: ", k, "-partite Graph") + } + } else if (main == "") { + main <- NULL + } + + ## clean minimal theme + p <- p + theme( + plot.background = element_rect(fill = "white", color = NA), + panel.background = element_rect(fill = "white", color = NA), + panel.border = element_blank(), + axis.text = element_blank(), + axis.ticks = element_blank(), + axis.title = element_blank(), + panel.grid = element_blank(), + plot.margin = margin(10, 10, 10, 10) + ) + + ## title + if (!is.null(main)) { + p <- p + ggtitle(main) + + theme(plot.title = element_text(size = cex.main, hjust = 0.5, + color = "#333333")) + } + + ## legend + if (isTRUE(legendOff) || legendOff == 1) { + p <- p + theme(legend.position = "none") + } else { + p <- p + guides(fill = guide_legend(override.aes = list(size = 3.5)), + shape = guide_legend(override.aes = list(size = 3.5))) + p <- p + theme(legend.position = "bottom", + legend.text = element_text(size = cex.legend, + color = "#333333"), + legend.background = element_blank(), + legend.key = element_blank()) + } + + ## print plot + suppressWarnings(print(p)) + + ## build return list + deg <- igraph::degree(g) + comp <- igraph::components(g) + + ## singletons — one row per singleton node, with all FE columns filled + ## A singleton node has degree 1: it appears in exactly one observation. + ## Each row shows the singleton's value and its connected FE levels. + singleton_rows <- list() + for (i in seq_along(index_names)) { + nm <- index_names[i] + vals <- unique(data[[nm]]) + node_names <- paste0(nm, ":", vals) + node_deg <- deg[node_names] + s_vals <- vals[node_deg == 1] + for (sv in s_vals) { + ## find the one row where this singleton appears + match_rows <- data[data[[nm]] == sv, index_names, drop = FALSE] + row <- match_rows[1, , drop = FALSE] # degree 1 → exactly 1 unique combo + row$singleton_fe <- nm + singleton_rows[[length(singleton_rows) + 1]] <- row + } + } + if (length(singleton_rows) > 0) { + singletons <- do.call(rbind, singleton_rows) + rownames(singletons) <- NULL + } else { + singletons <- data.frame(matrix(ncol = length(index_names) + 1, nrow = 0)) + colnames(singletons) <- c(index_names, "singleton_fe") + } + + ## multi-edges (count > 1) — return with FE columns parsed from node names + edge_weights <- igraph::E(g)$weight + if (!is.null(edge_weights)) { + multi_mask <- edge_weights > 1 + if (any(multi_mask)) { + mel <- igraph::as_edgelist(g)[multi_mask, , drop = FALSE] + ## parse "fe_name:value" back to separate columns + parse_node <- function(node_str) { + pos <- regexpr(":", node_str, fixed = TRUE) + list(fe = substr(node_str, 1, pos - 1), + val = substr(node_str, pos + 1, nchar(node_str))) + } + from_parsed <- parse_node(mel[, 1]) + to_parsed <- parse_node(mel[, 2]) + me_list <- list() + me_list[[from_parsed$fe[1]]] <- from_parsed$val + me_list[[to_parsed$fe[1]]] <- to_parsed$val + me_list[["count"]] <- edge_weights[multi_mask] + multi_edges <- as.data.frame(me_list, stringsAsFactors = FALSE) + } else { + multi_edges <- data.frame(count = numeric(0), + stringsAsFactors = FALSE) + } + } else { + multi_edges <- data.frame(count = numeric(0), + stringsAsFactors = FALSE) + } + + out <- list( + graph = g, + singletons = singletons, + multi_edges = multi_edges, + components = comp$membership, + n_components = comp$no, + plot = p + ) + + return(invisible(out)) +} diff --git a/man/panelView.Rd b/man/panelView.Rd index 1ea9de1..3597500 100644 --- a/man/panelView.Rd +++ b/man/panelView.Rd @@ -20,7 +20,12 @@ cex.lab = 12, cex.legend = 12, background = NULL, style = NULL, by.unit = FALSE, lwd = 0.2, leave.gap = FALSE, display.all = NULL, by.cohort = FALSE, - collapse.history = NULL, report.missing = FALSE) + collapse.history = NULL, report.missing = FALSE, + show.singletons = TRUE, highlight.components = TRUE, + layout = "fr", node.size = 3, + show.labels = "auto", edge.color = "gray70", + edge.alpha = NULL, edge.width = NULL, + singleton.color = "#D7263D") } \arguments{ \item{data}{a data frame. The panel does not have to be balanced.} @@ -30,7 +35,7 @@ \item{X}{variable name of the time-varying covariates. Ignored if \code{formula} is provided.} \item{index}{a two-element string vector specifying the unit (group) and time indicators. Must be of length 2.} \item{ignore.treat}{a logical flag indicating whether there is a treatment variable. Default value is \code{ignore.treat = FALSE}.} - \item{type}{a string that specifies the type of the plot. Must be either \code{"treat"} (default), which plots the treatment status of each unit at each time point, \code{"missing"}, which plots the missing-data, \code{"outcome"}, which plots the raw outcome data, or \code{"bivariate"}, which plots time series of outcome and treatment in one graph.} + \item{type}{a string that specifies the type of the plot. Must be one of \code{"treat"} (default), which plots the treatment status of each unit at each time point, \code{"missing"}, which plots the missing-data, \code{"outcome"}, which plots the raw outcome data, \code{"bivariate"}, which plots time series of outcome and treatment in one graph, or \code{"network"}, which plots a k-partite graph of the panel structure showing connected components, singletons, and weighted edges. Aliases: \code{"graph"} and \code{"singleton"} both map to \code{"network"}.} \item{outcome.type}{a string that specifies the type of outcome variable. Must be either \code{"continuous"}(default) or \code{"discrete"}. For a continuous variable, time series lines for specified units will be plotted, and for discrete response, jitter-ed points at each time period will be plotted.} \item{treat.type}{a string that specifies the type of treatment variable. Must be either \code{"continuous"} or \code{"discrete"}. The default is NULL, which means the option will be decided based on the number of unique treatment values: if the number if bigger than 10, it will be set as "continuous"; otherwise, it will be set as "discrete".} \item{by.group}{a logic flag indicating whether the data should be plotted in a column in separate groups based on treatment status changes for the outcome plot. } @@ -70,6 +75,15 @@ \item{by.cohort}{a logical flag indicating whether to plot the average outcome lines based on unique treatment histories in an "outcome" plot.} \item{collapse.history}{a logical flag indicating whether to collapse units by treat history in a "treat"" plot.} \item{report.missing}{a logical flag indicating whether to report missingness in the included variables.} + \item{show.singletons}{logical. If \code{TRUE} (default), singleton nodes (degree-1) are highlighted with a red outline ring. Only used when \code{type = "network"}.} + \item{highlight.components}{logical. If \code{TRUE} (default), connected components are shaded with convex hulls. Only used when \code{type = "network"}.} + \item{layout}{character. Layout algorithm for the bipartite graph. One of \code{"fr"} (Fruchterman-Reingold, default), \code{"bipartite"}, or \code{"circle"}. Only used when \code{type = "network"}.} + \item{node.size}{numeric. Size of nodes in the graph. Default is 3. Only used when \code{type = "network"}.} + \item{show.labels}{character. When to show node labels. One of \code{"auto"} (default, show if N+T <= 50), \code{"all"}, \code{"singletons"}, or \code{"none"}. Only used when \code{type = "network"}.} + \item{edge.color}{character. Color of edges. Default is \code{"gray70"} (auto-adjusted by density). Only used when \code{type = "network"}.} + \item{edge.alpha}{numeric (0--1) or \code{NULL}. Edge transparency. \code{NULL} (default) auto-scales based on edge count. Only used when \code{type = "network"}.} + \item{edge.width}{numeric or \code{NULL}. Edge line width in mm. \code{NULL} (default) auto-scales based on edge count. Only used when \code{type = "network"}.} + \item{singleton.color}{character. Color used to highlight singleton nodes (degree-1). Default is \code{"#D7263D"} (crimson). Only used when \code{type = "network"} and \code{show.singletons = TRUE}.} } \details{ diff --git a/tests/testthat/test-graph.R b/tests/testthat/test-graph.R new file mode 100644 index 0000000..4862c61 --- /dev/null +++ b/tests/testthat/test-graph.R @@ -0,0 +1,476 @@ +# Tests for type = "network" (k-partite graph visualization) +# Backward-compatible aliases: "graph" and "singleton" both map to "network" + +library(panelView) + +## Helper: run panelview() suppressing the plot device output +pv <- function(...) { + pdf(NULL) + on.exit(dev.off()) + panelview(...) +} + +## Helper: get singleton values for a given FE dimension +get_singletons <- function(result, fe_name) { + s <- result$singletons + if (nrow(s) == 0) return(character(0)) + s[[fe_name]][s$singleton_fe == fe_name] +} + +# ----------------------------------------------------------------------- +# 2.1 Basic Functionality — Balanced Panel +# ----------------------------------------------------------------------- + +test_that("network type works with balanced panel", { + skip_if_not_installed("igraph") + data(turnout, package = "panelView") + result <- pv(turnout, ~1, index = c("abb", "year"), type = "network") + + expect_true(is.list(result)) + expect_equal(result$n_components, 1) + expect_length(get_singletons(result, "abb"), 0) + expect_length(get_singletons(result, "year"), 0) + expect_true(inherits(result$graph, "igraph")) + expect_true(inherits(result$plot, "gg") || inherits(result$plot, "ggplot")) + expect_equal(igraph::vcount(result$graph), 47 + 24) + expect_equal(igraph::ecount(result$graph), 47 * 24) +}) + +# ----------------------------------------------------------------------- +# 2.1b Backward-compatible alias: "graph" +# ----------------------------------------------------------------------- + +test_that("graph alias works", { + skip_if_not_installed("igraph") + data(turnout, package = "panelView") + result <- pv(turnout, ~1, index = c("abb", "year"), type = "graph") + + expect_true(is.list(result)) + expect_true(inherits(result$graph, "igraph")) + expect_equal(igraph::vcount(result$graph), 47 + 24) +}) + +# ----------------------------------------------------------------------- +# 2.2 Unbalanced Panel with Singletons +# ----------------------------------------------------------------------- + +test_that("network type identifies singletons", { + skip_if_not_installed("igraph") + df <- data.frame( + unit = c("A", "A", "A", "B", "B", "C"), + time = c(1, 2, 3, 1, 2, 4) + ) + result <- pv(df, ~1, index = c("unit", "time"), type = "network") + + expect_equal(igraph::vcount(result$graph), 7) + expect_equal(igraph::ecount(result$graph), 6) + expect_true("C" %in% get_singletons(result, "unit")) + expect_true(3 %in% get_singletons(result, "time")) + expect_true(4 %in% get_singletons(result, "time")) + expect_equal(result$n_components, 2) +}) + +# ----------------------------------------------------------------------- +# 2.3 Disconnected Components +# ----------------------------------------------------------------------- + +test_that("network type detects disconnected components", { + skip_if_not_installed("igraph") + df <- data.frame( + unit = c("A", "A", "B", "B", "C", "C", "D", "D"), + time = c(1, 2, 1, 2, 3, 4, 3, 4) + ) + result <- pv(df, ~1, index = c("unit", "time"), type = "network") + + expect_equal(result$n_components, 2) + expect_length(get_singletons(result, "unit"), 0) + expect_length(get_singletons(result, "time"), 0) + expect_equal(igraph::ecount(result$graph), 8) +}) + +# ----------------------------------------------------------------------- +# 2.4 Type Alias: "singleton" +# ----------------------------------------------------------------------- + +test_that("singleton alias works", { + skip_if_not_installed("igraph") + df <- data.frame(unit = c("A", "A", "B"), time = c(1, 2, 1)) + result <- pv(df, ~1, index = c("unit", "time"), type = "singleton") + + expect_true(inherits(result$graph, "igraph")) + expect_true(is.list(result)) +}) + +# ----------------------------------------------------------------------- +# 2.5 Layout Algorithms +# ----------------------------------------------------------------------- + +test_that("all layout algorithms work", { + skip_if_not_installed("igraph") + data(turnout, package = "panelView") + for (lay in c("fr", "bipartite", "circle")) { + result <- pv(turnout, ~1, index = c("abb", "year"), + type = "network", layout = lay) + expect_true(inherits(result$graph, "igraph"), + info = paste("layout =", lay)) + expect_true(inherits(result$plot, "gg") || inherits(result$plot, "ggplot"), + info = paste("layout =", lay)) + } +}) + +# ----------------------------------------------------------------------- +# 2.6 Invalid Layout +# ----------------------------------------------------------------------- + +test_that("invalid layout errors", { + skip_if_not_installed("igraph") + data(turnout, package = "panelView") + expect_error( + pv(turnout, ~1, index = c("abb", "year"), + type = "network", layout = "invalid"), + regexp = "'arg' should be one of" + ) +}) + +# ----------------------------------------------------------------------- +# 2.7 show.singletons = FALSE +# ----------------------------------------------------------------------- + +test_that("show.singletons FALSE works", { + skip_if_not_installed("igraph") + df <- data.frame( + unit = c("A", "A", "B"), + time = c(1, 2, 1) + ) + result <- pv(df, ~1, index = c("unit", "time"), type = "network", + show.singletons = FALSE) + + expect_true(is.list(result$singletons)) +}) + +# ----------------------------------------------------------------------- +# 2.8 highlight.components = FALSE +# ----------------------------------------------------------------------- + +test_that("highlight.components FALSE works", { + skip_if_not_installed("igraph") + df <- data.frame( + unit = c("A", "A", "C", "C"), + time = c(1, 2, 3, 4) + ) + result <- pv(df, ~1, index = c("unit", "time"), type = "network", + highlight.components = FALSE) + + expect_equal(result$n_components, 2) +}) + +# ----------------------------------------------------------------------- +# 2.9 show.labels Options +# ----------------------------------------------------------------------- + +test_that("show.labels options work", { + skip_if_not_installed("igraph") + df <- data.frame(unit = c("A", "A", "B"), time = c(1, 2, 1)) + for (opt in c("auto", "all", "singletons", "none")) { + expect_no_error( + pv(df, ~1, index = c("unit", "time"), type = "network", + show.labels = opt), + message = paste("show.labels =", opt) + ) + } +}) + +# ----------------------------------------------------------------------- +# 2.10 Custom node.size +# ----------------------------------------------------------------------- + +test_that("custom node.size works", { + skip_if_not_installed("igraph") + df <- data.frame(unit = c("A", "A", "B"), time = c(1, 2, 1)) + result <- pv(df, ~1, index = c("unit", "time"), type = "network", + node.size = 5) + expect_true(is.list(result)) +}) + +# ----------------------------------------------------------------------- +# 2.11 Invalid node.size +# ----------------------------------------------------------------------- + +test_that("invalid node.size errors", { + skip_if_not_installed("igraph") + df <- data.frame(unit = c("A", "A", "B"), time = c(1, 2, 1)) + expect_error( + pv(df, ~1, index = c("unit", "time"), type = "network", node.size = -1), + regexp = "node.size" + ) +}) + +# ----------------------------------------------------------------------- +# 3.1 Single Unit, Single Time Period +# ----------------------------------------------------------------------- + +test_that("single unit single time works", { + skip_if_not_installed("igraph") + df <- data.frame(unit = "A", time = 1) + result <- pv(df, ~1, index = c("unit", "time"), type = "network") + + expect_equal(igraph::vcount(result$graph), 2) + expect_equal(igraph::ecount(result$graph), 1) + expect_equal(result$n_components, 1) + expect_length(get_singletons(result, "unit"), 1) + expect_length(get_singletons(result, "time"), 1) +}) + +# ----------------------------------------------------------------------- +# 3.2 All Units Are Singletons +# ----------------------------------------------------------------------- + +test_that("all singletons case works", { + skip_if_not_installed("igraph") + df <- data.frame( + unit = c("A", "B", "C"), + time = c(1, 2, 3) + ) + result <- pv(df, ~1, index = c("unit", "time"), type = "network") + + expect_length(get_singletons(result, "unit"), 3) + expect_length(get_singletons(result, "time"), 3) + expect_equal(result$n_components, 3) +}) + +# ----------------------------------------------------------------------- +# 3.5 Numeric Unit and Time IDs +# ----------------------------------------------------------------------- + +test_that("numeric IDs work", { + skip_if_not_installed("igraph") + df <- data.frame(unit = c(1, 1, 2, 2), time = c(2001, 2002, 2001, 2002)) + result <- pv(df, ~1, index = c("unit", "time"), type = "network") + + expect_true(inherits(result$graph, "igraph")) + expect_equal(igraph::vcount(result$graph), 4) # 2 units + 2 times + expect_equal(igraph::ecount(result$graph), 4) # 4 observations +}) + +# ----------------------------------------------------------------------- +# 3.6 Factor Unit IDs +# ----------------------------------------------------------------------- + +test_that("factor IDs work", { + skip_if_not_installed("igraph") + df <- data.frame( + unit = factor(c("X", "X", "Y", "Y")), + time = c(1, 2, 1, 2) + ) + result <- pv(df, ~1, index = c("unit", "time"), type = "network") + + expect_true(inherits(result$graph, "igraph")) +}) + +# ----------------------------------------------------------------------- +# 3.7 Formula with Variables (Not ~1) +# ----------------------------------------------------------------------- + +test_that("formula with variables works for network", { + skip_if_not_installed("igraph") + data(turnout, package = "panelView") + result <- pv(turnout, turnout ~ policy_edr, + index = c("abb", "year"), type = "network") + + expect_true(inherits(result$graph, "igraph")) + ## Same structure as ~1 — outcome and treatment ignored for network type + expect_equal(igraph::vcount(result$graph), 47 + 24) +}) + +# ----------------------------------------------------------------------- +# 4.1–4.4 Regression: Existing Types Unaffected +# ----------------------------------------------------------------------- + +test_that("existing treat type unaffected", { + data(turnout, package = "panelView") + expect_no_error( + pv(turnout, turnout ~ policy_edr, index = c("abb", "year"), + type = "treat") + ) +}) + +test_that("existing missing type unaffected", { + data(turnout, package = "panelView") + expect_no_error( + pv(turnout, turnout ~ 1, index = c("abb", "year"), + type = "missing") + ) +}) + +test_that("existing outcome type unaffected", { + data(turnout, package = "panelView") + expect_no_error( + pv(turnout, turnout ~ policy_edr, index = c("abb", "year"), + type = "outcome") + ) +}) + +test_that("existing bivariate type unaffected", { + data(turnout, package = "panelView") + expect_no_error( + pv(turnout, turnout ~ policy_edr, index = c("abb", "year"), + type = "bivariate") + ) +}) + +# ----------------------------------------------------------------------- +# 5.1 Graph Structure Invariants (k = 2) +# ----------------------------------------------------------------------- + +test_that("graph structure invariants hold", { + skip_if_not_installed("igraph") + + panels <- list( + data.frame(unit = c("A", "A", "B", "B"), + time = c(1, 2, 1, 2)), + data.frame(unit = c("A", "A", "A", "B", "B", "C"), + time = c(1, 2, 3, 1, 2, 4)), + data.frame(unit = c("X", "Y", "Z"), + time = c(10, 20, 30)) + ) + + for (i in seq_along(panels)) { + df <- panels[[i]] + result <- pv(df, ~1, index = c("unit", "time"), type = "network") + g <- result$graph + + n_units <- length(unique(df$unit)) + n_times <- length(unique(df$time)) + + expect_equal(igraph::vcount(g), n_units + n_times, + info = paste("panel", i, "vcount")) + expect_equal(igraph::ecount(g), nrow(df), + info = paste("panel", i, "ecount")) + } +}) + +# ----------------------------------------------------------------------- +# 5.2 Singleton Identification Is Correct +# ----------------------------------------------------------------------- + +test_that("singleton identification is correct", { + skip_if_not_installed("igraph") + + df <- data.frame( + unit = c("A", "A", "A", "B", "B", "C", "D"), + time = c(1, 2, 3, 1, 2, 4, 5) + ) + result <- pv(df, ~1, index = c("unit", "time"), type = "network") + + ## Compute expected singletons from the data + unit_counts <- table(df$unit) + time_counts <- table(df$time) + + expected_unit_singletons <- names(unit_counts[unit_counts == 1]) + expected_time_singletons <- as.numeric(names(time_counts[time_counts == 1])) + + ## C and D appear once each + expect_true(setequal(get_singletons(result, "unit"), expected_unit_singletons)) + ## Times 3, 4, 5 appear once each + expect_true(setequal(get_singletons(result, "time"), expected_time_singletons)) +}) + +# ----------------------------------------------------------------------- +# 6.1 k = 3: Three-way Fixed Effects +# ----------------------------------------------------------------------- + +test_that("k = 3 tripartite graph works", { + skip_if_not_installed("igraph") + df <- data.frame( + worker = c("A", "A", "B", "B"), + firm = c(1, 1, 2, 2), + year = c(2020, 2021, 2020, 2021) + ) + result <- pv(df, ~1, index = c("worker", "firm", "year"), type = "network", + show.labels = "all") + + g <- result$graph + ## 2 workers + 2 firms + 2 years = 6 nodes + expect_equal(igraph::vcount(g), 6) + ## Each of 4 observations creates 3 edges (w-f, w-y, f-y) + ## But some are duplicates that get aggregated: + ## w-f: A-1, A-1, B-2, B-2 → 2 unique edges (weight 2 each) + ## w-y: A-2020, A-2021, B-2020, B-2021 → 4 unique edges + ## f-y: 1-2020, 1-2021, 2-2020, 2-2021 → 4 unique edges + ## Total unique edges = 2 + 4 + 4 = 10 + expect_equal(igraph::ecount(g), 10) + expect_equal(result$n_components, 1) + + ## Check singletons is a data frame with FE columns + expect_true(is.data.frame(result$singletons)) + expect_true("worker" %in% names(result$singletons)) + expect_true("firm" %in% names(result$singletons)) + + ## Check multi-edges exist (A-1 and B-2 have weight 2) + expect_true(nrow(result$multi_edges) > 0) +}) + +# ----------------------------------------------------------------------- +# 6.2 k = 3: Disconnected Components +# ----------------------------------------------------------------------- + +test_that("k = 3 disconnected components", { + skip_if_not_installed("igraph") + df <- data.frame( + worker = c("A", "B"), + firm = c(1, 2), + year = c(2020, 2021) + ) + result <- pv(df, ~1, index = c("worker", "firm", "year"), type = "network") + + ## 2 workers + 2 firms + 2 years = 6 nodes + expect_equal(igraph::vcount(result$graph), 6) + ## Each observation creates 3 edges, no overlaps: 2 * 3 = 6 + expect_equal(igraph::ecount(result$graph), 6) + ## Two isolated triangles + expect_equal(result$n_components, 2) +}) + +# ----------------------------------------------------------------------- +# 6.3 Duplicate Observations (Multi-edges) +# ----------------------------------------------------------------------- + +test_that("duplicate observations produce weighted edges", { + skip_if_not_installed("igraph") + df <- data.frame( + w = c("A", "A", "B", "B", "B"), + f = c(1, 1, 1, 2, 2) + ) + result <- pv(df, ~1, index = c("w", "f"), type = "network", + show.labels = "all") + + g <- result$graph + ## 2 workers + 2 firms = 4 nodes + expect_equal(igraph::vcount(g), 4) + ## Unique edges: A-1 (weight 2), B-1 (weight 1), B-2 (weight 2) = 3 + expect_equal(igraph::ecount(g), 3) + + ## Check multi-edges reported + expect_true(nrow(result$multi_edges) >= 1) + ## A-1 should have weight 2 + weights <- igraph::E(g)$weight + expect_true(max(weights) == 2) +}) + +# ----------------------------------------------------------------------- +# 6.4 Network index validation +# ----------------------------------------------------------------------- + +test_that("network type rejects invalid index", { + skip_if_not_installed("igraph") + df <- data.frame(a = 1, b = 2, c = 3) + expect_error( + pv(df, ~1, index = c("a", "nonexistent"), type = "network"), + regexp = "index" + ) + ## Single index should also fail + expect_error( + pv(df, ~1, index = c("a"), type = "network"), + regexp = "index" + ) +}) diff --git a/tutorial/01-treat.Rmd b/tutorial/01-treat.Rmd index a938ba3..ea0856e 100644 --- a/tutorial/01-treat.Rmd +++ b/tutorial/01-treat.Rmd @@ -1,4 +1,4 @@ -# Treatment Status & Missingness {#sec-treat} +# Treatment Status {#sec-treat} ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, diff --git a/tutorial/04-network.Rmd b/tutorial/04-network.Rmd new file mode 100644 index 0000000..6728073 --- /dev/null +++ b/tutorial/04-network.Rmd @@ -0,0 +1,367 @@ +# Network Structure {#sec-network} + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, + fig.width = 10, fig.height = 7) +``` + +::: {.callout-note} +The `type = "network"` feature is currently available only on the development branch. Install it with: +`devtools::install_github("xuyiqing/panelView@dev")` +::: + +When estimating models with multiple sets of fixed effects, the connectivity structure of the data determines what can be identified. As @correia2016 shows, the fixed-effect estimation problem is equivalent to solving a linear system on a graph, and the structure of that graph has direct consequences for estimation and inference. + +Two features of this structure are particularly important: + +- **Singletons** --- units (or time periods, firms, etc.) that appear in only one combination with the other fixed-effect dimension. Singletons contribute no identifying variation for the fixed effects on the other side of the graph. In linear models, they can be iteratively removed without affecting the estimates of the remaining coefficients. In nonlinear models (e.g., Poisson regression), failing to remove singletons leads to incidental-parameter bias. Detecting singletons before estimation is therefore a prerequisite for valid inference. + +- **Non-unique observations** --- when multiple observations share the same combination of fixed-effect indices (e.g., the same worker appears at the same firm multiple times), these duplicates create weighted edges in the graph. Understanding where and how often such duplicates occur is essential for specifying the correct model and interpreting standard errors. + +The `type = "network"` option in `panelview()` visualizes this structure directly. It constructs a $k$-partite graph from $k \geq 2$ sets of fixed effects, where: + +- **nodes** represent fixed-effect levels (units, time periods, firms, etc.); +- **edges** represent observed combinations. + +The resulting plot reveals connected components, singletons, and duplicate observations at a glance. We first load the package. + +```{r} +library(panelView) +data(panelView) +``` + + +## Graph elements + +In the network plot, each distinct level of a fixed-effect dimension becomes a **node**. In a standard unit $\times$ time panel, there is one node for each unit and one node for each time period. Different fixed-effect dimensions are distinguished by shape: circles for the first dimension (e.g., units), squares for the second (e.g., time periods), triangles for the third, and so on. + +Each observation in the data creates an **edge** (link) between the nodes it connects. For example, if unit $A$ is observed at time $t$, an edge is drawn between node $A$ and node $t$. If the same combination appears multiple times (duplicate observations), the edge becomes thicker to reflect the count. + +The plot also identifies: + +- **Connected components**: groups of nodes that are linked to each other through some chain of edges. Nodes in different components share no observations and are shaded with distinct convex hulls. +- **Singletons**: nodes with exactly one edge (degree 1), highlighted with a colored glow ring. + + +## Basic usage: Unit $\times$ Time + +With the standard panel structure (unit $\times$ time), the function constructs a bipartite graph: units are shown as circles, time periods as squares, and edges connect each unit to the periods in which it is observed. + +### Balanced panel + +The `turnout` dataset is a balanced panel of 47 US states over 24 election years. Because every state is observed in every year, the graph is a complete bipartite graph with a single connected component and no singletons. + +```{r net-balanced, fig.height=8} +panelview(turnout, formula = ~ 1, + index = c("abb", "year"), + type = "network", + main = "Turnout: Balanced Panel") +``` + + +### Unbalanced panel with singletons + +In many applied settings, panels are unbalanced: some units are observed in only a subset of periods. Units or periods with only one connection (degree 1) are called **singletons**. @correia2016 shows that singletons can be iteratively removed without affecting the estimation of multi-way fixed effects. + +We construct a simple example where some units appear in only one period: + +```{r net-singleton-data} +set.seed(42) + +## start with a 20-unit, 8-period balanced panel +sim_unbalanced <- expand.grid(unit = paste0("U", 1:20), time = 2001:2008, + stringsAsFactors = FALSE) + +## randomly drop 40% of observations to create an unbalanced panel +sim_unbalanced <- sim_unbalanced[sample(nrow(sim_unbalanced), + round(nrow(sim_unbalanced) * 0.6)), ] + +## add two singleton units (each observed in exactly one period) +sim_unbalanced <- rbind(sim_unbalanced, + data.frame(unit = "Singleton_A", time = 2003)) +sim_unbalanced <- rbind(sim_unbalanced, + data.frame(unit = "Singleton_B", time = 2007)) + +## add a disconnected component +sim_unbalanced <- rbind(sim_unbalanced, + data.frame(unit = "Iso_1", time = 2050), + data.frame(unit = "Iso_2", time = 2050), + data.frame(unit = "Iso_2", time = 2051)) +``` + +```{r net-singleton-plot, fig.height=8} +p.network <- panelview(sim_unbalanced, formula = ~ 1, + index = c("unit", "time"), + type = "network", + main = "Unbalanced Panel with Singletons") +``` + +Singletons are highlighted with a crimson glow. The function returns graph diagnostics invisibly. The `$singletons` element is a data frame listing each singleton along with the fixed-effect dimension it belongs to: + +```{r} +p.network$singletons +``` + +The `$n_components` element reports the number of connected components --- groups of units and time periods that share no observations with each other: + +```{r} +## two components: the main group and the {Iso_1, Iso_2, 2050, 2051} cluster +p.network$n_components +``` + + +### Missingness in Data + +When a formula such as `Y ~ D + X` is supplied with `type = "network"`, observations with missing values in any of the specified variables are dropped before the graph is constructed. This way the graph reflects only the observations usable for estimation. + +```{r net-missingness} +sim_missing <- data.frame( + unit = rep(c("A", "B", "C"), each = 4), + time = rep(2001:2004, 3), + Y = c(1, NA, 3, 4, 5, 6, NA, 8, 9, 10, 11, 12), + D = c(0, 0, 1, 1, 0, 0, 0, NA, 1, 1, 1, 1) +) + +## ~1 keeps all 12 observations (missingness in Y/D is ignored) +p.all <- panelview(sim_missing, ~ 1, + index = c("unit", "time"), type = "network") + +## Y ~ D drops rows where Y or D is NA (3 rows dropped → 9 edges) +p.yd <- panelview(sim_missing, Y ~ D, + index = c("unit", "time"), type = "network", + main = "Missingness from formula: Y ~ D") +``` + +```{r} +cat("Edges with ~1:", igraph::ecount(p.all$graph), "\n") +cat("Edges with Y ~ D:", igraph::ecount(p.yd$graph), "\n") +``` + + +## Layout options + +Three layout algorithms are available via the `layout` parameter: + +- `"fr"` (default): Fruchterman--Reingold force-directed layout. Good for revealing cluster structure. +- `"bipartite"`: Two-row layout with each fixed-effect dimension on a separate horizontal line. Best for small panels. +- `"circle"`: Nodes arranged on a circle. + +```{r net-layouts, fig.width=10, fig.height=4} +sim_small <- data.frame( + unit = c("A","A","A","B","B","C","C","D"), + time = c(1, 2, 3, 1, 2, 3, 4, 5) +) + +panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network", + layout = "fr", show.labels = "all", main = "FR layout") +panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network", + layout = "bipartite", show.labels = "all", main = "Bipartite layout") +panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network", + layout = "circle", show.labels = "all", main = "Circle layout") +``` + + +## Multi-way fixed effects + +Many empirical settings involve more than two sets of fixed effects ($k \geq 3$). For example, matched employer--employee data has worker, firm, and year fixed effects simultaneously. The `index` parameter accepts 3 or more column names for the network type. + +Each fixed-effect dimension is rendered with a distinct shape and color: + +| Dimension | Shape | Default color | +|-----------|-------|---------------| +| 1st | Circle | Steel blue | +| 2nd | Square | Burnt orange | +| 3rd | Triangle | Sage green | +| 4th | Diamond | Purple | +| 5th | Inv. triangle | Amber | + +### Three-way example: worker $\times$ firm $\times$ year + +```{r net-tripartite-data} +sim_workers <- data.frame( + worker = c("Alice", "Alice", "Bob", "Bob", "Carol", "Carol", "Dave"), + firm = c("Google", "Meta", "Google", "Apple", "Meta", "Apple", "Netflix"), + year = c(2020, 2021, 2020, 2021, 2020, 2021, 2022) +) +``` + +```{r net-tripartite-plot, fig.height=8} +p.workers <- panelview(sim_workers, formula = ~ 1, + index = c("worker", "firm", "year"), + type = "network", + show.labels = "all", + main = "3-way FE: Worker x Firm x Year") +``` + +Dave works only at Netflix and only in 2022, forming a separate connected component from the main group. The `$n_components` element confirms the number of connected components in the graph: + +```{r} +## main group (Alice, Bob, Carol with Google, Meta, Apple in 2020-2021) +## and the isolated {Dave, Netflix, 2022} cluster +p.workers$n_components +``` + + +### Four-way example: student $\times$ teacher $\times$ school $\times$ year + +```{r net-fourway, fig.height=8} +sim_schools <- data.frame( + student = c("S1","S1","S2","S2","S3","S3","S4","S5"), + teacher = c("T1","T2","T1","T3","T2","T3","T4","T5"), + school = c("A","A","A","A","B","B","C","D"), + year = c(2020,2021,2020,2021,2020,2021,2022,2022) +) + +panelview(sim_schools, ~ 1, + index = c("student", "teacher", "school", "year"), + type = "network", show.labels = "all", + main = "4-way FE: Student x Teacher x School x Year") +``` + + +## Weighted edges + +Theories for standard panel data methods often assume each combination of fixed-effect indices (e.g., unit and time) uniquely identifies an observation. However, in many empirical settings this assumption does not hold. For example, in matched employer--employee data, a worker may appear at the same firm in multiple records within the same year. It is important for researchers to diagnose these cases before estimation. + +The network plot aggregates duplicate edges and renders them with proportionally thicker lines. To illustrate, we take the `turnout` dataset and deliberately duplicate some state-year observations: + +```{r net-weighted-data} +## take a subset of turnout and create duplicates +sim_turnout_dup <- turnout[turnout$year <= 1940, c("abb", "year")] + +## duplicate some state-year pairs: MN appears 3x in 1920, WI 2x in 1924 +sim_turnout_dup <- rbind(sim_turnout_dup, + data.frame(abb = "MN", year = 1920), + data.frame(abb = "MN", year = 1920), + data.frame(abb = "WI", year = 1924) +) +``` + +```{r net-weighted-plot, fig.height=8} +p.dup <- panelview(sim_turnout_dup, ~ 1, index = c("abb", "year"), + type = "network", + main = "Turnout Subset with Duplicate Observations") +``` + +The thicker edges between MN--1920 and WI--1924 are clearly visible. The `$multi_edges` element is a data frame with one row per duplicated combination, with columns for each fixed-effect dimension and a `count` column: + +```{r} +p.dup$multi_edges +``` + + +## Customization + +### Edge visibility + +For dense graphs, the default edge transparency may need adjustment. Use `edge.alpha` (0--1) and `edge.width` (in mm) to control edge appearance: + +```{r net-edge-params, fig.height=8} +panelview(turnout, ~ 1, index = c("abb", "year"), type = "network", + edge.alpha = 0.8, edge.width = 0.4, + main = "Custom: edge.alpha = 0.8, edge.width = 0.4") +``` + +### Custom colors + +Supply a vector of colors (one per fixed-effect dimension) via the `color` parameter: + +```{r net-colors, fig.height=8} +panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network", + color = c("#6366F1", "#10B981"), show.labels = "all", + main = "Custom Colors: Indigo + Emerald") +``` + +### Singleton color + +The color used to highlight singletons can be changed with `singleton.color`: + +```{r net-singleton-color, fig.height=8} +panelview(sim_unbalanced, ~ 1, index = c("unit", "time"), type = "network", + singleton.color = "#FF8C00", + main = "Custom Singleton Color (Dark Orange)") +``` + +### Node size and edge width + +Use `node.size` to control node size and `edge.width` for edge thickness: + +```{r net-node-edge-size, fig.height=8} +panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network", + node.size = 5, edge.width = 1, show.labels = "all", + main = "Larger Nodes and Thicker Edges") +``` + +### Other options + +- `show.singletons = FALSE`: do not highlight singletons. +- `highlight.components = FALSE`: do not draw convex hulls around connected components. +- `show.labels = "singletons"`: label only singleton nodes (useful for large panels). +- `legendOff = TRUE`: hide the legend. +- `main = ""`: suppress the title. + + +## Accessing diagnostics + +When `type = "network"`, `panelview()` invisibly returns a list with the full graph structure. This allows programmatic inspection beyond what the plot shows. + +```{r net-access, fig.height=6} +p.network <- panelview(sim_unbalanced, ~ 1, + index = c("unit", "time"), type = "network", + main = "") +``` + +### Identifying singletons in the data + +The `$singletons` element is a data frame with one column per fixed-effect dimension, plus a `singleton_fe` column indicating which dimension the singleton belongs to. Each row shows the singleton node and its connected FE levels: + +```{r} +p.network$singletons +``` + +The `singleton_fe` column tells you which dimension is the singleton. To extract just the singleton units: + +```{r} +p.network$singletons[p.network$singletons$singleton_fe == "unit", ] +``` + +### Identifying duplicate observations + +The `$multi_edges` data frame has one row per duplicated fixed-effect combination, with columns named after the fixed-effect dimensions and a `count` column: + +```{r} +p.dup$multi_edges +``` + +To find the original rows in the data that correspond to duplicated combinations: + +```{r} +## rows where (abb, year) is a duplicated combination +dup_idx <- duplicated(sim_turnout_dup[, c("abb", "year")]) | + duplicated(sim_turnout_dup[, c("abb", "year")], fromLast = TRUE) +sim_turnout_dup[dup_idx, ] +``` + +### Full return value + +| Element | Description | +|---------|-------------| +| `graph` | An **igraph** object for further analysis | +| `singletons` | Data frame with one column per FE dimension + `singleton_fe`: rows are singleton nodes with their connected FE levels | +| `multi_edges` | Data frame with FE columns + `count`: duplicated combinations | +| `components` | Component membership vector | +| `n_components` | Number of connected components | +| `plot` | The **ggplot2** object for further customization | + +The **igraph** object can be used for additional graph analysis: + +```{r net-igraph} +library(igraph) + +## degree distribution +deg <- degree(p.network$graph) +summary(deg) + +## number of connected components +p.network$n_components +``` diff --git a/tutorial/_quarto.yml b/tutorial/_quarto.yml index 29fa616..fd35c89 100644 --- a/tutorial/_quarto.yml +++ b/tutorial/_quarto.yml @@ -11,6 +11,7 @@ book: - 01-treat.Rmd - 02-outcome.Rmd - 03-bivariate.Rmd + - 04-network.Rmd - aa-changelog.Rmd - references.qmd diff --git a/tutorial/aa-changelog.Rmd b/tutorial/aa-changelog.Rmd index 3169bb5..1c2b821 100644 --- a/tutorial/aa-changelog.Rmd +++ b/tutorial/aa-changelog.Rmd @@ -1,5 +1,22 @@ # Changelog {#sec-changelog .unnumbered} +## v1.3.0 + +(2026-03-21) + +* Added `type = "network"` for visualizing the connectivity structure of panel data as a $k$-partite graph, following the graph-theoretic framework of @correia2016. Features: + - Supports 2 or more sets of fixed effects (e.g., unit $\times$ time, worker $\times$ firm $\times$ year) + - Identifies and highlights **singletons** (degree-1 nodes) + - Detects **connected components** with convex hull shading + - Handles **non-unique observations** via weighted (thicker) edges + - Three layout algorithms: Fruchterman--Reingold (`"fr"`), bipartite, circle + - Formula support: `Y ~ D + X` drops rows with missing values before building the graph + - Programmatic return: `$singletons`, `$multi_edges`, `$n_components`, `$graph`, `$plot` +* New parameters: `show.singletons`, `highlight.components`, `layout`, `node.size`, `show.labels`, `edge.color`, `edge.alpha`, `edge.width`, `singleton.color` +* Added **igraph** to `Suggests` (conditional dependency for `type = "network"`) +* Added tutorial chapter: [Network Structure](#sec-network) +* Backward-compatible aliases: `type = "graph"` and `type = "singleton"` both map to `"network"` + ## v1.2.1 (2026-03-20) diff --git a/tutorial/index.qmd b/tutorial/index.qmd index ea408f7..9cf5601 100644 --- a/tutorial/index.qmd +++ b/tutorial/index.qmd @@ -4,25 +4,57 @@ This manual serves as a user guide for the **panelView** package in R, which vis ## Installation -```{r clear-environment, eval = FALSE} -# From CRAN +| Source | Version | Date | Features | +|--------|---------|------|----------| +| CRAN | 1.2.1 | 2026-03-20 | Treatment, outcome, bivariate plots | +| GitHub (`master`) | 1.2.1 | 2026-03-20 | Same as CRAN | +| GitHub (`dev`) | 1.3.0 | 2026-03-21 | + `type = "network"`: k-partite graph, singletons, weighted edges | + +```{r install-cran, eval = FALSE} +# From CRAN (stable release) install.packages("panelView") ``` -```{r install-github-dev, eval = FALSE, message = FALSE, warning = FALSE, cache = FALSE} -# Development version +```{r install-github-stable, eval = FALSE, message = FALSE, warning = FALSE, cache = FALSE} +# Stable GitHub version (master branch) devtools::install_github("xuyiqing/panelView") ``` +```{r install-github-dev-branch, eval = FALSE, message = FALSE, warning = FALSE, cache = FALSE} +# Development version (dev branch, includes type = "network") +devtools::install_github("xuyiqing/panelView@dev") +``` + ```{r check-version} -# Check version +# Check installed version installed.packages()["panelView", "Version"] ``` +**panelView** depends on the following packages, which should be installed automatically when **panelView** is being installed. You can also install them manually. + +```{r install-deps, eval = FALSE} +install_all <- function(packages) { + installed_pkgs <- installed.packages()[, "Package"] + for (pkg in packages) { + if (!pkg %in% installed_pkgs) { + install.packages(pkg) + } + } +} +packages <- c("ggplot2", "gridExtra", "grid", "dplyr") +install_all(packages) +``` + +The `type = "network"` plot additionally requires **igraph**, which is an optional (suggested) dependency: + +```{r install-igraph, eval = FALSE} +install.packages("igraph") +``` + ## Quick-reference -**panelView** has one main function, `panelview()`, with three core capabilities: +**panelView** has one main function, `panelview()`, with four core capabilities: 1. **Treatment status & missingness** (`type = "treat"`, `type = "missing"`) — plot which units are treated in which periods, highlight missing values, and summarize treatment histories. @@ -30,6 +62,8 @@ installed.packages()["panelView", "Version"] 3. **Bivariate relationships** (`type = "bivariate"`) — visualize the relationship between a treatment variable and an outcome variable, either in aggregate or unit by unit. +4. **Network structure** (`type = "network"`) — visualize the connectivity structure of panel data as a $k$-partite graph, identifying singletons, connected components, and duplicate observations. Supports two or more sets of fixed effects. See @correia2016 for the graph-theoretic framework. + @@ -37,8 +71,8 @@ installed.packages()["panelView", "Version"] |---|---|---| | `formula` | formula | `Y ~ D + X1 + X2`; first RHS variable is the treatment | | `data` | data.frame | Long-format panel data | -| `index` | character(2) | `c("unit_var", "time_var")` | -| `type` | character | `"treat"` / `"missing"` / `"outcome"` / `"bivariate"` | +| `index` | character | `c("unit_var", "time_var")`; length $\geq 2$ for `type = "network"` | +| `type` | character | `"treat"` / `"missing"` / `"outcome"` / `"bivariate"` / `"network"` | | `Y`, `D` | character | Alternative to formula: variable names as strings | | `by.timing` | logical | Sort units by timing of first treatment | | `pre.post` | logical | Distinguish pre- and post-treatment periods (binary D only) | @@ -58,17 +92,31 @@ installed.packages()["panelView", "Version"] Note that *Y*, *D*, and *X* are merely labels; they can be any variables in a panel dataset. +## Shipped datasets + +**panelView** ships three datasets for examples and testing: + +| Dataset | Description | Units | Periods | Balance | +|---------|-------------|-------|---------|---------| +| `turnout` | US state voter turnout and election reform policies | 47 states | 24 elections (1920--2000) | Balanced | +| `capacity` | State institutional capacity across countries | 47 countries | 44 years (1960--2003) | Balanced | +| `simdata` | Simulated panel with staggered binary treatment | 200 units | 15 periods | Balanced | + +All three are balanced panels. The [network structure chapter](#sec-network) uses inline simulation code to construct unbalanced panels that showcase singletons, connected components, and multi-way fixed effects. + ## Organization - [Chapter @sec-treat] — Plotting treatment status and missingness - [Chapter @sec-outcome] — Plotting outcome trajectories - [Chapter @sec-bivariate] — Plotting bivariate relationships +- [Chapter @sec-network] — Visualizing network structure: singletons, connected components, and multi-way fixed effects ## Contributors - [Yiqing Xu](https://yiqingxu.org/){target="_blank"} - [Licheng Liu](https://liulch.github.io/){target="_blank"} - [Hongyu Mou](https://hongyumou.github.io/){target="_blank"} +- StatsClaw (Agentic System for Statistical Software Development) ## Report bugs diff --git a/tutorial/references.bib b/tutorial/references.bib index c66cc50..6c5bd9e 100644 --- a/tutorial/references.bib +++ b/tutorial/references.bib @@ -8,3 +8,11 @@ @article{mou2022 pages = {1--20}, doi = {10.18637/jss.v107.i07} } + +@unpublished{correia2016, + title = {A Feasible Estimator for Linear Models with Multi-Way Fixed Effects}, + author = {Correia, Sergio}, + year = {2016}, + note = {Working paper, Duke University}, + url = {http://scorreia.com/research/hdfe.pdf} +} From 4b455bb1ee4aa7501bc4535012538214bab143ae Mon Sep 17 00:00:00 2001 From: Yiqing Xu <7664920+xuyiqing@users.noreply.github.com> Date: Sat, 21 Mar 2026 16:39:16 -0700 Subject: [PATCH 02/10] tutorial: use consistent U-prefix naming for simulated units in ch4 Co-Authored-By: Claude Opus 4.6 (1M context) --- tutorial/04-network.Rmd | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/tutorial/04-network.Rmd b/tutorial/04-network.Rmd index 6728073..2edd135 100644 --- a/tutorial/04-network.Rmd +++ b/tutorial/04-network.Rmd @@ -76,17 +76,17 @@ sim_unbalanced <- expand.grid(unit = paste0("U", 1:20), time = 2001:2008, sim_unbalanced <- sim_unbalanced[sample(nrow(sim_unbalanced), round(nrow(sim_unbalanced) * 0.6)), ] -## add two singleton units (each observed in exactly one period) +## add two units that each appear in only one period (singletons) sim_unbalanced <- rbind(sim_unbalanced, - data.frame(unit = "Singleton_A", time = 2003)) + data.frame(unit = "U21", time = 2003)) sim_unbalanced <- rbind(sim_unbalanced, - data.frame(unit = "Singleton_B", time = 2007)) + data.frame(unit = "U22", time = 2007)) -## add a disconnected component +## add two units in a separate time range (disconnected component) sim_unbalanced <- rbind(sim_unbalanced, - data.frame(unit = "Iso_1", time = 2050), - data.frame(unit = "Iso_2", time = 2050), - data.frame(unit = "Iso_2", time = 2051)) + data.frame(unit = "U23", time = 2050), + data.frame(unit = "U24", time = 2050), + data.frame(unit = "U24", time = 2051)) ``` ```{r net-singleton-plot, fig.height=8} @@ -105,7 +105,7 @@ p.network$singletons The `$n_components` element reports the number of connected components --- groups of units and time periods that share no observations with each other: ```{r} -## two components: the main group and the {Iso_1, Iso_2, 2050, 2051} cluster +## two components: the main group and the {U23, U24, 2050, 2051} cluster p.network$n_components ``` From eebd82fd6c873dd6b162895d5f7f32c5a206d395 Mon Sep 17 00:00:00 2001 From: Yiqing Xu <7664920+xuyiqing@users.noreply.github.com> Date: Sat, 21 Mar 2026 16:41:52 -0700 Subject: [PATCH 03/10] tutorial.Rproj --- tutorial/tutorial.Rproj | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 tutorial/tutorial.Rproj diff --git a/tutorial/tutorial.Rproj b/tutorial/tutorial.Rproj new file mode 100644 index 0000000..8e3c2eb --- /dev/null +++ b/tutorial/tutorial.Rproj @@ -0,0 +1,13 @@ +Version: 1.0 + +RestoreWorkspace: Default +SaveWorkspace: Default +AlwaysSaveHistory: Default + +EnableCodeIndexing: Yes +UseSpacesForTab: Yes +NumSpacesForTab: 2 +Encoding: UTF-8 + +RnwWeave: Sweave +LaTeX: pdfLaTeX From f464986b2e4ecffe6f5df4f0f4733937d05847b0 Mon Sep 17 00:00:00 2001 From: Yiqing Xu Date: Sat, 21 Mar 2026 18:01:41 -0700 Subject: [PATCH 04/10] Update DESCRIPTION and ch4 tutorial: add network functionality, improve structure Co-Authored-By: Claude Opus 4.6 (1M context) --- DESCRIPTION | 2 +- tutorial/04-network.Rmd | 41 ++++++++++++++++++++++++----------------- 2 files changed, 25 insertions(+), 18 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index fb88967..c7c2ad5 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -9,7 +9,7 @@ Authors@R: person("Hongyu", "Mou", , "hongyumou@g.ucla.edu", role = c("aut")), person("Licheng", "Liu", ,"liulch@mit.edu", role = c("aut"))) Maintainer: Yiqing Xu -Description: Visualizes panel data. It has three main functionalities: (1) it plots the treatment status and missing values in a panel dataset; (2) it visualizes the temporal dynamics of a main variable of interest; (3) it depicts the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate. For details, see . +Description: Visualizes panel data. It has four main functionalities: (1) it plots the treatment status and missing values in a panel dataset; (2) it visualizes the temporal dynamics of a main variable of interest; (3) it depicts the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate; (4) it displays the network structure of multi-way fixed effects as a k-partite graph, identifying connected components, singletons, and duplicate observations. For details, see . URL: https://yiqingxu.org/packages/panelview/, https://github.com/xuyiqing/panelView BugReports: https://github.com/xuyiqing/panelView/issues License: MIT + file LICENSE diff --git a/tutorial/04-network.Rmd b/tutorial/04-network.Rmd index 2edd135..e36e5e8 100644 --- a/tutorial/04-network.Rmd +++ b/tutorial/04-network.Rmd @@ -7,7 +7,9 @@ knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, ::: {.callout-note} The `type = "network"` feature is currently available only on the development branch. Install it with: -`devtools::install_github("xuyiqing/panelView@dev")` +```{r, eval = FALSE} +devtools::install_github("xuyiqing/panelView@dev") +``` ::: When estimating models with multiple sets of fixed effects, the connectivity structure of the data determines what can be identified. As @correia2016 shows, the fixed-effect estimation problem is equivalent to solving a linear system on a graph, and the structure of that graph has direct consequences for estimation and inference. @@ -18,30 +20,32 @@ Two features of this structure are particularly important: - **Non-unique observations** --- when multiple observations share the same combination of fixed-effect indices (e.g., the same worker appears at the same firm multiple times), these duplicates create weighted edges in the graph. Understanding where and how often such duplicates occur is essential for specifying the correct model and interpreting standard errors. -The `type = "network"` option in `panelview()` visualizes this structure directly. It constructs a $k$-partite graph from $k \geq 2$ sets of fixed effects, where: - -- **nodes** represent fixed-effect levels (units, time periods, firms, etc.); -- **edges** represent observed combinations. - -The resulting plot reveals connected components, singletons, and duplicate observations at a glance. We first load the package. - -```{r} -library(panelView) -data(panelView) -``` +The **panelView** package visualizes them with the `type = "network"` option, constructing a $k$-partite graph from $k \geq 2$ sets of fixed effects. -## Graph elements +## Network elements In the network plot, each distinct level of a fixed-effect dimension becomes a **node**. In a standard unit $\times$ time panel, there is one node for each unit and one node for each time period. Different fixed-effect dimensions are distinguished by shape: circles for the first dimension (e.g., units), squares for the second (e.g., time periods), triangles for the third, and so on. Each observation in the data creates an **edge** (link) between the nodes it connects. For example, if unit $A$ is observed at time $t$, an edge is drawn between node $A$ and node $t$. If the same combination appears multiple times (duplicate observations), the edge becomes thicker to reflect the count. -The plot also identifies: +In short, + +- **nodes** represent fixed-effect levels (units, time periods, firms, etc.); +- **edges** represent observed combinations. + +The resulting plot reveals connected components, singletons, and duplicate observations at a glance. - **Connected components**: groups of nodes that are linked to each other through some chain of edges. Nodes in different components share no observations and are shaded with distinct convex hulls. - **Singletons**: nodes with exactly one edge (degree 1), highlighted with a colored glow ring. +- **Duplicate observations**: when the same combination of fixed-effect levels appears more than once, the edge becomes thicker to reflect the count. +We first load the package. + +```{r} +library(panelView) +data(panelView) +``` ## Basic usage: Unit $\times$ Time @@ -155,9 +159,11 @@ sim_small <- data.frame( panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network", layout = "fr", show.labels = "all", main = "FR layout") panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network", - layout = "bipartite", show.labels = "all", main = "Bipartite layout") + layout = "bipartite", show.labels = "all", + main = "Bipartite layout") panelview(sim_small, ~ 1, index = c("unit", "time"), type = "network", - layout = "circle", show.labels = "all", main = "Circle layout") + layout = "circle", show.labels = "all", + main = "Circle layout") ``` @@ -180,7 +186,8 @@ Each fixed-effect dimension is rendered with a distinct shape and color: ```{r net-tripartite-data} sim_workers <- data.frame( worker = c("Alice", "Alice", "Bob", "Bob", "Carol", "Carol", "Dave"), - firm = c("Google", "Meta", "Google", "Apple", "Meta", "Apple", "Netflix"), + firm = c("Google", "Meta", "Google", "Apple", + "Meta", "Apple", "Netflix"), year = c(2020, 2021, 2020, 2021, 2020, 2021, 2022) ) ``` From 98db491a7793d2e35aabae9ea20c7ce962c5c2ef Mon Sep 17 00:00:00 2001 From: Yiqing Xu <7664920+xuyiqing@users.noreply.github.com> Date: Wed, 8 Apr 2026 15:21:52 -0700 Subject: [PATCH 05/10] docs: add NEWS.md for 1.3.0 release Document network visualization feature and historical changelog for CRAN submission. Co-Authored-By: Claude Opus 4.6 (1M context) --- NEWS.md | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 NEWS.md diff --git a/NEWS.md b/NEWS.md new file mode 100644 index 0000000..0ee3a4f --- /dev/null +++ b/NEWS.md @@ -0,0 +1,121 @@ +# panelView 1.3.0 + +## New features + +* Add `type = "network"` to visualize the connectivity structure of multi-way + fixed effects as a k-partite graph (Correia 2016). The new plot identifies + singletons, connected components, and non-unique (duplicate) observations via + weighted edges, and supports two or more fixed-effect dimensions as well as + formula-based missingness. +* New parameters for the network plot: `show.singletons`, + `highlight.components`, `layout`, `node.size`, `show.labels`, `edge.color`, + `edge.alpha`, `edge.width`, and `singleton.color`. +* Add `igraph` to `Suggests` (used only by `type = "network"`). +* Add a new tutorial chapter (chapter 4) documenting the network + visualization, and expand the Quarto tutorial index and references. + +## Improvements + +* Update tutorial chapter 1 and use consistent `U`-prefix naming for simulated + units in chapter 4. +* Update `DESCRIPTION` to describe the new k-partite graph functionality. + +# panelView 1.2.1 + +## Bug fixes + +* Remove spurious runtime warnings: fix deprecated `group_by_all()` by + switching to `group_by(across(everything()))`; drop unused + `position = "identity"` and `stat = "identity"` in ggplot2 layers. +* Fix a `margin()` vector-argument warning and a false-positive `by.cohort` + warning. +* Address ggplot2 `size` deprecation and tighten `class()` comparisons. +* Fix a bivariate plot title bug. + +## Internal + +* Refactor monolithic `panelView.R` into focused plot files (one file per plot + type). +* Add a testthat suite (36 tests). +* Replace the legacy vignette with a Quarto-based manual and add a changelog + chapter. +* Add `ARCHITECTURE.md` with Mermaid diagrams documenting the package + structure. +* `R CMD check --as-cran`: 0 errors, 0 warnings, 0 notes. + +# panelView 1.1.17 + +Add links to the [JSS paper](https://doi.org/10.18637/jss.v107.i07). + +# panelView 1.1.16 + +* Add `collapse.history` to allow users to collapse units by treatment history + in a `treat` plot. +* Add `show.missing` to output missing-data summary statistics for the key + variables. +* Add `axis.lab.angle` to allow users to change the angle of the x-axis + labels. +* Allow `pre.post` to be applied to an `outcome` plot. +* Change the color scheme in the `outcome` plot. +* Miscellaneous bug fixes. + +# panelView 1.1.11 + +* Add `by.cohort` to plot average outcome trajectories of units sharing the + same treatment history (when the number of unique histories is under 20). + +# panelView 1.1.10 + +* Add `by.group.side` to arrange subfigures of `by.group = TRUE` in a row + rather than a column. +* Add `display.all` to show all units when the number of units exceeds 500 + (otherwise a random sample of 500 units is shown). + +# panelView 1.1.8 + +* Add `leave.gap` to keep gaps in time using white bars when the time variable + is unevenly spaced (e.g., due to missing data). +* Add `type = "missing"` to plot missingness in data. + +# panelView 1.1.7 + +* Rename the main function from `panelView` to `panelview` for consistency + with the Stata version. +* In outcome plots, use a dot to represent the last-period observation of a + unit that is treated in the last period. + +# panelView 1.1.6 + +* Plot time series of outcome and treatment in one graph + (`type = "bivar"`). By default, plots mean D and Y against time in the same + graph; use `by.unit = TRUE` to plot each unit. New `style`, `ylim`, and + `lwd` options control line/bar styles, axis limits, and line width. + +# panelView 1.1.4 + +* Add `treat.type` to control whether the treatment variable is treated as + continuous (`"continuous"`) or discrete (`"discrete"`). + +# panelView 1.1.2 + +* Rename plot `type`s: `"treat"` (formerly `"missing"`) for treatment status + and `"outcome"` (formerly `"raw"`) for raw outcomes. +* Allow more than two treatment levels. +* Add `pre.post` to distinguish pre- and post-treatment observations for + treated units in a DID setting. +* Rename `by.treatment` to `by.timing` and `treatment` to `ignore.treat`. +* Add fontsize options. + +# panelView 1.0.5 + +Fix typos. CRAN release. + +# panelView 1.0.4 + +* Allow plotting treated units on top of control units in the "missing" plot. +* Streamline the `color` option for both the "missing" and "raw" plots. + +# panelView 1.0.3 + +* Allow changing the color of bricks in the "missing" plot. +* Allow leaving the treatment blank in both the "missing" and "raw" plots. From ccb4e58695b00f7694886c69474c1eb7c090a0f9 Mon Sep 17 00:00:00 2001 From: Yiqing Xu Date: Wed, 13 May 2026 22:59:23 -0700 Subject: [PATCH 06/10] v1.3.1: modern theme refresh, new theme/group.mean.overlay arguments * v1.3.1: modern theme refresh, new theme/group.mean.overlay arguments Visual refresh across all four plot types (treat / outcome / bivariate / network): * Plain left-aligned titles at base size 11, white background for status heatmaps, subtle major-x gridlines for trajectories, thin gray dashed treatment-onset marker (replaces prior thick white separator). * Multi-level discrete-status palette refreshed to a muted, perceptually distinct sequence; binary blue cascade and continuous gradient unchanged. * Outcome control trajectories lightened to grey75 so dense overlapping lines no longer darken into a near-black band. * Outcome draw order fixed: controls now render at the bottom, treated-pre next, treated-post on top -- via three explicit geom_line layers (single geom_line(group=id) was drawing in id order, not type order, so treated trajectories were being covered by the control band). * by.group / by.group.side suptitle (textGrob path) routed through the modern title style via a new .pv_modern_top_grob helper (28 call sites). New arguments: * `theme = c("default", "red")`. The "red" theme activates a high-contrast publication-paper recipe: gray control / dark-gray treated-pre / brick-red treated-post for status and outcome plots, plus a solid black dashed treatment-onset marker. * `group.mean.overlay = FALSE`. When TRUE, the outcome plot dims per-unit trajectories and overlays a heavy group-mean line plus a 10-90% quantile ribbon per treatment-status group. Currently scoped to the main DID continuous-outcome path. Deprecations: * `theme.bw = FALSE` is soft-deprecated -- still functional for backwards compatibility but emits a one-time warning per call. The tutorial no longer demonstrates it. Tutorial: * New "Themes" subsections in 01-treat and 02-outcome chapters; new red- theme demo in 03-bivariate chapter. * Demo of theme="red" handling all three categories (Under Control, Under Treatment, Missing) on the capacity dataset. * New v1.3.1 changelog entry; index version table + argument table updated for the two new arguments. Validation: R CMD check --as-cran on the built tarball returns Status: OK (0 errors / 0 warnings / 0 notes). 116/116 testthat tests pass. Clean-cache Quarto book render of all 7 chapters succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) * docs: log v1.3.1 CRAN acceptance in tutorial book Update install table to CRAN 1.3.1 across all three rows; add CRAN release line (2026-05-13) to v1.3.1 changelog entry. Co-Authored-By: Claude Opus 4.7 (1M context) * docs: link StatsClaw in tutorial contributors Co-Authored-By: Claude Opus 4.7 (1M context) --------- Co-authored-by: Yiqing Xu <7664920+xuyiqing@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) --- DESCRIPTION | 4 +- NEWS.md | 28 ++++++ R/panelView.R | 20 ++++ R/plot-bivariate.R | 22 +++-- R/plot-network.R | 7 +- R/plot-outcome.R | 199 +++++++++++++++++++++++++------------- R/plot-treat.R | 58 ++++++++--- R/theme-modern.R | 41 ++++++++ man/panelView.Rd | 3 + tutorial/01-treat.Rmd | 35 ++++++- tutorial/02-outcome.Rmd | 42 +++++--- tutorial/03-bivariate.Rmd | 22 ++++- tutorial/aa-changelog.Rmd | 11 +++ tutorial/index.qmd | 10 +- 14 files changed, 385 insertions(+), 117 deletions(-) create mode 100644 R/theme-modern.R diff --git a/DESCRIPTION b/DESCRIPTION index c7c2ad5..51564f3 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,8 @@ Package: panelView Type: Package Title: Visualizing Panel Data -Version: 1.3.0 -Date: 2026-03-21 +Version: 1.3.1 +Date: 2026-05-13 Authors@R: c(person("Yiqing", "Xu", ,"yiqingxu@stanford.edu", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-2041-6671")), diff --git a/NEWS.md b/NEWS.md index 0ee3a4f..f525d01 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,31 @@ +# panelView 1.3.1 + +## Visual refresh + +* Plot defaults updated for a cleaner, publication-ready look across all four + plot types: plain left-aligned titles, base font size 11, white background + for status heatmaps, subtle major-x gridlines for trajectories, and a thin + gray dashed treatment-onset marker (replaces the prior thick white line). +* Multi-level discrete treatment palette refreshed to a muted, perceptually + distinct sequence; binary and continuous palettes unchanged. +* Control-trajectory gray lightened to `grey75` so dense overlaps no longer + darken into a near-black band. + +## New arguments + +* `theme = c("default", "red")`: optional theme switch. `"red"` activates a + high-contrast publication recipe with gray control / brick-red treated-post + for status and outcome plots and a solid black dashed treatment-onset line. +* `group.mean.overlay = FALSE`: opt-in for the outcome plot. Dims per-unit + trajectories and overlays a heavy group-mean line plus a 10--90% quantile + ribbon per group. Currently scoped to the main DID continuous-outcome path. + +## Deprecations + +* `theme.bw = FALSE` is soft-deprecated and emits a one-time warning. The + legacy gray-panel look is no longer demonstrated in the tutorial and may + be removed in a future major release. + # panelView 1.3.0 ## New features diff --git a/R/panelView.R b/R/panelView.R index 8e3ab21..b99cce0 100644 --- a/R/panelView.R +++ b/R/panelView.R @@ -19,6 +19,8 @@ panelview <- function(data, # a data frame (long-form) by.group.side = FALSE, by.timing = FALSE, theme.bw = TRUE, + theme = "default", ## "default" (panelView blues) or "red" + group.mean.overlay = FALSE, xlim = NULL, ylim = NULL, xlab = NULL, @@ -208,6 +210,24 @@ panelview <- function(data, # a data frame (long-form) stop("\"theme.bw\" is not a logical flag.") } + if (isFALSE(as.logical(theme.bw))) { + warning( + "\"theme.bw = FALSE\" is deprecated and may be removed in a future ", + "release. The default (TRUE) renders the recommended look; set ", + "\"theme = 'red'\" for a high-contrast publication recipe.", + call. = FALSE + ) + } + + if (!is.character(theme) || length(theme) != 1 || + !theme %in% c("default", "red")) { + stop("\"theme\" must be one of \"default\" or \"red\".") + } + + if (is.logical(group.mean.overlay) == FALSE & !group.mean.overlay %in% c(0, 1)) { + stop("\"group.mean.overlay\" is not a logical flag.") + } + if (is.logical(by.timing) == FALSE & !by.timing%in%c(0, 1)) { stop("\"by.timing\" is not a logical flag.") } diff --git a/R/plot-bivariate.R b/R/plot-bivariate.R index c741ca9..95616af 100644 --- a/R/plot-bivariate.R +++ b/R/plot-bivariate.R @@ -50,15 +50,19 @@ ## plot color setting raw.color <- NULL - ## color setting + ## color setting if (is.null(color)==TRUE) { #not indicate color - if (theme.bw == FALSE) { # not theme.bw (black and white theme) - raw.color <- c("dodgerblue4", "lightsalmon2") - } + if (identical(theme, "red")) { + ## cip theme: brick-red accent + gray neutral + raw.color <- c("#B83A4B", "grey40") + } + else if (theme.bw == FALSE) { # not theme.bw (black and white theme) + raw.color <- c("dodgerblue4", "lightsalmon2") + } else { # theme.bw raw.color <- c("black","azure4") } - } + } else { #indicate color if (length(color) != 2) { stop("Length of \"color\" should be equal to 2.\n") @@ -112,13 +116,13 @@ p <- ggplot(na.omit(data.means), aes(x=time)) if (theme.bw == TRUE) { - p <- p + theme_bw() + p <- p + .pv_modern_theme_bw() } p <- p + theme(legend.position = legend.pos, aspect.ratio = 1/2, axis.text.x = element_text(angle = angle, hjust=x.h, vjust=x.h), - plot.title = element_text(size=cex.main, hjust = 0.5, face="bold",margin = margin(8, 0, 8, 0))) + plot.title = .pv_modern_title(cex.main, theme.bw)) if (is.null(ylim) == TRUE) { @@ -236,12 +240,12 @@ p <- ggplot(na.omit(data), aes(x=time)) if (theme.bw == TRUE) { - p <- p + theme_bw() + p <- p + .pv_modern_theme_bw() } p <- p + theme(legend.position = legend.pos, axis.text.x = element_text(angle = 90, hjust=x.h, vjust=0.5), - plot.title = element_text(size=cex.main, hjust = 0.5, face="bold",margin = margin(8, 0, 8, 0))) + plot.title = .pv_modern_title(cex.main, theme.bw)) if (is.null(ylim) == TRUE) { ylim <- c(min(data$outcome, na.rm = TRUE),max(data$outcome, na.rm = TRUE)) diff --git a/R/plot-network.R b/R/plot-network.R index 1c9abee..6dfa641 100644 --- a/R/plot-network.R +++ b/R/plot-network.R @@ -383,9 +383,12 @@ ## title if (!is.null(main)) { + title.hjust <- if (isTRUE(theme.bw)) 0 else 0.5 + title.face <- if (isTRUE(theme.bw)) "plain" else "bold" p <- p + ggtitle(main) + - theme(plot.title = element_text(size = cex.main, hjust = 0.5, - color = "#333333")) + theme(plot.title = element_text(size = cex.main, hjust = title.hjust, + face = title.face, color = "#333333", + margin = margin(8, 0, 8, 0))) } ## legend diff --git a/R/plot-outcome.R b/R/plot-outcome.R index ce4249a..c66ef6a 100644 --- a/R/plot-outcome.R +++ b/R/plot-outcome.R @@ -7,6 +7,12 @@ data <- na.omit(data) } + ## draw order: controls bottom, treated on top + if ("type" %in% names(data)) { + data <- data[order(match(data$type, c("co", "tr", "tr.pst"))), , + drop = FALSE] + } + ## theme p <- ggplot(data) + xlab(xlab) + ylab(ylab) @@ -17,27 +23,31 @@ set.linewidth = rep(0.5, length(limits)) if (theme.bw == TRUE) { - p <- p + theme_bw() + - theme(panel.grid.major = element_blank(), - panel.grid.minor = element_blank(), - axis.text.x = element_text(angle = angle, hjust=x.h, vjust=x.h), - plot.title = element_text(size=cex.main, hjust = 0.5, face="bold",margin = margin(8, 0, 8, 0))) + p <- p + .pv_modern_theme_bw() + + theme(axis.text.x = element_text(angle = angle, hjust=x.h, vjust=x.h), + plot.title = .pv_modern_title(cex.main, theme.bw)) } else { p <- p + theme(axis.text.x = element_text(angle = angle, hjust=x.h, vjust=x.h), - plot.title = element_text(size=cex.main, hjust = 0.5, face="bold",margin = margin(8, 0, 8, 0))) + plot.title = .pv_modern_title(cex.main, theme.bw)) } ## main if (outcome.type == "continuous") { - p <- p + geom_line(aes(time, outcome, - colour = type, - linewidth = type, - linetype = type, - group = id)) + line.aes <- aes(time, outcome, + colour = type, + linewidth = type, + linetype = type, + group = id) + for (.t in c("co", "tr", "tr.pst")) { + d.t <- data[data$type == .t, , drop = FALSE] + if (nrow(d.t) > 0) { + p <- p + geom_line(data = d.t, mapping = line.aes) + } + } - data1 <- subset(data, data$last_dot==1) + data1 <- subset(data, data$last_dot==1) p <- p + geom_point(data = data1, aes(time, outcome), colour = raw.color[2], @@ -108,10 +118,15 @@ ## color setting if (is.null(color) == TRUE) { if (ignore.treat == FALSE) { - if (outcome.type == "continuous") { - raw.color <- c("#5e5e5e50", "#FC8D62", "red") + if (identical(theme, "red")) { + ## red theme: light gray control, darker gray treated-pre + ## (drawn on top of controls), brick-red treated-post. + raw.color <- c("grey75", "grey30", "#B83A4B") } else { - raw.color <- c("#5e5e5e60", "#FC8D62", "red") + ## default theme: light gray control (so dense overlapping + ## lines don't darken into a band), medium blue treated-pre, + ## dark navy treated-post. Matches the binary status palette. + raw.color <- c("grey75", "#4671D5", "#06266F") } if (type == "outcome" && (staggered == 0 | by.group == TRUE | pre.post == FALSE)) { # two conditions only raw.color <- raw.color[c(1,3)] @@ -119,7 +134,7 @@ } else { # ignore treat raw.color <- "#5e5e5e50" } - } else { # color is specified + } else { # color is specified if (ignore.treat == FALSE) { if (staggered == 0 | pre.post == FALSE) { # with reversals or two groups only if (length(color) != 2) { @@ -171,11 +186,11 @@ p <- ggplot(data) + xlab(xlab) + ylab(ylab) if (theme.bw == TRUE) { - p <- p + theme_bw() + p <- p + .pv_modern_theme_bw() } p <- p + theme(legend.position = legend.pos, axis.text.x = element_text(angle = angle, hjust=x.h, vjust=x.h), - plot.title = element_text(size=cex.main, hjust = 0.5, face="bold",margin = margin(8, 0, 8, 0))) + plot.title = .pv_modern_title(cex.main, theme.bw)) if (outcome.type == "continuous") { ## main @@ -490,41 +505,92 @@ + ## draw order: controls first (bottom), treated-pre on top, + ## treated-post on very top, so treated trajectories cover + ## the dense control band rather than vice versa. + if ("type" %in% names(data)) { + data <- data[order(match(data$type, c("co", "tr", "tr.pst"))), , + drop = FALSE] + } + ## theme p <- ggplot(data) + xlab(xlab) + ylab(ylab) if (theme.bw == TRUE) { - p <- p + theme_bw() + p <- p + .pv_modern_theme_bw() } p <- p + theme(legend.position = legend.pos, axis.text.x = element_text(angle = angle, hjust=x.h, vjust=x.h), - plot.title = element_text(size=cex.main, hjust = 0.5, face="bold",margin = margin(8, 0, 8, 0))) - + plot.title = .pv_modern_title(cex.main, theme.bw)) + if (DID == TRUE && Ntr >= 1) { if (exists("time.bf")) { if (time.bf >= min(show) && time.bf <= max(show)) { - p <- p + geom_vline(xintercept=time.bf, colour="white", linewidth = 2) + if (theme.bw == TRUE) { + onset.col <- if (identical(theme, "red")) "black" else "grey50" + onset.lw <- if (identical(theme, "red")) 0.6 else 0.4 + p <- p + geom_vline(xintercept=time.bf, colour=onset.col, + linewidth=onset.lw, linetype="dashed") + } else { + p <- p + geom_vline(xintercept=time.bf, colour="white", linewidth = 2) + } if (shade.post == TRUE) { p <- p + annotate("rect", xmin= time.bf, xmax= Inf, - ymin=-Inf, ymax=Inf, alpha = .3) - } + ymin=-Inf, ymax=Inf, alpha = .3) + } } } } - ## main - p <- p + geom_line(aes(time, outcome, - colour = type, - linewidth = type, - linetype = type, - group = id)) + ## main: draw controls first (bottom), then treated-pre, then + ## treated-post -- three separate layers because geom_line draws + ## polylines in group-value order, not row order. + spaghetti.alpha <- if (isTRUE(group.mean.overlay)) 0.18 else 1 + line.aes <- aes(time, outcome, + colour = type, + linewidth = type, + linetype = type, + group = id) + for (.t in c("co", "tr", "tr.pst")) { + d.t <- data[data$type == .t, , drop = FALSE] + if (nrow(d.t) > 0) { + p <- p + geom_line(data = d.t, mapping = line.aes, + alpha = spaghetti.alpha) + } + } - data1 <- subset(data, data$last_dot==1) + data1 <- subset(data, data$last_dot==1) p <- p + geom_point(data = data1, aes(time, outcome), colour = raw.color[3], - size = 0.5) - - + size = 0.5, + alpha = spaghetti.alpha) + + ## opt-in: heavy group-mean overlay + 10-90% band + if (isTRUE(group.mean.overlay)) { + data.no.na <- data[!is.na(data$outcome), , drop = FALSE] + grp <- do.call(rbind, lapply(split(data.no.na, list(data.no.na$time, + data.no.na$type), + drop = TRUE), + function(d) data.frame(time = d$time[1], type = d$type[1], + mean = mean(d$outcome), + lo = stats::quantile(d$outcome, 0.10, names = FALSE), + hi = stats::quantile(d$outcome, 0.90, names = FALSE)))) + p <- p + geom_ribbon(data = grp, + aes(x = time, ymin = lo, ymax = hi, + fill = type, group = type), + alpha = 0.18, colour = NA, + inherit.aes = FALSE) + + geom_line(data = grp, + aes(x = time, y = mean, colour = type, + group = type), + linewidth = 1.0, + inherit.aes = FALSE) + + scale_fill_manual(limits = set.limits, + labels = set.labels, + values = set.colors, + guide = "none") + } + p <- p + scale_colour_manual(limits = set.limits, labels = set.labels, values =set.colors) + @@ -547,15 +613,18 @@ data <- na.omit(data) data$type <- factor(data$type, levels = c(-1, 0, 1), labels = c("co","tr","tr.pst")) + ## draw order: controls bottom, treated on top + data <- data[order(data$type), , drop = FALSE] + ## theme p <- ggplot(data) + xlab(xlab) + ylab(ylab) if (theme.bw == TRUE) { - p <- p + theme_bw() + p <- p + .pv_modern_theme_bw() } p <- p + theme(legend.position = legend.pos, axis.text.x = element_text(angle = angle, hjust=x.h, vjust=x.h), - plot.title = element_text(size=cex.main, hjust = 0.5, face="bold",margin = margin(8, 0, 8, 0))) - + plot.title = .pv_modern_title(cex.main, theme.bw)) + ## main plot p <- p + geom_jitter(width = 0.15, height = 0.15, aes(x = time, y = outcome, colour = type, shape = type)) @@ -759,10 +828,10 @@ legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p1 + theme(legend.position="none"), legend, nrow = 2, heights = c (1, 1/5)), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } else { suppressWarnings(grid.arrange(p1 + theme(legend.position="none"), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } else if (2%in%unit.type) { @@ -773,10 +842,10 @@ legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p2 + theme(legend.position="none"), legend, nrow = 2, heights = c (1, 1/5)), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } else { suppressWarnings(grid.arrange(p2 + theme(legend.position="none"), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } else if (3%in%unit.type) { @@ -787,10 +856,10 @@ legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p3 + theme(legend.position="none"), legend, nrow = 2, heights = c (1, 1/5)), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } else { suppressWarnings(grid.arrange(p3 + theme(legend.position="none"), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } @@ -807,11 +876,11 @@ legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p2 + theme(legend.position="none"), p3 + theme(legend.position="none"), legend, nrow = 3, heights = c (1, 1, 1/5)), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } else { suppressWarnings(grid.arrange(p2 + theme(legend.position="none"), p3 + theme(legend.position="none"), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } else if (!2%in%unit.type) { @@ -824,11 +893,11 @@ legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p1 + theme(legend.position="none"), p3 + theme(legend.position="none"), legend, nrow = 3, heights = c (1, 1, 1/5)), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } else { suppressWarnings(grid.arrange(p1 + theme(legend.position="none"), p3 + theme(legend.position="none"), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } else if (!3%in%unit.type) { @@ -841,11 +910,11 @@ legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p1 + theme(legend.position="none"), p2 + theme(legend.position="none"), legend, nrow = 3, heights = c (1, 1, 1/5)), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } else { suppressWarnings(grid.arrange(p1 + theme(legend.position="none"), p2 + theme(legend.position="none"), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } @@ -863,12 +932,12 @@ legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p1 + theme(legend.position="none"), p2 + theme(legend.position="none"), p3 + theme(legend.position="none"), legend, nrow = 4, heights = c (1, 1, 1, 1/5)), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } else { suppressWarnings(grid.arrange(p1 + theme(legend.position="none"), p2 + theme(legend.position="none"), p3 + theme(legend.position="none"), - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } @@ -884,10 +953,10 @@ suppressWarnings(g <- ggplotGrob(p1 + theme(legend.position="bottom"))$grobs) legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p1 + theme(legend.position="none"), - nrow =1, top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2))),legend,nrow=2,heights=c(1,1/8))) + nrow =1, top = .pv_modern_top_grob(main, cex.main.top, theme.bw)),legend,nrow=2,heights=c(1,1/8))) } else { suppressWarnings(grid.arrange(p1 + theme(legend.position="none"), nrow =1, - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } else if (2%in%unit.type) { @@ -897,10 +966,10 @@ suppressWarnings(g <- ggplotGrob(p2 + theme(legend.position="bottom"))$grobs) legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p2 + theme(legend.position="none"), - nrow =1, top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2))),legend,nrow=2,heights=c(1,1/8))) + nrow =1, top = .pv_modern_top_grob(main, cex.main.top, theme.bw)),legend,nrow=2,heights=c(1,1/8))) } else { suppressWarnings(grid.arrange(p2 + theme(legend.position="none"), nrow =1, - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } else if (3%in%unit.type) { @@ -910,10 +979,10 @@ suppressWarnings(g <- ggplotGrob(p3 + theme(legend.position="bottom"))$grobs) legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p3 + theme(legend.position="none"), - nrow =1, top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2))),legend,nrow=2,heights=c(1,1/8))) + nrow =1, top = .pv_modern_top_grob(main, cex.main.top, theme.bw)),legend,nrow=2,heights=c(1,1/8))) } else { suppressWarnings(grid.arrange(p3 + theme(legend.position="none"), nrow =1, - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } @@ -929,11 +998,11 @@ suppressWarnings(g <- ggplotGrob(p2 + theme(legend.position="bottom"))$grobs) legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p2 + theme(legend.position="none"), p3 + theme(legend.position="none"), - nrow =1, top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2))),legend,nrow=2,heights=c(1,1/8))) + nrow =1, top = .pv_modern_top_grob(main, cex.main.top, theme.bw)),legend,nrow=2,heights=c(1,1/8))) } else { suppressWarnings(grid.arrange(p2 + theme(legend.position="none"), p3 + theme(legend.position="none"), nrow =1, - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } else if (!2%in%unit.type) { @@ -945,11 +1014,11 @@ suppressWarnings(g <- ggplotGrob(p1 + theme(legend.position="bottom"))$grobs) legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p1 + theme(legend.position="none"), p3 + theme(legend.position="none"), - nrow =1, top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2))),legend,nrow=2,heights=c(1,1/8))) + nrow =1, top = .pv_modern_top_grob(main, cex.main.top, theme.bw)),legend,nrow=2,heights=c(1,1/8))) } else { suppressWarnings(grid.arrange(p1 + theme(legend.position="none"), p3 + theme(legend.position="none"), nrow =1, - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } else if (!3%in%unit.type) { @@ -961,11 +1030,11 @@ suppressWarnings(g <- ggplotGrob(p1 + theme(legend.position="bottom"))$grobs) legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]] suppressWarnings(grid.arrange(arrangeGrob(p1 + theme(legend.position="none"), p2 + theme(legend.position="none"), - nrow =1, top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2))),legend,nrow=2,heights=c(1,1/8))) + nrow =1, top = .pv_modern_top_grob(main, cex.main.top, theme.bw)),legend,nrow=2,heights=c(1,1/8))) } else { suppressWarnings(grid.arrange(p1 + theme(legend.position="none"), p2 + theme(legend.position="none"), nrow =1, - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } @@ -985,12 +1054,12 @@ suppressWarnings(grid.arrange(arrangeGrob(p1 + theme(legend.position="none"), p2 + theme(legend.position="none"), p3 + theme(legend.position="none"), nrow =1, - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2))),legend,nrow=2,heights=c(1,1/8))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw)),legend,nrow=2,heights=c(1,1/8))) } else { suppressWarnings(grid.arrange(p1 + theme(legend.position="none"), p2 + theme(legend.position="none"), p3 + theme(legend.position="none"), nrow =1, - top = textGrob(main, gp = gpar(fontsize = cex.main.top,font=2)))) + top = .pv_modern_top_grob(main, cex.main.top, theme.bw))) } } diff --git a/R/plot-treat.R b/R/plot-treat.R index 3f99d06..f873119 100644 --- a/R/plot-treat.R +++ b/R/plot-treat.R @@ -46,8 +46,8 @@ if (d.bi == FALSE && ignore.treat == 0) { ## >2 treatment level - tr.col <- c("#66C2A5","#FC8D62","#8DA0CB","#E78AC3","#A6D854","#FFD92F","#E5C494", - "#FAFAD2", "#ADFF2F", "#87CEFA", "#1874CD", "#00008B") + tr.col <- c("#5B8AA6","#C66B5A","#6B7A99","#B07AA1","#BE8C3F","#7A8C5A", + "#4A6FA5","#A6573F","#506683","#8F5B7A","#A07733","#5E6E45") if (treat.type == "discrete") { for (i in 1:n.levels) { @@ -85,23 +85,34 @@ } else { ## binary treatment indicator + ## theme-dependent binary palette (control / treated-pre / treated-post) + if (identical(theme, "red")) { + pv.ctl <- "grey85" + pv.tpre <- "grey50" + pv.tpst <- "#B83A4B" + } else { + pv.ctl <- "#B0C4DE" + pv.tpre <- "#4671D5" + pv.tpst <- "#06266F" + } + if (0 %in% all) { ## have pre and post: general DID type data - + ## control if (-1 %in% all) { - col <- c(col,"#B0C4DE") + col <- c(col, pv.ctl) breaks <- c(breaks, -1) label <- c(label,"Controls") } - + ## treated pre - col <- c(col,"#4671D5") + col <- c(col, pv.tpre) breaks <- c(breaks, 0) label <- c(label,"Treated (Pre)") - + ## treated post if (1 %in% all) { - col <- c(col,"#06266F") + col <- c(col, pv.tpst) breaks <- c(breaks, 1) label <- c(label,"Treated (Post)") } @@ -110,7 +121,7 @@ ## control if (-1 %in% all) { - col <- c(col,"#B0C4DE") + col <- c(col, pv.ctl) breaks <- c(breaks, -1) if (ignore.treat == 0) { ## if (pre.post == TRUE) { @@ -121,12 +132,12 @@ } else { label <- c(label, "Observed") } - + } - ## treated + ## treated if (1 %in% all) { - col <- c(col,"#06266F") + col <- c(col, pv.tpst) breaks <- c(breaks, 1) ## if (pre.post == TRUE) { label <- c(label,"Under Treatment") @@ -262,8 +273,12 @@ ## background color if (is.null(background)==FALSE) { grid.color <- border.color <- background.color <- legend.color <- background + } else if (theme.bw == TRUE) { + ## modern look: white plot/legend background; tile borders stay subtle grey + grid.color <- border.color <- "grey90" + background.color <- legend.color <- "white" } else { - grid.color <- border.color <- background.color <- legend.color <- "grey90" + grid.color <- border.color <- background.color <- legend.color <- "grey90" } @@ -279,7 +294,12 @@ p <- p + geom_tile() } - p <- p + labs(x = xlab, y = ylab, title=main) + theme_bw() + p <- p + labs(x = xlab, y = ylab, title=main) + if (theme.bw == TRUE) { + p <- p + theme_bw(base_size = 11) + } else { + p <- p + theme_bw() + } #if (treat.type == "discrete") { p <- p + scale_fill_manual("Treatment level: ", breaks = breaks, values = col, labels=label) @@ -290,6 +310,14 @@ #p <- p + scale_fill_gradient(low = col[1], high = col[2], na.value="white") + guides(fill=guide_legend(title= label)) #} + if (theme.bw == TRUE) { + title.style <- element_text(size=cex.main, hjust = 0, face="plain", + margin = margin(8, 0, 8, 0)) + } else { + title.style <- element_text(size=cex.main, hjust = 0.5, face="bold", + margin = margin(8, 0, 8, 0)) + } + p <- p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), @@ -307,7 +335,7 @@ legend.position = legend.pos, legend.margin = margin(0, 5, 5, 0), legend.text = element_text(margin = margin(r = 10, unit = "pt"), size = cex.legend), - plot.title = element_text(size=cex.main, hjust = 0.5,face="bold",margin = margin(8, 0, 8, 0))) + plot.title = title.style) if (axis.lab == "both") { diff --git a/R/theme-modern.R b/R/theme-modern.R new file mode 100644 index 0000000..c3acc86 --- /dev/null +++ b/R/theme-modern.R @@ -0,0 +1,41 @@ +## Modern theme helper (v1.4.0+). +## +## .pv_modern_title(cex.main, theme.bw): title element_text matching the +## modern look when theme.bw=TRUE (plain, left-aligned), or the legacy +## look (bold, centered) when theme.bw=FALSE. +## +## .pv_modern_theme_bw(cex.main, base_size=11): drop-in replacement for +## `theme_bw()` returning the modern base: theme_bw at base_size, no +## minor grid, only major-x grid retained. For trajectory plots only; +## tile / heatmap plots keep their own theme block (no gridlines). + +.pv_modern_title <- function(cex.main, theme.bw) { + if (isTRUE(theme.bw)) { + ggplot2::element_text(size = cex.main, hjust = 0, face = "plain", + margin = ggplot2::margin(8, 0, 8, 0)) + } else { + ggplot2::element_text(size = cex.main, hjust = 0.5, face = "bold", + margin = ggplot2::margin(8, 0, 8, 0)) + } +} + +.pv_modern_theme_bw <- function(base_size = 11) { + ggplot2::theme_bw(base_size = base_size) + + ggplot2::theme(panel.grid.minor = ggplot2::element_blank(), + panel.grid.major.y = ggplot2::element_blank(), + panel.grid.major.x = ggplot2::element_line(colour = "grey92", + linewidth = 0.3)) +} + +## Suptitle (top grob) for grid.arrange-based by.group / by.group.side +## layouts. Modern look: plain, left-aligned. Legacy: bold, centered. +.pv_modern_top_grob <- function(main, cex.main.top, theme.bw) { + if (isTRUE(theme.bw)) { + grid::textGrob(main, + x = grid::unit(0.02, "npc"), hjust = 0, + gp = grid::gpar(fontsize = cex.main.top, font = 1)) + } else { + grid::textGrob(main, + gp = grid::gpar(fontsize = cex.main.top, font = 2)) + } +} diff --git a/man/panelView.Rd b/man/panelView.Rd index 3597500..a2ea9e0 100644 --- a/man/panelView.Rd +++ b/man/panelView.Rd @@ -8,6 +8,7 @@ outcome.type = "continuous", treat.type = NULL, by.group = FALSE, by.group.side = FALSE, by.timing = FALSE, theme.bw = TRUE, + theme = "default", group.mean.overlay = FALSE, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, gridOff = FALSE, legendOff = FALSE, @@ -42,6 +43,8 @@ \item{by.group.side}{a logical flag indicating whether to arrange subfigures of \code{by.group = TRUE} in a row rather than in a column.} \item{by.timing}{a logic flag indicating whether the units should be sorted based on the timing of receiving the treatment for the treat plot.} \item{theme.bw}{a logical flag specifying whether to use a black-and-white theme.} + \item{theme}{a string specifying an optional named theme; one of \code{"default"} (the original panelView palette and onset-line style) or \code{"red"} (a high-contrast publication-paper recipe: gray control / brick-red treated-post for the binary status palette and outcome lines, with a solid-black dashed treatment-onset marker). Default \code{"default"}.} + \item{group.mean.overlay}{a logical flag for the outcome plot. When \code{TRUE}, per-unit trajectories are dimmed and a heavy group-mean line plus a 10--90\% quantile ribbon are overlaid per treatment-status group. Currently implemented for the main DID continuous-outcome path (\code{type = "outcome"}, \code{outcome.type = "continuous"}, \code{by.group = FALSE}); other variants ignore the argument. Default \code{FALSE}.} \item{xlim}{a two-element numeric vector specifying the range of x-axis. When the class of time variable is string, must specify the range of strings to be shown, e.g. \code{xlim=c(1,30)}.} \item{ylim}{a two-element numeric vector specifying the range of y-axis.} \item{xlab}{a string indicating the label of the x-axis.} diff --git a/tutorial/01-treat.Rmd b/tutorial/01-treat.Rmd index ea0856e..9c07d60 100644 --- a/tutorial/01-treat.Rmd +++ b/tutorial/01-treat.Rmd @@ -25,14 +25,13 @@ panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, xlab = "Year", ylab = "State") ``` -Use `by.timing = TRUE` to sort units by the timing of first treatment. The `background` option sets the panel background color; `cex.*` options control font sizes. +Use `by.timing = TRUE` to sort units by the timing of first treatment. ```{r treat1-2, fig.height=10} panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb", "year"), xlab = "Year", ylab = "State", by.timing = TRUE, - legend.labs = c("No EDR", "EDR"), background = "white", - cex.main = 20, cex.axis = 8, cex.lab = 12, cex.legend = 12) + legend.labs = c("No EDR", "EDR")) ``` For staggered adoption, `pre.post = TRUE` distinguishes pre- and post-treatment periods for treated units. @@ -110,6 +109,36 @@ panelview(Capacity ~ demo + lnpop + lngdp, axis.lab.angle = 90, by.timing = TRUE, axis.lab = "time") ``` +## Themes + +The `theme` argument selects the color recipe. The default (`"default"`) +uses panelView's classic blue progression. Setting `theme = "red"` +activates a high-contrast publication-paper recipe: control units fade to +a light gray and treated post-period cells pick up a brick-red accent. + +```{r treat2-theme-red, fig.height=8} +panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, + data = turnout, index = c("abb", "year"), + xlab = "Year", ylab = "State", by.timing = TRUE, + pre.post = TRUE, theme = "red", + legend.labs = c("Control States", + "Treated States (before EDR)", + "Treated States (after EDR)")) +``` + +The `red` theme also handles panels with missing observations. On +`capacity` (treatment turns on and off; some unit-year cells are +unobserved), the heatmap reads as three layers --- gray untreated cells, +brick-red treated cells, and white missing cells --- with all three shown +in the legend. + +```{r treat2-theme-red-missing, fig.height=10} +panelview(Capacity ~ demo + lnpop + lngdp, + data = capacity, index = c("ccode", "year"), + main = "Democracy and State Capacity", + axis.lab.gap = c(2, 10), theme = "red") +``` + ## Collapsing units by treatment history When the number of units is large, `collapse.history = TRUE` collapses units that share the same treatment history into a single row. The y-axis shows the group size. diff --git a/tutorial/02-outcome.Rmd b/tutorial/02-outcome.Rmd index 94d4cc1..d5b1268 100644 --- a/tutorial/02-outcome.Rmd +++ b/tutorial/02-outcome.Rmd @@ -38,26 +38,44 @@ panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, "Treated States (after EDR)")) ``` -Switch off the black-and-white theme with `theme.bw = FALSE`. - -```{r outcome1-3} -panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, - data = turnout, index = c("abb", "year"), type = "outcome", - main = "EDR Reform and Turnout", theme.bw = FALSE) -``` - Customise line colors and legend labels simultaneously. ```{r outcome1-4} panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb", "year"), type = "outcome", main = "EDR Reform and Turnout", - color = c("lightblue", "blue", "#99999950"), + color = c("lightblue", "blue", "grey75"), legend.labs = c("Control States", "Treated States (before EDR)", "Treated States (after EDR)")) ``` +## Themes and overlays + +Activate the high-contrast `red` theme for a publication-paper look. Under +`theme = "red"`, controls fade to a light gray, treated trajectories pick +up a brick-red accent, and the treatment-onset marker becomes a solid +black dashed line. + +```{r outcome1-3} +panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, + data = turnout, index = c("abb", "year"), type = "outcome", + main = "EDR Reform and Turnout", theme = "red") +``` + +Overlay heavy group-mean trajectories and a 10--90% quantile ribbon with +`group.mean.overlay = TRUE`. Useful when there are many units per group +and you want the audience to see both individual variation and the group +average at a glance. (Currently implemented for the main DID continuous- +outcome path; combine with `theme = "red"` for the cleanest read.) + +```{r outcome1-3b} +panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, + data = turnout, index = c("abb", "year"), type = "outcome", + main = "EDR Reform and Turnout", + theme = "red", group.mean.overlay = TRUE) +``` + ## Plotting a subset of units Use `id` to plot only specific units. @@ -71,13 +89,13 @@ panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, ## Splitting by treatment-status group -`by.group = TRUE` separates units into groups based on whether their treatment status ever changed (e.g., always treated, always control, switchers). `cex.main` and `cex.main.sub` control title and subtitle font sizes. +`by.group = TRUE` separates units into groups based on whether their treatment status ever changed (e.g., always treated, always control, switchers). ```{r outcome1-6} panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb", "year"), type = "outcome", main = "EDR Reform and Turnout", - by.group = TRUE, cex.main = 20, cex.main.sub = 15) + by.group = TRUE) ``` Use `by.group.side = TRUE` to arrange the subfigures in a row rather than a column. @@ -86,7 +104,7 @@ Use `by.group.side = TRUE` to arrange the subfigures in a row rather than a colu panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb", "year"), type = "outcome", main = "EDR Reform and Turnout", - by.group.side = TRUE, cex.main = 20, cex.main.sub = 15) + by.group.side = TRUE) ``` ## Cohort-averaged trajectories diff --git a/tutorial/03-bivariate.Rmd b/tutorial/03-bivariate.Rmd index 24f5e11..22e6955 100644 --- a/tutorial/03-bivariate.Rmd +++ b/tutorial/03-bivariate.Rmd @@ -43,7 +43,7 @@ panelview(lnpop ~ demo, data = capacity, ```{r bivar1-3} panelview(Y ~ D, data = simdata, index = c("id", "time"), - type = "bivariate", theme.bw = FALSE, outcome.type = "discrete") + type = "bivariate", outcome.type = "discrete") ``` **Continuous Y, continuous D** — default `style = c("l", "l")`: @@ -71,6 +71,18 @@ panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, style = c("line", "connected"), color = c(2, 3), lwd = 0.4) ``` +Activate the high-contrast `red` theme: the treatment line picks up the +brick-red accent and the outcome line drops to a neutral gray, so the +treatment series visually dominates. + +```{r bivar1-6b} +panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, + data = turnout, index = c("abb", "year"), + xlab = "EDR", ylab = "Turnout", type = "bivariate", + style = c("c", "b"), theme = "red", + main = "EDR Reform and Turnout") +``` + ## By unit Set `by.unit = TRUE` to plot *Y* and *D* as separate panels for each unit. Use `show.id` or `id` to select units. @@ -96,7 +108,7 @@ panelview(lnpop ~ demo, data = capacity, ```{r bivar2-3} panelview(Y ~ D, data = simdata, index = c("id", "time"), type = "bivariate", - by.unit = TRUE, theme.bw = FALSE, + by.unit = TRUE, outcome.type = "discrete", id = unique(simdata$id)[1:12]) ``` @@ -106,7 +118,7 @@ panelview(Y ~ D, data = simdata, ```{r bivar2-4, fig.height=4} panelview(lnpop ~ polity2, data = capacity, index = c("country", "year"), type = "bivariate", - by.unit = TRUE, theme.bw = FALSE, + by.unit = TRUE, color = c("blue", "red"), show.id = 1:12) ``` @@ -126,7 +138,7 @@ Use `style = "line"` to draw both *Y* and *D* as lines: panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, index = c("abb", "year"), type = "bivariate", by.unit = TRUE, - style = "line", theme.bw = FALSE, + style = "line", lwd = 0.5, show.id = 1:12, ylab = "Turnout") ``` @@ -134,6 +146,6 @@ panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, panelview(Y ~ D, data = simdata, index = c("id", "time"), type = "bivariate", by.unit = TRUE, outcome.type = "discrete", - style = "line", theme.bw = FALSE, + style = "line", lwd = 0.4, id = unique(simdata$id)[1:20]) ``` diff --git a/tutorial/aa-changelog.Rmd b/tutorial/aa-changelog.Rmd index 1c2b821..6f5cd66 100644 --- a/tutorial/aa-changelog.Rmd +++ b/tutorial/aa-changelog.Rmd @@ -1,5 +1,16 @@ # Changelog {#sec-changelog .unnumbered} +## v1.3.1 + +(2026-05-13) + +* Plot defaults refreshed for a cleaner, publication-ready look: plain left-aligned titles at base size 11, white background for status heatmaps, subtle major-x gridlines for trajectories, and a thin gray dashed treatment-onset marker (replaces the prior thick white separator). +* Outcome-plot draw order fixed: control trajectories are now drawn first, treated-pre next, treated-post on top, so treated lines are no longer covered by the control band. +* New argument `theme = c("default", "red")`. The optional `"red"` theme switches the binary status palette to gray control / dark-gray treated-pre / brick-red treated-post, and the outcome onset marker to a solid black dashed line. The `"default"` palette also moves to a blue progression (gray control, medium blue treated-pre, dark navy treated-post) so red is reserved for `theme = "red"`. +* New argument `group.mean.overlay = FALSE`. When `TRUE`, the outcome plot dims per-unit trajectories and overlays a heavy group-mean line plus a 10–90% quantile ribbon per treatment-status group. Currently scoped to the main DID continuous-outcome path. +* `theme.bw = FALSE` is soft-deprecated. The legacy gray-panel look still works but emits a one-time warning; the tutorial no longer demonstrates it. +* CRAN release (2026-05-13). + ## v1.3.0 (2026-03-21) diff --git a/tutorial/index.qmd b/tutorial/index.qmd index 9cf5601..23d53dc 100644 --- a/tutorial/index.qmd +++ b/tutorial/index.qmd @@ -6,9 +6,9 @@ This manual serves as a user guide for the **panelView** package in R, which vis | Source | Version | Date | Features | |--------|---------|------|----------| -| CRAN | 1.2.1 | 2026-03-20 | Treatment, outcome, bivariate plots | -| GitHub (`master`) | 1.2.1 | 2026-03-20 | Same as CRAN | -| GitHub (`dev`) | 1.3.0 | 2026-03-21 | + `type = "network"`: k-partite graph, singletons, weighted edges | +| CRAN | 1.3.1 | 2026-05-13 | Refreshed plot defaults; `type = "network"`; `theme = "red"`; `group.mean.overlay` | +| GitHub (`master`) | 1.3.1 | 2026-05-13 | Same as CRAN | +| GitHub (`dev`) | 1.3.1 | 2026-05-13 | Same as CRAN | ```{r install-cran, eval = FALSE} # From CRAN (stable release) @@ -89,6 +89,8 @@ install.packages("igraph") | `color` | character | Override fill/line colors | | `legendOff` | logical | Remove the legend | | `gridOff` | logical | Remove grid lines (continuous treatment) | +| `theme` | character | `"default"` (blue) or `"red"` (high-contrast publication recipe) | +| `group.mean.overlay` | logical | Overlay group-mean line + 10--90% ribbon (`type = "outcome"`) | Note that *Y*, *D*, and *X* are merely labels; they can be any variables in a panel dataset. @@ -116,7 +118,7 @@ All three are balanced panels. The [network structure chapter](#sec-network) use - [Yiqing Xu](https://yiqingxu.org/){target="_blank"} - [Licheng Liu](https://liulch.github.io/){target="_blank"} - [Hongyu Mou](https://hongyumou.github.io/){target="_blank"} -- StatsClaw (Agentic System for Statistical Software Development) +- [StatsClaw](https://statsclaw.ai/){target="_blank"} (Agentic System for Statistical Software Development) ## Report bugs From 21a5119aa370fe9d58b4669a4048f4cbaae34cfd Mon Sep 17 00:00:00 2001 From: Yiqing Xu Date: Wed, 13 May 2026 23:17:08 -0700 Subject: [PATCH 07/10] docs: split panelview+theme example into explicit p <- assignment (#9) Saving the plot to a named object first makes the two-step pattern explicit for readers; mirrors how users will typically reach for ggplot2 layers in their own code. Co-authored-by: Claude Opus 4.7 (1M context) --- tutorial/01-treat.Rmd | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/tutorial/01-treat.Rmd b/tutorial/01-treat.Rmd index 9c07d60..23f586d 100644 --- a/tutorial/01-treat.Rmd +++ b/tutorial/01-treat.Rmd @@ -139,6 +139,27 @@ panelview(Capacity ~ demo + lnpop + lngdp, axis.lab.gap = c(2, 10), theme = "red") ``` +## Customizing with ggplot2 syntax + +`panelview()` returns a `ggplot` object for single-panel plots, so any +`ggplot2` layer can be added with the usual `+`. In v1.3.1 the default +title is left-aligned and plain; to recover a centered, bold title, add a +`theme(plot.title = ...)` layer: + +```{r treat2-ggplot-center-title, fig.height=8} +library(ggplot2) +p <- panelview(turnout ~ policy_edr + policy_mail_in + policy_motor, + data = turnout, index = c("abb", "year"), + main = "Election-Day Registration Adoption", + by.timing = TRUE, theme = "red") +p + theme(plot.title = element_text(hjust = 0.5, face = "bold")) +``` + +The same pattern works for other theme elements (axis text, legend +position, margins, etc.). Multi-panel layouts (`by.group`, +`by.group.side`, `by.cohort`, `by.unit`) are an exception: those return a +`gtable` produced by `grid.arrange()` and do not accept `+` composition. + ## Collapsing units by treatment history When the number of units is large, `collapse.history = TRUE` collapses units that share the same treatment history into a single row. The y-axis shows the group size. From d1e0f56ac68938826d1cefae355a7b41a55d3744 Mon Sep 17 00:00:00 2001 From: Yiqing Xu Date: Wed, 13 May 2026 23:22:10 -0700 Subject: [PATCH 08/10] ci: add R CMD check --as-cran on push and PR to dev/master 5-platform matrix (macOS, Windows, Linux release / devel / oldrel-1) using the r-lib/actions/check-r-package recipe with --as-cran flags. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/R-CMD-check.yaml | 55 ++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 .github/workflows/R-CMD-check.yaml diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml new file mode 100644 index 0000000..74e1296 --- /dev/null +++ b/.github/workflows/R-CMD-check.yaml @@ -0,0 +1,55 @@ +# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples +# Runs R CMD check --as-cran across the standard CRAN platform matrix on every +# push and PR targeting dev/master. + +on: + push: + branches: [main, master, dev] + pull_request: + branches: [main, master, dev] + +name: R-CMD-check.yaml + +permissions: read-all + +jobs: + R-CMD-check: + runs-on: ${{ matrix.config.os }} + + name: ${{ matrix.config.os }} (${{ matrix.config.r }}) + + strategy: + fail-fast: false + matrix: + config: + - {os: macos-latest, r: 'release'} + - {os: windows-latest, r: 'release'} + - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} + - {os: ubuntu-latest, r: 'release'} + - {os: ubuntu-latest, r: 'oldrel-1'} + + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + R_KEEP_PKG_SOURCE: yes + + steps: + - uses: actions/checkout@v4 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + r-version: ${{ matrix.config.r }} + http-user-agent: ${{ matrix.config.http-user-agent }} + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::rcmdcheck + needs: check + + - uses: r-lib/actions/check-r-package@v2 + with: + upload-snapshots: true + build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")' + args: 'c("--no-manual", "--as-cran")' From 1d12facf4c53c1ff6eb3999154449694ae0e6457 Mon Sep 17 00:00:00 2001 From: Yiqing Xu Date: Wed, 13 May 2026 23:24:09 -0700 Subject: [PATCH 09/10] ci: scope R CMD check to main/master only Dev-side checks are run manually by the maintainer; CI gates only the final mile into master. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/R-CMD-check.yaml | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml index 74e1296..99957ab 100644 --- a/.github/workflows/R-CMD-check.yaml +++ b/.github/workflows/R-CMD-check.yaml @@ -1,12 +1,13 @@ # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples # Runs R CMD check --as-cran across the standard CRAN platform matrix on every -# push and PR targeting dev/master. +# push and PR targeting main/master only. Dev-side checks are run manually +# by the maintainer; CI gates only the final mile into master. on: push: - branches: [main, master, dev] + branches: [main, master] pull_request: - branches: [main, master, dev] + branches: [main, master] name: R-CMD-check.yaml From 373d8b782d378a54c72d719998eae6343d124d63 Mon Sep 17 00:00:00 2001 From: Yiqing Xu Date: Wed, 13 May 2026 23:25:35 -0700 Subject: [PATCH 10/10] ci: trim matrix to release-only on three platforms Drop ubuntu r-devel and r-oldrel-1; keep macOS, Windows, Ubuntu on R release. Maintainer runs devel/oldrel checks manually when needed. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/R-CMD-check.yaml | 2 -- 1 file changed, 2 deletions(-) diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml index 99957ab..de0bb95 100644 --- a/.github/workflows/R-CMD-check.yaml +++ b/.github/workflows/R-CMD-check.yaml @@ -25,9 +25,7 @@ jobs: config: - {os: macos-latest, r: 'release'} - {os: windows-latest, r: 'release'} - - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} - {os: ubuntu-latest, r: 'release'} - - {os: ubuntu-latest, r: 'oldrel-1'} env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}