Skip to content

feat: add extra columns and on conflict update#4

Merged
sncalvo merged 1 commit into
mainfrom
feat/add-extra-columns
May 26, 2026
Merged

feat: add extra columns and on conflict update#4
sncalvo merged 1 commit into
mainfrom
feat/add-extra-columns

Conversation

@sncalvo
Copy link
Copy Markdown
Member

@sncalvo sncalvo commented Apr 21, 2026

Description

Adds two related features to the staging workflow:

Extra columns on the staging table. Declare additional columns that exist only on the staging table — not on the source model — via an extra_columns: option. Useful for tracking import metadata, priorities, batch identifiers, or processing flags that shouldn't be persisted to the final destination. Each column is specified as either a simple type symbol or a hash with :type, :default, and :null options. Supported types cover the common ActiveRecord set (:string, :text, :integer, :bigint, :float, :decimal, :boolean, :datetime, :date, :time, :binary, :json, :jsonb, :uuid), with per-adapter SQL type mapping for PostgreSQL, MySQL, and SQLite. Extra columns are automatically excluded during transfer to the source table — the transfer strategies (Insert and Upsert) now intersect staging and source column names so any staging-only column is silently skipped, even if added outside this feature.

Declarative conflict resolution for staging inserts. A new insert_on_conflict: option lets you aggregate data into the staging table across multiple inserts with explicit per-column strategies instead of hand-written SQL. Strategies include :greatest, :least, :new, :existing, :sum, :coalesce, and raw SQL passthrough. The logic lives in a new ConflictResolver class that generates adapter-specific clauses: ON CONFLICT … DO UPDATE SET … for PostgreSQL and SQLite, ON DUPLICATE KEY UPDATE … for MySQL. Validation happens at construction time so misconfiguration surfaces early.

Together these let callers use the staging table as a scratchpad for multi-source aggregation — dedupe, merge, and enrich records before transfer — without the result columns leaking into the destination schema.

Example

StagingTable.stage(User,
  extra_columns: {
    priority: :integer,
    score: {type: :integer, default: 0},
    import_batch: {type: :string, default: "batch_1"}
  },
  insert_on_conflict: {
    target: [:email],
    update: {
      priority: :greatest,   # keep the larger value
      score: :sum,           # add values together
      name: :new,            # overwrite with incoming
      age: :existing         # keep existing (skip this column)
    }
  }
) do |staging|
  # First source
  staging.insert([
    {email: "john@example.com", name: "John", age: 30, priority: 1, score: 10}
  ])

  # Second source — same email, conflict resolved per-column
  staging.insert([
    {email: "john@example.com", name: "Johnny", age: 99, priority: 5, score: 20}
  ])

  # Query using the extra columns
  staging.where(priority: 5).find_each do |record|
    # record.priority => 5  (greatest)
    # record.score    => 30 (sum)
    # record.name     => "Johnny" (new)
    # record.age      => 30 (existing)
  end

  # Flag rows before transfer
  staging.where(score: 30..).update_all(import_batch: "high_value")
end
# priority, score, and import_batch are automatically excluded
# from the INSERT INTO users … SELECT … FROM staging_users_… transfer.

Review Notes

  • Adapter coverage. All three supported adapters (PostgreSQL, MySQL, SQLite) have their own sql_type_for and quote_default overrides because types and boolean literals differ. Changes to the shared Base defaults will cascade unless explicitly overridden per adapter.
  • MySQL conflict-resolution semantics. ON DUPLICATE KEY UPDATE fires on any violated unique constraint on the table, not only the ones listed in :target. If a staging table has multiple unique indexes, the update may trigger on conflicts the caller didn't intend. The README calls this out explicitly. PostgreSQL and SQLite use :target literally via ON CONFLICT (col, …).
  • Transfer filter is a safety net. The staging.column_names & source.column_names intersection in both transfer strategies protects against any staging-only column — not just extra_columns. This means extra columns don't need special bookkeeping at transfer time, and the feature composes cleanly with future additions.
  • Raw SQL strategy passthrough. The insert_on_conflict :update hash accepts a raw SQL string as a strategy for cases the built-in strategies don't cover. This is interpolated verbatim — callers must not pass user-controlled input. Intended for developer-authored expressions like "COALESCE(excluded.col, staging.col) * 2".
  • Validation timing. ConflictResolver validates options in its constructor, so a malformed insert_on_conflict: raises ConfigurationError at StagingTable.stage call time rather than when the first INSERT runs. Misconfigurations won't silently ship bad SQL.
  • Test strategy. ConflictResolver is covered by a dedicated unit spec that asserts generated SQL shape per adapter via shared examples. extra_columns_spec.rb runs integration tests through the full StagingTable.stage path on real databases. MySQL examples are pending in CI environments where MySQL isn't available and run only when the adapter is up.

@sncalvo sncalvo force-pushed the feat/add-extra-columns branch 2 times, most recently from c8ca964 to d86c011 Compare April 21, 2026 15:06
@sncalvo sncalvo changed the title feat: add extra columns feat: add extra columns and on conflict update Apr 21, 2026
@sncalvo sncalvo force-pushed the feat/add-extra-columns branch from d86c011 to be37c7e Compare April 21, 2026 15:17
@sncalvo sncalvo marked this pull request as ready for review April 21, 2026 15:22
@sncalvo sncalvo merged commit bdfc3eb into main May 26, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant