You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Draft. I have a proof-of-concept implementation PostgreSQL that I'm getting into a publishable state and will share here soon. I have it working though, so this is based on an actual artifact, not just handwaving through the idea.
Summary
Add a backend-agnostic direct codec layer to persistent that bypasses PersistValue entirely. On the decode side, typed query results go from the database's wire format to Haskell records without intermediate representations. On the encode side, Haskell values go from record fields to the wire format without detouring through PersistValue.
The decode path uses continuation-passing style (CPS) throughout, so the success path allocates zero Either constructors and zero PersistValue wrappers. The encode path operates similarly, but in the opposite direction.
The existing PersistValue-based path is unchanged. All current code continues to compile and run without modification.
Motivation
Allocation overhead
Every row decoded from any persistent backend follows this path today:
flowchart LR
A["wire format"] -->|"decode to PersistValue"| B["PersistValue per column"]
B -->|"cons cell"| C["[PersistValue]"]
C -->|"fromPersistValue per field"| D["Haskell record"]
Loading
For a 10-column row, this allocates:
10 PersistValue heap objects (tagged union, 2+ words each)
10 list cons cells (3 words each)
~9 intermediate Either Text (a -> b -> ...) values from the applicative fromPersistValues chain
Then fromPersistValue pattern-matches each PersistValue again to extract the payload
That works out to roughly 70 words (~560 bytes) of pure overhead per row, all short-lived. For a 100k row result set, this produces ~56MB of intermediate garbage that exists only to be collected.
Encoding has a symmetric problem: toPersistFields converts each record field into a PersistValue, then the backend immediately inspects it to produce the wire format. Every insert and update pays this cost.
Limited type vocabulary
PersistValue has a fixed set of constructors that cannot represent backend-specific types natively. This implementation was chosen back in the 2010s when the most common backends were MySQL and PostgreSQL, and the most common use cases were simple CRUD operations. Nowadays, production-grade applications often need to support more types, often with database-specific support. In order to support these types, we currently have to shoehorn them into the existing constructors, losing type safety and precision. Examples from the PostgreSQL backend include:
JSON/JSONB: stored as PersistByteString, losing the distinction between JSON and raw bytes
UUID: stored as PersistLiteralEscaped with hex-encoded bytes, requiring a text-to-UUID conversion on each read
Intervals: stored as PersistRational, losing semantic meaning (PostgreSQL's interval has year/month/day/time components that Rational cannot faithfully represent)
Composite types: PostgreSQL composite values have no PersistValue representation at all
Inet/cidr: IP address types round-trip through PersistLiteralEscaped
Arrays of custom types: PersistList/PersistArray elements must themselves be PersistValue, so arrays of domain types require double conversion
Enums: backend enums are sent as text with no type safety at the Haskell boundary
Furthermore, heavily using the custom type support makes it harder to improve the underlying library– for example, we could switch to the binary protocol that PostgreSQL supports, but custom values all need to be converted to use the binary protocol, or at least audited to ensure they are compatible.
This means that nearly any changes to the underlying library constitutes a breaking change for all users of the library, which is long-term unsustainable.
With the proposed FieldDecode/FieldEncode classes, backend packages can provide instances for any Haskell type directly, and the type mapping is no longer constrained by PersistValue's fixed vocabulary. This gives us significant flexibility to improve the underlying library without introducing breaking changes for users.
Runtime type checks
fromPersistValue performs runtime pattern matching on the PersistValue constructor at every field — effectively a type check that the schema already guarantees at compile time. The direct path eliminates this redundancy: TH generates code that calls the correct decoder for each field's statically-known type.
Extensibility without breaking changes
When a database adds a new type (or a backend author wants to support an existing type more faithfully), persistent's current architecture requires either adding a constructor to PersistValue (a breaking change to every backend and every PersistField instance) or cramming the value into an existing constructor like PersistLiteralEscaped (lossy, with no type safety).
With FieldDecode/FieldEncode, a backend package can add support for any new type by publishing a new instance– without touching persistent core, without coordinating with other backends, and without breaking any downstream users. A PostgreSQL backend could add FieldDecode PgRowEnv HStore or FieldDecode PgRowEnv TSVector the day PostgreSQL ships them. Users who don't reference these types are completely unaffected.
Stable SQL for prepared statements and pipelining
Today, persistent generates bulk inserts by expanding INSERT INTO t VALUES (?,?,?), (?,?,?), ... with one bind parameter per value. For 10k rows of 10 columns, that's 100k bind parameters and a SQL string whose length changes with every batch — defeating prepared statement caching and forcing the database to re-plan the query each time.
Similarly, WHERE id IN (?,?,?,...) expands to a variable number of bind parameters.
The direct encode path opens the door to column-oriented encoding: a single fixed SQL template like INSERT ... SELECT * FROM UNNEST($1, $2, ...) where each parameter is an array of all values for one column, or WHERE id = ANY($1) with a single array parameter. The SQL string stays the same regardless of how many rows or values are involved, which is a prerequisite for proper prepared statement reuse.1
I propose introducing three new types/classes in persistent core, all backend-agnostic:
-- CPS column-cursor monadnewtypeRowReaderenva=RowReader{ unRowReader
:: forall r. env -> Counter
-> (Text -> IO r) -- on error
-> (a -> IO r) -- on success
-> IO r
}-- Per-field direct decoding, split into prepare + run.classFieldDecodeenvawhere--| Inspect column metadata once per result set, returning a-- specialized runner that can decode every row without-- re-checking types.
prepareField
::env->FieldNameDB->Int-> (Text->IOr) -> (FieldRunnerenva->IOr) ->IOr--| The product of 'prepareField'. Runs on each row with only the-- row-varying data (row index, raw bytes, etc.)– column type-- dispatch has already happened.newtypeFieldRunnerenva=FieldRunner{runField::forallr.env-> (Text->IOr) -> (a->IOr) ->IOr}-- Per-entity direct decoding. TH generates instances.classFromRowenvawhererowReader::RowReaderenva
env is an opaque, backend-specific type that carries whatever the backend needs to read a column: a result pointer and row index for SQL databases, a BSON document for MongoDB, a key-value list for Redis, etc.
The key insight behind the prepare/run split is that within a single result set, the column types are fixed: every row has the same OIDs (PostgreSQL), the same field types (MySQL), or the same BSON structure (MongoDB). The current design would check the column type on every row × every column. With prepareField, the type dispatch happens once when the first row (or the result metadata) is inspected, producing a FieldRunner that has already resolved which decoder to use. The per-row hot loop then just calls runField without branching on column types at all.
For a 10-column entity over 100k rows, this eliminates a million branches that all take the same path.
FieldDecode takes both a FieldNameDB (for document stores that look up fields by name) and an Int column index (for SQL backends that read by position). Each backend uses whichever access pattern is natural and ignores the other.
Why CPS?
A naive IO (Either Text a) return type allocates a Right constructor at every step in the applicative chain, even on the success path where the error case is never used. With CPS, the success continuation is passed directly and the Either is never constructed:
RowReader ff <*>RowReader fa =RowReader$\env ctr onErr onOk ->
ff env ctr onErr $\f ->
fa env ctr onErr $\a ->
onOk (f a)
Both prepareField and FieldRunner are CPS too, so the backend's decoder feeds its result directly into the continuation. Callers use runRowReaderCPS to supply their success and failure actions without materializing unnecessary values:
val <- runRowReaderCPS rowReader env ctr
(throwIO .PersistMarshalError) -- on error: throw directlypure-- on success: return value directly
yield val
The entire path from wire format to yield produces zero Either constructors and zero PersistValue wrappers.
TH generates FromRow alongside PersistEntity
For an entity User { userName :: Text, userAge :: Maybe Int }, mkPersist generates both the existing PersistEntity instance (unchanged) and a new FromRow instance:
Under the hood, nextField calls prepareField on the first row and caches the resulting FieldRunner, then applies runField for each subsequent row. The instance is polymorphic over env — the FieldDecode constraints resolve when a concrete backend is chosen, and the entity definition itself remains backend-agnostic.
Backend implementations
Each backend provides its own env type and FieldDecode instances. The same TH-generated FromRow code works across all of them without modification.
PostgreSQL
I've got a prototype implementation that fully passes all tests for persistent and esqueleto. The env environment value wraps a PGresult handle, a row index, and a vector of column OIDs classified into a PgType ADT. FieldDecode instances inspect the PgType once during prepareField and return a FieldRunner that calls the appropriate postgresql-binary decoder directly, without re-checking the OID on each row. Instances cover Bool, Int16–Int64, Int, Double, Scientific, Rational, Text, ByteString, Day, TimeOfDay, UTCTime, and Maybe a. Backend-specific types like UUID, IPRange, DiffTime, Value (JSON), and composite types can be added as further instances without any changes to persistent core.
-- Sketch (simplified from the prototype):instanceFieldDecodePgRowEnvTextwhere
prepareField env _ col onErr onOk =do
pgType <- columnType env col
case pgType ofScalarPgText-> onOk (FieldRunner$\env' onErr' onOk' -> readBytes env' col >>= decodeWith textDecoder onErr' onOk')
ScalarPgVarchar-> onOk (FieldRunner$\env' onErr' onOk' -> readBytes env' col >>= decodeWith textDecoder onErr' onOk')
_ -> onErr ("type mismatch: expected text, got "<>show pgType)
The case pgType of branch runs once. On subsequent rows, only the FieldRunner executes, since it already knows which decoder to use.
SQLite
The environment wraps a Sqlite.Statement. SQLite's dynamic typing means the column type can technically vary per row, so prepareField here is lightweight, but it still validates that the column index is in range and captures the statement reference, keeping the per-row FieldRunner simple.
instanceFieldDecodeSqliteRowEnvTextwhere
prepareField (SqliteRowEnv stmt) _ col onErr onOk =
onOk $FieldRunner$\(SqliteRowEnv stmt') onErr' onOk' ->do
ty <-Sqlite.columnType stmt' (colIdx col)
case ty ofSqlite.NullColumn-> onErr' "unexpected NULL for Text"
_ ->Sqlite.columnText stmt' (colIdx col) >>= onOk'
MySQL
The environment wraps a vector of MySQLBase.Field metadata and a vector of Maybe ByteString row data. prepareField captures the field metadata once, and the returned FieldRunner uses it to call MySQL.convert on each row's data without re-reading the metadata.
instanceFieldDecodeMySQLRowEnvTextwhere
prepareField env _ col onErr onOk =let field = mysqlFields env V.! col
in onOk $FieldRunner$\env' onErr' onOk' ->case mysqlRow env' V.! col ofNothing-> onErr' "unexpected NULL for Text"Just bs ->caseMySQL.convert field bs ofJust t -> onOk' t
Nothing-> onErr' "MySQL: cannot convert to Text"
MongoDB
The environment wraps a BSON Document. FieldDecode looks up fields by name (ignoring the column index), which is why the class takes both parameters. For MongoDB, the prepare step is essentially a no-op since each document is self-describing, but the split still keeps the interface uniform.
instanceFieldDecodeMongoRowEnvTextwhere
prepareField _ name _ onErr onOk =
onOk $FieldRunner$\(MongoRowEnv doc) onErr' onOk' ->caseDB.look (unFieldNameDB name) doc ofJust (DB.String t) -> onOk' t
JustDB.Null-> onErr' "unexpected NULL for Text"Nothing-> onErr' ("missing field: "<> unFieldNameDB name)
_ -> onErr' "expected String"-- Distinguishes absent fields from null fields:instanceFieldDecodeMongoRowEnva=>FieldDecodeMongoRowEnv (Maybea) where
prepareField env name col onErr onOk =
prepareField env name col onErr $\inner ->
onOk $FieldRunner$\(MongoRowEnv doc) onErr' onOk' ->caseDB.look (unFieldNameDB name) doc ofNothing-> onOk' NothingJustDB.Null-> onOk' Nothing
_ -> runField inner (MongoRowEnv doc) onErr' (onOk' .Just)
Redis
The environment wraps binary-encoded key-value pairs. Like MongoDB, FieldDecode looks up by field name. The prepare step captures the encoded field name to avoid re-encoding it on every lookup.
instanceFieldDecodeRedisRowEnvTextwhere
prepareField _ name _ onErr onOk =let key = encodeUtf8 (unFieldNameDB name)
in onOk $FieldRunner$\(RedisRowEnv pairs) onErr' onOk' ->caseV.find (\(k, _) -> k == key) pairs ofJust (_, bs) ->caseBinary.decode (L.fromStrict bs) ofBinPersistText t -> onOk' t
_ -> onErr' "expected Text"Nothing-> onErr' ("missing field: "<> unFieldNameDB name)
How FromRow unifies all backends
The same TH-generated instance resolves to different concrete code depending on the env type. The prepare/run split means that backends with uniform column types (PostgreSQL, MySQL) get the type dispatch out of the hot loop, while document stores (MongoDB, Redis) still benefit from the uniform interface even though their "prepare" step is lighter.
env
What prepareField inspects
What FieldRunner reads from
PostgreSQL
column OID from PGresult metadata
binary row data via postgresql-binary decoders
SQLite
(lightweight — validates column index)
sqlite3_column_* C API per row
MySQL
MySQLBase.Field metadata
Maybe ByteString row data + MySQL.convert
MongoDB
(no-op — documents are self-describing)
DB.look on BSON Document (by field name)
Redis
encodes field name to ByteString
Binary.decode from key-value pair
The entity definition contains no backend-specific code, and PersistValue appears nowhere in the decode path.
Specialization: eliminating dictionary overhead
The FromRow and FieldDecode instances are polymorphic over env, which means that in the general case GHC passes typeclass dictionaries at runtime: an indirect function call per field, per row. For a 10-column entity over 100k rows, that's a million indirect calls that could be direct invocations insteadj.
The standard fix is SPECIALIZE pragmas. Backend packages can emit specializations for their concrete env type, and we will extend TH to generate them automatically from the MkPersistSettings.
In order to achieve this, mkPersist will gain a new configuration field mpsDirectEnvTypes :: [Name] for environments that consumers want to be specialized. When this list contains 'PgRowEnv, the generated FromRow instance for an entity User looks like:
The pragma is placed inside the body of the instance declaration. When GHC sees it, it creates a monomorphic copy of rowReader @PgRowEnv @User, and is then able to inline all the FieldRunner calls and eliminate dictionary indirection entirely. The per-row loop collapses to a straight sequence of concrete decoder calls with no polymorphism left at runtime.
Because the list lives in MkPersistSettings, code that uses persistent against several different backends from a single module (e.g. one module that needs both PgRowEnv and SqliteRowEnv) can set mpsDirectEnvTypes = [''PgRowEnv, ''SqliteRowEnv] and TH will emit specializations for both. Backends that don't appear in the list do not get specialized code generated; yet the instance remains polymorphic and continues to compile normally.
This matters most for the FieldRunner closures produced by prepareField. After specialization, GHC can see through the closure and inline the decoder body directly into the row-reading loop, which in turn enables further optimizations like unboxing intermediate results and eliminating redundant null checks across adjacent fields. Without specialization, the closure is opaque to the optimizer because its concrete type isn't known at the call site.
The encode path benefits similarly: ToRow's toRowBuilder and each FieldEncode instance can be specialized per Param type as soon as TH generates SPECIALIZE pragmas for the encoding side.
Each backend implements HasDirectQuery (naming subject to change, just calling it Direct to indicate that we're bypassing PersistValue,) to send a query and yield one Env backend per result row. The conduit consumer runs prepareField on the first row (or the result metadata) to obtain FieldRunners, then applies them to every subsequent row via runField. The type dispatch happens once and the per-row loop is a straight-line decode.
HasDirectInsert is the encoding counterpart, accepting pre-encoded parameters and sending them to the database. A backend instance for PostgreSQL, for example, would define type Param PgBackend = PgParam where PgParam carries the OID, binary payload, and format code for each parameter.
User-facing API
In order to make migration as straightforward as possible, the .Experimental modules export the same function names as the originals, with additional constraints that indicate that the direct path should be used:
This class is parameterized by env rather than backend — the mapping from backend to env happens at the call site via the Env type family. This means SqlSelectDirect instances are written per environment type, which is the right granularity: the row format depends on the env, not on which backend wrapper is in use.
Instances for Entity, Value, Maybe (Entity), and tuples. The .Experimental.Direct module exports select with the extra constraint — same name, same query DSL:
-- Database.Esqueleto.Experimental.Direct:importDatabase.Esqueleto.Experimental.Direct
users <- select $do
p <- from $ table @Person
where_ (p ^.PersonAge>=. val 18)
return p
-- Identical syntax. The direct path is chosen by the import, not the function name.
The backend type variable already carried by esqueleto resolves Env backend via the associated type family in HasDirectQuery, which then satisfies the SqlSelectDirect a r (Env backend) constraint.
Design: Encoding
The decode side has a working prototype. The encode side follows the same principles.
The problem
flowchart LR
A["Haskell record"] -->|"toPersistFields"| B["[PersistValue]"]
B -->|"encode per field"| C["wire format"]
Loading
Every field is boxed into PersistValue and then immediately unboxed. For bulk inserts of 10k rows × 10 columns, that's 100k unnecessary PersistValue allocations.
FieldEncode: one class, one method
classFieldEncodeparamawhereencodeField::a->param
param is a backend-specific encoded parameter type. Each backend decides what param looks like: a PostgreSQL backend might use a type carrying an OID, encoded bytes, and a format tag; an SQLite backend might use a sum type mirroring SQLite's type affinity (SqliteInt !Int64 | SqliteText !Text | ...).
The class is deliberately minimal: it has one class with one method, but enables a wide range of backends to be supported without breaking changes.
Alternative: contravariant encoders (hasql-style)
It's worth considering the approach that hasql takes here, which uses contravariant functors to compose encoders:
The >$< operator (contramap from Contravariant) lets you project a field out of a record and feed it into an encoder, and <> sequences them. The result is a single Params User value that knows how to encode an entire record in one pass.
This approach is advantageous in several ways:
The encoder is a first-class value that can be composed, stored, and reused; you can build an encoder for a composite type by combining encoders for its parts, and the types enforce that every field is accounted for.
It separates the description of the encoding from the execution, which pairs well with the prepare/run split on the decode side: you could prepare a Params once and run it per row.
My main concern around this is the learning curve: Contravariant and Divisible are less familiar than Applicative to most Haskell developers, and the corpus of good intuition around them is a bit lacking. For a library like persistent, whose user base spans from beginners to experts, that friction matters. hasql's encoding API is one of the things that I've found to be something of a barrier to adoption with hasql, even though I'm relatively fluent with Haskell and appreciate the type safety it provides.
That said, if we're already asking users to change imports and adopt new constraints, the marginal cost of learning contravariant composition might be acceptable? Especially since TH can generate the encoders automatically for Entity types, meaning most users would only encounter the raw API when writing custom queries. The generated code would look something like:
This is an area where we should probably prototype both approaches and see which one leads to clearer error messages and more natural composition in practice. The simple FieldEncode class is easier to explain and implement first, but if we're going to make breaking changes at some point anyway, the contravariant approach might be the better long-term bet. It may also be the case that we can build a small DSL on top of Contravariant that doesn't feel as foreign to users, which may give us the best of both worlds.
ToRow: TH-generated, produces a builder
-- Writes encoded params into a SmallMutableArray, avoiding intermediate lists.newtypeParamBuilderparam=ParamBuilder (SmallMutableArrayRealWorldparam->Int->IOInt)
instanceMonoid (ParamBuilderparam)
writeParam::FieldEncodeparama=>a->ParamBuilderparambuildParams::Int->ParamBuilderparam->IO (SmallArrayparam)
TH generates:
instance (FieldEncodeparamText, FieldEncodeparam (MaybeInt))
=>ToRowparamUserwhere
toRowBuilder (User name age) = writeParam name <> writeParam age
Usage:
params <- buildParams 2 (toRowBuilder user)
-- params :: SmallArray param, ready for the backend
Typed query parameters
FieldEncode also gives us typed query parameters without routing through [PersistValue]:
rawQueryDirectTyped @(EntityUser)
"SELECT ?? FROM user WHERE age > $1 AND name LIKE $2"
(writeParam (18::Int) <> writeParam ("A%"::Text))
Each Haskell value goes directly to the backend's encoded format, skipping toPersistValue entirely.
Column-oriented encoding: UNNEST and = ANY
The direct encode path is designed to support column-oriented parameter encoding, which enables two important patterns:
Bulk inserts via UNNEST: instead of INSERT INTO t VALUES (?,?,?), (?,?,?), ... with a dynamic SQL string, use INSERT INTO t (c1, c2, ...) SELECT * FROM UNNEST($1, $2, ...) where each $N is an array containing all values for that column across all rows. The SQL template is fixed regardless of batch size, enabling prepared statement reuse.
IN-clause via = ANY: instead of WHERE id IN (?,?,?,...) with a variable number of parameters, use WHERE id = ANY($1) with one array parameter. Again, the SQL is fixed.
Both patterns require encoding a collection of Haskell values directly into the database's binary array format:
For 10k rows × 10 columns, this means that instead of 100k PersistValue objects transposed into PersistArray lists, the direct path builds 10 binary column arrays from the record fields with no intermediate representation.
Allocation comparison (decode path)
For a 10-column entity, per row:
PersistValue path
Direct CPS path
PersistValue objects
10
0
List cons cells
10
0
Either from applicative chain
~9
0
Either from field decode
10
0 (CPS)
Either at boundary
0
0 (runRowReaderCPS)
Boxed Int (column counter)
10
0 (unboxed counter)
Column type dispatch
10 per row
10 once (prepare/run)
Total intermediate objects
~49
0
For 100k rows, that's roughly 4.9 million intermediate objects eliminated.
Migration strategy: .Experimental modules
Following the precedent set by esqueleto (which introduced Database.Esqueleto.Experimental for its new FROM syntax), the direct codec API lives in .Experimental modules alongside the existing API. Users opt in by changing their imports. The existing modules are unchanged.
persistent core
Module
Contents
Database.Persist.Sql
Unchanged. selectList, get, rawSql, etc. continue to use [PersistValue].
Database.Persist.Sql.Experimental
New. Re-exports everything from Database.Persist.Sql, plus direct-path variants with additional FromRow/ToRow constraints. Same names, same signatures — just extra constraints.
The experimental module provides versions of the standard operations that take the direct path when the constraints are satisfied:
Note on Standard Operations:
As of this RFC, only raw query functions are implemented (rawQueryDirect and rawSqlDirect). The higher-level operations listed above (selectList, get, insertMany_) are planned but not yet implemented. These would be implemented in experimental modules to give users typed, zero-allocation versions of core persistent operations.
Users switch by changing one import:
-- Before:importDatabase.Persist.Sql-- After (direct path, zero PersistValue):importDatabase.Persist.Sql.Experimental
All existing code continues to work because the experimental signatures are strictly more constrained: they accept a subset of the original callers (those whose backend supports direct codecs). If a backend doesn't have HasDirectQuery/FromRow instances, etc., the experimental variants simply won't type-check, and the user stays on the original import.
esqueleto
Module
Contents
Database.Esqueleto.Experimental
Unchanged.
Database.Esqueleto.Experimental.Direct
New. Re-exports everything from Database.Esqueleto.Experimental, plus selectDirect, selectOneDirect, etc. with SqlSelectDirect constraints.
As long as we provide the same function name, same return type, same query DSL– the only difference is the extra constraints ensuring the direct path is used. The one area where users will run into labor on their end is converting their custom type instances to use the new direct encoding/decoding APIs.
Backend packages
Each backend package that supports direct codecs exports its env type from the primary existing module, as well as its FieldDecode/FieldEncode instances. These don't need their own .Experimental modules: the instances are always available, and they are picked up via normal instance resolution when the user uses the experimental persistent/esqueleto modules.
Graduation path
Once the direct path is proven stable:
The .Experimental signatures move into the main modules. The extra constraints are additive: they narrow the accepted backends but don't change behavior for backends that satisfy them.
Backends that don't support direct codecs continue to work via the [PersistValue] path.
Eventually, PersistValue-based operations could be deprecated in favor of the direct path.
This mirrors esqueleto's migration of its FROM syntax from .Experimental to the recommended default.
Relationship to SqlBackend
A key design tension that I think requires some discussion: much of the persistent ecosystem (esqueleto's rawSelectSource, persistent's default PersistQueryRead implementation, user code using SqlPersistT) projects down to bare SqlBackend, erasing the concrete backend type. The HasDirectQuery backend constraint needs to reduce Env backend to a concrete type, but SqlBackend doesn't correspond to any particular backend, so Env SqlBackend has no meaningful definition in terms of the way we want to use it in this proposal.
I've explored a few approaches, but I think only the first is particularly viable.
Approach 1: Backend-specific types
Users and libraries work with the concrete backend type (WriteBackend PostgreSQLBackend, SqliteBackend, etc.) instead of SqlBackend. The HasDirectQuery constraint resolves naturally.
For most application code, the concrete type appears only in the runner (withPostgresqlPool, withSqliteConn, etc.) and the ReaderT backend m type. Switching from SqlPersistT to a backend-specific ReaderT is a relatively straightforward find-and-replace change.
Approach 2: A direct-codec field on SqlBackend
For code that flows through SqlBackend, and there's a lot of it, we'd ideally bridge the gap by adding a dedicated field that carries the direct codec machinery from whatever backend created the connection. SqlBackend already works this way for other operations: connInsertSql, connPrepare, connInsertManySql, etc. are all function-valued fields that the backend populates at connection setup time, closing over its own concrete types. The direct path would fit naturally into the same pattern:
The naive idea I had was to use a rank-2 type so the caller passes a polymorphicRowReader and the backend instantiates it at its own concrete env, but it doesn't seem viable in practice:
-- ⚠ THIS DOES NOT TYPECHECK: see belowdataDirectCodecs=DirectCodecs{ dcQueryAndDecode
:: forall a m. MonadIO m
=> (forall env. FromRow env a => RowReader env a)
-> Text -> [PersistValue]
-> Acquire (ConduitM () a m ())
}
Why the rank-2 approach fails
The argument (forall env. FromRow env a => RowReader env a) is fine from the caller's perspective. the TH-generated rowReader class method has exactly this type (polymorphic in env, constrained by FromRow env a). The caller can provide it.
The problem is on the consumption side. The backend's implementation needs to instantiate env at its concrete type:
When the backend writes polyReader @PgRowEnv, GHC needs to discharge the constraint FromRow PgRowEnv a. But a is a rigid type variable from the outer forall a. The backend's lambda must work for alla, and GHC has no instance FromRow PgRowEnv a that covers every possible type. The fact that FieldDecode PgRowEnv Text etc. are "in scope" doesn't help; GHC can't perform instance resolution for FromRow PgRowEnv a without knowing what a is. This produces:
Could not deduce (FromRow PgRowEnv a) arising from a use of 'polyReader'
This is a fundamental tension: you can't erase both env (via the closure in SqlBackend) and a (via forall a) while still relying on typeclass instance resolution to connect them. The dictionary for something like FromRow PgRowEnv User exists at each specific call site, but nobody at the backend's definition site can work with it.
Possible directions
Several alternative encodings were considered, but each has significant drawbacks:
Parameterize DirectCodecs by env: If DirectCodecs env carries the env type, the backend can accept RowReader env a directly. But SqlBackend would need to hold DirectCodecs env for some env, meaning either SqlBackend becomes parameterized (a massive breaking change to the entire ecosystem) or the env is hidden behind an existential... at which point the caller can't construct a RowReader for an env it doesn't know.
Compile the decoder at the call site: The call site knows both a and the backend, so it could resolve all constraints and produce a fully monomorphic decoding function to pass into DirectCodecs. But this requires the call site to know the concrete env type, which means it already has the concrete backend type — and if it has the concrete backend type, it doesn't need the SqlBackend bridge.
Add a FromRow constraint to dcQueryAndDecode: This just pushes the problem around: DirectCodecs is stored in SqlBackend where a isn't in scope, so there's nowhere to put the constraint.
Current recommendation
For now, Approach 2 remains an open problem. The direct codec path works cleanly with concrete backend types (Approach 1). Code that flows through SqlBackend continues to use the PersistValue path, which would remain fully supported.
If a viable encoding for DirectCodecs is found that bridges the SqlBackend gap without parameterizing it, this section will be updated. Potential avenues to explore might include:
A reflection-style approach where the backend reifies its env's dictionaries into a runtime value that the caller threads through
A Dict-based encoding where the caller provides Dict (FromRow env a) explicitly alongside the reader
Restructuring SqlBackend itself to carry a type parameter (a long-term project that defeats most of the attempts at backwards compatibility outlined in this proposal)
In the meantime, the practical impact is limited: most application code already has access to the concrete backend type from the connection runner (withPostgresqlPool, withSqliteConn, etc.), and the direct path is fully available there.
Footnotes
Fixed SQL templates are also a prerequisite for prepared statements, which parse and plan the SQL once and execute it many times. Variable-length SQL strings change with every batch size, so they can never be prepared: every call is a cold parse and plan. Prepared statements in turn enable protocol-level pipelining: sending multiple execute messages to the database without waiting for each response before sending the next (see PostgreSQL's pipeline mode). Pipeline mode doesn't require prepared statements, but unless prepared statements can be run with a consistent set of parameters, you can't use them effectively in pipeline mode since you have to create a huge number of them. ↩
Status
Draft. I have a proof-of-concept implementation PostgreSQL that I'm getting into a publishable state and will share here soon. I have it working though, so this is based on an actual artifact, not just handwaving through the idea.
Summary
Add a backend-agnostic direct codec layer to persistent that bypasses
PersistValueentirely. On the decode side, typed query results go from the database's wire format to Haskell records without intermediate representations. On the encode side, Haskell values go from record fields to the wire format without detouring throughPersistValue.The decode path uses continuation-passing style (CPS) throughout, so the success path allocates zero
Eitherconstructors and zeroPersistValuewrappers. The encode path operates similarly, but in the opposite direction.The existing
PersistValue-based path is unchanged. All current code continues to compile and run without modification.Motivation
Allocation overhead
Every row decoded from any persistent backend follows this path today:
flowchart LR A["wire format"] -->|"decode to PersistValue"| B["PersistValue per column"] B -->|"cons cell"| C["[PersistValue]"] C -->|"fromPersistValue per field"| D["Haskell record"]For a 10-column row, this allocates:
PersistValueheap objects (tagged union, 2+ words each)Either Text (a -> b -> ...)values from the applicativefromPersistValueschainfromPersistValuepattern-matches eachPersistValueagain to extract the payloadThat works out to roughly 70 words (~560 bytes) of pure overhead per row, all short-lived. For a 100k row result set, this produces ~56MB of intermediate garbage that exists only to be collected.
Encoding has a symmetric problem:
toPersistFieldsconverts each record field into aPersistValue, then the backend immediately inspects it to produce the wire format. Every insert and update pays this cost.Limited type vocabulary
PersistValuehas a fixed set of constructors that cannot represent backend-specific types natively. This implementation was chosen back in the 2010s when the most common backends were MySQL and PostgreSQL, and the most common use cases were simple CRUD operations. Nowadays, production-grade applications often need to support more types, often with database-specific support. In order to support these types, we currently have to shoehorn them into the existing constructors, losing type safety and precision. Examples from the PostgreSQL backend include:PersistByteString, losing the distinction between JSON and raw bytesPersistLiteralEscapedwith hex-encoded bytes, requiring a text-to-UUID conversion on each readPersistRational, losing semantic meaning (PostgreSQL'sintervalhas year/month/day/time components thatRationalcannot faithfully represent)PersistValuerepresentation at allPersistLiteralEscapedPersistList/PersistArrayelements must themselves bePersistValue, so arrays of domain types require double conversionFurthermore, heavily using the custom type support makes it harder to improve the underlying library– for example, we could switch to the binary protocol that PostgreSQL supports, but custom values all need to be converted to use the binary protocol, or at least audited to ensure they are compatible.
This means that nearly any changes to the underlying library constitutes a breaking change for all users of the library, which is long-term unsustainable.
With the proposed
FieldDecode/FieldEncodeclasses, backend packages can provide instances for any Haskell type directly, and the type mapping is no longer constrained byPersistValue's fixed vocabulary. This gives us significant flexibility to improve the underlying library without introducing breaking changes for users.Runtime type checks
fromPersistValueperforms runtime pattern matching on thePersistValueconstructor at every field — effectively a type check that the schema already guarantees at compile time. The direct path eliminates this redundancy: TH generates code that calls the correct decoder for each field's statically-known type.Extensibility without breaking changes
When a database adds a new type (or a backend author wants to support an existing type more faithfully), persistent's current architecture requires either adding a constructor to
PersistValue(a breaking change to every backend and everyPersistFieldinstance) or cramming the value into an existing constructor likePersistLiteralEscaped(lossy, with no type safety).With
FieldDecode/FieldEncode, a backend package can add support for any new type by publishing a new instance– without touching persistent core, without coordinating with other backends, and without breaking any downstream users. A PostgreSQL backend could addFieldDecode PgRowEnv HStoreorFieldDecode PgRowEnv TSVectorthe day PostgreSQL ships them. Users who don't reference these types are completely unaffected.Stable SQL for prepared statements and pipelining
Today, persistent generates bulk inserts by expanding
INSERT INTO t VALUES (?,?,?), (?,?,?), ...with one bind parameter per value. For 10k rows of 10 columns, that's 100k bind parameters and a SQL string whose length changes with every batch — defeating prepared statement caching and forcing the database to re-plan the query each time.Similarly,
WHERE id IN (?,?,?,...)expands to a variable number of bind parameters.The direct encode path opens the door to column-oriented encoding: a single fixed SQL template like
INSERT ... SELECT * FROM UNNEST($1, $2, ...)where each parameter is an array of all values for one column, orWHERE id = ANY($1)with a single array parameter. The SQL string stays the same regardless of how many rows or values are involved, which is a prerequisite for proper prepared statement reuse.1Design: Decoding
Core abstraction:
RowReader+FieldDecode+FromRowI propose introducing three new types/classes in persistent core, all backend-agnostic:
envis an opaque, backend-specific type that carries whatever the backend needs to read a column: a result pointer and row index for SQL databases, a BSON document for MongoDB, a key-value list for Redis, etc.The key insight behind the prepare/run split is that within a single result set, the column types are fixed: every row has the same OIDs (PostgreSQL), the same field types (MySQL), or the same BSON structure (MongoDB). The current design would check the column type on every row × every column. With
prepareField, the type dispatch happens once when the first row (or the result metadata) is inspected, producing aFieldRunnerthat has already resolved which decoder to use. The per-row hot loop then just callsrunFieldwithout branching on column types at all.For a 10-column entity over 100k rows, this eliminates a million branches that all take the same path.
FieldDecodetakes both aFieldNameDB(for document stores that look up fields by name) and anIntcolumn index (for SQL backends that read by position). Each backend uses whichever access pattern is natural and ignores the other.Why CPS?
A naive
IO (Either Text a)return type allocates aRightconstructor at every step in the applicative chain, even on the success path where the error case is never used. With CPS, the success continuation is passed directly and theEitheris never constructed:Both
prepareFieldandFieldRunnerare CPS too, so the backend's decoder feeds its result directly into the continuation. Callers userunRowReaderCPSto supply their success and failure actions without materializing unnecessary values:The entire path from wire format to
yieldproduces zeroEitherconstructors and zeroPersistValuewrappers.TH generates
FromRowalongsidePersistEntityFor an entity
User { userName :: Text, userAge :: Maybe Int },mkPersistgenerates both the existingPersistEntityinstance (unchanged) and a newFromRowinstance:Under the hood,
nextFieldcallsprepareFieldon the first row and caches the resultingFieldRunner, then appliesrunFieldfor each subsequent row. The instance is polymorphic overenv— theFieldDecodeconstraints resolve when a concrete backend is chosen, and the entity definition itself remains backend-agnostic.Backend implementations
Each backend provides its own
envtype andFieldDecodeinstances. The same TH-generatedFromRowcode works across all of them without modification.PostgreSQL
I've got a prototype implementation that fully passes all tests for persistent and esqueleto. The
envenvironment value wraps aPGresulthandle, a row index, and a vector of column OIDs classified into aPgTypeADT.FieldDecodeinstances inspect thePgTypeonce duringprepareFieldand return aFieldRunnerthat calls the appropriatepostgresql-binarydecoder directly, without re-checking the OID on each row. Instances coverBool,Int16–Int64,Int,Double,Scientific,Rational,Text,ByteString,Day,TimeOfDay,UTCTime, andMaybe a. Backend-specific types likeUUID,IPRange,DiffTime,Value(JSON), and composite types can be added as further instances without any changes to persistent core.The
case pgType ofbranch runs once. On subsequent rows, only theFieldRunnerexecutes, since it already knows which decoder to use.SQLite
The environment wraps a
Sqlite.Statement. SQLite's dynamic typing means the column type can technically vary per row, soprepareFieldhere is lightweight, but it still validates that the column index is in range and captures the statement reference, keeping the per-rowFieldRunnersimple.MySQL
The environment wraps a vector of
MySQLBase.Fieldmetadata and a vector ofMaybe ByteStringrow data.prepareFieldcaptures the field metadata once, and the returnedFieldRunneruses it to callMySQL.converton each row's data without re-reading the metadata.MongoDB
The environment wraps a BSON
Document.FieldDecodelooks up fields by name (ignoring the column index), which is why the class takes both parameters. For MongoDB, the prepare step is essentially a no-op since each document is self-describing, but the split still keeps the interface uniform.Redis
The environment wraps binary-encoded key-value pairs. Like MongoDB,
FieldDecodelooks up by field name. The prepare step captures the encoded field name to avoid re-encoding it on every lookup.How
FromRowunifies all backendsThe same TH-generated instance resolves to different concrete code depending on the
envtype. The prepare/run split means that backends with uniform column types (PostgreSQL, MySQL) get the type dispatch out of the hot loop, while document stores (MongoDB, Redis) still benefit from the uniform interface even though their "prepare" step is lighter.envprepareFieldinspectsFieldRunnerreads fromPGresultmetadatapostgresql-binarydecoderssqlite3_column_*C API per rowMySQLBase.FieldmetadataMaybe ByteStringrow data +MySQL.convertDB.lookon BSONDocument(by field name)ByteStringBinary.decodefrom key-value pairThe entity definition contains no backend-specific code, and
PersistValueappears nowhere in the decode path.Specialization: eliminating dictionary overhead
The
FromRowandFieldDecodeinstances are polymorphic overenv, which means that in the general case GHC passes typeclass dictionaries at runtime: an indirect function call per field, per row. For a 10-column entity over 100k rows, that's a million indirect calls that could be direct invocations insteadj.The standard fix is
SPECIALIZEpragmas. Backend packages can emit specializations for their concreteenvtype, and we will extend TH to generate them automatically from theMkPersistSettings.In order to achieve this,
mkPersistwill gain a new configuration fieldmpsDirectEnvTypes :: [Name]for environments that consumers want to be specialized. When this list contains'PgRowEnv, the generatedFromRowinstance for an entityUserlooks like:The pragma is placed inside the body of the instance declaration. When GHC sees it, it creates a monomorphic copy of
rowReader @PgRowEnv @User, and is then able to inline all theFieldRunnercalls and eliminate dictionary indirection entirely. The per-row loop collapses to a straight sequence of concrete decoder calls with no polymorphism left at runtime.Because the list lives in
MkPersistSettings, code that usespersistentagainst several different backends from a single module (e.g. one module that needs bothPgRowEnvandSqliteRowEnv) can setmpsDirectEnvTypes = [''PgRowEnv, ''SqliteRowEnv]andTHwill emit specializations for both. Backends that don't appear in the list do not get specialized code generated; yet the instance remains polymorphic and continues to compile normally.This matters most for the
FieldRunnerclosures produced byprepareField. After specialization, GHC can see through the closure and inline the decoder body directly into the row-reading loop, which in turn enables further optimizations like unboxing intermediate results and eliminating redundant null checks across adjacent fields. Without specialization, the closure is opaque to the optimizer because its concrete type isn't known at the call site.The encode path benefits similarly:
ToRow'stoRowBuilderand eachFieldEncodeinstance can be specialized perParamtype as soon asTHgeneratesSPECIALIZEpragmas for the encoding side.Query execution
Each backend implements
HasDirectQuery(naming subject to change, just calling itDirectto indicate that we're bypassingPersistValue,) to send a query and yield oneEnv backendper result row. The conduit consumer runsprepareFieldon the first row (or the result metadata) to obtainFieldRunners, then applies them to every subsequent row viarunField. The type dispatch happens once and the per-row loop is a straight-line decode.HasDirectInsertis the encoding counterpart, accepting pre-encoded parameters and sending them to the database. A backend instance for PostgreSQL, for example, would definetype Param PgBackend = PgParamwherePgParamcarries the OID, binary payload, and format code for each parameter.User-facing API
In order to make migration as straightforward as possible, the
.Experimentalmodules export the same function names as the originals, with additional constraints that indicate that the direct path should be used:Esqueleto integration
A
SqlSelectDirectclass parallels esqueleto'sSqlSelectwith a CPSRowReader-based decoder:This class is parameterized by
envrather thanbackend— the mapping from backend to env happens at the call site via theEnvtype family. This meansSqlSelectDirectinstances are written per environment type, which is the right granularity: the row format depends on the env, not on which backend wrapper is in use.Instances for
Entity,Value,Maybe (Entity), and tuples. The.Experimental.Directmodule exportsselectwith the extra constraint — same name, same query DSL:The
backendtype variable already carried by esqueleto resolvesEnv backendvia the associated type family inHasDirectQuery, which then satisfies theSqlSelectDirect a r (Env backend)constraint.Design: Encoding
The decode side has a working prototype. The encode side follows the same principles.
The problem
flowchart LR A["Haskell record"] -->|"toPersistFields"| B["[PersistValue]"] B -->|"encode per field"| C["wire format"]Every field is boxed into
PersistValueand then immediately unboxed. For bulk inserts of 10k rows × 10 columns, that's 100k unnecessaryPersistValueallocations.FieldEncode: one class, one methodparamis a backend-specific encoded parameter type. Each backend decides whatparamlooks like: a PostgreSQL backend might use a type carrying an OID, encoded bytes, and a format tag; an SQLite backend might use a sum type mirroring SQLite's type affinity (SqliteInt !Int64 | SqliteText !Text | ...).The class is deliberately minimal: it has one class with one method, but enables a wide range of backends to be supported without breaking changes.
Alternative: contravariant encoders (hasql-style)
It's worth considering the approach that hasql takes here, which uses contravariant functors to compose encoders:
The
>$<operator (contramap fromContravariant) lets you project a field out of a record and feed it into an encoder, and<>sequences them. The result is a singleParams Uservalue that knows how to encode an entire record in one pass.This approach is advantageous in several ways:
Paramsonce and run it per row.My main concern around this is the learning curve:
ContravariantandDivisibleare less familiar thanApplicativeto most Haskell developers, and the corpus of good intuition around them is a bit lacking. For a library likepersistent, whose user base spans from beginners to experts, that friction matters.hasql's encoding API is one of the things that I've found to be something of a barrier to adoption withhasql, even though I'm relatively fluent with Haskell and appreciate the type safety it provides.That said, if we're already asking users to change imports and adopt new constraints, the marginal cost of learning contravariant composition might be acceptable? Especially since TH can generate the encoders automatically for
Entitytypes, meaning most users would only encounter the raw API when writing custom queries. The generated code would look something like:This is an area where we should probably prototype both approaches and see which one leads to clearer error messages and more natural composition in practice. The simple
FieldEncodeclass is easier to explain and implement first, but if we're going to make breaking changes at some point anyway, the contravariant approach might be the better long-term bet. It may also be the case that we can build a small DSL on top ofContravariantthat doesn't feel as foreign to users, which may give us the best of both worlds.ToRow: TH-generated, produces a builderTH generates:
Usage:
Typed query parameters
FieldEncodealso gives us typed query parameters without routing through[PersistValue]:Each Haskell value goes directly to the backend's encoded format, skipping
toPersistValueentirely.Column-oriented encoding: UNNEST and = ANY
The direct encode path is designed to support column-oriented parameter encoding, which enables two important patterns:
Bulk inserts via UNNEST: instead of
INSERT INTO t VALUES (?,?,?), (?,?,?), ...with a dynamic SQL string, useINSERT INTO t (c1, c2, ...) SELECT * FROM UNNEST($1, $2, ...)where each$Nis an array containing all values for that column across all rows. The SQL template is fixed regardless of batch size, enabling prepared statement reuse.IN-clause via = ANY: instead of
WHERE id IN (?,?,?,...)with a variable number of parameters, useWHERE id = ANY($1)with one array parameter. Again, the SQL is fixed.Both patterns require encoding a collection of Haskell values directly into the database's binary array format:
TH can generate a columnar encoder that transposes a
Vector recordinto per-column arrays and encodes each one directly:For 10k rows × 10 columns, this means that instead of 100k
PersistValueobjects transposed intoPersistArraylists, the direct path builds 10 binary column arrays from the record fields with no intermediate representation.Allocation comparison (decode path)
For a 10-column entity, per row:
PersistValuepathPersistValueobjectsEitherfrom applicative chainEitherfrom field decodeEitherat boundaryrunRowReaderCPS)Int(column counter)For 100k rows, that's roughly 4.9 million intermediate objects eliminated.
Migration strategy:
.ExperimentalmodulesFollowing the precedent set by esqueleto (which introduced
Database.Esqueleto.Experimentalfor its newFROMsyntax), the direct codec API lives in.Experimentalmodules alongside the existing API. Users opt in by changing their imports. The existing modules are unchanged.persistent core
Database.Persist.SqlselectList,get,rawSql, etc. continue to use[PersistValue].Database.Persist.Sql.ExperimentalDatabase.Persist.Sql, plus direct-path variants with additionalFromRow/ToRowconstraints. Same names, same signatures — just extra constraints.The experimental module provides versions of the standard operations that take the direct path when the constraints are satisfied:
Note on Standard Operations:
As of this RFC, only raw query functions are implemented (
rawQueryDirectandrawSqlDirect). The higher-level operations listed above (selectList,get,insertMany_) are planned but not yet implemented. These would be implemented in experimental modules to give users typed, zero-allocation versions of core persistent operations.Users switch by changing one import:
All existing code continues to work because the experimental signatures are strictly more constrained: they accept a subset of the original callers (those whose backend supports direct codecs). If a backend doesn't have
HasDirectQuery/FromRowinstances, etc., the experimental variants simply won't type-check, and the user stays on the original import.esqueleto
Database.Esqueleto.ExperimentalDatabase.Esqueleto.Experimental.DirectDatabase.Esqueleto.Experimental, plusselectDirect,selectOneDirect, etc. withSqlSelectDirectconstraints.As long as we provide the same function name, same return type, same query DSL– the only difference is the extra constraints ensuring the direct path is used. The one area where users will run into labor on their end is converting their custom type instances to use the new direct encoding/decoding APIs.
Backend packages
Each backend package that supports direct codecs exports its
envtype from the primary existing module, as well as itsFieldDecode/FieldEncodeinstances. These don't need their own.Experimentalmodules: the instances are always available, and they are picked up via normal instance resolution when the user uses the experimental persistent/esqueleto modules.Graduation path
Once the direct path is proven stable:
.Experimentalsignatures move into the main modules. The extra constraints are additive: they narrow the accepted backends but don't change behavior for backends that satisfy them.[PersistValue]path.PersistValue-based operations could be deprecated in favor of the direct path.This mirrors esqueleto's migration of its
FROMsyntax from.Experimentalto the recommended default.Relationship to
SqlBackendA key design tension that I think requires some discussion: much of the persistent ecosystem (esqueleto's
rawSelectSource, persistent's defaultPersistQueryReadimplementation, user code usingSqlPersistT) projects down to bareSqlBackend, erasing the concrete backend type. TheHasDirectQuery backendconstraint needs to reduceEnv backendto a concrete type, butSqlBackenddoesn't correspond to any particular backend, soEnv SqlBackendhas no meaningful definition in terms of the way we want to use it in this proposal.I've explored a few approaches, but I think only the first is particularly viable.
Approach 1: Backend-specific types
Users and libraries work with the concrete backend type (
WriteBackend PostgreSQLBackend,SqliteBackend, etc.) instead ofSqlBackend. TheHasDirectQueryconstraint resolves naturally.For most application code, the concrete type appears only in the runner (
withPostgresqlPool,withSqliteConn, etc.) and theReaderT backend mtype. Switching fromSqlPersistTto a backend-specificReaderTis a relatively straightforward find-and-replace change.Approach 2: A direct-codec field on
SqlBackendFor code that flows through
SqlBackend, and there's a lot of it, we'd ideally bridge the gap by adding a dedicated field that carries the direct codec machinery from whatever backend created the connection.SqlBackendalready works this way for other operations:connInsertSql,connPrepare,connInsertManySql, etc. are all function-valued fields that the backend populates at connection setup time, closing over its own concrete types. The direct path would fit naturally into the same pattern:The naive idea I had was to use a rank-2 type so the caller passes a polymorphic
RowReaderand the backend instantiates it at its own concreteenv, but it doesn't seem viable in practice:Why the rank-2 approach fails
The argument
(forall env. FromRow env a => RowReader env a)is fine from the caller's perspective. the TH-generatedrowReaderclass method has exactly this type (polymorphic inenv, constrained byFromRow env a). The caller can provide it.The problem is on the consumption side. The backend's implementation needs to instantiate
envat its concrete type:When the backend writes
polyReader @PgRowEnv, GHC needs to discharge the constraintFromRow PgRowEnv a. Butais a rigid type variable from the outerforall a. The backend's lambda must work for alla, and GHC has no instanceFromRow PgRowEnv athat covers every possible type. The fact thatFieldDecode PgRowEnv Textetc. are "in scope" doesn't help; GHC can't perform instance resolution forFromRow PgRowEnv awithout knowing whatais. This produces:This is a fundamental tension: you can't erase both
env(via the closure inSqlBackend) anda(viaforall a) while still relying on typeclass instance resolution to connect them. The dictionary for something likeFromRow PgRowEnv Userexists at each specific call site, but nobody at the backend's definition site can work with it.Possible directions
Several alternative encodings were considered, but each has significant drawbacks:
Parameterize
DirectCodecsbyenv: IfDirectCodecs envcarries the env type, the backend can acceptRowReader env adirectly. ButSqlBackendwould need to holdDirectCodecs envfor someenv, meaning eitherSqlBackendbecomes parameterized (a massive breaking change to the entire ecosystem) or theenvis hidden behind an existential... at which point the caller can't construct aRowReaderfor an env it doesn't know.Compile the decoder at the call site: The call site knows both
aand the backend, so it could resolve all constraints and produce a fully monomorphic decoding function to pass intoDirectCodecs. But this requires the call site to know the concrete env type, which means it already has the concrete backend type — and if it has the concrete backend type, it doesn't need theSqlBackendbridge.Add a
FromRowconstraint todcQueryAndDecode: This just pushes the problem around:DirectCodecsis stored inSqlBackendwhereaisn't in scope, so there's nowhere to put the constraint.Current recommendation
For now, Approach 2 remains an open problem. The direct codec path works cleanly with concrete backend types (Approach 1). Code that flows through
SqlBackendcontinues to use thePersistValuepath, which would remain fully supported.If a viable encoding for
DirectCodecsis found that bridges theSqlBackendgap without parameterizing it, this section will be updated. Potential avenues to explore might include:reflection-style approach where the backend reifies its env's dictionaries into a runtime value that the caller threads throughDict-based encoding where the caller providesDict (FromRow env a)explicitly alongside the readerSqlBackenditself to carry a type parameter (a long-term project that defeats most of the attempts at backwards compatibility outlined in this proposal)In the meantime, the practical impact is limited: most application code already has access to the concrete backend type from the connection runner (
withPostgresqlPool,withSqliteConn, etc.), and the direct path is fully available there.Footnotes
Fixed SQL templates are also a prerequisite for prepared statements, which parse and plan the SQL once and execute it many times. Variable-length SQL strings change with every batch size, so they can never be prepared: every call is a cold parse and plan. Prepared statements in turn enable protocol-level pipelining: sending multiple execute messages to the database without waiting for each response before sending the next (see PostgreSQL's pipeline mode). Pipeline mode doesn't require prepared statements, but unless prepared statements can be run with a consistent set of parameters, you can't use them effectively in pipeline mode since you have to create a huge number of them. ↩