Skip to content

RFC: direct row codecs #1617

@iand675

Description

@iand675

Status

Draft. I have a proof-of-concept implementation PostgreSQL that I'm getting into a publishable state and will share here soon. I have it working though, so this is based on an actual artifact, not just handwaving through the idea.

Summary

Add a backend-agnostic direct codec layer to persistent that bypasses PersistValue entirely. On the decode side, typed query results go from the database's wire format to Haskell records without intermediate representations. On the encode side, Haskell values go from record fields to the wire format without detouring through PersistValue.

The decode path uses continuation-passing style (CPS) throughout, so the success path allocates zero Either constructors and zero PersistValue wrappers. The encode path operates similarly, but in the opposite direction.

The existing PersistValue-based path is unchanged. All current code continues to compile and run without modification.

Motivation

Allocation overhead

Every row decoded from any persistent backend follows this path today:

flowchart LR
    A["wire format"] -->|"decode to PersistValue"| B["PersistValue per column"]
    B -->|"cons cell"| C["[PersistValue]"]
    C -->|"fromPersistValue per field"| D["Haskell record"]
Loading

For a 10-column row, this allocates:

  • 10 PersistValue heap objects (tagged union, 2+ words each)
  • 10 list cons cells (3 words each)
  • ~9 intermediate Either Text (a -> b -> ...) values from the applicative fromPersistValues chain
  • Then fromPersistValue pattern-matches each PersistValue again to extract the payload

That works out to roughly 70 words (~560 bytes) of pure overhead per row, all short-lived. For a 100k row result set, this produces ~56MB of intermediate garbage that exists only to be collected.

Encoding has a symmetric problem: toPersistFields converts each record field into a PersistValue, then the backend immediately inspects it to produce the wire format. Every insert and update pays this cost.

Limited type vocabulary

PersistValue has a fixed set of constructors that cannot represent backend-specific types natively. This implementation was chosen back in the 2010s when the most common backends were MySQL and PostgreSQL, and the most common use cases were simple CRUD operations. Nowadays, production-grade applications often need to support more types, often with database-specific support. In order to support these types, we currently have to shoehorn them into the existing constructors, losing type safety and precision. Examples from the PostgreSQL backend include:

  • JSON/JSONB: stored as PersistByteString, losing the distinction between JSON and raw bytes
  • UUID: stored as PersistLiteralEscaped with hex-encoded bytes, requiring a text-to-UUID conversion on each read
  • Intervals: stored as PersistRational, losing semantic meaning (PostgreSQL's interval has year/month/day/time components that Rational cannot faithfully represent)
  • Composite types: PostgreSQL composite values have no PersistValue representation at all
  • Inet/cidr: IP address types round-trip through PersistLiteralEscaped
  • Arrays of custom types: PersistList/PersistArray elements must themselves be PersistValue, so arrays of domain types require double conversion
  • Enums: backend enums are sent as text with no type safety at the Haskell boundary

Furthermore, heavily using the custom type support makes it harder to improve the underlying library– for example, we could switch to the binary protocol that PostgreSQL supports, but custom values all need to be converted to use the binary protocol, or at least audited to ensure they are compatible.

This means that nearly any changes to the underlying library constitutes a breaking change for all users of the library, which is long-term unsustainable.

With the proposed FieldDecode/FieldEncode classes, backend packages can provide instances for any Haskell type directly, and the type mapping is no longer constrained by PersistValue's fixed vocabulary. This gives us significant flexibility to improve the underlying library without introducing breaking changes for users.

Runtime type checks

fromPersistValue performs runtime pattern matching on the PersistValue constructor at every field — effectively a type check that the schema already guarantees at compile time. The direct path eliminates this redundancy: TH generates code that calls the correct decoder for each field's statically-known type.

Extensibility without breaking changes

When a database adds a new type (or a backend author wants to support an existing type more faithfully), persistent's current architecture requires either adding a constructor to PersistValue (a breaking change to every backend and every PersistField instance) or cramming the value into an existing constructor like PersistLiteralEscaped (lossy, with no type safety).

With FieldDecode/FieldEncode, a backend package can add support for any new type by publishing a new instance– without touching persistent core, without coordinating with other backends, and without breaking any downstream users. A PostgreSQL backend could add FieldDecode PgRowEnv HStore or FieldDecode PgRowEnv TSVector the day PostgreSQL ships them. Users who don't reference these types are completely unaffected.

Stable SQL for prepared statements and pipelining

Today, persistent generates bulk inserts by expanding INSERT INTO t VALUES (?,?,?), (?,?,?), ... with one bind parameter per value. For 10k rows of 10 columns, that's 100k bind parameters and a SQL string whose length changes with every batch — defeating prepared statement caching and forcing the database to re-plan the query each time.

Similarly, WHERE id IN (?,?,?,...) expands to a variable number of bind parameters.

The direct encode path opens the door to column-oriented encoding: a single fixed SQL template like INSERT ... SELECT * FROM UNNEST($1, $2, ...) where each parameter is an array of all values for one column, or WHERE id = ANY($1) with a single array parameter. The SQL string stays the same regardless of how many rows or values are involved, which is a prerequisite for proper prepared statement reuse.1

Design: Decoding

Core abstraction: RowReader + FieldDecode + FromRow

I propose introducing three new types/classes in persistent core, all backend-agnostic:

-- CPS column-cursor monad
newtype RowReader env a = RowReader
    { unRowReader
        :: forall r. env -> Counter
        -> (Text -> IO r)   -- on error
        -> (a -> IO r)      -- on success
        -> IO r
    }

-- Per-field direct decoding, split into prepare + run.
class FieldDecode env a where
    -- | Inspect column metadata once per result set, returning a
    --   specialized runner that can decode every row without
    --   re-checking types.
    prepareField
        :: env -> FieldNameDB -> Int
        -> (Text -> IO r) -> (FieldRunner env a -> IO r) -> IO r

-- | The product of 'prepareField'. Runs on each row with only the
--   row-varying data (row index, raw bytes, etc.)– column type
--   dispatch has already happened.
newtype FieldRunner env a = FieldRunner
    { runField :: forall r. env -> (Text -> IO r) -> (a -> IO r) -> IO r }

-- Per-entity direct decoding. TH generates instances.
class FromRow env a where
    rowReader :: RowReader env a

env is an opaque, backend-specific type that carries whatever the backend needs to read a column: a result pointer and row index for SQL databases, a BSON document for MongoDB, a key-value list for Redis, etc.

The key insight behind the prepare/run split is that within a single result set, the column types are fixed: every row has the same OIDs (PostgreSQL), the same field types (MySQL), or the same BSON structure (MongoDB). The current design would check the column type on every row × every column. With prepareField, the type dispatch happens once when the first row (or the result metadata) is inspected, producing a FieldRunner that has already resolved which decoder to use. The per-row hot loop then just calls runField without branching on column types at all.

For a 10-column entity over 100k rows, this eliminates a million branches that all take the same path.

FieldDecode takes both a FieldNameDB (for document stores that look up fields by name) and an Int column index (for SQL backends that read by position). Each backend uses whichever access pattern is natural and ignores the other.

Why CPS?

A naive IO (Either Text a) return type allocates a Right constructor at every step in the applicative chain, even on the success path where the error case is never used. With CPS, the success continuation is passed directly and the Either is never constructed:

RowReader ff <*> RowReader fa = RowReader $ \env ctr onErr onOk ->
    ff env ctr onErr $ \f ->
        fa env ctr onErr $ \a ->
            onOk (f a)

Both prepareField and FieldRunner are CPS too, so the backend's decoder feeds its result directly into the continuation. Callers use runRowReaderCPS to supply their success and failure actions without materializing unnecessary values:

val <- runRowReaderCPS rowReader env ctr
    (throwIO . PersistMarshalError)  -- on error: throw directly
    pure                              -- on success: return value directly
yield val

The entire path from wire format to yield produces zero Either constructors and zero PersistValue wrappers.

TH generates FromRow alongside PersistEntity

For an entity User { userName :: Text, userAge :: Maybe Int }, mkPersist generates both the existing PersistEntity instance (unchanged) and a new FromRow instance:

instance (FieldDecode env Text, FieldDecode env (Maybe Int))
      => FromRow env User where
    rowReader = User
        <$> nextField (FieldNameDB "name")
        <*> nextField (FieldNameDB "age")

Under the hood, nextField calls prepareField on the first row and caches the resulting FieldRunner, then applies runField for each subsequent row. The instance is polymorphic over env — the FieldDecode constraints resolve when a concrete backend is chosen, and the entity definition itself remains backend-agnostic.

Backend implementations

Each backend provides its own env type and FieldDecode instances. The same TH-generated FromRow code works across all of them without modification.

PostgreSQL

I've got a prototype implementation that fully passes all tests for persistent and esqueleto. The env environment value wraps a PGresult handle, a row index, and a vector of column OIDs classified into a PgType ADT. FieldDecode instances inspect the PgType once during prepareField and return a FieldRunner that calls the appropriate postgresql-binary decoder directly, without re-checking the OID on each row. Instances cover Bool, Int16Int64, Int, Double, Scientific, Rational, Text, ByteString, Day, TimeOfDay, UTCTime, and Maybe a. Backend-specific types like UUID, IPRange, DiffTime, Value (JSON), and composite types can be added as further instances without any changes to persistent core.

-- Sketch (simplified from the prototype):
instance FieldDecode PgRowEnv Text where
    prepareField env _ col onErr onOk = do
        pgType <- columnType env col
        case pgType of
            Scalar PgText    -> onOk (FieldRunner $ \env' onErr' onOk' -> readBytes env' col >>= decodeWith textDecoder onErr' onOk')
            Scalar PgVarchar -> onOk (FieldRunner $ \env' onErr' onOk' -> readBytes env' col >>= decodeWith textDecoder onErr' onOk')
            _                -> onErr ("type mismatch: expected text, got " <> show pgType)

The case pgType of branch runs once. On subsequent rows, only the FieldRunner executes, since it already knows which decoder to use.

SQLite

The environment wraps a Sqlite.Statement. SQLite's dynamic typing means the column type can technically vary per row, so prepareField here is lightweight, but it still validates that the column index is in range and captures the statement reference, keeping the per-row FieldRunner simple.

instance FieldDecode SqliteRowEnv Text where
    prepareField (SqliteRowEnv stmt) _ col onErr onOk =
        onOk $ FieldRunner $ \(SqliteRowEnv stmt') onErr' onOk' -> do
            ty <- Sqlite.columnType stmt' (colIdx col)
            case ty of
                Sqlite.NullColumn -> onErr' "unexpected NULL for Text"
                _                 -> Sqlite.columnText stmt' (colIdx col) >>= onOk'

MySQL

The environment wraps a vector of MySQLBase.Field metadata and a vector of Maybe ByteString row data. prepareField captures the field metadata once, and the returned FieldRunner uses it to call MySQL.convert on each row's data without re-reading the metadata.

instance FieldDecode MySQLRowEnv Text where
    prepareField env _ col onErr onOk =
        let field = mysqlFields env V.! col
        in onOk $ FieldRunner $ \env' onErr' onOk' ->
            case mysqlRow env' V.! col of
                Nothing -> onErr' "unexpected NULL for Text"
                Just bs -> case MySQL.convert field bs of
                    Just t  -> onOk' t
                    Nothing -> onErr' "MySQL: cannot convert to Text"

MongoDB

The environment wraps a BSON Document. FieldDecode looks up fields by name (ignoring the column index), which is why the class takes both parameters. For MongoDB, the prepare step is essentially a no-op since each document is self-describing, but the split still keeps the interface uniform.

instance FieldDecode MongoRowEnv Text where
    prepareField _ name _ onErr onOk =
        onOk $ FieldRunner $ \(MongoRowEnv doc) onErr' onOk' ->
            case DB.look (unFieldNameDB name) doc of
                Just (DB.String t) -> onOk' t
                Just DB.Null       -> onErr' "unexpected NULL for Text"
                Nothing            -> onErr' ("missing field: " <> unFieldNameDB name)
                _                  -> onErr' "expected String"

-- Distinguishes absent fields from null fields:
instance FieldDecode MongoRowEnv a => FieldDecode MongoRowEnv (Maybe a) where
    prepareField env name col onErr onOk =
        prepareField env name col onErr $ \inner ->
            onOk $ FieldRunner $ \(MongoRowEnv doc) onErr' onOk' ->
                case DB.look (unFieldNameDB name) doc of
                    Nothing      -> onOk' Nothing
                    Just DB.Null -> onOk' Nothing
                    _            -> runField inner (MongoRowEnv doc) onErr' (onOk' . Just)

Redis

The environment wraps binary-encoded key-value pairs. Like MongoDB, FieldDecode looks up by field name. The prepare step captures the encoded field name to avoid re-encoding it on every lookup.

instance FieldDecode RedisRowEnv Text where
    prepareField _ name _ onErr onOk =
        let key = encodeUtf8 (unFieldNameDB name)
        in onOk $ FieldRunner $ \(RedisRowEnv pairs) onErr' onOk' ->
            case V.find (\(k, _) -> k == key) pairs of
                Just (_, bs) -> case Binary.decode (L.fromStrict bs) of
                    BinPersistText t -> onOk' t
                    _                -> onErr' "expected Text"
                Nothing -> onErr' ("missing field: " <> unFieldNameDB name)

How FromRow unifies all backends

The same TH-generated instance resolves to different concrete code depending on the env type. The prepare/run split means that backends with uniform column types (PostgreSQL, MySQL) get the type dispatch out of the hot loop, while document stores (MongoDB, Redis) still benefit from the uniform interface even though their "prepare" step is lighter.

env What prepareField inspects What FieldRunner reads from
PostgreSQL column OID from PGresult metadata binary row data via postgresql-binary decoders
SQLite (lightweight — validates column index) sqlite3_column_* C API per row
MySQL MySQLBase.Field metadata Maybe ByteString row data + MySQL.convert
MongoDB (no-op — documents are self-describing) DB.look on BSON Document (by field name)
Redis encodes field name to ByteString Binary.decode from key-value pair

The entity definition contains no backend-specific code, and PersistValue appears nowhere in the decode path.

Specialization: eliminating dictionary overhead

The FromRow and FieldDecode instances are polymorphic over env, which means that in the general case GHC passes typeclass dictionaries at runtime: an indirect function call per field, per row. For a 10-column entity over 100k rows, that's a million indirect calls that could be direct invocations insteadj.

The standard fix is SPECIALIZE pragmas. Backend packages can emit specializations for their concrete env type, and we will extend TH to generate them automatically from the MkPersistSettings.

In order to achieve this, mkPersist will gain a new configuration field mpsDirectEnvTypes :: [Name] for environments that consumers want to be specialized. When this list contains 'PgRowEnv, the generated FromRow instance for an entity User looks like:

instance (FieldDecode env Text, FieldDecode env (Maybe Int))
      => FromRow env User where
    rowReader = User <$> nextField "name" <*> nextField "age"
    {-# SPECIALIZE instance FromRow PgRowEnv User #-}

The pragma is placed inside the body of the instance declaration. When GHC sees it, it creates a monomorphic copy of rowReader @PgRowEnv @User, and is then able to inline all the FieldRunner calls and eliminate dictionary indirection entirely. The per-row loop collapses to a straight sequence of concrete decoder calls with no polymorphism left at runtime.

Because the list lives in MkPersistSettings, code that uses persistent against several different backends from a single module (e.g. one module that needs both PgRowEnv and SqliteRowEnv) can set mpsDirectEnvTypes = [''PgRowEnv, ''SqliteRowEnv] and TH will emit specializations for both. Backends that don't appear in the list do not get specialized code generated; yet the instance remains polymorphic and continues to compile normally.

This matters most for the FieldRunner closures produced by prepareField. After specialization, GHC can see through the closure and inline the decoder body directly into the row-reading loop, which in turn enables further optimizations like unboxing intermediate results and eliminating redundant null checks across adjacent fields. Without specialization, the closure is opaque to the optimizer because its concrete type isn't known at the call site.

The encode path benefits similarly: ToRow's toRowBuilder and each FieldEncode instance can be specialized per Param type as soon as TH generates SPECIALIZE pragmas for the encoding side.

Query execution

class HasDirectQuery backend where
    type Env backend
    directQuerySource
        :: MonadIO m
        => backend -> Text -> [PersistValue]
        -> Acquire (ConduitM () (Env backend) m ())

class HasDirectInsert backend where
    type Param backend
    directInsert
        :: MonadIO m
        => backend -> Text -> SmallArray (Param backend)
        -> m ()

Each backend implements HasDirectQuery (naming subject to change, just calling it Direct to indicate that we're bypassing PersistValue,) to send a query and yield one Env backend per result row. The conduit consumer runs prepareField on the first row (or the result metadata) to obtain FieldRunners, then applies them to every subsequent row via runField. The type dispatch happens once and the per-row loop is a straight-line decode.

HasDirectInsert is the encoding counterpart, accepting pre-encoded parameters and sending them to the database. A backend instance for PostgreSQL, for example, would define type Param PgBackend = PgParam where PgParam carries the OID, binary payload, and format code for each parameter.

User-facing API

In order to make migration as straightforward as possible, the .Experimental modules export the same function names as the originals, with additional constraints that indicate that the direct path should be used:

-- Database.Persist.Sql.Experimental:
rawQuery :: (FromRow (Env backend) a, HasDirectQuery backend, ...)
    => Text -> [PersistValue] -> ReaderT backend m (Acquire (ConduitM () a m ()))

rawSql :: (FromRow (Env backend) a, HasDirectQuery backend, ...)
    => Text -> [PersistValue] -> ReaderT backend m [a]

Esqueleto integration

A SqlSelectDirect class parallels esqueleto's SqlSelect with a CPS RowReader-based decoder:

class SqlSelect a r => SqlSelectDirect a r env where
    sqlSelectDirectRow :: RowReader env r

This class is parameterized by env rather than backend — the mapping from backend to env happens at the call site via the Env type family. This means SqlSelectDirect instances are written per environment type, which is the right granularity: the row format depends on the env, not on which backend wrapper is in use.

Instances for Entity, Value, Maybe (Entity), and tuples. The .Experimental.Direct module exports select with the extra constraint — same name, same query DSL:

-- Database.Esqueleto.Experimental.Direct:
import Database.Esqueleto.Experimental.Direct

users <- select $ do
    p <- from $ table @Person
    where_ (p ^. PersonAge >=. val 18)
    return p
-- Identical syntax. The direct path is chosen by the import, not the function name.

The backend type variable already carried by esqueleto resolves Env backend via the associated type family in HasDirectQuery, which then satisfies the SqlSelectDirect a r (Env backend) constraint.

Design: Encoding

The decode side has a working prototype. The encode side follows the same principles.

The problem

flowchart LR
    A["Haskell record"] -->|"toPersistFields"| B["[PersistValue]"]
    B -->|"encode per field"| C["wire format"]
Loading

Every field is boxed into PersistValue and then immediately unboxed. For bulk inserts of 10k rows × 10 columns, that's 100k unnecessary PersistValue allocations.

FieldEncode: one class, one method

class FieldEncode param a where
    encodeField :: a -> param

param is a backend-specific encoded parameter type. Each backend decides what param looks like: a PostgreSQL backend might use a type carrying an OID, encoded bytes, and a format tag; an SQLite backend might use a sum type mirroring SQLite's type affinity (SqliteInt !Int64 | SqliteText !Text | ...).

The class is deliberately minimal: it has one class with one method, but enables a wide range of backends to be supported without breaking changes.

Alternative: contravariant encoders (hasql-style)

It's worth considering the approach that hasql takes here, which uses contravariant functors to compose encoders:

-- hasql's style:
userParams :: Params User
userParams =
       (userName >$< param (nonNullable text))
    <> (userAge  >$< param (nullable int4))

The >$< operator (contramap from Contravariant) lets you project a field out of a record and feed it into an encoder, and <> sequences them. The result is a single Params User value that knows how to encode an entire record in one pass.

This approach is advantageous in several ways:

  1. The encoder is a first-class value that can be composed, stored, and reused; you can build an encoder for a composite type by combining encoders for its parts, and the types enforce that every field is accounted for.
  2. It separates the description of the encoding from the execution, which pairs well with the prepare/run split on the decode side: you could prepare a Params once and run it per row.

My main concern around this is the learning curve: Contravariant and Divisible are less familiar than Applicative to most Haskell developers, and the corpus of good intuition around them is a bit lacking. For a library like persistent, whose user base spans from beginners to experts, that friction matters. hasql's encoding API is one of the things that I've found to be something of a barrier to adoption with hasql, even though I'm relatively fluent with Haskell and appreciate the type safety it provides.

That said, if we're already asking users to change imports and adopt new constraints, the marginal cost of learning contravariant composition might be acceptable? Especially since TH can generate the encoders automatically for Entity types, meaning most users would only encounter the raw API when writing custom queries. The generated code would look something like:

instance HasEncoder param User where
    encoder =
           (userName >$< fieldEncoder @Text)
        <> (userAge  >$< fieldEncoder @(Maybe Int))

This is an area where we should probably prototype both approaches and see which one leads to clearer error messages and more natural composition in practice. The simple FieldEncode class is easier to explain and implement first, but if we're going to make breaking changes at some point anyway, the contravariant approach might be the better long-term bet. It may also be the case that we can build a small DSL on top of Contravariant that doesn't feel as foreign to users, which may give us the best of both worlds.

ToRow: TH-generated, produces a builder

-- Writes encoded params into a SmallMutableArray, avoiding intermediate lists.
newtype ParamBuilder param = ParamBuilder (SmallMutableArray RealWorld param -> Int -> IO Int)

instance Monoid (ParamBuilder param)

writeParam :: FieldEncode param a => a -> ParamBuilder param
buildParams :: Int -> ParamBuilder param -> IO (SmallArray param)

TH generates:

instance (FieldEncode param Text, FieldEncode param (Maybe Int))
      => ToRow param User where
    toRowBuilder (User name age) = writeParam name <> writeParam age

Usage:

params <- buildParams 2 (toRowBuilder user)
-- params :: SmallArray param, ready for the backend

Typed query parameters

FieldEncode also gives us typed query parameters without routing through [PersistValue]:

rawQueryDirectTyped @(Entity User)
    "SELECT ?? FROM user WHERE age > $1 AND name LIKE $2"
    (writeParam (18 :: Int) <> writeParam ("A%" :: Text))

Each Haskell value goes directly to the backend's encoded format, skipping toPersistValue entirely.

Column-oriented encoding: UNNEST and = ANY

The direct encode path is designed to support column-oriented parameter encoding, which enables two important patterns:

Bulk inserts via UNNEST: instead of INSERT INTO t VALUES (?,?,?), (?,?,?), ... with a dynamic SQL string, use INSERT INTO t (c1, c2, ...) SELECT * FROM UNNEST($1, $2, ...) where each $N is an array containing all values for that column across all rows. The SQL template is fixed regardless of batch size, enabling prepared statement reuse.

IN-clause via = ANY: instead of WHERE id IN (?,?,?,...) with a variable number of parameters, use WHERE id = ANY($1) with one array parameter. Again, the SQL is fixed.

Both patterns require encoding a collection of Haskell values directly into the database's binary array format:

class FieldEncodeArray param a where
    encodeColumnArray :: Vector a -> param

TH can generate a columnar encoder that transposes a Vector record into per-column arrays and encodes each one directly:

class ToRowColumnar param a where
    toColumnarBuilder :: Vector a -> ParamBuilder param

instance (FieldEncodeArray param Text, FieldEncodeArray param (Maybe Int))
      => ToRowColumnar param User where
    toColumnarBuilder users =
           writeParam (encodeColumnArray (V.map userName users))
        <> writeParam (encodeColumnArray (V.map userAge users))

For 10k rows × 10 columns, this means that instead of 100k PersistValue objects transposed into PersistArray lists, the direct path builds 10 binary column arrays from the record fields with no intermediate representation.

Allocation comparison (decode path)

For a 10-column entity, per row:

PersistValue path Direct CPS path
PersistValue objects 10 0
List cons cells 10 0
Either from applicative chain ~9 0
Either from field decode 10 0 (CPS)
Either at boundary 0 0 (runRowReaderCPS)
Boxed Int (column counter) 10 0 (unboxed counter)
Column type dispatch 10 per row 10 once (prepare/run)
Total intermediate objects ~49 0

For 100k rows, that's roughly 4.9 million intermediate objects eliminated.

Migration strategy: .Experimental modules

Following the precedent set by esqueleto (which introduced Database.Esqueleto.Experimental for its new FROM syntax), the direct codec API lives in .Experimental modules alongside the existing API. Users opt in by changing their imports. The existing modules are unchanged.

persistent core

Module Contents
Database.Persist.Sql Unchanged. selectList, get, rawSql, etc. continue to use [PersistValue].
Database.Persist.Sql.Experimental New. Re-exports everything from Database.Persist.Sql, plus direct-path variants with additional FromRow/ToRow constraints. Same names, same signatures — just extra constraints.

The experimental module provides versions of the standard operations that take the direct path when the constraints are satisfied:

Note on Standard Operations:

As of this RFC, only raw query functions are implemented (rawQueryDirect and rawSqlDirect). The higher-level operations listed above (selectList, get, insertMany_) are planned but not yet implemented. These would be implemented in experimental modules to give users typed, zero-allocation versions of core persistent operations.

Users switch by changing one import:

-- Before:
import Database.Persist.Sql

-- After (direct path, zero PersistValue):
import Database.Persist.Sql.Experimental

All existing code continues to work because the experimental signatures are strictly more constrained: they accept a subset of the original callers (those whose backend supports direct codecs). If a backend doesn't have HasDirectQuery/FromRow instances, etc., the experimental variants simply won't type-check, and the user stays on the original import.

esqueleto

Module Contents
Database.Esqueleto.Experimental Unchanged.
Database.Esqueleto.Experimental.Direct New. Re-exports everything from Database.Esqueleto.Experimental, plus selectDirect, selectOneDirect, etc. with SqlSelectDirect constraints.
-- Database.Esqueleto.Experimental (unchanged):
select :: (SqlSelect a r, MonadIO m, SqlBackendCanRead backend)
    => SqlQuery a -> ReaderT backend m [r]

-- Database.Esqueleto.Experimental.Direct (new):
select :: ( SqlSelect a r, SqlSelectDirect a r (Env backend)
          , MonadIO m, SqlBackendCanRead backend, HasDirectQuery backend )
    => SqlQuery a -> ReaderT backend m [r]

As long as we provide the same function name, same return type, same query DSL– the only difference is the extra constraints ensuring the direct path is used. The one area where users will run into labor on their end is converting their custom type instances to use the new direct encoding/decoding APIs.

Backend packages

Each backend package that supports direct codecs exports its env type from the primary existing module, as well as its FieldDecode/FieldEncode instances. These don't need their own .Experimental modules: the instances are always available, and they are picked up via normal instance resolution when the user uses the experimental persistent/esqueleto modules.

Graduation path

Once the direct path is proven stable:

  1. The .Experimental signatures move into the main modules. The extra constraints are additive: they narrow the accepted backends but don't change behavior for backends that satisfy them.
  2. Backends that don't support direct codecs continue to work via the [PersistValue] path.
  3. Eventually, PersistValue-based operations could be deprecated in favor of the direct path.

This mirrors esqueleto's migration of its FROM syntax from .Experimental to the recommended default.

Relationship to SqlBackend

A key design tension that I think requires some discussion: much of the persistent ecosystem (esqueleto's rawSelectSource, persistent's default PersistQueryRead implementation, user code using SqlPersistT) projects down to bare SqlBackend, erasing the concrete backend type. The HasDirectQuery backend constraint needs to reduce Env backend to a concrete type, but SqlBackend doesn't correspond to any particular backend, so Env SqlBackend has no meaningful definition in terms of the way we want to use it in this proposal.

I've explored a few approaches, but I think only the first is particularly viable.

Approach 1: Backend-specific types

Users and libraries work with the concrete backend type (WriteBackend PostgreSQLBackend, SqliteBackend, etc.) instead of SqlBackend. The HasDirectQuery constraint resolves naturally.

For most application code, the concrete type appears only in the runner (withPostgresqlPool, withSqliteConn, etc.) and the ReaderT backend m type. Switching from SqlPersistT to a backend-specific ReaderT is a relatively straightforward find-and-replace change.

Approach 2: A direct-codec field on SqlBackend

For code that flows through SqlBackend, and there's a lot of it, we'd ideally bridge the gap by adding a dedicated field that carries the direct codec machinery from whatever backend created the connection. SqlBackend already works this way for other operations: connInsertSql, connPrepare, connInsertManySql, etc. are all function-valued fields that the backend populates at connection setup time, closing over its own concrete types. The direct path would fit naturally into the same pattern:

data SqlBackend = SqlBackend
    { ...existing fields...
    , connDirectCodecs :: Maybe DirectCodecs
    }

The naive idea I had was to use a rank-2 type so the caller passes a polymorphic RowReader and the backend instantiates it at its own concrete env, but it doesn't seem viable in practice:

-- ⚠ THIS DOES NOT TYPECHECK: see below
data DirectCodecs = DirectCodecs
    { dcQueryAndDecode
        :: forall a m. MonadIO m
        => (forall env. FromRow env a => RowReader env a)
        -> Text -> [PersistValue]
        -> Acquire (ConduitM () a m ())
    }

Why the rank-2 approach fails

The argument (forall env. FromRow env a => RowReader env a) is fine from the caller's perspective. the TH-generated rowReader class method has exactly this type (polymorphic in env, constrained by FromRow env a). The caller can provide it.

The problem is on the consumption side. The backend's implementation needs to instantiate env at its concrete type:

mkPgDirectCodecs :: PgConnection -> DirectCodecs
mkPgDirectCodecs conn = DirectCodecs
    { dcQueryAndDecode = \polyReader sql params -> do
        let reader = polyReader @PgRowEnv  -- needs FromRow PgRowEnv a
        src <- pgQuerySource conn sql params
        src .| decodeRowsConduit reader
    }

When the backend writes polyReader @PgRowEnv, GHC needs to discharge the constraint FromRow PgRowEnv a. But a is a rigid type variable from the outer forall a. The backend's lambda must work for all a, and GHC has no instance FromRow PgRowEnv a that covers every possible type. The fact that FieldDecode PgRowEnv Text etc. are "in scope" doesn't help; GHC can't perform instance resolution for FromRow PgRowEnv a without knowing what a is. This produces:

Could not deduce (FromRow PgRowEnv a) arising from a use of 'polyReader'

This is a fundamental tension: you can't erase both env (via the closure in SqlBackend) and a (via forall a) while still relying on typeclass instance resolution to connect them. The dictionary for something like FromRow PgRowEnv User exists at each specific call site, but nobody at the backend's definition site can work with it.

Possible directions

Several alternative encodings were considered, but each has significant drawbacks:

Parameterize DirectCodecs by env: If DirectCodecs env carries the env type, the backend can accept RowReader env a directly. But SqlBackend would need to hold DirectCodecs env for some env, meaning either SqlBackend becomes parameterized (a massive breaking change to the entire ecosystem) or the env is hidden behind an existential... at which point the caller can't construct a RowReader for an env it doesn't know.

Compile the decoder at the call site: The call site knows both a and the backend, so it could resolve all constraints and produce a fully monomorphic decoding function to pass into DirectCodecs. But this requires the call site to know the concrete env type, which means it already has the concrete backend type — and if it has the concrete backend type, it doesn't need the SqlBackend bridge.

Add a FromRow constraint to dcQueryAndDecode: This just pushes the problem around: DirectCodecs is stored in SqlBackend where a isn't in scope, so there's nowhere to put the constraint.

Current recommendation

For now, Approach 2 remains an open problem. The direct codec path works cleanly with concrete backend types (Approach 1). Code that flows through SqlBackend continues to use the PersistValue path, which would remain fully supported.

If a viable encoding for DirectCodecs is found that bridges the SqlBackend gap without parameterizing it, this section will be updated. Potential avenues to explore might include:

  • A reflection-style approach where the backend reifies its env's dictionaries into a runtime value that the caller threads through
  • A Dict-based encoding where the caller provides Dict (FromRow env a) explicitly alongside the reader
  • Restructuring SqlBackend itself to carry a type parameter (a long-term project that defeats most of the attempts at backwards compatibility outlined in this proposal)

In the meantime, the practical impact is limited: most application code already has access to the concrete backend type from the connection runner (withPostgresqlPool, withSqliteConn, etc.), and the direct path is fully available there.

Footnotes

  1. Fixed SQL templates are also a prerequisite for prepared statements, which parse and plan the SQL once and execute it many times. Variable-length SQL strings change with every batch size, so they can never be prepared: every call is a cold parse and plan. Prepared statements in turn enable protocol-level pipelining: sending multiple execute messages to the database without waiting for each response before sending the next (see PostgreSQL's pipeline mode). Pipeline mode doesn't require prepared statements, but unless prepared statements can be run with a consistent set of parameters, you can't use them effectively in pipeline mode since you have to create a huge number of them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions