Skip to content

RFC: Extend encryption / decryption to field level #7963

@walmsles

Description

@walmsles

Is this related to an existing feature request or issue?

No response

Which Powertools for AWS Lambda (Python) utility does this relate to?

Other

Summary

The data_masking utility in powertools encrypts/decrypts the entire data body, so you get a single encrypted data blob. There are times when you want to apply encryption/decryption to the actual fields within a payload rather than encrypt the whole thing.

I intend to raise an RFC for a TypeScript implementation and am also happy to take this on.

Use case

A real use case for this is ensuring the security of PII data stored during checkpointing of Durable function outputs. The existing mechanism there also serialises/deserialises the entire payload rather than specific fields, and the encrypted data blob is visible in the console (with no structure or other field pointers to assist debugging or investigation of issues).

I would like to have encryption/decryption at the field level, while preserving the payload structure. This can assist with debugging durable function executions, where the payload remains visible (including its structure), which is helpful. This also enables the selective encryption of data fields in the payload.

Proposal

The approach I am putting forward is to not encrypt each field of data individually, which would be computationally expensive, but instead, to replace each field with an encryption placeholder: "secret_field": { "__encrypted": "my.secret_field"} and then include ALL the encrypted fields as a single blob at the end of the data payload whcih is encrypted using a single encryption function call. Following is an example showing the original, encrypted and decrypted payloads:

Original Payload:

{
  "customer": {
    "name": "John",
    "ssn": "123-45-6789",
    "creditCard": "4111-1111-1111-1111"
  }
}

After encrypt(data, ['customer.ssn', 'customer.creditCard']):

{
  "customer": {
    "name": "John",
    "ssn": { "__encrypted": "customer.ssn" },
    "creditCard": { "__encrypted": "customer.creditCard" }
  },
  "__powertools_encrypted_data": "AQICAHh8s0D5ZXJzaW9uIjogIjEuMCIsICJhbGdvcml0aG0iOiAiQUVTL0dDTS9Ob1BhZGRpbmciLCAiY2lwaGVydGV4dCI6ICJ...",
  "__powertools_encryption_context": {
    "purpose": "field-encryption"
  }
}

Within the encrypted_data field is json structure of fields to be encrypted:

{
  "customer.ssn": "123-45-6789",
  "customer.creditCard": "4111-1111-1111-1111"
}

Usage Examples

Basic Field Encryption

from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AWSEncryptionSDKProvider

provider = AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])
data_masker = DataMasking(provider=provider)

def lambda_handler(event, context):
    order_data = {
        "orderId": "12345",
        "customer": {
            "name": "John Doe",
            "email": "john@example.com",
            "ssn": "123-45-6789"
        },
        "payment": {
            "creditCard": "4111-1111-1111-1111",
            "amount": 99.99
        }
    }
    
    # Encrypt sensitive fields only
    encrypted = data_masker.encrypt(
        order_data,
        fields=["customer.ssn", "payment.creditCard"],
        tenant_id="acme-corp"
    )
    
    # Store encrypted payload (orderId and amount visible for queries)
    dynamodb.put_item(Item=encrypted)

Processing with Partial Visibility

def process_order(encrypted_order):
    # Can query/filter by non-encrypted fields without decryption
    if encrypted_order["payment"]["amount"] > 100:
        send_fraud_alert(encrypted_order["orderId"])
    
    # Only decrypt when actually needed
    decrypted = data_masker.decrypt(encrypted_order)
    charge_credit_card(decrypted["payment"]["creditCard"])

Multi-Tenant Data Isolation

def store_customer_data(tenant_id, customer_data):
    # Encryption context binds data to tenant
    encrypted = data_masker.encrypt(
        customer_data,
        fields=["ssn", "dob", "medicalRecords"],
        tenant_id=tenant_id,
        data_classification="pii"
    )
    
    return encrypted

def retrieve_customer_data(encrypted_data):
    # AWS Encryption SDK validates tenant_id automatically
    # Decrypt fails if context doesn't match
    decrypted = data_masker.decrypt(encrypted_data)
    return decrypted

Durable Function Checkpointing

Need to implement an integration class for Durable Function checkpointing which should form part of what is implemented in the Powertools library.

from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AWSEncryptionSDKProvider
from aws_durable_execution_sdk_python import DurableContext, durable_execution
from aws_durable_execution_sdk_python.config import StepConfig
import os
import json

# Initialize once at module level - cached across invocations
KMS_KEY_ARN = os.getenv("KMS_KEY_ARN")

class EncryptedFieldsSerDes:
    def __init__(self, encrypted_fields: list[str], kms_key_arn: str):
        provider = AWSEncryptionSDKProvider(keys=[kms_key_arn])
        self.data_masker = DataMasking(provider=provider)
        self.encrypted_fields = encrypted_fields
    
    def serialize(self, value: dict, context) -> str:
        encrypted = self.data_masker.encrypt(
            value,
            fields=self.encrypted_fields,
            workflow_id=context.operation_id
        )
        return json.dumps(encrypted)
    
    def deserialize(self, data: str, context) -> dict:
        encrypted = json.loads(data)
        return self.data_masker.decrypt(encrypted)

# Create SerDes instance at module level - reused across invocations
payment_serdes = EncryptedFieldsSerDes(
    encrypted_fields=["customer.creditCard", "customer.ssn"],
    kms_key_arn=KMS_KEY_ARN
)

@durable_execution
def handler(event: dict, context: DurableContext):
    # Reuse cached SerDes instance
    result = context.step(
        lambda _: process_payment(event),
        name="process_payment",
        config=StepConfig(serdes=payment_serdes)
    )
    return result

Debugging and Observability

# Console/logs show structure without exposing PII
logger.info("Processing order", extra={
    "encrypted_order": encrypted_order
})

# Output in logs:
# {
#   "orderId": "12345",
#   "customer": {
#     "name": "John Doe",
#     "ssn": {"__encrypted": "customer.ssn"}
#   },
#   "payment": {"amount": 99.99},
#   "__powertools_encrypted_data": "AQICAHh8s0D5...",
#   "__powertools_encryption_context": {"tenant_id": "acme-corp"}
# }

# Structure visible for debugging
# Non-sensitive fields queryable
# Encrypted blob present but not readable without KMS access

Out of scope

Nothing considered

Potential challenges

The outcome of encryption will change the payload structure - but that is also what encryption is all about.

It is a potential challenge for customers, but is transparently handled by the decrypt function.

Dependencies and Integrations

No response

Alternative solutions

Acknowledgment

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCtriagePending triage from maintainers

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions