Under which category would you file this issue?
Airflow Core
Apache Airflow version
3.2.0
What happened and how to reproduce it?
Description
After upgrading from Airflow 3.1.7 → 3.2.0, we are consistently observing MySQL deadlocks between the scheduler and triggerer when processing deferrable tasks.
This did not occur in 3.1.7 under the same workload.
Environment
- Airflow version: 3.2.0
- Previous version (no issue): 3.1.7
- Executor: CeleryExecutor
- DB: MySQL
- Scheduler replicas: 3
- Triggerer: 1 instance
- Workload: heavy use of deferrable operators (sensors / async tasks)
Symptoms
- Scheduler/Trigger crashes or restarts due to DB deadlocks
- Deadlocks consistently involve task_instance table
- System becomes unstable under load
Example deadlock pattern:
UPDATE task_instance
SET updated_at=..., trigger_id=NULL
WHERE task_instance.state != 'deferred'
AND task_instance.trigger_id IS NOT NULL
conflicting with:
UPDATE task_instance
SET state='scheduled', trigger_id=NULL,
next_method='__fail__', next_kwargs=...
WHERE task_instance.state = 'deferred'
AND task_instance.trigger_timeout < now()
Root Cause Analysis
Key observation
In Airflow 3.2.0, both scheduler and triggerer mutate task_instance rows for deferrable tasks:
Triggerer (set-based update)
- Performs bulk UPDATE on deferred tasks that timeout
- Updates:
- state
- trigger_id
- next_method
- next_kwargs
Scheduler (callback-driven updates)
- Processes executor callbacks via:
callback = session.get(Callback, callback_id)
callback.run(session=session)
- Inside callback.run():
- Loads TaskInstance
- Mutates:
- state
- trigger_id
- other fields
Result
Two independent writers:
- Triggerer → bulk UPDATE (set-based)
- Scheduler → row-by-row ORM UPDATE
Both target overlapping task_instance rows.
Why this causes deadlocks
- Both queries scan overlapping row sets (even if predicates are logically disjoint)
- Lock acquisition order differs:
- Triggerer: index scan order
- Scheduler: callback / primary key order
- With multiple scheduler replicas, contention increases significantly
Typical pattern:
Scheduler: locks row A → waits for row B
Triggerer: locks row B → waits for row A
→ DEADLOCK
What you think should happen instead?
- Avoid concurrent writes:
- Scheduler should not mutate task_instance fields owned by triggerer
- Enforce consistent ordering:
- Ensure both components lock rows in deterministic order
- Batch updates:
- Avoid large scans or uncontrolled ORM flushes
- Ownership separation:
- Triggerer handles deferred lifecycle exclusively
- Scheduler only consumes results
Operating System
Ubuntu 22.04.5 LTS
Deployment
None
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
Not Applicable
Kubernetes Version
No response
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
Under which category would you file this issue?
Airflow Core
Apache Airflow version
3.2.0
What happened and how to reproduce it?
Description
After upgrading from Airflow 3.1.7 → 3.2.0, we are consistently observing MySQL deadlocks between the scheduler and triggerer when processing deferrable tasks.
This did not occur in 3.1.7 under the same workload.
Environment
Symptoms
Example deadlock pattern:
conflicting with:
Root Cause Analysis
Key observation
In Airflow 3.2.0, both scheduler and triggerer mutate task_instance rows for deferrable tasks:
Triggerer (set-based update)
Scheduler (callback-driven updates)
Result
Two independent writers:
Both target overlapping
task_instancerows.Why this causes deadlocks
Typical pattern:
What you think should happen instead?
Operating System
Ubuntu 22.04.5 LTS
Deployment
None
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
Not Applicable
Kubernetes Version
No response
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct