Skip to content

Conversation

@melton-jason
Copy link
Contributor

@melton-jason melton-jason commented Jan 26, 2026

Replaces #5404
Fixes #4148, #7560
Addresses part of #5337

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list
  • Add automated tests

This branch should functionally be the equivalent of #7455, based on v7.11.3 instead of main.

The Pull Request seeks to address an issue related to the prior autonumbering code, where database transactions could become broken due to a LOCK TABLES statement within transactions:

Because LOCK TABLES implicitly commits any current transactions (see aforementioned MariaDB docs), wouldn't autonumbering implicitly commit the Django transaction?
If this is true, then the transaction state that Django works with in outer transaction.atomic() blocks would be broken/inconsistent: the transaction would already be committed as soon as the tables are locked for autonumbering.

See #6490 (comment)

Currently in main, this autonumbering behavior with the WorkBench can result in a functionally complete lockout of the database, preventing other connections from reading tables like Discipline and Collection until the WorkBench operation finishes.

Below is a video of the issue taken in v7.11.3 :

v7_11_3_wb_issue.mov

Below is with the changes in this branch:

v7__11_3_wb_fix.mov

(Developer-Focused) New Internal Autonumbering Design

Specify now uses transaction-independent User Advisory Locks to determine which session is currently using autonumbering for a particular table. If no other sessions currently have the autonumbering lock for a particular table, a connection will acquire the lock.
If a record attempts to autonumber a field that is being autonumbered by another session (e.g. another session holds the autonumbering lock), Specify will wait up to 10 seconds for the prior connection to release its lock. If the lock is not released by that time, then Specify will error out of the operation.

To maintain autonumbering behavior for concurrent actions where there is a long-running transaction also doing autonumbering (such as when a long-running WorkBench operation is ongoing), Specify uses Redis (for IPC) to store the currently highest autonumbering behavior for a particular formatter.
When a sessions holds an AutoNumbering lock for a particular table, it checks this store for the highest automumber and compares it to the highest value the session can see in the database: using the higher of the two to determine the new highest value.
Thus, even with transactions that have a high Isolation Level, Specify is still internally committing the AutoNumbering values to the store that can be accessed by other sessions.

Roughly, this means that AutoNumbering acts at a similar isolation level to READ UNCOMMITTED within the application.

The core part of the implementation of this design is through the new LockDispatcher class:

class LockDispatcher:
def __init__(self, lock_prefix: str | None = None, case_sensitive_names=False):
db_name = getattr(settings, "DATABASE_NAME")
self.lock_prefix_parts: list[str] = [db_name]
if lock_prefix is not None:
self.lock_prefix_parts.append(lock_prefix)
self.case_sensitive_names = case_sensitive_names
self.locks: dict[str, Lock] = dict()
self.in_context = False
def close(self):
self.release_all()
def __enter__(self):
self.in_context = True
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
self.in_context = False
def lock_name(self, *name_parts: str):
final_name = LOCK_NAME_SEPARATOR.join(
(*self.lock_prefix_parts, *name_parts))
return final_name.lower() if self.case_sensitive_names else final_name
@contextmanager
def lock_and_release(self, name: str, timeout: int = 5):
try:
yield self.acquire(name, timeout)
finally:
self.release(name)
def create_lock(self, name: str, timeout: int = 5):
lock_name = self.lock_name(name)
return Lock(lock_name, timeout)
def acquire(self, name: str, timeout: int = 5):
if self.locks.get(name) is not None:
return
lock = self.create_lock(name, timeout)
self.locks[name] = lock
return lock.acquire()
def release_all(self):
for lock_name in list(self.locks.keys()):
self.release(lock_name)
self.locks = dict()
def release(self, name: str):
lock = self.locks.pop(name, None)
if lock is None:
return
lock.release()

This class can be used as a context manager and handles acquiring and releasing User Advisory Locks:

with LockDispatcher() as lock_dispatcher:
    # acquire a new lock
    lock_dispatcher.acquire("some_lock_name", timeout=10)
    # if another connection already holds a particular lock, Specify will wait timeout
    # seconds for the lock to be released before raising a TimeoutError
    lock_dispatcher.acquire("other_lock", timeout=20)
    # locks can be released explicitly...
    lock_dispatcher.release("some_lock_name")
    # or are automatically released when the context manager code block is exited
    do_more_while_locked()
# all acquired locks are released at this point, regardless of whether an error occurred within 
# the context block
do_something()

The AutonumberingLockDispatcher builds the IPC (Redis) integration required for AutoNumbering on top of the base LockDispatcher class, which handles the database locks.

class AutonumberingLockDispatcher(LockDispatcher):
def __init__(self):
lock_prefix = "autonumbering"
super().__init__(lock_prefix=lock_prefix, case_sensitive_names=False)
# We use Redis for IPC, to maintain the current "highest" autonumbering
# value for each table + field
self.redis = RedisConnection(decode_responses=True)
# Before the records are created within a transaction, they're stored
# locally within this dictonary
# The whole dictonary can be committed to Redis via commit_highest
# The key hierarchy is generally:
# table -> field -> collection = "highest value"
self.highest_in_flight: MutableMapping[str, MutableMapping[str, MutableMapping[int, str]]] = defaultdict(
lambda: defaultdict(lambda: defaultdict(str)))
def __exit__(self, exc_type, exc_val, exc_tb):
super().__exit__(exc_type, exc_val, exc_tb)
def highest_stored_value(self, table_name: str, field_name: str, collection_id: int) -> str | None:
key_name = self.lock_name(
table_name, field_name, "highest", str(collection_id))
highest = RedisString(self.redis).get(key_name)
if isinstance(highest, bytes):
return highest.decode()
elif highest is None:
return None
return str(highest)
def cache_highest(self, table_name: str, field_name: str, collection_id: int, value: str):
self.highest_in_flight[table_name.lower(
)][field_name.lower()][collection_id] = value
def commit_highest(self):
for table_name, tables in self.highest_in_flight.items():
for field_name, fields in tables.items():
for collection, value in fields.items():
self.set_highest_value(
table_name, field_name, collection, value)
self.highest_in_flight.clear()
def set_highest_value(self, table_name: str, field_name: str, collection_id: int, value: str, time_to_live: int = 10):
key_name = self.lock_name(
table_name, field_name, "highest", str(collection_id))
RedisString(self.redis).set(key_name, value,
time_to_live, override_existing=True)

(User-Focused) New Autonumbering Behavior

Below is graph which simply describes the new behavior

flowchart TD
    A[WorkBench 001] --> B[WorkBench 002]
    B l1@-- waiting for DataEntry--> C(.)
    B -- concurrent record saved --> D(DataEntry 003)
    C l2@-- DataEntry finished--> E(WorkBench 004)
    D --> E

    classDef wb stroke:#f00
    classDef de stroke:#00f
    classDef dashed stroke-dasharray: 5 5
    class A,B,C,E wb;
    class D de;
    class l1,l2 dashed
Loading

All AutoNumbering operations are processed serially. This means if there's an ongoing Workbench operation that creates a record that is numbered 01, the next autonumbering value for any other session will be 02.
If the Workbench were to create another record, it would be numbered 03.

Testing instructions

Sample Database

To speed up setup for testing this Pull Request, i've set up a testing database that contains all resources required for testing:
Feel free to use the database!

User - spadmin
Password - testuser

Collections:

  • Fish
    • In the Ichthyology Discipline
    • Catalog Number of the format FSH-#########
    • One CollectionObject WorkBench Data Set
  • Plants
    • In the Botany Discipline
    • Catalog Number of the format #########
    • CO -> text2 of the format AAA-NNN-#########
      • Where A can be any letter and N can be any number
    • Locality -> text1 of the format #########
    • Two Collection Object WorkBench Data Sets
    • One Locality Data Set
  • Vascular Plants
    • In the Botany Discipline
    • CO -> text2 of the format AAA-NNN-#########
      • Where A can be any letter and N can be any number
  • Locality -> text1 of the format #########
  • One Collection Object WorkBench Data Set

issue-6490_testing.sql.zip


General Concurrency with WorkBench

  • Start a WorkBench Upload or Validation operation on a sufficiently large Data Set (the operation needs to be in process while the below steps of General Concurrency are completed)
  • Open Specify in a new tab, window, or browser
  • Open a DataEntry form of the same table as the base table as the WorkBench Data Set, where one or fields have an autonumbering field format
  • Save the Data Entry record and ensure the record saves successfully
  • Reload the page and ensure Specify loads and still remains accessible

Concurrent Autonumbering with the WorkBench

  • Start a WorkBench Upload operation on a sufficiently large Data Set (the operation needs to be in process while some of the below steps of Concurrent Autonumbering with the WorkBench are completed) that contains two or more fields that will be autonumbered
  • Open Specify in a new tab, window, or browser
  • Open a DataEntry form of the same table as the base table as the WorkBench Data Set, where one or fields have an autonumbering field format
  • Save the Data Entry record and ensure the record saves successfully
  • Wait for the WorkBench Upload operation to complete
  • Ensure that the value for the autonumbered field from Data Entry is skipped in the WorkBench upload results.

Concurrent Autonumbering with Non Collection Scoping
Review Table Scoping Hiearchy for an overview on table scopes

With the Sample Database, this would be using the Locality Data Set in the Plants Collection with creating a new Locality in the Vascular Plants Collection

  • Start a WorkBench Upload operation on a sufficiently large Data Set (the operation needs to be in process while some of the below steps of Concurrent Autonumbering with Non Collection Scoping are completed) where the base table is not scoped to Collection that contains one or more fields that will be autonumbered
  • Open Specify in a new tab, window, or browser
  • Switch to a different Collection which is in the same scope as the records being uploaded
  • Open a DataEntry form of the same table as the base table as the WorkBench Data Set where one or more fields have an autonumbering field format
  • Save the Data Entry record and ensure the record saves successfully
  • Wait for the WorkBench Upload operation to complete
  • Ensure that the value for the autonumbered field from Data Entry is skipped in the WorkBench upload results.

General

  • Please also do general testing focused around autonumbering, especially with concurrent WorkBench uploads!

See #6490, #5337
Replaces #5404
Fixes #4148, #7560

This branch is largely the application of #7455 on `v7.11.3`
@grantfitzsimmons grantfitzsimmons added this to the 7.11.4 milestone Jan 28, 2026
@melton-jason melton-jason marked this pull request as ready for review January 29, 2026 18:20
@melton-jason melton-jason requested review from a team January 29, 2026 18:20
@melton-jason
Copy link
Contributor Author

There previously was an issue with the Concurrent Autonumbering with Non Collection Scoping testing step where records scoped higher than Collection would not "share" their expected autonumbering scheme
e.g., when using the Sample Database, autonumbering a Locality -> text1 while there was a concurrent WorkBench Upload also autonumbering Locality -> text1 records would insert a duplicate.

This issue has been fixed!
The concurrent Autonumbering should now always respect the table's scope

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 📋Back Log

Development

Successfully merging this pull request may close these issues.

4 participants