Skip to content

MONGOID-5684 Ensure hash-typed fields return Hash#6141

Closed
jamis wants to merge 4 commits intomongodb:masterfrom
jamis:5684-hash-returned-as-document
Closed

MONGOID-5684 Ensure hash-typed fields return Hash#6141
jamis wants to merge 4 commits intomongodb:masterfrom
jamis:5684-hash-returned-as-document

Conversation

@jamis
Copy link
Copy Markdown
Contributor

@jamis jamis commented Apr 27, 2026

Fixes MONGOID-5684.

Problem

When a document with a field :name, type: Hash is saved and reloaded, the field value is a BSON::Document rather than a plain Hash:

class Foo
  include Mongoid::Document
  field :feature, type: Hash
end

foo = Foo.create!(feature: { 'x' => 1 })
foo = Foo.find(foo.id)
foo.feature.class  #=> BSON::Document  (should be Hash)

This happens because BSON::Document is a subclass of Hash, so Mongoid's type check passes — but Hash::ClassMethods had no demongoize method, so the call fell through to Object.demongoize, which returns its argument unchanged.

A secondary symptom: after modifying the field, field_was returned a plain Hash whose string keys weren't accessible via symbol — inconsistent with the BSON::Document returned by the getter.

Fix

lib/mongoid/extensions/hash.rb — Add Hash::ClassMethods#demongoize. When legacy_hash_fields is false, it converts BSON::DocumentHash recursively. Plain Hash inputs are returned as the same object (no allocation), so subsequent attribute reads are O(1).

lib/mongoid/attributes.rb — After demongoizing a BSON::Document to a plain Hash, write the plain Hash back into attributes. This ensures in-place mutations on the returned value are visible to dirty tracking. The write-back is scoped precisely to the BSON::DocumentHash conversion and does not affect other field types (Date, Time, BigDecimal, Set, etc.).

lib/mongoid/config.rb — Add the legacy_hash_fields option (default: true).

Feature flag

The fix is opt-in to avoid breaking applications that depend on the current behavior:

Mongoid.configure do |config|
  config.legacy_hash_fields = false  # opt in to correct behavior
end

The default (true) preserves the existing behavior unchanged.

Performance

Hash.demongoize is called on every attribute read. The key question is whether the conversion adds unacceptable overhead.

Scenario 1 — legacy_hash_fields: true (current behavior, BSON::Document returned as-is)

Input Throughput
Empty 7.67M i/s
Flat, 5 keys 7.60M i/s
Nested, 3 levels × 10 keys 7.57M i/s

O(1), no allocation — just a config check and an early return.

Scenario 2 — legacy_hash_fields: false, first read (BSON::Document input, conversion)

Input Throughput vs. legacy
Empty 5.10M i/s 1.5× slower
Flat, 5 keys 1.22M i/s 4.2× slower
Nested, 3 levels × 10 keys 6.87k i/s 742× slower

This is the one-time cost paid when a document is first loaded from the database. For the nested case, 111 Hash objects are allocated and 1,110 nodes are visited per call (145 µs on the benchmark machine). This cost is proportional to the size and depth of the stored value.

Scenario 3 — legacy_hash_fields: false, subsequent reads (plain Hash input)

After the first read, attributes holds the converted plain Hash. Subsequent reads return it as-is:

Input Throughput vs. legacy
Empty 27.2M i/s 3.5× faster
Flat, 5 keys 27.1M i/s 3.6× faster
Nested, 3 levels × 10 keys 27.2M i/s 3.6× faster

O(1), size-independent — two type checks and an early return.

Summary: with legacy_hash_fields: false, each Hash-typed attribute read pays a one-time conversion cost proportional to the hash's size. All subsequent reads of that field are faster than the current legacy path. Applications with large, deeply-nested Hash fields loaded frequently from the database should measure the first-read cost against their own document shapes before enabling the flag.

jamis added 4 commits April 24, 2026 17:06
…vior

Also fix process_raw_attribute to store the demongoized value back into
attributes when demongoize returns a new object (e.g. BSON::Document to
plain Hash). Without this, in-place mutations on the returned Hash would
not be reflected in dirty tracking because the returned object was
decoupled from attributes.
Copilot AI review requested due to automatic review settings April 27, 2026 20:24
@jamis jamis requested a review from a team as a code owner April 27, 2026 20:24
@jamis jamis requested a review from comandeo-mongo April 27, 2026 20:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an opt-in fix for MONGOID-5684 so that field ..., type: Hash returns a plain Hash (instead of BSON::Document) after reloading from the database, while preserving legacy behavior by default via a new configuration flag.

Changes:

  • Add Mongoid.config.legacy_hash_fields (default true) to preserve/disable legacy BSON::Document return behavior for Hash-typed fields.
  • Implement Hash.demongoize to optionally convert BSON::Document to a plain Hash (with recursive conversion).
  • Add specs covering legacy vs opt-in behavior, and add a benchmark script for Hash.demongoize.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
lib/mongoid/extensions/hash.rb Adds Hash.demongoize with feature-flagged conversion from BSON::Document to Hash.
lib/mongoid/attributes.rb Attempts to write back the converted Hash into attributes to support dirty tracking with in-place mutation.
lib/mongoid/config.rb Adds legacy_hash_fields configuration option (default true).
spec/mongoid/extensions/hash_spec.rb Adds unit specs for Hash.demongoize behavior under both config modes.
spec/mongoid/changeable_spec.rb Adds integration specs for reload behavior and dirty tracking semantics under both config modes.
perf/benchmark_hash_demongoize.rb Adds a benchmark script to compare legacy/first-read/subsequent-read performance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/mongoid/attributes.rb
Comment on lines +104 to +108
# When demongoize converts a BSON::Document to a plain Hash (i.e.
# legacy_hash_fields is false), store the plain Hash back into
# attributes so that in-place mutations on the returned value are
# visible to dirty tracking.
attributes[name] = value if raw.is_a?(BSON::Document) && !value.is_a?(BSON::Document)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_raw_attribute writes the demongoized value back into attributes whenever raw is a BSON::Document and the result is not. This is broader than “Hash field BSON::Document → Hash” and can break other field types that legitimately demongoize a BSON document into a non-hash value (e.g. Mongoid::Fields::Localized#demongoize accepts a Hash/BSON::Document of translations but returns a single translation value). In that case this line would overwrite the stored translations hash with a scalar value.

Please restrict the write-back to the intended case (Hash-typed fields only, when legacy_hash_fields is false, and when the demongoized value is a plain Hash).

Suggested change
# When demongoize converts a BSON::Document to a plain Hash (i.e.
# legacy_hash_fields is false), store the plain Hash back into
# attributes so that in-place mutations on the returned value are
# visible to dirty tracking.
attributes[name] = value if raw.is_a?(BSON::Document) && !value.is_a?(BSON::Document)
# When demongoize converts a BSON::Document to a plain Hash for a
# Hash field (i.e. legacy_hash_fields is false), store the plain Hash
# back into attributes so that in-place mutations on the returned
# value are visible to dirty tracking.
if field&.type == Hash && !Mongoid.legacy_hash_fields &&
raw.is_a?(BSON::Document) && value.instance_of?(Hash)
attributes[name] = value
end

Copilot uses AI. Check for mistakes.
Comment on lines +119 to +124
return object unless object.is_a?(BSON::Document) &&
!Mongoid.config.legacy_hash_fields

object.each_with_object({}) { |(k, v), h| h[k] = ::Hash.demongoize(v) }
end

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash.demongoize claims to “recursively” convert nested BSON::Document values, but the current implementation only recurses through hash values and will leave BSON::Document instances that are nested inside arrays unchanged (because arrays are returned as-is). Hash-typed fields can contain arrays of sub-documents, so this can still leak BSON::Document objects into the returned structure when legacy_hash_fields is false.

Consider extending the deep conversion to traverse arrays (and any other container types you want to support), and add a spec that covers a BSON::Document nested inside an array value.

Suggested change
return object unless object.is_a?(BSON::Document) &&
!Mongoid.config.legacy_hash_fields
object.each_with_object({}) { |(k, v), h| h[k] = ::Hash.demongoize(v) }
end
return object if Mongoid.config.legacy_hash_fields
return object unless object.is_a?(BSON::Document)
demongoize_hash_field_value(object)
end
private
def demongoize_hash_field_value(object)
case object
when BSON::Document
object.each_with_object({}) do |(k, v), h|
h[k] = demongoize_hash_field_value(v)
end
when Array
object.map { |value| demongoize_hash_field_value(value) }
else
object
end
end

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +197
Mongoid.config.legacy_hash_fields = false
example.run
ensure
Mongoid.config.legacy_hash_fields = true
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This around block forces Mongoid.config.legacy_hash_fields back to true in the ensure clause, which can leak state if the pre-existing value wasn’t true (e.g. if another spec/context changed it). Please capture the original value before overriding it and restore that value in ensure to keep the spec isolated.

Suggested change
Mongoid.config.legacy_hash_fields = false
example.run
ensure
Mongoid.config.legacy_hash_fields = true
original_legacy_hash_fields = Mongoid.config.legacy_hash_fields
Mongoid.config.legacy_hash_fields = false
example.run
ensure
Mongoid.config.legacy_hash_fields = original_legacy_hash_fields

Copilot uses AI. Check for mistakes.
Comment on lines +271 to +274
Mongoid.config.legacy_hash_fields = false
example.run
ensure
Mongoid.config.legacy_hash_fields = true
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This around block restores Mongoid.config.legacy_hash_fields to true unconditionally. To avoid cross-test leakage if another example/context set a different value, please save the previous config value before setting it to false and restore that saved value in the ensure clause.

Suggested change
Mongoid.config.legacy_hash_fields = false
example.run
ensure
Mongoid.config.legacy_hash_fields = true
previous_legacy_hash_fields = Mongoid.config.legacy_hash_fields
Mongoid.config.legacy_hash_fields = false
example.run
ensure
Mongoid.config.legacy_hash_fields = previous_legacy_hash_fields

Copilot uses AI. Check for mistakes.
@jamis
Copy link
Copy Markdown
Contributor Author

jamis commented Apr 29, 2026

This implementation imposes an unacceptable overhead with the recursive copy. Closing this in favor of #6142, which addresses only the issue raised in the ticket.

@jamis jamis closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants