Skip to content

Fix a bug that Gem::YAMLSerializer.load ignores quotation#9552

Open
kou wants to merge 1 commit into
ruby:masterfrom
kou:yaml-quote
Open

Fix a bug that Gem::YAMLSerializer.load ignores quotation#9552
kou wants to merge 1 commit into
ruby:masterfrom
kou:yaml-quote

Conversation

@kou
Copy link
Copy Markdown
Member

@kou kou commented May 18, 2026

Fix #9551

What was the end-user or developer problem that led to this PR?

.gemspec is serialized to YAML (metadata.gz) in .gem. metadata.gz may use double quote for string value like:

requirements:
  - "system: arrow-glib>=25.0.0: amazon_linux: arrow-glib-devel"

Gem::YAMLSerializer.load ignores the quotation and parses it as a map:

{
  "requirements" => [
    {"\"system" => "arrow-glib>=25.0.0: amazon_linux: arrow-glib-devel\""},
  ]
}

What is your fix for the problem, implemented in this PR?

Recognize quotation and parse quoted value as a string value.

Make sure the following tasks are checked

Copilot AI review requested due to automatic review settings May 18, 2026 04:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Gem::YAMLSerializer.load incorrectly treating quoted scalars containing : as mappings (notably within requirements entries in .gemspec YAML), so metadata in .gem files is deserialized correctly.

Changes:

  • Add handling in the YAMLSerializer parser intended to recognize single/double-quoted scalars and parse them as scalars instead of mappings.
  • Add a regression test ensuring quoted requirements entries load as an array of strings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
lib/rubygems/yaml_serializer.rb Adjusts node-type detection to treat quoted lines as scalars to avoid : triggering mapping parsing.
test/rubygems/test_gem_safe_yaml.rb Adds a test case for quoted requirements values deserializing to strings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 96 to 102
if stripped.start_with?("- ") || stripped == "-"
parse_sequence(indent, anchor)
elsif stripped.start_with?("\"") and stripped.end_with?("\"")
parse_plain_scalar(indent, anchor)
elsif stripped.start_with?("'") and stripped.end_with?("'")
parse_plain_scalar(indent, anchor)
elsif stripped =~ MAPPING_KEY_RE && !stripped.start_with?("!ruby/object:")
Comment thread lib/rubygems/yaml_serializer.rb Outdated
Comment on lines +98 to +100
elsif stripped.start_with?("\"") and stripped.end_with?("\"")
parse_plain_scalar(indent, anchor)
elsif stripped.start_with?("'") and stripped.end_with?("'")
Comment on lines +321 to +333
def test_requirement_quote
yaml = <<~YAML
requirements:
- "system: arrow-glib>=25.0.0: amazon_linux: arrow-glib-devel"
- 'system: arrow-glib>=25.0.0: fedora: libarrow-glib-devel'
YAML

assert_equal([
"system: arrow-glib>=25.0.0: amazon_linux: arrow-glib-devel",
"system: arrow-glib>=25.0.0: fedora: libarrow-glib-devel",
],
yaml_load(yaml)["requirements"])
end
@hsbt
Copy link
Copy Markdown
Member

hsbt commented May 18, 2026

Thanks, Should we cover "key": "value" and 'key': 'value' cases?

This change introduce key": "value when "key\": \"value" was presented.

@kou
Copy link
Copy Markdown
Member Author

kou commented May 18, 2026

Oh, I didn't notice the case. We may need to restructure the current parse_node approach to support both cases.

@kou
Copy link
Copy Markdown
Member Author

kou commented May 23, 2026

Should we cover "key": "value" and 'key': 'value' cases?

I checked the current implementation. We don't need to care about this case because:

  • "key": "value" case doesn't happen in the top-level. All top-level keys (!ruby/object:Gem::Specification's keys) are known and they never use "key": "value" form.
  • Non top-level mappings are parsed in parse_mapping not parse_node.

parse_mapping:

def parse_mapping(indent, anchor)
pairs = []
while @lines.any?
line = @lines[0]
stripped = line.lstrip
break unless line.size - stripped.size == indent &&
stripped =~ MAPPING_KEY_RE && !stripped.start_with?("!ruby/object:")
key = $1.strip
@lines.shift
val = strip_comment($2.to_s.strip)
key = decode_binary_tag(key) if key.start_with?("!binary ")
val_anchor, val = consume_value_anchor(val)
value = parse_mapping_value(val, indent)
value = register_anchor(val_anchor, value) if val_anchor
pairs << [Scalar.new(value: key), value]
end
register_anchor(anchor, Mapping.new(pairs: pairs))
end

parse_mapping doesn't use parse_node to parse a key. So the added conditions aren't used.

BTW, users can specify mapping only in metadata. I found some problems in metadata. See #9560 for details.

`"a: b"` must be processed as a string value (`a: b`) not a map
value (`{"a" => "b"}`).
@kou
Copy link
Copy Markdown
Member Author

kou commented May 23, 2026

I've added a comment for the note.

@ruby ruby deleted a comment from nphi05026-ship-it May 23, 2026
@kou kou requested a review from Copilot May 23, 2026 21:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment on lines +98 to 118
elsif stripped.start_with?("\"") && stripped.end_with?("\"")
# We don't need to care about the following case here:
# 1. "value with comment" # ...
# 2. "key": "value"
#
# 1. must not happen because YAMLSerializer doesn't emit any
# comment. YAMLSerializer parses only YAML that is generated
# by YAMLSerializer.
#
# 2. must not happen because #parse_node isn't used non
# top-level mapping. Non top-level mapping always uses
# #parse_mapping. Top-level mapping never use the '"key":
# "value"' form because all top-level keys
# ("!ruby/object:Gem::Specification"'s keys) are known and
# #emit_specification doesn't quote anything.
parse_plain_scalar(indent, anchor)
elsif stripped.start_with?("'") && stripped.end_with?("'")
# See also the above note for double quotation.
parse_plain_scalar(indent, anchor)
elsif stripped =~ MAPPING_KEY_RE && !stripped.start_with?("!ruby/object:")
parse_mapping(indent, anchor)
Comment on lines +107 to +112
# 2. must not happen because #parse_node isn't used non
# top-level mapping. Non top-level mapping always uses
# #parse_mapping. Top-level mapping never use the '"key":
# "value"' form because all top-level keys
# ("!ruby/object:Gem::Specification"'s keys) are known and
# #emit_specification doesn't quote anything.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gem::YAMLSerializer.load doesn't recognize double quote

3 participants