Skip to content

Generate JSON schemas for plugins with schema definitions for nested configurations#6814

Merged
dlvenable merged 2 commits intoopensearch-project:mainfrom
dlvenable:json-schema-use-definitions
May 6, 2026
Merged

Generate JSON schemas for plugins with schema definitions for nested configurations#6814
dlvenable merged 2 commits intoopensearch-project:mainfrom
dlvenable:json-schema-use-definitions

Conversation

@dlvenable
Copy link
Copy Markdown
Member

Description

The current schema generation uses object for nested configurations. With the nested configurations, the nested configurations are not referenced. This makes it difficult to auto-generate documentation from the schemas.

This PR adds a new --use_definitions=true configuration to the schema generation that generates a schema instead of object.

Here is how it generates currently for the date processor (some parts are removed to focus on the change):

./gradlew :data-prepper-plugin-schema-cli:run --args='--plugin_type=processor --plugin_names=date'
{
  "$schema" : "https://json-schema.org/draft/2020-12/schema",
  "type" : "object",
  "properties" : {
    "from_time_received" : {
       ...
    },
    "match" : {
      "description" : "This option cannot be defined at the same time as <code>from_time_received</code>. The date processor will use the first pattern that matches each event's timestamp field. You must provide at least one pattern unless you have <code>from_time_received</code>.",
      "type" : "array",
      "items" : {
        "type" : "object",
        "properties" : {
          "key" : {
            "type" : "string",
            "description" : "Represents the event key against which to match patterns. Required if <code>match</code> is configured."
          },
          "patterns" : {
            "description" : "A list of possible patterns that the timestamp value of the key can have. The patterns are based on a sequence of letters and symbols. The <code>patterns</code> support all the patterns listed in the Java DateTimeFormatter (https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) reference. To match ISO 8601 formatted strings, use, <code>yyyy-MM-dd'T'HH:mm:ss.SSSXXX</code>. To match Apache Common Log Format, use <code>dd/MMM/yyyy:HH:mm:ss Z</code>. The timestamp value also supports <code>epoch_second</code>, <code>epoch_milli</code>, and <code>epoch_nano</code> values, which represent the timestamp as the number of seconds, milliseconds, and nanoseconds since the epoch. Epoch values always use the UTC time zone.",
            "examples" : [ {
              "description" : "Matches ISO-8601 formatted strings.",
              "example" : "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
            }, {
              "description" : "Matches Apache Common Log Format.",
              "example" : "dd/MMM/yyyy:HH:mm:ss Z"
            }, {
              "description" : "Matches against strings that represent seconds since Unix epoch time.",
              "example" : "epoch_second"
            } ],
            "type" : "array",
            "items" : {
              "type" : "string"
            }
          }
        }
      }
    },
    "destination" : {
      ...
    },
...

With this change, we get this output instead (again, some parts removed):

./gradlew :data-prepper-plugin-schema-cli:run --args='--plugin_type=processor --plugin_names=date --use_definitions=true'
{
  "$schema" : "https://json-schema.org/draft/2020-12/schema",
  "$defs" : {
    "DateMatch" : {
      "type" : "object",
      "properties" : {
        "key" : {
          "type" : "string",
          "description" : "Represents the event key against which to match patterns. Required if <code>match</code> is configured."
        },
        "patterns" : {
          "description" : "A list of possible patterns that the timestamp value of the key can have. The patterns are based on a sequence of letters and symbols. The <code>patterns</code> support all the patterns listed in the Java DateTimeFormatter (https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) reference. To match ISO 8601 formatted strings, use, <code>yyyy-MM-dd'T'HH:mm:ss.SSSXXX</code>. To match Apache Common Log Format, use <code>dd/MMM/yyyy:HH:mm:ss Z</code>. The timestamp value also supports <code>epoch_second</code>, <code>epoch_milli</code>, and <code>epoch_nano</code> values, which represent the timestamp as the number of seconds, milliseconds, and nanoseconds since the epoch. Epoch values always use the UTC time zone.",
          "examples" : [ {
            "description" : "Matches ISO-8601 formatted strings.",
            "example" : "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
          }, {
            "description" : "Matches Apache Common Log Format.",
            "example" : "dd/MMM/yyyy:HH:mm:ss Z"
          }, {
            "description" : "Matches against strings that represent seconds since Unix epoch time.",
            "example" : "epoch_second"
          } ],
          "type" : "array",
          "items" : {
            "type" : "string"
          }
        }
      }
    }
  },
  "type" : "object",
  "properties" : {
    "from_time_received" : {
      ...
    },
    "match" : {
      "description" : "This option cannot be defined at the same time as <code>from_time_received</code>. The date processor will use the first pattern that matches each event's timestamp field. You must provide at least one pattern unless you have <code>from_time_received</code>.",
      "type" : "array",
      "items" : {
        "$ref" : "#/$defs/DateMatch"
      }
    },
    "destination" : {
      ...
    },
...

Issues Resolved

N/A

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

dlvenable added 2 commits May 1, 2026 16:17
…es instead of using object.

Signed-off-by: David Venable <dlv@amazon.com>
Signed-off-by: David Venable <dlv@amazon.com>
package org.opensearch.dataprepper.schemas;

public class JsonSchemaConverterConfig {
private final boolean useDefinitions;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class might be redundant? If there's no plan to add more options, just passing boolean useDefinitions directly to convertIntoJsonSchema would be simpler.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to keep changing the method parameters as the options changed and this was no additional work.

@dlvenable dlvenable merged commit 10cc810 into opensearch-project:main May 6, 2026
69 of 72 checks passed
@dlvenable dlvenable deleted the json-schema-use-definitions branch May 6, 2026 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants