This document provides an overview of OpenSearch syntax, including querying data, filtering results, and interacting with OpenSearch using SQL, Piped Processing Language (PPL), and DSL queries. The primary focus is on querying the comments index to retrieve and analyze public comments from different dockets.
OpenSearch supports SQL queries through the _plugins/_sql endpoint, which allows users to write queries in SQL syntax for easy data retrieval.
POST _plugins/_sql
{
"query": "SELECT * FROM comments LIMIT 50"
}POST _plugins/_sql
{
"query": "SELECT * FROM comments WHERE docketId = 'FDA-2024-XYZ' LIMIT 20"
}POST _plugins/_sql
{
"query": "SELECT docketId, COUNT(*) AS comment_count FROM comments GROUP BY docketId"
}PPL allows querying in a pipeline (|) format, making it intuitive for data filtering and aggregation.
POST _plugins/_ppl
{
"query": "source=comments | fields docketId, comment | head 50"
}POST _plugins/_ppl
{
"query": "source=comments | where comment like 'drug' | fields docketId, comment"
}OpenSearch also supports querying via JSON-based DSL, offering more flexibility for complex queries.
POST comments/_search
{
"query": {
"match": {
"comment": "drug"
}
}
}POST comments/_search
{
"size": 0,
"aggs": {
"docket_counts": {
"terms": {
"field": "docketId.keyword",
"size": 100
},
"aggs": {
"matching_comments": {
"filter": {
"match": {
"comment": "drug"
}
}
}
}
}
}
}To retrieve large datasets, OpenSearch allows pagination using LIMIT and OFFSET.
POST _plugins/_sql
{
"query": "SELECT * FROM comments LIMIT 50 OFFSET 100"
}For deep pagination, search_after is recommended instead of from and size to optimize performance.
POST comments/_search
{
"size": 50,
"query": {
"match_all": {}
},
"search_after": ["last_sort_value"]
}Indexes in OpenSearch act as containers for structured data storage.
PUT comments
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}DELETE commentsGET _cat/indices?vDocuments in OpenSearch represent individual records within an index.
POST comments/_doc/
{
"comment": "This is an important regulation.",
"docketId": "EPA-2025-XYZ",
"author": "John Doe",
"modifyDate": "2025-02-04"
}POST comments/_update/1
{
"doc": {
"comment": "Updated comment text."
}
}DELETE comments/_doc/1POST comments/_delete_by_query
{
"query": {
"match": { "docketId": "FDA-2024-XYZ" }
}
}- Endpoint:
https://localhost:9200 - Authentication: Username:
admin, Password:$OPENSEARCH_INITIAL_PASSWORD - Example Query via
curl:
curl -XPOST https://localhost:9200/comments/_search -u admin:$OPENSEARCH_INITIAL_PASSWORD -H "Content-Type: application/json" -d '{"query":{"match_all":{}}}'- Endpoint:
https://<your-opensearch-endpoint> - Authentication: Uses AWS IAM Role-based authentication
- Example Query via Python (
opensearch-py):
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3
host = "<your-opensearch-endpoint>"
region = "us-east-1"
credentials = boto3.Session().get_credentials()
auth = AWS4Auth(credentials.access_key, credentials.secret_key, region, "aoss", session_token=credentials.token)
client = OpenSearch(
hosts=[{'host': host, 'port': 443}],
http_auth=auth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection,
)
response = client.search(index="comments", body={"query": {"match_all": {}}})
print(response)- OpenSearch supports SQL queries via
_plugins/_sql, PPL via_plugins/_ppl, and JSON-based DSL queries. - Pagination can be handled via
LIMIT,OFFSET, andsearch_afterfor deep pagination. - Index management includes creating, deleting, and listing indexes.
- Document management allows inserting, updating, and deleting records efficiently.
- Authentication differs for local (Docker) vs cloud (AWS OpenSearch Serverless).