Skip to content

Replace reindex API with bulk API and add multi-data-stream support#7

Open
AndersonQ wants to merge 1 commit intoelastic:masterfrom
AndersonQ:copy-instead-of-reindex
Open

Replace reindex API with bulk API and add multi-data-stream support#7
AndersonQ wants to merge 1 commit intoelastic:masterfrom
AndersonQ:copy-instead-of-reindex

Conversation

@AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Mar 11, 2026

When using the test kit some docs were failing because of how the reindex work and hiding the real issue. Thus I modified it to use the bulk as the Elastic Agent does. Also I made some small improvements:

Proposed commit message

Replace reindex API with bulk API and add multi-data-stream support

Switch from the server-side _reindex API to scan + _bulk with "create" actions, mimicking how Elastic Agent sends data. This gives per-document feedback: created, duplicate (409 version conflict), or other errors. Failed and duplicate documents are saved to separate NDJSON files.

Accept a list of data streams instead of a single one; the program now iterates over each data stream, scoping index names per data stream (e.g. tsdb-<data_stream>, tsdb-overwritten-<data_stream>).

Also:
- Read ES credentials from ELASTIC_PACKAGE_ELASTICSEARCH_* env vars with fallback to hardcoded defaults; mask passwords in output
- Fall back to system/certifi CA bundle when elasticsearch_ca_path is not set
- Replace exit(0) with sys.exit(1) for error paths
- Clear time_series_fields between runs to avoid stale state
- Update README with new configuration, examples, and output

Switch from the server-side _reindex API to scan + _bulk with "create"
actions, mimicking how Elastic Agent sends data. This gives per-document
feedback: created, duplicate (409 version conflict), or other errors.
Failed and duplicate documents are saved to separate NDJSON files.

Accept a list of data streams instead of a single one; the program now
iterates over each data stream, scoping index names per data stream
(e.g. tsdb-<data_stream>, tsdb-overwritten-<data_stream>).

Also:
- Read ES credentials from ELASTIC_PACKAGE_ELASTICSEARCH_* env vars
  with fallback to hardcoded defaults; mask passwords in output
- Fall back to system/certifi CA bundle when elasticsearch_ca_path is
  not set
- Replace exit(0) with sys.exit(1) for error paths
- Clear time_series_fields between runs to avoid stale state
- Update README with new configuration, examples, and output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant