diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 370b18d..115af6a 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -61,8 +61,7 @@ jobs: - name: Build and Package on ${{ matrix.java }} run: | chmod 777 ./mvnw - ./mvnw -B clean package -DskipTests -DskipITs \ - -Dspotless.skip=true \ + ./mvnw -B clean verify -Dspotless.skip=true result: name: Build diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 4f45ba5..cfed833 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -19,7 +19,7 @@ jobs: cache: maven - name: Build Release Package - run: mvn clean package -DskipTests -Prelease -B + run: ./mvnw clean verify -Prelease -B - name: Create GitHub Release uses: softprops/action-gh-release@v2 diff --git a/consilens-ai/README.md b/consilens-ai/README.md index 110a788..91e956d 100644 --- a/consilens-ai/README.md +++ b/consilens-ai/README.md @@ -28,6 +28,43 @@ mvn clean install -pl consilens-ai -am ### Basic Usage +Production-oriented CLI entrypoint: + +```bash +consilens ai config "compare mysql users with postgresql users by id" \ + --no-llm \ + --source-type mysql \ + --source-url jdbc:mysql://localhost:3306/mydb \ + --source-table users \ + --source-user-env MYSQL_USER \ + --source-password-env MYSQL_PASSWORD \ + --target-type postgresql \ + --target-url jdbc:postgresql://localhost:5432/mydb \ + --target-table users \ + --target-user-env PG_USER \ + --target-password-env PG_PASSWORD \ + --keys id \ + --fields name,email,status \ + --output diff.yaml + +consilens ai explain -c diff.yaml +consilens diff --dry-run -c diff.yaml +consilens diff -c diff.yaml +consilens ai diagnose --result diff-records.json --analyzer rulebased --output diagnose.md +consilens ai providers +consilens ai providers --format json +consilens ai doctor --format json +``` + +The CLI path generates canonical Consilens YAML and validates it with the existing engine model. AI does not execute a real diff directly. +`ai diagnose` reads row-level diff evidence from a `json` `diff-record` sink; stats-only result files are not enough for pattern analysis. +The analyzer is loaded via SPI. Use `--analyzer ` or `CONSILENS_AI_ANALYZER`; the default is `rulebased`. +Use `--output` to persist the diagnosis report; otherwise it is printed to stdout. +Use `ai providers` to verify which analyzer and LLM backend plugins are visible on the runtime classpath; `--format json` is available for CI checks and scripts. +Use `ai doctor` as a production preflight check for SPI discovery, selected analyzer/backend wiring and required API key configuration. It is offline by default; add `--online` only when the deployment environment should verify backend reachability. + +SDK/chat usage: + ```java // Initialize components SessionContext session = SessionContext.builder() @@ -71,7 +108,9 @@ consilens-ai/ │ ├── consilens-ai-llm-api/ │ └── consilens-ai-llm-plugins/ │ ├── consilens-ai-llm-noop/ -│ └── consilens-ai-llm-ollama/ +│ ├── consilens-ai-llm-ollama/ +│ ├── consilens-ai-llm-openai/ +│ └── consilens-ai-llm-deepseek/ └── consilens-ai-tool/ # Tool system ├── consilens-ai-tool-api/ └── consilens-ai-tool-plugins/consilens-ai-tool-defaults/ @@ -82,6 +121,8 @@ consilens-ai/ ### DiffTool Compares two database tables via JDBC and identifies all differences. +This tool is intended for SDK/demo usage. Production CLI flows should generate a YAML config and execute through `DiffService` / `DefaultCompareRuntime`. + **Example**: "Compare the orders table between production and staging" ### AnalyzeTool @@ -124,6 +165,16 @@ Configure Ollama (local LLM): LLMBackend backend = new OllamaBackend("http://localhost:11434"); ``` +Configure OpenAI: +```java +LLMBackend backend = new OpenAIBackend("https://api.openai.com/v1", "gpt-4.1-mini", System.getenv("OPENAI_API_KEY")); +``` + +Configure DeepSeek: +```java +LLMBackend backend = new DeepSeekBackend("https://api.deepseek.com", "deepseek-chat", System.getenv("DEEPSEEK_API_KEY")); +``` + Or use no-op backend (fallback to rule-based): ```java LLMBackend backend = new NoopBackend(); diff --git a/consilens-ai/USAGE.md b/consilens-ai/USAGE.md index c756691..c380c54 100644 --- a/consilens-ai/USAGE.md +++ b/consilens-ai/USAGE.md @@ -2,6 +2,54 @@ ## Quick Start +### Production CLI Flow + +```bash +consilens ai config "compare users from mysql to postgresql by id" \ + --no-llm \ + --source-type mysql \ + --source-url jdbc:mysql://localhost:3306/mydb \ + --source-table users \ + --source-user-env MYSQL_USER \ + --source-password-env MYSQL_PASSWORD \ + --target-type postgresql \ + --target-url jdbc:postgresql://localhost:5432/mydb \ + --target-table users \ + --target-user-env PG_USER \ + --target-password-env PG_PASSWORD \ + --keys id \ + --fields name,email,status \ + --output diff.yaml + +consilens ai explain -c diff.yaml +consilens diff --dry-run -c diff.yaml +consilens diff -c diff.yaml +consilens ai diagnose --result diff-records.json --analyzer rulebased --output diagnose.md +consilens ai providers +consilens ai providers --format json +consilens ai doctor --format json +``` + +For cloud LLMs, set `--backend openai` with `OPENAI_API_KEY`, or `--backend deepseek` with `DEEPSEEK_API_KEY`. `CONSILENS_AI_BACKEND`, `CONSILENS_AI_MODEL`, `CONSILENS_AI_BASE_URL` and `CONSILENS_AI_TIMEOUT` can provide environment defaults. The AI command produces structured configuration; real diff execution still goes through the existing deterministic engine. + +`ai diagnose` requires diff evidence, not only summary statistics. Configure a `json` `diff-record` sink before running `consilens diff`: +The analyzer is selected via SPI with `--analyzer ` or `CONSILENS_AI_ANALYZER`; default: `rulebased`. +Use `--output` to write the report to a file; omit it to print to stdout. +Use `ai providers` to verify discovered analyzer and LLM backend providers before enabling a production task. Add `--format json` for CI checks and scripts. +Use `ai doctor` for production preflight checks. It verifies provider discovery, selected analyzer/backend creation and required API key configuration without network calls by default; add `--online` to verify backend reachability. + +```yaml +result: + sinks: + - format: console + type: result + - format: json + type: diff-record + properties: + path: ./diff-records.json + pretty: true +``` + ### Basic Conversation ```java @@ -32,6 +80,22 @@ String response = engine.chat( System.out.println(response); ``` +Cloud backend examples: + +```java +LLMBackend openai = new OpenAIBackend( + "https://api.openai.com/v1", + "gpt-4.1-mini", + System.getenv("OPENAI_API_KEY") +); + +LLMBackend deepseek = new DeepSeekBackend( + "https://api.deepseek.com", + "deepseek-chat", + System.getenv("DEEPSEEK_API_KEY") +); +``` + ## Common Use Cases ### 1. Compare Two Database Tables @@ -39,13 +103,15 @@ System.out.println(response); ``` User: "Compare the 'users' table between production and staging" -Response: The AI will: +SDK/demo response: The AI will: 1. Ask for connection details (URLs, credentials) 2. Execute the diff using DiffTool 3. Report the number of differences found 4. Store the diff result for further analysis ``` +For production CLI usage, prefer `consilens ai config` followed by `consilens diff --dry-run` and `consilens diff`. + **Tool Input Schema** (DiffTool): ```json { @@ -78,6 +144,14 @@ Response: The AI will: 4. Provide explanations and recommendations ``` +Production CLI: + +```bash +consilens ai diagnose --result diff-records.json --analyzer rulebased --output diagnose.md +``` + +The input must be either a JSON array of diff records or an object containing a `differences` array. A stats-only `result` JSON file is rejected because it does not contain row-level evidence. + ### 3. Generate Repair SQL ``` @@ -298,6 +372,8 @@ The system validates and sanitizes all inputs: The system is running in fallback mode. Either: - Start an Ollama server: `ollama serve` - Configure an OllamaBackend with correct URL +- Set `OPENAI_API_KEY` and configure OpenAIBackend +- Set `DEEPSEEK_API_KEY` and configure DeepSeekBackend - Or intentionally use NoopBackend for rule-based only ### "Unknown tool: consilens_diff" diff --git a/consilens-ai/consilens-ai-analyzer/consilens-ai-analyzer-api/src/main/java/com/consilens/ai/spi/AIAnalyzerManager.java b/consilens-ai/consilens-ai-analyzer/consilens-ai-analyzer-api/src/main/java/com/consilens/ai/spi/AIAnalyzerManager.java index d856d00..b2eef34 100644 --- a/consilens-ai/consilens-ai-analyzer/consilens-ai-analyzer-api/src/main/java/com/consilens/ai/spi/AIAnalyzerManager.java +++ b/consilens-ai/consilens-ai-analyzer/consilens-ai-analyzer-api/src/main/java/com/consilens/ai/spi/AIAnalyzerManager.java @@ -3,6 +3,7 @@ import com.consilens.spi.PluginManager; import java.util.Map; +import java.util.Set; /** * Manages {@link AIAnalyzer} instances loaded via SPI. @@ -49,4 +50,11 @@ public AIAnalyzer create(String name) { public AIAnalyzer create(String name, Map config) { return pluginManager.create(name, config); } + + /** + * Returns analyzer provider names discovered from the classpath. + */ + public Set supportedNames() { + return pluginManager.getSupportedKeys(); + } } diff --git a/consilens-ai/consilens-ai-core/pom.xml b/consilens-ai/consilens-ai-core/pom.xml index b669635..379f2c9 100644 --- a/consilens-ai/consilens-ai-core/pom.xml +++ b/consilens-ai/consilens-ai-core/pom.xml @@ -34,5 +34,11 @@ com.fasterxml.jackson.core jackson-databind + + org.junit.jupiter + junit-jupiter + ${junit.version} + test + diff --git a/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/AIConfigDraftValidator.java b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/AIConfigDraftValidator.java new file mode 100644 index 0000000..abfddc4 --- /dev/null +++ b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/AIConfigDraftValidator.java @@ -0,0 +1,120 @@ +package com.consilens.ai.config; + +import com.consilens.ai.config.model.AIConfigDraft; +import com.consilens.ai.config.model.AIConfigIssue; +import com.consilens.ai.config.model.DatasetDraft; +import com.consilens.ai.config.model.MappingDraft; + +import java.util.ArrayList; +import java.util.List; + +/** + * Validates the minimal structured draft accepted from AI or CLI hints. + */ +public class AIConfigDraftValidator { + + public List validate(AIConfigDraft draft) { + List issues = new ArrayList<>(); + if (draft == null) { + issues.add(error("", "AI_CONFIG_DRAFT_MISSING", "AI config draft is required")); + return issues; + } + validateDataset("source", draft.getSource(), issues); + validateDataset("target", draft.getTarget(), issues); + validateMapping(draft.getMapping(), issues); + return issues; + } + + public boolean hasErrors(List issues) { + return issues != null && issues.stream() + .anyMatch(issue -> issue.getSeverity() == AIConfigIssue.Severity.ERROR); + } + + private void validateDataset(String side, DatasetDraft dataset, List issues) { + if (dataset == null) { + issues.add(error(side, "AI_CONFIG_" + side.toUpperCase() + "_MISSING", side + " dataset is required")); + return; + } + require(dataset.getType(), side + ".type", "AI_CONFIG_DATASET_TYPE_MISSING", + side + " dataset type is required", issues); + require(dataset.getJdbcUrl(), side + ".jdbcUrl", "AI_CONFIG_JDBC_URL_MISSING", + side + " JDBC URL is required", issues); + if (!blank(dataset.getJdbcUrl()) && !dataset.getJdbcUrl().startsWith("jdbc:")) { + issues.add(error(side + ".jdbcUrl", "AI_CONFIG_JDBC_URL_INVALID", + side + " JDBC URL must start with jdbc:")); + } + String resourceType = blank(dataset.getResourceType()) ? "table" : dataset.getResourceType().trim(); + if (!"table".equalsIgnoreCase(resourceType) && !"sql".equalsIgnoreCase(resourceType)) { + issues.add(error(side + ".resourceType", "AI_CONFIG_UNSUPPORTED_RESOURCE_TYPE", + side + " resourceType must be table or sql")); + } + if ("sql".equalsIgnoreCase(resourceType)) { + require(dataset.getQuery(), side + ".query", "AI_CONFIG_QUERY_MISSING", + side + " query is required when resourceType=sql", issues); + validateSql(side + ".query", dataset.getQuery(), issues); + } else { + require(dataset.getResourceName(), side + ".resourceName", "AI_CONFIG_RESOURCE_NAME_MISSING", + side + " table name is required", issues); + } + require(dataset.getUsernameEnv(), side + ".usernameEnv", "AI_CONFIG_USERNAME_ENV_MISSING", + side + " username env variable name is required", issues); + require(dataset.getPasswordEnv(), side + ".passwordEnv", "AI_CONFIG_PASSWORD_ENV_MISSING", + side + " password env variable name is required", issues); + } + + private void validateMapping(MappingDraft mapping, List issues) { + if (mapping == null) { + issues.add(error("mapping", "AI_CONFIG_MAPPING_MISSING", "mapping is required")); + return; + } + if (mapping.getSourceKeys() == null || mapping.getSourceKeys().isEmpty() + || mapping.getTargetKeys() == null || mapping.getTargetKeys().isEmpty()) { + issues.add(error("mapping.keys", "AI_CONFIG_KEY_MISSING", + "sourceKeys and targetKeys are required")); + return; + } + if (mapping.getSourceKeys().size() != mapping.getTargetKeys().size()) { + issues.add(error("mapping.keys", "AI_CONFIG_KEY_MAPPING_INVALID", + "sourceKeys and targetKeys must have the same size")); + } + if (mapping.getSourceFields() != null && !mapping.getSourceFields().isEmpty() + && (mapping.getTargetFields() == null + || mapping.getSourceFields().size() != mapping.getTargetFields().size())) { + issues.add(error("mapping.fields", "AI_CONFIG_FIELD_MAPPING_INVALID", + "sourceFields and targetFields must have the same size")); + } + } + + private void validateSql(String path, String sql, List issues) { + if (blank(sql)) { + return; + } + String value = sql.trim(); + if (!(value.regionMatches(true, 0, "select ", 0, 7) + || value.regionMatches(true, 0, "with ", 0, 5))) { + issues.add(error(path, "AI_CONFIG_QUERY_INVALID", "query must start with SELECT or WITH")); + } + if (value.contains(";") || value.contains("--") || value.contains("/*") || value.contains("*/")) { + issues.add(error(path, "AI_CONFIG_QUERY_UNSAFE", "query contains disallowed SQL fragments")); + } + } + + private void require(String value, String path, String code, String message, List issues) { + if (blank(value)) { + issues.add(error(path, code, message)); + } + } + + private boolean blank(String value) { + return value == null || value.trim().isEmpty(); + } + + private AIConfigIssue error(String path, String code, String message) { + return AIConfigIssue.builder() + .severity(AIConfigIssue.Severity.ERROR) + .path(path) + .code(code) + .message(message) + .build(); + } +} diff --git a/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/AIConfigDraft.java b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/AIConfigDraft.java new file mode 100644 index 0000000..b5c6be6 --- /dev/null +++ b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/AIConfigDraft.java @@ -0,0 +1,31 @@ +package com.consilens.ai.config.model; + +import lombok.AllArgsConstructor; +import lombok.Builder; +import lombok.Data; +import lombok.NoArgsConstructor; + +import java.util.ArrayList; +import java.util.List; + +/** + * Structured draft produced by AI or CLI hints before compiling to a Consilens config. + */ +@Data +@Builder +@NoArgsConstructor +@AllArgsConstructor +public class AIConfigDraft { + + private DatasetDraft source; + private DatasetDraft target; + private MappingDraft mapping; + private StrategyDraft strategy; + private ResultDraft result; + + @Builder.Default + private List assumptions = new ArrayList<>(); + + @Builder.Default + private List warnings = new ArrayList<>(); +} diff --git a/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/AIConfigIssue.java b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/AIConfigIssue.java new file mode 100644 index 0000000..46ec5dd --- /dev/null +++ b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/AIConfigIssue.java @@ -0,0 +1,27 @@ +package com.consilens.ai.config.model; + +import lombok.AllArgsConstructor; +import lombok.Builder; +import lombok.Data; +import lombok.NoArgsConstructor; + +/** + * Structured issue found while validating an AI-generated config draft. + */ +@Data +@Builder +@NoArgsConstructor +@AllArgsConstructor +public class AIConfigIssue { + + private Severity severity; + private String path; + private String code; + private String message; + + public enum Severity { + ERROR, + WARNING, + INFO + } +} diff --git a/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/DatasetDraft.java b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/DatasetDraft.java new file mode 100644 index 0000000..4cdb74e --- /dev/null +++ b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/DatasetDraft.java @@ -0,0 +1,25 @@ +package com.consilens.ai.config.model; + +import lombok.AllArgsConstructor; +import lombok.Builder; +import lombok.Data; +import lombok.NoArgsConstructor; + +/** + * Source or target dataset draft for AI-assisted config generation. + */ +@Data +@Builder +@NoArgsConstructor +@AllArgsConstructor +public class DatasetDraft { + + private String type; + private String name; + private String jdbcUrl; + private String usernameEnv; + private String passwordEnv; + private String resourceType; + private String resourceName; + private String query; +} diff --git a/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/MappingDraft.java b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/MappingDraft.java new file mode 100644 index 0000000..4bd1820 --- /dev/null +++ b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/MappingDraft.java @@ -0,0 +1,36 @@ +package com.consilens.ai.config.model; + +import lombok.AllArgsConstructor; +import lombok.Builder; +import lombok.Data; +import lombok.NoArgsConstructor; + +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +/** + * Key and field mapping draft. + */ +@Data +@Builder +@NoArgsConstructor +@AllArgsConstructor +public class MappingDraft { + + @Builder.Default + private List sourceKeys = new ArrayList<>(); + + @Builder.Default + private List targetKeys = new ArrayList<>(); + + @Builder.Default + private List sourceFields = new ArrayList<>(); + + @Builder.Default + private List targetFields = new ArrayList<>(); + + @Builder.Default + private Map fieldMapping = new LinkedHashMap<>(); +} diff --git a/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/ResultDraft.java b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/ResultDraft.java new file mode 100644 index 0000000..5a86273 --- /dev/null +++ b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/ResultDraft.java @@ -0,0 +1,19 @@ +package com.consilens.ai.config.model; + +import lombok.AllArgsConstructor; +import lombok.Builder; +import lombok.Data; +import lombok.NoArgsConstructor; + +/** + * Result output draft. + */ +@Data +@Builder +@NoArgsConstructor +@AllArgsConstructor +public class ResultDraft { + + private String sinkFormat; + private String sinkType; +} diff --git a/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/StrategyDraft.java b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/StrategyDraft.java new file mode 100644 index 0000000..1555087 --- /dev/null +++ b/consilens-ai/consilens-ai-core/src/main/java/com/consilens/ai/config/model/StrategyDraft.java @@ -0,0 +1,23 @@ +package com.consilens.ai.config.model; + +import lombok.AllArgsConstructor; +import lombok.Builder; +import lombok.Data; +import lombok.NoArgsConstructor; + +/** + * Strategy draft for AI-assisted config generation. + */ +@Data +@Builder +@NoArgsConstructor +@AllArgsConstructor +public class StrategyDraft { + + private String mode; + private String algorithm; + private Integer bisectionFactor; + private Long bisectionThreshold; + private Integer batchSize; + private Long maxDifferences; +} diff --git a/consilens-ai/consilens-ai-core/src/test/java/com/consilens/ai/config/AIConfigDraftValidatorTest.java b/consilens-ai/consilens-ai-core/src/test/java/com/consilens/ai/config/AIConfigDraftValidatorTest.java new file mode 100644 index 0000000..e81dee5 --- /dev/null +++ b/consilens-ai/consilens-ai-core/src/test/java/com/consilens/ai/config/AIConfigDraftValidatorTest.java @@ -0,0 +1,70 @@ +package com.consilens.ai.config; + +import com.consilens.ai.config.model.AIConfigDraft; +import com.consilens.ai.config.model.AIConfigIssue; +import com.consilens.ai.config.model.DatasetDraft; +import com.consilens.ai.config.model.MappingDraft; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AIConfigDraftValidatorTest { + + private final AIConfigDraftValidator validator = new AIConfigDraftValidator(); + + @Test + void shouldAcceptMinimalValidDraft() { + List issues = validator.validate(validDraft()); + + assertFalse(validator.hasErrors(issues)); + } + + @Test + void shouldRejectDraftWithoutKeys() { + AIConfigDraft draft = validDraft(); + draft.setMapping(MappingDraft.builder().build()); + + List issues = validator.validate(draft); + + assertTrue(validator.hasErrors(issues)); + assertTrue(issues.stream().anyMatch(issue -> "AI_CONFIG_KEY_MISSING".equals(issue.getCode()))); + } + + @Test + void shouldRejectUnsafeSqlResource() { + AIConfigDraft draft = validDraft(); + draft.getSource().setResourceType("sql"); + draft.getSource().setResourceName(null); + draft.getSource().setQuery("select * from users; drop table users"); + + List issues = validator.validate(draft); + + assertTrue(validator.hasErrors(issues)); + assertTrue(issues.stream().anyMatch(issue -> "AI_CONFIG_QUERY_UNSAFE".equals(issue.getCode()))); + } + + private AIConfigDraft validDraft() { + return AIConfigDraft.builder() + .source(dataset("mysql", "jdbc:mysql://localhost:3306/source", "users")) + .target(dataset("postgresql", "jdbc:postgresql://localhost:5432/target", "users")) + .mapping(MappingDraft.builder() + .sourceKeys(List.of("id")) + .targetKeys(List.of("id")) + .build()) + .build(); + } + + private DatasetDraft dataset(String type, String url, String table) { + return DatasetDraft.builder() + .type(type) + .jdbcUrl(url) + .usernameEnv("DB_USER") + .passwordEnv("DB_PASSWORD") + .resourceType("table") + .resourceName(table) + .build(); + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/pom.xml b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/pom.xml index 49b013e..13cc609 100644 --- a/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/pom.xml +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/pom.xml @@ -22,5 +22,26 @@ com.fasterxml.jackson.core jackson-databind + + com.squareup.okhttp3 + okhttp + + + org.junit.jupiter + junit-jupiter + ${junit.version} + test + + + com.squareup.okhttp3 + mockwebserver + test + + + org.assertj + assertj-core + ${assertj.version} + test + diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/backend/AbstractOpenAICompatibleBackend.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/backend/AbstractOpenAICompatibleBackend.java new file mode 100644 index 0000000..e513aaf --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/backend/AbstractOpenAICompatibleBackend.java @@ -0,0 +1,325 @@ +package com.consilens.ai.backend; + +import com.consilens.ai.http.HttpLLMClient; +import com.consilens.ai.model.BackendInfo; +import com.consilens.ai.model.ChatMessage; +import com.consilens.ai.model.FunctionDefinition; +import com.consilens.ai.model.LLMResponse; +import com.consilens.ai.spi.LLMBackend; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.node.ArrayNode; +import com.fasterxml.jackson.databind.node.ObjectNode; +import lombok.extern.slf4j.Slf4j; + +import java.io.IOException; +import java.time.Duration; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Base backend for OpenAI-compatible chat completion APIs. + */ +@Slf4j +public abstract class AbstractOpenAICompatibleBackend implements LLMBackend { + + private static final String DEFAULT_CHAT_PATH = "/chat/completions"; + + private final String baseUrl; + private final String model; + private final String apiKey; + private final Double temperature; + private final Integer maxTokens; + private final HttpLLMClient httpClient; + private final ObjectMapper objectMapper; + + protected AbstractOpenAICompatibleBackend(String baseUrl, String model, String apiKey) { + this(baseUrl, model, apiKey, null, null, null); + } + + protected AbstractOpenAICompatibleBackend(String baseUrl, String model, String apiKey, + Duration timeout, Double temperature, Integer maxTokens) { + this.baseUrl = normalizeBaseUrl(baseUrl); + this.model = model; + this.apiKey = apiKey; + this.temperature = temperature; + this.maxTokens = maxTokens; + this.httpClient = timeout == null ? new HttpLLMClient() : new HttpLLMClient(timeout); + this.objectMapper = new ObjectMapper(); + } + + protected abstract String backendName(); + + protected String chatPath() { + return DEFAULT_CHAT_PATH; + } + + protected String completionPath() { + return DEFAULT_CHAT_PATH; + } + + protected String healthPath() { + return "/models"; + } + + protected Map headers() { + if (apiKey == null || apiKey.isBlank()) { + throw new IllegalStateException(backendName() + " api key is required"); + } + return Map.of("Authorization", "Bearer " + apiKey); + } + + protected abstract boolean supportsToolCalls(); + + protected abstract boolean supportsStreaming(); + + @Override + public LLMResponse chat(String systemPrompt, List messages, List functions) { + try { + ObjectNode requestBody = objectMapper.createObjectNode(); + requestBody.put("model", model); + requestBody.put("stream", false); + applyGenerationOptions(requestBody); + + ArrayNode messagesArray = objectMapper.createArrayNode(); + if (systemPrompt != null && !systemPrompt.isEmpty()) { + messagesArray.add(buildMessageNode(ChatMessage.Role.SYSTEM, systemPrompt, null, null, null)); + } + if (messages != null) { + for (ChatMessage msg : messages) { + messagesArray.add(buildMessageNode(msg.getRole(), msg.getContent(), msg.getToolCallId(), msg.getToolCalls(), msg.getName())); + } + } + requestBody.set("messages", messagesArray); + + if (supportsToolCalls() && functions != null && !functions.isEmpty()) { + ArrayNode toolsArray = objectMapper.createArrayNode(); + for (FunctionDefinition fd : functions) { + ObjectNode toolNode = objectMapper.createObjectNode(); + toolNode.put("type", "function"); + ObjectNode funcDef = objectMapper.createObjectNode(); + funcDef.put("name", fd.getName()); + if (fd.getDescription() != null) { + funcDef.put("description", fd.getDescription()); + } + if (fd.getParameters() != null) { + funcDef.set("parameters", fd.getParameters()); + } + toolNode.set("function", funcDef); + toolsArray.add(toolNode); + } + requestBody.set("tools", toolsArray); + requestBody.put("tool_choice", "auto"); + } + + JsonNode response = httpClient.post(baseUrl + chatPath(), requestBody, headers()); + return parseResponse(response); + } catch (Exception e) { + log.error("{} chat request failed: {}", backendName(), e.getMessage(), e); + return LLMResponse.builder() + .text("Error communicating with " + backendName() + ": " + e.getMessage()) + .finishReason("error") + .build(); + } + } + + @Override + public String complete(String prompt) { + try { + ObjectNode requestBody = objectMapper.createObjectNode(); + requestBody.put("model", model); + requestBody.put("stream", false); + applyGenerationOptions(requestBody); + + ArrayNode messagesArray = objectMapper.createArrayNode(); + messagesArray.add(buildMessageNode(ChatMessage.Role.USER, prompt, null, null, null)); + requestBody.set("messages", messagesArray); + + JsonNode response = httpClient.post(baseUrl + completionPath(), requestBody, headers()); + return extractText(response); + } catch (Exception e) { + log.error("{} completion request failed: {}", backendName(), e.getMessage(), e); + return "Error: " + e.getMessage(); + } + } + + @Override + public boolean isAvailable() { + try { + return httpClient.isReachable(baseUrl + healthPath(), headers()); + } catch (Exception e) { + return false; + } + } + + @Override + public BackendInfo info() { + return BackendInfo.builder() + .name(backendName()) + .model(model) + .version("1.0") + .supportsFunctionCalling(supportsToolCalls()) + .supportsStreaming(supportsStreaming()) + .build(); + } + + protected LLMResponse parseResponse(JsonNode response) throws IOException { + JsonNode choice = response.path("choices").isArray() && response.path("choices").size() > 0 + ? response.path("choices").get(0) + : response; + JsonNode message = choice.path("message"); + String content = extractMessageContent(choice, message); + String finishReason = choice.path("finish_reason").asText(response.path("finish_reason").asText("stop")); + + List toolCalls = parseToolCalls(message.path("tool_calls")); + LLMResponse.Usage usage = parseUsage(response.path("usage")); + + return LLMResponse.builder() + .text(content) + .toolCalls(toolCalls) + .finishReason(finishReason) + .usage(usage) + .build(); + } + + protected String extractText(JsonNode response) { + JsonNode choice = response.path("choices").isArray() && response.path("choices").size() > 0 + ? response.path("choices").get(0) + : response; + JsonNode message = choice.path("message"); + String content = extractMessageContent(choice, message); + return content != null ? content : response.path("response").asText("No response from " + backendName()); + } + + protected String extractMessageContent(JsonNode choice, JsonNode message) { + if (message.isObject()) { + JsonNode contentNode = message.get("content"); + if (contentNode != null && !contentNode.isNull()) { + return contentNode.asText(null); + } + } + JsonNode contentNode = choice.get("content"); + return contentNode != null && !contentNode.isNull() ? contentNode.asText(null) : null; + } + + private void applyGenerationOptions(ObjectNode requestBody) { + if (temperature != null) { + requestBody.put("temperature", temperature); + } + if (maxTokens != null) { + requestBody.put("max_tokens", maxTokens); + } + } + + private String normalizeBaseUrl(String value) { + if (value == null || value.isBlank()) { + throw new IllegalArgumentException(backendName() + " baseUrl is required"); + } + return value.endsWith("/") ? value.substring(0, value.length() - 1) : value; + } + + private ObjectNode buildMessageNode(ChatMessage.Role role, + String content, + String toolCallId, + List toolCalls, + String name) { + ObjectNode msgNode = objectMapper.createObjectNode(); + msgNode.put("role", role.name().toLowerCase()); + if (content != null) { + msgNode.put("content", content); + } else { + msgNode.putNull("content"); + } + if (name != null && !name.isEmpty()) { + msgNode.put("name", name); + } + if (toolCallId != null && !toolCallId.isEmpty()) { + msgNode.put("tool_call_id", toolCallId); + } + if (toolCalls != null && !toolCalls.isEmpty()) { + ArrayNode toolCallsNode = objectMapper.createArrayNode(); + for (ChatMessage.ToolCall tc : toolCalls) { + ObjectNode tcNode = objectMapper.createObjectNode(); + tcNode.put("id", tc.getId()); + tcNode.put("type", "function"); + ObjectNode funcNode = objectMapper.createObjectNode(); + funcNode.put("name", tc.getName()); + funcNode.put("arguments", serializeArguments(tc.getArguments())); + tcNode.set("function", funcNode); + toolCallsNode.add(tcNode); + } + msgNode.set("tool_calls", toolCallsNode); + } + return msgNode; + } + + private String serializeArguments(Map arguments) { + try { + return objectMapper.writeValueAsString(arguments != null ? arguments : Collections.emptyMap()); + } catch (Exception e) { + return "{}"; + } + } + + private List parseToolCalls(JsonNode toolCallsNode) { + if (!toolCallsNode.isArray() || toolCallsNode.size() == 0) { + return Collections.emptyList(); + } + List toolCalls = new ArrayList<>(); + for (JsonNode tc : toolCallsNode) { + JsonNode func = tc.path("function"); + String id = tc.path("id").asText("tc_" + System.currentTimeMillis()); + String name = func.path("name").asText(null); + Map arguments = new HashMap<>(); + JsonNode argsNode = func.path("arguments"); + if (argsNode.isObject()) { + argsNode.fields().forEachRemaining(e -> arguments.put(e.getKey(), toJavaValue(e.getValue()))); + } else if (argsNode.isTextual()) { + try { + JsonNode parsed = objectMapper.readTree(argsNode.asText()); + parsed.fields().forEachRemaining(e -> arguments.put(e.getKey(), toJavaValue(e.getValue()))); + } catch (Exception ignored) { + } + } + toolCalls.add(ChatMessage.ToolCall.builder() + .id(id) + .name(name) + .arguments(arguments) + .build()); + } + return toolCalls; + } + + private LLMResponse.Usage parseUsage(JsonNode usageNode) { + if (usageNode == null || usageNode.isMissingNode() || usageNode.isNull()) { + return null; + } + return LLMResponse.Usage.builder() + .promptTokens(usageNode.path("prompt_tokens").asInt(0)) + .completionTokens(usageNode.path("completion_tokens").asInt(0)) + .totalTokens(usageNode.path("total_tokens").asInt(0)) + .build(); + } + + private Object toJavaValue(JsonNode value) { + if (value == null || value.isNull()) { + return null; + } + if (value.isBoolean()) { + return value.asBoolean(); + } + if (value.isIntegralNumber()) { + return value.asLong(); + } + if (value.isFloatingPointNumber()) { + return value.asDouble(); + } + if (value.isTextual()) { + return value.asText(); + } + return objectMapper.convertValue(value, Object.class); + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/http/HttpLLMClient.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/http/HttpLLMClient.java new file mode 100644 index 0000000..2c67a2f --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/http/HttpLLMClient.java @@ -0,0 +1,135 @@ +package com.consilens.ai.http; + +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import lombok.extern.slf4j.Slf4j; +import okhttp3.MediaType; +import okhttp3.OkHttpClient; +import okhttp3.Request; +import okhttp3.RequestBody; +import okhttp3.Response; + +import java.io.IOException; +import java.time.Duration; +import java.util.Map; + +/** + * Thin HTTP client wrapper over OkHttp for LLM backend communication. + */ +@Slf4j +public class HttpLLMClient { + + private static final MediaType JSON = MediaType.get("application/json; charset=utf-8"); + + private final OkHttpClient client; + private final ObjectMapper objectMapper; + + public HttpLLMClient() { + this(Duration.ofSeconds(10), Duration.ofSeconds(120), Duration.ofSeconds(30), null); + } + + public HttpLLMClient(Duration timeout) { + this(timeout != null ? timeout : Duration.ofSeconds(10), + timeout != null ? timeout : Duration.ofSeconds(120), + timeout != null ? timeout : Duration.ofSeconds(30), + timeout); + } + + private HttpLLMClient(Duration connectTimeout, Duration readTimeout, Duration writeTimeout, Duration callTimeout) { + OkHttpClient.Builder builder = new OkHttpClient.Builder() + .connectTimeout(connectTimeout) + .readTimeout(readTimeout) + .writeTimeout(writeTimeout); + if (callTimeout != null) { + builder.callTimeout(callTimeout); + } + this.client = builder.build(); + this.objectMapper = new ObjectMapper(); + } + + /** + * Sends a POST request with a JSON body and returns the parsed response. + * + * @param url the endpoint URL + * @param body the request body object (will be serialized to JSON) + * @return the parsed JSON response + * @throws IOException if the request fails + */ + public JsonNode post(String url, Object body) throws IOException { + return post(url, body, Map.of()); + } + + public JsonNode post(String url, Object body, Map headers) throws IOException { + String jsonBody = objectMapper.writeValueAsString(body); + RequestBody requestBody = RequestBody.create(jsonBody, JSON); + Request.Builder requestBuilder = new Request.Builder().url(url).post(requestBody); + applyHeaders(requestBuilder, headers); + Request request = requestBuilder.build(); + try (Response response = client.newCall(request).execute()) { + if (!response.isSuccessful()) { + throw new IOException("HTTP " + response.code() + " from " + url + ": " + response.message()); + } + String responseBody = response.body() != null ? response.body().string() : "{}"; + return objectMapper.readTree(responseBody); + } + } + + /** + * Sends a GET request and returns the parsed JSON response. + * + * @param url the URL to probe + * @return the parsed JSON response + * @throws IOException if the request fails + */ + public JsonNode get(String url) throws IOException { + return get(url, Map.of()); + } + + public JsonNode get(String url, Map headers) throws IOException { + Request.Builder requestBuilder = new Request.Builder().url(url).get(); + applyHeaders(requestBuilder, headers); + Request request = requestBuilder.build(); + try (Response response = client.newCall(request).execute()) { + if (!response.isSuccessful()) { + throw new IOException("HTTP " + response.code() + " from " + url); + } + String responseBody = response.body() != null ? response.body().string() : "{}"; + return objectMapper.readTree(responseBody); + } + } + + /** + * Checks whether the given URL is reachable. + * + * @param url the URL to probe + * @return {@code true} if the server responds with a 2xx status + */ + public boolean isReachable(String url) { + return isReachable(url, Map.of()); + } + + public boolean isReachable(String url, Map headers) { + try { + Request.Builder requestBuilder = new Request.Builder().url(url).get(); + applyHeaders(requestBuilder, headers); + Request request = requestBuilder.build(); + try (Response response = client.newCall(request).execute()) { + return response.isSuccessful(); + } + } catch (Exception e) { + log.debug("Reachability check failed for {}: {}", url, e.getMessage()); + return false; + } + } + + private void applyHeaders(Request.Builder requestBuilder, Map headers) { + if (headers == null || headers.isEmpty()) { + return; + } + headers.forEach((key, value) -> { + if (key != null && !key.isBlank() && value != null && !value.isBlank()) { + requestBuilder.header(key, value); + } + }); + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/spi/LLMBackendManager.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/spi/LLMBackendManager.java index 1ca9848..2c4701a 100644 --- a/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/spi/LLMBackendManager.java +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/main/java/com/consilens/ai/spi/LLMBackendManager.java @@ -3,6 +3,7 @@ import com.consilens.spi.PluginManager; import java.util.Map; +import java.util.Set; /** * Manages {@link LLMBackend} instances loaded via SPI. @@ -49,4 +50,11 @@ public LLMBackend create(String name) { public LLMBackend create(String name, Map config) { return pluginManager.create(name, config); } + + /** + * Returns LLM backend provider names discovered from the classpath. + */ + public Set supportedNames() { + return pluginManager.getSupportedKeys(); + } } diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/test/java/com/consilens/ai/http/HttpLLMClientTest.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/test/java/com/consilens/ai/http/HttpLLMClientTest.java new file mode 100644 index 0000000..e5f2316 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-api/src/test/java/com/consilens/ai/http/HttpLLMClientTest.java @@ -0,0 +1,55 @@ +package com.consilens.ai.http; + +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import okhttp3.mockwebserver.MockResponse; +import okhttp3.mockwebserver.MockWebServer; +import okhttp3.mockwebserver.RecordedRequest; +import org.junit.jupiter.api.Test; + +import java.util.Map; + +import static org.assertj.core.api.Assertions.assertThat; + +class HttpLLMClientTest { + + private final ObjectMapper objectMapper = new ObjectMapper(); + + @Test + void postSendsHeadersAndJsonBody() throws Exception { + try (MockWebServer server = new MockWebServer()) { + server.enqueue(new MockResponse() + .setResponseCode(200) + .setBody("{\"ok\":true}") + .addHeader("Content-Type", "application/json")); + server.start(); + + HttpLLMClient client = new HttpLLMClient(); + JsonNode body = objectMapper.readTree("{\"model\":\"test\"}"); + JsonNode response = client.post(server.url("/v1/chat/completions").toString(), body, + Map.of("Authorization", "Bearer token")); + + RecordedRequest request = server.takeRequest(); + assertThat(request.getMethod()).isEqualTo("POST"); + assertThat(request.getHeader("Authorization")).isEqualTo("Bearer token"); + assertThat(request.getHeader("Content-Type")).contains("application/json"); + assertThat(objectMapper.readTree(request.getBody().readUtf8()).path("model").asText()).isEqualTo("test"); + assertThat(response.path("ok").asBoolean()).isTrue(); + } + } + + @Test + void isReachableUsesHeaders() throws Exception { + try (MockWebServer server = new MockWebServer()) { + server.enqueue(new MockResponse().setResponseCode(200).setBody("{}")); + server.start(); + + HttpLLMClient client = new HttpLLMClient(); + assertThat(client.isReachable(server.url("/models").toString(), Map.of("Authorization", "Bearer token"))).isTrue(); + + RecordedRequest request = server.takeRequest(); + assertThat(request.getMethod()).isEqualTo("GET"); + assertThat(request.getHeader("Authorization")).isEqualTo("Bearer token"); + } + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/pom.xml b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/pom.xml new file mode 100644 index 0000000..7636ec0 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/pom.xml @@ -0,0 +1,39 @@ + + + 4.0.0 + + com.consilens + consilens-ai-llm-plugins + 0.1-SNAPSHOT + + + consilens-ai-llm-deepseek + consilens-ai-llm-deepseek + Consilens AI LLM - DeepSeek backend + + + + com.consilens + consilens-ai-llm-api + + + com.squareup.okhttp3 + mockwebserver + test + + + org.junit.jupiter + junit-jupiter + ${junit.version} + test + + + org.assertj + assertj-core + ${assertj.version} + test + + + diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/main/java/com/consilens/ai/backend/DeepSeekBackend.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/main/java/com/consilens/ai/backend/DeepSeekBackend.java new file mode 100644 index 0000000..28ea4a6 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/main/java/com/consilens/ai/backend/DeepSeekBackend.java @@ -0,0 +1,55 @@ +package com.consilens.ai.backend; + +import java.time.Duration; + +/** + * DeepSeek chat completion backend. + */ +public class DeepSeekBackend extends AbstractOpenAICompatibleBackend { + + private static final String DEFAULT_BASE_URL = "https://api.deepseek.com"; + private static final String DEFAULT_MODEL = "deepseek-chat"; + + public DeepSeekBackend() { + this(DEFAULT_BASE_URL, DEFAULT_MODEL, System.getenv("DEEPSEEK_API_KEY")); + } + + public DeepSeekBackend(String baseUrl, String model, String apiKey) { + super(baseUrl, model, apiKey); + } + + public DeepSeekBackend(String baseUrl, String model, String apiKey, + Duration timeout, Double temperature, Integer maxTokens) { + super(baseUrl, model, apiKey, timeout, temperature, maxTokens); + } + + @Override + protected String backendName() { + return "deepseek"; + } + + @Override + protected String chatPath() { + return "/chat/completions"; + } + + @Override + protected String completionPath() { + return "/chat/completions"; + } + + @Override + protected String healthPath() { + return "/models"; + } + + @Override + protected boolean supportsToolCalls() { + return true; + } + + @Override + protected boolean supportsStreaming() { + return true; + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/main/java/com/consilens/ai/backend/DeepSeekBackendProvider.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/main/java/com/consilens/ai/backend/DeepSeekBackendProvider.java new file mode 100644 index 0000000..899c489 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/main/java/com/consilens/ai/backend/DeepSeekBackendProvider.java @@ -0,0 +1,87 @@ +package com.consilens.ai.backend; + +import com.consilens.ai.spi.LLMBackend; +import com.consilens.ai.spi.LLMBackendProvider; + +import java.time.Duration; +import java.util.Map; + +/** + * SPI provider for the DeepSeek backend. + */ +public class DeepSeekBackendProvider implements LLMBackendProvider { + + private static final String KEY_BASE_URL = "baseUrl"; + private static final String KEY_MODEL = "model"; + private static final String KEY_API_KEY = "apiKey"; + private static final String KEY_TIMEOUT = "timeout"; + private static final String KEY_TEMPERATURE = "temperature"; + private static final String KEY_MAX_TOKENS = "maxTokens"; + + @Override + public String getName() { + return "deepseek"; + } + + @Override + public LLMBackend create() { + return new DeepSeekBackend(); + } + + @Override + public LLMBackend create(Map config) { + if (config == null || config.isEmpty()) { + return create(); + } + String baseUrl = valueOrDefault(config, KEY_BASE_URL, "https://api.deepseek.com"); + String model = valueOrDefault(config, KEY_MODEL, "deepseek-chat"); + String apiKey = valueOrEnv(config, KEY_API_KEY, "DEEPSEEK_API_KEY"); + return new DeepSeekBackend(baseUrl, model, apiKey, + duration(config.get(KEY_TIMEOUT)), + doubleValue(config.get(KEY_TEMPERATURE)), + integerValue(config.get(KEY_MAX_TOKENS))); + } + + private String valueOrDefault(Map config, String key, String defaultValue) { + Object value = config.get(key); + if (value == null) { + return defaultValue; + } + String text = String.valueOf(value); + return text.isBlank() ? defaultValue : text; + } + + private String valueOrEnv(Map config, String key, String envName) { + Object value = config.get(key); + if (value == null) { + return System.getenv(envName); + } + String text = String.valueOf(value); + return text.isBlank() ? System.getenv(envName) : text; + } + + private Duration duration(Object value) { + if (value == null || String.valueOf(value).isBlank()) { + return null; + } + String text = String.valueOf(value).trim(); + if (text.endsWith("ms")) { + return Duration.ofMillis(Long.parseLong(text.substring(0, text.length() - 2))); + } + if (text.endsWith("s")) { + return Duration.ofSeconds(Long.parseLong(text.substring(0, text.length() - 1))); + } + if (text.startsWith("P")) { + return Duration.parse(text); + } + return Duration.ofSeconds(Long.parseLong(text)); + } + + private Double doubleValue(Object value) { + return value == null || String.valueOf(value).isBlank() ? null : Double.valueOf(String.valueOf(value)); + } + + private Integer integerValue(Object value) { + return value == null || String.valueOf(value).isBlank() ? null : Integer.valueOf(String.valueOf(value)); + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/main/resources/META-INF/services/com.consilens.ai.spi.LLMBackendProvider b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/main/resources/META-INF/services/com.consilens.ai.spi.LLMBackendProvider new file mode 100644 index 0000000..11ed1bd --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/main/resources/META-INF/services/com.consilens.ai.spi.LLMBackendProvider @@ -0,0 +1 @@ +com.consilens.ai.backend.DeepSeekBackendProvider diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/test/java/com/consilens/ai/backend/DeepSeekBackendTest.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/test/java/com/consilens/ai/backend/DeepSeekBackendTest.java new file mode 100644 index 0000000..40e42c9 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-deepseek/src/test/java/com/consilens/ai/backend/DeepSeekBackendTest.java @@ -0,0 +1,66 @@ +package com.consilens.ai.backend; + +import com.consilens.ai.model.ChatMessage; +import com.consilens.ai.model.LLMResponse; +import okhttp3.mockwebserver.MockResponse; +import okhttp3.mockwebserver.MockWebServer; +import okhttp3.mockwebserver.RecordedRequest; +import org.junit.jupiter.api.Test; + +import java.time.Duration; +import java.util.List; +import java.util.Map; + +import static org.assertj.core.api.Assertions.assertThat; + +class DeepSeekBackendTest { + + @Test + void usesDeepSeekPaths() throws Exception { + try (MockWebServer server = new MockWebServer()) { + server.enqueue(new MockResponse() + .setResponseCode(200) + .setBody("{" + + "\"choices\":[{" + + "\"message\":{\"content\":\"ok\"}," + + "\"finish_reason\":\"stop\"" + + "}]" + + "}") + .addHeader("Content-Type", "application/json")); + server.enqueue(new MockResponse().setResponseCode(200).setBody("{}")); + server.start(); + + DeepSeekBackend backend = new DeepSeekBackend(server.url("").toString().replaceAll("/$", ""), + "deepseek-chat", "deepseek-key", Duration.ofSeconds(5), 0.3, 512); + + LLMResponse response = backend.chat(null, List.of(ChatMessage.user("hello")), List.of()); + assertThat(response.getText()).isEqualTo("ok"); + + RecordedRequest chatRequest = server.takeRequest(); + assertThat(chatRequest.getPath()).isEqualTo("/chat/completions"); + assertThat(chatRequest.getHeader("Authorization")).isEqualTo("Bearer deepseek-key"); + String chatBody = chatRequest.getBody().readUtf8(); + assertThat(chatBody).contains("\"temperature\":0.3"); + assertThat(chatBody).contains("\"max_tokens\":512"); + + assertThat(backend.isAvailable()).isTrue(); + RecordedRequest healthRequest = server.takeRequest(); + assertThat(healthRequest.getPath()).isEqualTo("/models"); + } + } + + @Test + void providerCreatesConfiguredBackend() { + DeepSeekBackendProvider provider = new DeepSeekBackendProvider(); + + assertThat(provider.getName()).isEqualTo("deepseek"); + assertThat(provider.create(Map.of( + "baseUrl", "http://localhost", + "model", "deepseek-reasoner", + "apiKey", "key", + "timeout", "5s", + "temperature", 0.1, + "maxTokens", 128 + )).info().getModel()).isEqualTo("deepseek-reasoner"); + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-ollama/src/main/java/com/consilens/ai/backend/HttpLLMClient.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-ollama/src/main/java/com/consilens/ai/backend/HttpLLMClient.java index fd6b551..3ba54c2 100644 --- a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-ollama/src/main/java/com/consilens/ai/backend/HttpLLMClient.java +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-ollama/src/main/java/com/consilens/ai/backend/HttpLLMClient.java @@ -1,97 +1,8 @@ package com.consilens.ai.backend; -import com.fasterxml.jackson.databind.JsonNode; -import com.fasterxml.jackson.databind.ObjectMapper; -import lombok.extern.slf4j.Slf4j; -import okhttp3.MediaType; -import okhttp3.OkHttpClient; -import okhttp3.Request; -import okhttp3.RequestBody; -import okhttp3.Response; - -import java.io.IOException; -import java.time.Duration; - /** - * Thin HTTP client wrapper over OkHttp for LLM backend communication. + * Backward-compatible alias for the shared LLM HTTP client. */ -@Slf4j -public class HttpLLMClient { - - private static final MediaType JSON = MediaType.get("application/json; charset=utf-8"); - - private final OkHttpClient client; - private final ObjectMapper objectMapper; - - public HttpLLMClient() { - this.client = new OkHttpClient.Builder() - .connectTimeout(Duration.ofSeconds(10)) - .readTimeout(Duration.ofSeconds(120)) - .writeTimeout(Duration.ofSeconds(30)) - .build(); - this.objectMapper = new ObjectMapper(); - } - - /** - * Sends a POST request with a JSON body and returns the parsed response. - * - * @param url the endpoint URL - * @param body the request body object (will be serialized to JSON) - * @return the parsed JSON response - * @throws IOException if the request fails - */ - public JsonNode post(String url, Object body) throws IOException { - String jsonBody = objectMapper.writeValueAsString(body); - RequestBody requestBody = RequestBody.create(jsonBody, JSON); - Request request = new Request.Builder() - .url(url) - .post(requestBody) - .build(); - try (Response response = client.newCall(request).execute()) { - if (!response.isSuccessful()) { - throw new IOException("HTTP " + response.code() + " from " + url + ": " + response.message()); - } - String responseBody = response.body() != null ? response.body().string() : "{}"; - return objectMapper.readTree(responseBody); - } - } - - /** - * Sends a GET request and returns the parsed JSON response. - * - * @param url the endpoint URL - * @return the parsed JSON response - * @throws IOException if the request fails - */ - public JsonNode get(String url) throws IOException { - Request request = new Request.Builder() - .url(url) - .get() - .build(); - try (Response response = client.newCall(request).execute()) { - if (!response.isSuccessful()) { - throw new IOException("HTTP " + response.code() + " from " + url); - } - String responseBody = response.body() != null ? response.body().string() : "{}"; - return objectMapper.readTree(responseBody); - } - } - - /** - * Checks whether the given base URL is reachable. - * - * @param url the URL to probe - * @return {@code true} if the server responds with a 2xx status - */ - public boolean isReachable(String url) { - try { - Request request = new Request.Builder().url(url).get().build(); - try (Response response = client.newCall(request).execute()) { - return response.isSuccessful(); - } - } catch (Exception e) { - log.debug("Reachability check failed for {}: {}", url, e.getMessage()); - return false; - } - } +@Deprecated +public class HttpLLMClient extends com.consilens.ai.http.HttpLLMClient { } diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-ollama/src/main/java/com/consilens/ai/backend/OllamaBackend.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-ollama/src/main/java/com/consilens/ai/backend/OllamaBackend.java index 373b2f8..4f601da 100644 --- a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-ollama/src/main/java/com/consilens/ai/backend/OllamaBackend.java +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-ollama/src/main/java/com/consilens/ai/backend/OllamaBackend.java @@ -4,6 +4,7 @@ import com.consilens.ai.model.ChatMessage; import com.consilens.ai.model.FunctionDefinition; import com.consilens.ai.model.LLMResponse; +import com.consilens.ai.http.HttpLLMClient; import com.consilens.ai.spi.LLMBackend; import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.ObjectMapper; @@ -12,6 +13,7 @@ import lombok.extern.slf4j.Slf4j; import java.io.IOException; +import java.time.Duration; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; @@ -29,6 +31,8 @@ public class OllamaBackend implements LLMBackend { private final String baseUrl; private final String model; + private final Double temperature; + private final Integer maxTokens; private final HttpLLMClient httpClient; private final ObjectMapper objectMapper; @@ -37,9 +41,15 @@ public OllamaBackend() { } public OllamaBackend(String baseUrl, String model) { + this(baseUrl, model, null, null, null); + } + + public OllamaBackend(String baseUrl, String model, Duration timeout, Double temperature, Integer maxTokens) { this.baseUrl = baseUrl; this.model = model; - this.httpClient = new HttpLLMClient(); + this.temperature = temperature; + this.maxTokens = maxTokens; + this.httpClient = timeout == null ? new HttpLLMClient() : new HttpLLMClient(timeout); this.objectMapper = new ObjectMapper(); } @@ -49,6 +59,7 @@ public LLMResponse chat(String systemPrompt, List messages, List config) { } String baseUrl = config.containsKey(KEY_BASE_URL) ? String.valueOf(config.get(KEY_BASE_URL)) : "http://localhost:11434"; String model = config.containsKey(KEY_MODEL) ? String.valueOf(config.get(KEY_MODEL)) : "qwen2.5:7b"; - return new OllamaBackend(baseUrl, model); + return new OllamaBackend(baseUrl, model, + duration(config.get(KEY_TIMEOUT)), + doubleValue(config.get(KEY_TEMPERATURE)), + integerValue(config.get(KEY_MAX_TOKENS))); + } + + private Duration duration(Object value) { + if (value == null || String.valueOf(value).isBlank()) { + return null; + } + String text = String.valueOf(value).trim(); + if (text.endsWith("ms")) { + return Duration.ofMillis(Long.parseLong(text.substring(0, text.length() - 2))); + } + if (text.endsWith("s")) { + return Duration.ofSeconds(Long.parseLong(text.substring(0, text.length() - 1))); + } + if (text.startsWith("P")) { + return Duration.parse(text); + } + return Duration.ofSeconds(Long.parseLong(text)); + } + + private Double doubleValue(Object value) { + return value == null || String.valueOf(value).isBlank() ? null : Double.valueOf(String.valueOf(value)); + } + + private Integer integerValue(Object value) { + return value == null || String.valueOf(value).isBlank() ? null : Integer.valueOf(String.valueOf(value)); } } diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/pom.xml b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/pom.xml new file mode 100644 index 0000000..8c420f4 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/pom.xml @@ -0,0 +1,39 @@ + + + 4.0.0 + + com.consilens + consilens-ai-llm-plugins + 0.1-SNAPSHOT + + + consilens-ai-llm-openai + consilens-ai-llm-openai + Consilens AI LLM - OpenAI backend + + + + com.consilens + consilens-ai-llm-api + + + com.squareup.okhttp3 + mockwebserver + test + + + org.junit.jupiter + junit-jupiter + ${junit.version} + test + + + org.assertj + assertj-core + ${assertj.version} + test + + + diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/main/java/com/consilens/ai/backend/OpenAIBackend.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/main/java/com/consilens/ai/backend/OpenAIBackend.java new file mode 100644 index 0000000..f7be317 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/main/java/com/consilens/ai/backend/OpenAIBackend.java @@ -0,0 +1,40 @@ +package com.consilens.ai.backend; + +import java.time.Duration; + +/** + * OpenAI chat completion backend. + */ +public class OpenAIBackend extends AbstractOpenAICompatibleBackend { + + private static final String DEFAULT_BASE_URL = "https://api.openai.com/v1"; + private static final String DEFAULT_MODEL = "gpt-4.1-mini"; + + public OpenAIBackend() { + this(DEFAULT_BASE_URL, DEFAULT_MODEL, System.getenv("OPENAI_API_KEY")); + } + + public OpenAIBackend(String baseUrl, String model, String apiKey) { + super(baseUrl, model, apiKey); + } + + public OpenAIBackend(String baseUrl, String model, String apiKey, + Duration timeout, Double temperature, Integer maxTokens) { + super(baseUrl, model, apiKey, timeout, temperature, maxTokens); + } + + @Override + protected String backendName() { + return "openai"; + } + + @Override + protected boolean supportsToolCalls() { + return true; + } + + @Override + protected boolean supportsStreaming() { + return true; + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/main/java/com/consilens/ai/backend/OpenAIBackendProvider.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/main/java/com/consilens/ai/backend/OpenAIBackendProvider.java new file mode 100644 index 0000000..f962856 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/main/java/com/consilens/ai/backend/OpenAIBackendProvider.java @@ -0,0 +1,87 @@ +package com.consilens.ai.backend; + +import com.consilens.ai.spi.LLMBackend; +import com.consilens.ai.spi.LLMBackendProvider; + +import java.time.Duration; +import java.util.Map; + +/** + * SPI provider for the OpenAI backend. + */ +public class OpenAIBackendProvider implements LLMBackendProvider { + + private static final String KEY_BASE_URL = "baseUrl"; + private static final String KEY_MODEL = "model"; + private static final String KEY_API_KEY = "apiKey"; + private static final String KEY_TIMEOUT = "timeout"; + private static final String KEY_TEMPERATURE = "temperature"; + private static final String KEY_MAX_TOKENS = "maxTokens"; + + @Override + public String getName() { + return "openai"; + } + + @Override + public LLMBackend create() { + return new OpenAIBackend(); + } + + @Override + public LLMBackend create(Map config) { + if (config == null || config.isEmpty()) { + return create(); + } + String baseUrl = valueOrDefault(config, KEY_BASE_URL, "https://api.openai.com/v1"); + String model = valueOrDefault(config, KEY_MODEL, "gpt-4.1-mini"); + String apiKey = valueOrEnv(config, KEY_API_KEY, "OPENAI_API_KEY"); + return new OpenAIBackend(baseUrl, model, apiKey, + duration(config.get(KEY_TIMEOUT)), + doubleValue(config.get(KEY_TEMPERATURE)), + integerValue(config.get(KEY_MAX_TOKENS))); + } + + private String valueOrDefault(Map config, String key, String defaultValue) { + Object value = config.get(key); + if (value == null) { + return defaultValue; + } + String text = String.valueOf(value); + return text.isBlank() ? defaultValue : text; + } + + private String valueOrEnv(Map config, String key, String envName) { + Object value = config.get(key); + if (value == null) { + return System.getenv(envName); + } + String text = String.valueOf(value); + return text.isBlank() ? System.getenv(envName) : text; + } + + private Duration duration(Object value) { + if (value == null || String.valueOf(value).isBlank()) { + return null; + } + String text = String.valueOf(value).trim(); + if (text.endsWith("ms")) { + return Duration.ofMillis(Long.parseLong(text.substring(0, text.length() - 2))); + } + if (text.endsWith("s")) { + return Duration.ofSeconds(Long.parseLong(text.substring(0, text.length() - 1))); + } + if (text.startsWith("P")) { + return Duration.parse(text); + } + return Duration.ofSeconds(Long.parseLong(text)); + } + + private Double doubleValue(Object value) { + return value == null || String.valueOf(value).isBlank() ? null : Double.valueOf(String.valueOf(value)); + } + + private Integer integerValue(Object value) { + return value == null || String.valueOf(value).isBlank() ? null : Integer.valueOf(String.valueOf(value)); + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/main/resources/META-INF/services/com.consilens.ai.spi.LLMBackendProvider b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/main/resources/META-INF/services/com.consilens.ai.spi.LLMBackendProvider new file mode 100644 index 0000000..8efc4b0 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/main/resources/META-INF/services/com.consilens.ai.spi.LLMBackendProvider @@ -0,0 +1 @@ +com.consilens.ai.backend.OpenAIBackendProvider diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/test/java/com/consilens/ai/backend/OpenAIBackendTest.java b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/test/java/com/consilens/ai/backend/OpenAIBackendTest.java new file mode 100644 index 0000000..7620450 --- /dev/null +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/consilens-ai-llm-openai/src/test/java/com/consilens/ai/backend/OpenAIBackendTest.java @@ -0,0 +1,109 @@ +package com.consilens.ai.backend; + +import com.consilens.ai.model.ChatMessage; +import com.consilens.ai.model.FunctionDefinition; +import com.consilens.ai.model.LLMResponse; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import okhttp3.mockwebserver.MockResponse; +import okhttp3.mockwebserver.MockWebServer; +import okhttp3.mockwebserver.RecordedRequest; +import org.junit.jupiter.api.Test; + +import java.time.Duration; +import java.util.List; +import java.util.Map; + +import static org.assertj.core.api.Assertions.assertThat; + +class OpenAIBackendTest { + + private final ObjectMapper objectMapper = new ObjectMapper(); + + @Test + void sendsChatCompletionRequestAndParsesToolCalls() throws Exception { + try (MockWebServer server = new MockWebServer()) { + server.enqueue(new MockResponse() + .setResponseCode(200) + .setBody("{" + + "\"choices\":[{" + + "\"message\":{" + + "\"content\":\"use tool\"," + + "\"tool_calls\":[{" + + "\"id\":\"call_1\"," + + "\"type\":\"function\"," + + "\"function\":{" + + "\"name\":\"consilens_config_generate\"," + + "\"arguments\":\"{\\\"source_table\\\":\\\"orders\\\",\\\"limit\\\":100}\"" + + "}" + + "}]" + + "}," + + "\"finish_reason\":\"tool_calls\"" + + "}]," + + "\"usage\":{" + + "\"prompt_tokens\":10," + + "\"completion_tokens\":5," + + "\"total_tokens\":15" + + "}" + + "}") + .addHeader("Content-Type", "application/json")); + server.start(); + + OpenAIBackend backend = new OpenAIBackend(server.url("/v1").toString().replaceAll("/$", ""), + "test-model", "test-key", Duration.ofSeconds(5), 0.2, 256); + FunctionDefinition function = FunctionDefinition.builder() + .name("consilens_config_generate") + .description("Generate config") + .parameters(objectMapper.readTree("{\"type\":\"object\"}")) + .build(); + + LLMResponse response = backend.chat("system", List.of(ChatMessage.user("generate config")), List.of(function)); + + RecordedRequest request = server.takeRequest(); + JsonNode requestBody = objectMapper.readTree(request.getBody().readUtf8()); + assertThat(request.getPath()).isEqualTo("/v1/chat/completions"); + assertThat(request.getHeader("Authorization")).isEqualTo("Bearer test-key"); + assertThat(requestBody.path("model").asText()).isEqualTo("test-model"); + assertThat(requestBody.path("temperature").asDouble()).isEqualTo(0.2); + assertThat(requestBody.path("max_tokens").asInt()).isEqualTo(256); + assertThat(requestBody.path("messages").get(0).path("role").asText()).isEqualTo("system"); + assertThat(requestBody.path("messages").get(1).path("content").asText()).isEqualTo("generate config"); + assertThat(requestBody.path("tools").get(0).path("function").path("name").asText()).isEqualTo("consilens_config_generate"); + assertThat(requestBody.path("tool_choice").asText()).isEqualTo("auto"); + + assertThat(response.getFinishReason()).isEqualTo("tool_calls"); + assertThat(response.getUsage().getTotalTokens()).isEqualTo(15); + assertThat(response.getToolCalls()).hasSize(1); + assertThat(response.getToolCalls().get(0).getName()).isEqualTo("consilens_config_generate"); + assertThat(response.getToolCalls().get(0).getArguments()) + .containsEntry("source_table", "orders") + .containsEntry("limit", 100L); + } + } + + @Test + void providerCreatesConfiguredBackend() { + OpenAIBackendProvider provider = new OpenAIBackendProvider(); + + assertThat(provider.getName()).isEqualTo("openai"); + assertThat(provider.create(Map.of( + "baseUrl", "http://localhost", + "model", "openai-model", + "apiKey", "key", + "timeout", "5s", + "temperature", 0.1, + "maxTokens", 128 + )).info().getModel()).isEqualTo("openai-model"); + } + + @Test + void missingApiKeyReturnsUnavailableAndErrorResponse() { + OpenAIBackend backend = new OpenAIBackend("http://localhost", "model", ""); + + assertThat(backend.isAvailable()).isFalse(); + LLMResponse response = backend.chat(null, List.of(ChatMessage.user("hello")), List.of()); + + assertThat(response.getFinishReason()).isEqualTo("error"); + assertThat(response.getText()).contains("api key is required"); + } +} diff --git a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/pom.xml b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/pom.xml index 97b173c..c6a62e4 100644 --- a/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/pom.xml +++ b/consilens-ai/consilens-ai-llm/consilens-ai-llm-plugins/pom.xml @@ -17,5 +17,7 @@ consilens-ai-llm-noop consilens-ai-llm-ollama + consilens-ai-llm-openai + consilens-ai-llm-deepseek diff --git a/consilens-ai/consilens-ai-tool/consilens-ai-tool-plugins/consilens-ai-tool-defaults/src/main/java/com/consilens/ai/tool/ConfigGenerateTool.java b/consilens-ai/consilens-ai-tool/consilens-ai-tool-plugins/consilens-ai-tool-defaults/src/main/java/com/consilens/ai/tool/ConfigGenerateTool.java index 6270bc3..421c840 100644 --- a/consilens-ai/consilens-ai-tool/consilens-ai-tool-plugins/consilens-ai-tool-defaults/src/main/java/com/consilens/ai/tool/ConfigGenerateTool.java +++ b/consilens-ai/consilens-ai-tool/consilens-ai-tool-plugins/consilens-ai-tool-defaults/src/main/java/com/consilens/ai/tool/ConfigGenerateTool.java @@ -28,11 +28,15 @@ public JsonNode getInputSchema() { ObjectNode schema = OBJECT_MAPPER.createObjectNode(); schema.put("type", "object"); ObjectNode props = schema.putObject("properties"); + props.putObject("source_type").put("type", "string").put("description", "Source connector type"); props.putObject("source_url").put("type", "string").put("description", "Source JDBC URL"); props.putObject("source_username").put("type", "string").put("description", "Source DB username"); + props.putObject("source_password_env").put("type", "string").put("description", "Source password env variable name"); props.putObject("source_table").put("type", "string").put("description", "Source table name (schema.table)"); + props.putObject("target_type").put("type", "string").put("description", "Target connector type"); props.putObject("target_url").put("type", "string").put("description", "Target JDBC URL"); props.putObject("target_username").put("type", "string").put("description", "Target DB username"); + props.putObject("target_password_env").put("type", "string").put("description", "Target password env variable name"); props.putObject("target_table").put("type", "string").put("description", "Target table name (schema.table)"); props.putObject("primary_keys").put("type", "string").put("description", "Comma-separated primary key columns"); props.putObject("compare_columns").put("type", "string").put("description", "Comma-separated columns to compare (optional, defaults to all)"); @@ -50,19 +54,24 @@ public boolean isReadOnly() { public ToolResult execute(JsonNode input, ToolContext context) { String sourceUrl = input.path("source_url").asText(); String sourceUsername = input.path("source_username").asText(); + String sourceType = input.path("source_type").asText(inferType(sourceUrl)); + String sourcePasswordEnv = input.path("source_password_env").asText("SOURCE_PASSWORD"); String sourceTable = input.path("source_table").asText(); String targetUrl = input.path("target_url").asText(); String targetUsername = input.path("target_username").asText(); + String targetType = input.path("target_type").asText(inferType(targetUrl)); + String targetPasswordEnv = input.path("target_password_env").asText("TARGET_PASSWORD"); String targetTable = input.path("target_table").asText(); String primaryKeys = input.path("primary_keys").asText(); String compareColumns = input.path("compare_columns").asText(null); - if (sourceUrl.isEmpty() || targetUrl.isEmpty() || sourceTable.isEmpty() || primaryKeys.isEmpty()) { + if (sourceUrl.isEmpty() || sourceUsername.isEmpty() || targetUrl.isEmpty() || targetUsername.isEmpty() + || sourceTable.isEmpty() || targetTable.isEmpty() || primaryKeys.isEmpty()) { return ToolResult.failure("Required parameters: source_url, source_username, source_table, target_url, target_username, target_table, primary_keys"); } - String yaml = buildYaml(sourceUrl, sourceUsername, sourceTable, - targetUrl, targetUsername, targetTable, primaryKeys, compareColumns); + String yaml = buildYaml(sourceType, sourceUrl, sourceUsername, sourcePasswordEnv, sourceTable, + targetType, targetUrl, targetUsername, targetPasswordEnv, targetTable, primaryKeys, compareColumns); return ToolResult.success( "Generated configuration:\n\n```yaml\n" + yaml + "\n```", @@ -70,36 +79,90 @@ public ToolResult execute(JsonNode input, ToolContext context) { ); } - private String buildYaml(String sourceUrl, String sourceUsername, String sourceTable, - String targetUrl, String targetUsername, String targetTable, + private String buildYaml(String sourceType, String sourceUrl, String sourceUsername, String sourcePasswordEnv, + String sourceTable, String targetType, String targetUrl, String targetUsername, + String targetPasswordEnv, String targetTable, String primaryKeys, String compareColumns) { StringBuilder sb = new StringBuilder(); sb.append("# Consilens diff configuration\n"); sb.append("# Generated by Consilens AI\n\n"); sb.append("source:\n"); - sb.append(" url: \"").append(sourceUrl).append("\"\n"); - sb.append(" username: \"").append(sourceUsername).append("\"\n"); - sb.append(" password: \"${SOURCE_PASSWORD}\"\n\n"); + sb.append(" type: ").append(quote(sourceType)).append("\n"); + sb.append(" name: ").append(quote("source-" + sourceType)).append("\n"); + sb.append(" connection:\n"); + sb.append(" url: ").append(quote(sourceUrl)).append("\n"); + sb.append(" username: ").append(quote(sourceUsername)).append("\n"); + sb.append(" password: ").append(quote("${env." + sourcePasswordEnv + "}")).append("\n"); + sb.append(" resource:\n"); + sb.append(" type: table\n"); + sb.append(" name: ").append(quote(sourceTable)).append("\n\n"); sb.append("target:\n"); - sb.append(" url: \"").append(targetUrl).append("\"\n"); - sb.append(" username: \"").append(targetUsername).append("\"\n"); - sb.append(" password: \"${TARGET_PASSWORD}\"\n\n"); - - sb.append("comparisons:\n"); - sb.append(" - source_table: \"").append(sourceTable).append("\"\n"); - sb.append(" target_table: \"").append(targetTable).append("\"\n"); - sb.append(" primary_keys:\n"); + sb.append(" type: ").append(quote(targetType)).append("\n"); + sb.append(" name: ").append(quote("target-" + targetType)).append("\n"); + sb.append(" connection:\n"); + sb.append(" url: ").append(quote(targetUrl)).append("\n"); + sb.append(" username: ").append(quote(targetUsername)).append("\n"); + sb.append(" password: ").append(quote("${env." + targetPasswordEnv + "}")).append("\n"); + sb.append(" resource:\n"); + sb.append(" type: table\n"); + sb.append(" name: ").append(quote(targetTable)).append("\n\n"); + + sb.append("comparison:\n"); + sb.append(" keys:\n"); + sb.append(" source:\n"); + for (String pk : primaryKeys.split(",")) { + sb.append(" - ").append(quote(pk.trim())).append("\n"); + } + sb.append(" target:\n"); for (String pk : primaryKeys.split(",")) { - sb.append(" - \"").append(pk.trim()).append("\"\n"); + sb.append(" - ").append(quote(pk.trim())).append("\n"); } if (compareColumns != null && !compareColumns.isEmpty()) { - sb.append(" compare_columns:\n"); + sb.append(" fields:\n"); + sb.append(" source:\n"); for (String col : compareColumns.split(",")) { - sb.append(" - \"").append(col.trim()).append("\"\n"); + sb.append(" - ").append(quote(col.trim())).append("\n"); + } + sb.append(" target:\n"); + for (String col : compareColumns.split(",")) { + sb.append(" - ").append(quote(col.trim())).append("\n"); } } + sb.append("\nstrategy:\n"); + sb.append(" mode: checksum\n"); + sb.append(" algorithm: xor\n"); + sb.append("\nresult:\n"); + sb.append(" sinks:\n"); + sb.append(" - format: console\n"); + sb.append(" type: result\n"); return sb.toString(); } + + private String inferType(String url) { + if (url == null) { + return ""; + } + String lower = url.toLowerCase(); + if (lower.startsWith("jdbc:postgresql:")) { + return "postgresql"; + } + if (lower.startsWith("jdbc:mysql:")) { + return "mysql"; + } + if (lower.startsWith("jdbc:oracle:")) { + return "oracle"; + } + if (lower.startsWith("jdbc:sqlserver:")) { + return "sqlserver"; + } + int start = lower.indexOf(':'); + int end = start < 0 ? -1 : lower.indexOf(':', start + 1); + return start >= 0 && end > start ? lower.substring(start + 1, end) : ""; + } + + private String quote(String value) { + return "\"" + value.replace("\\", "\\\\").replace("\"", "\\\"") + "\""; + } } diff --git a/consilens-ai/consilens-ai-tool/consilens-ai-tool-plugins/consilens-ai-tool-defaults/src/test/java/com/consilens/ai/tool/ConfigGenerateToolTest.java b/consilens-ai/consilens-ai-tool/consilens-ai-tool-plugins/consilens-ai-tool-defaults/src/test/java/com/consilens/ai/tool/ConfigGenerateToolTest.java index 697ff06..69cf796 100644 --- a/consilens-ai/consilens-ai-tool/consilens-ai-tool-plugins/consilens-ai-tool-defaults/src/test/java/com/consilens/ai/tool/ConfigGenerateToolTest.java +++ b/consilens-ai/consilens-ai-tool/consilens-ai-tool-plugins/consilens-ai-tool-defaults/src/test/java/com/consilens/ai/tool/ConfigGenerateToolTest.java @@ -39,8 +39,14 @@ void shouldGenerateYamlConfig() { assertThat(result.isSuccess()).isTrue(); assertThat(result.getContent()).contains("source:"); assertThat(result.getContent()).contains("target:"); - assertThat(result.getContent()).contains("comparisons:"); - assertThat(result.getContent()).contains("primary_keys:"); + assertThat(result.getContent()).contains("connection:"); + assertThat(result.getContent()).contains("resource:"); + assertThat(result.getContent()).contains("comparison:"); + assertThat(result.getContent()).contains("keys:"); + assertThat(result.getContent()).contains("strategy:"); + assertThat(result.getContent()).contains("result:"); + assertThat(result.getContent()).doesNotContain("comparisons:"); + assertThat(result.getContent()).doesNotContain("primary_keys:"); assertThat(result.getContent()).contains("id"); } diff --git a/consilens-cli/README.md b/consilens-cli/README.md index 69b305c..5aebd6f 100644 --- a/consilens-cli/README.md +++ b/consilens-cli/README.md @@ -65,6 +65,7 @@ consilens [options] 命令: config 配置管理 diff 执行数据对比 + ai AI 辅助配置生成、解释和诊断 选项: -h, --help 显示帮助信息 @@ -104,6 +105,20 @@ consilens [options] --verbose 输出详细配置和进度信息 ``` +### `ai` + +``` +./bin/consilens-cli.sh ai config "compare orders" [选项] +./bin/consilens-cli.sh ai explain -c my-config.yaml +./bin/consilens-cli.sh ai diff "compare orders" -o ai-diff.yaml [选项] +./bin/consilens-cli.sh ai diagnose --result diff-records.json --analyzer rulebased --output diagnose.md +./bin/consilens-cli.sh ai providers +./bin/consilens-cli.sh ai providers --format json +./bin/consilens-cli.sh ai doctor --format json +``` + +`ai config` 和 `ai diff` 只生成并验证配置,真实对比仍通过 `diff -c ` 显式执行。`ai diagnose` 需要 `json` + `diff-record` 输出文件,只有统计摘要的 result JSON 不包含行级证据,无法诊断。诊断 analyzer 通过 SPI 加载,可使用 `--analyzer` 或 `CONSILENS_AI_ANALYZER` 指定,默认是 `rulebased`;使用 `--output` 可将诊断报告写入文件。`ai providers` 用于确认运行时 classpath 中实际发现了哪些 analyzer 和 LLM backend 插件,并支持 `--format json` 供 CI 和脚本读取。`ai doctor` 用于生产前置检查,默认离线检查 provider、analyzer/backend 创建和云后端密钥配置;需要真实连通性时显式加 `--online`。 + ## 配置文件格式 ### 最小配置示例 diff --git a/consilens-cli/pom.xml b/consilens-cli/pom.xml index a178bbf..beb5925 100644 --- a/consilens-cli/pom.xml +++ b/consilens-cli/pom.xml @@ -14,7 +14,7 @@ consilens-cli jar - consilens CLI + consilens cli Command-line interface for data diff operations @@ -40,6 +40,48 @@ consilens-sink-all + + com.consilens + consilens-ai-core + ${project.version} + + + + com.consilens + consilens-ai-analyzer-rulebased + ${project.version} + + + + com.consilens + consilens-ai-llm-api + ${project.version} + + + + com.consilens + consilens-ai-llm-noop + ${project.version} + + + + com.consilens + consilens-ai-llm-ollama + ${project.version} + + + + com.consilens + consilens-ai-llm-openai + ${project.version} + + + + com.consilens + consilens-ai-llm-deepseek + ${project.version} + + info.picocli diff --git a/consilens-cli/src/main/java/com/consilens/cli/ConsilensCliApplication.java b/consilens-cli/src/main/java/com/consilens/cli/ConsilensCliApplication.java index fa5edc1..e6c9050 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/ConsilensCliApplication.java +++ b/consilens-cli/src/main/java/com/consilens/cli/ConsilensCliApplication.java @@ -1,5 +1,6 @@ package com.consilens.cli; +import com.consilens.cli.command.AiCommand; import com.consilens.cli.command.ConfigCommand; import com.consilens.cli.command.DiffCommand; @@ -15,6 +16,7 @@ *
    *
  • {@code diff} - Perform data comparison between databases
  • *
  • {@code config} - Configuration management (generate, validate)
  • + *
  • {@code ai} - AI-assisted configuration and explanation
  • *
*/ @Slf4j @@ -23,11 +25,12 @@ version = "consilens CLI version 0.1-SNAPSHOT", description = "Cross-Database Data Comparison Tool", mixinStandardHelpOptions = true, - subcommands = { - DiffCommand.class, - ConfigCommand.class - } -) + subcommands = { + DiffCommand.class, + ConfigCommand.class, + AiCommand.class + } + ) public class ConsilensCliApplication implements Runnable { public static void main(String[] args) { diff --git a/consilens-cli/src/main/java/com/consilens/cli/ai/AIBackendOptions.java b/consilens-cli/src/main/java/com/consilens/cli/ai/AIBackendOptions.java new file mode 100644 index 0000000..0161bc6 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/ai/AIBackendOptions.java @@ -0,0 +1,25 @@ +package com.consilens.cli.ai; + +import lombok.Builder; +import lombok.Value; + +/** + * Backend selection options shared by AI CLI commands. + */ +@Value +@Builder +public class AIBackendOptions { + + @Builder.Default + String backend = "noop"; + + String model; + String baseUrl; + String apiKey; + String timeout; + Double temperature; + Integer maxTokens; + + @Builder.Default + boolean noLlm = false; +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigCompiler.java b/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigCompiler.java new file mode 100644 index 0000000..7d97416 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigCompiler.java @@ -0,0 +1,138 @@ +package com.consilens.cli.ai; + +import com.consilens.ai.config.model.AIConfigDraft; +import com.consilens.ai.config.model.DatasetDraft; +import com.consilens.ai.config.model.MappingDraft; +import com.consilens.ai.config.model.ResultDraft; +import com.consilens.ai.config.model.StrategyDraft; +import com.consilens.cli.model.CliConfiguration; +import com.consilens.cli.model.ComparisonConfig; +import com.consilens.cli.model.ConnectionConfig; +import com.consilens.cli.model.ListPairConfig; +import com.consilens.cli.model.StrategyConfig; +import com.consilens.sink.api.model.ResultConfig; +import com.consilens.sink.api.model.SinkConfig; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.SerializationFeature; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; + +import java.util.ArrayList; +import java.util.List; + +/** + * Compiles a validated AI draft into the canonical CLI configuration model. + */ +public class AIConfigCompiler { + + private final ObjectMapper yamlMapper; + + public AIConfigCompiler() { + this.yamlMapper = new ObjectMapper(new YAMLFactory()); + this.yamlMapper.setSerializationInclusion(com.fasterxml.jackson.annotation.JsonInclude.Include.NON_DEFAULT); + this.yamlMapper.configure(SerializationFeature.INDENT_OUTPUT, true); + } + + public CliConfiguration compile(AIConfigDraft draft) { + MappingDraft mapping = draft.getMapping(); + StrategyDraft strategy = draft.getStrategy() == null ? StrategyDraft.builder().build() : draft.getStrategy(); + + return CliConfiguration.builder() + .source(connection(draft.getSource(), "source")) + .target(connection(draft.getTarget(), "target")) + .comparison(ComparisonConfig.builder() + .keys(ListPairConfig.builder() + .source(copy(mapping.getSourceKeys())) + .target(copy(mapping.getTargetKeys())) + .build()) + .fields(fields(mapping)) + .build()) + .strategy(StrategyConfig.builder() + .mode(defaultValue(strategy.getMode(), "checksum")) + .algorithm(defaultValue(strategy.getAlgorithm(), "xor")) + .bisectionFactor(defaultValue(strategy.getBisectionFactor(), 4)) + .bisectionThreshold(strategy.getBisectionThreshold()) + .batchSize(defaultValue(strategy.getBatchSize(), 1000)) + .maxDifferences(defaultValue(strategy.getMaxDifferences(), 1_000_000L)) + .build()) + .result(result(draft.getResult())) + .build(); + } + + public String toYaml(CliConfiguration configuration) { + try { + return yamlMapper.writeValueAsString(configuration); + } catch (Exception e) { + throw new IllegalStateException("Failed to serialize AI generated configuration", e); + } + } + + private ConnectionConfig connection(DatasetDraft dataset, String side) { + String resourceType = defaultValue(dataset.getResourceType(), dataset.getQuery() == null ? "table" : "sql"); + ConnectionConfig.ResourceConfig resource = ConnectionConfig.ResourceConfig.builder() + .type(resourceType) + .name("table".equalsIgnoreCase(resourceType) ? dataset.getResourceName() : null) + .path("sql".equalsIgnoreCase(resourceType) ? dataset.getQuery() : null) + .build(); + return ConnectionConfig.builder() + .type(dataset.getType()) + .name(defaultValue(dataset.getName(), side + "-" + dataset.getType())) + .connection(ConnectionConfig.ConnectorConnectionProperties.builder() + .url(dataset.getJdbcUrl()) + .username(env(dataset.getUsernameEnv())) + .password(env(dataset.getPasswordEnv())) + .build()) + .resource(resource) + .build(); + } + + private ListPairConfig fields(MappingDraft mapping) { + if (mapping.getSourceFields() == null || mapping.getSourceFields().isEmpty() + || mapping.getTargetFields() == null || mapping.getTargetFields().isEmpty()) { + return null; + } + return ListPairConfig.builder() + .source(copy(mapping.getSourceFields())) + .target(copy(mapping.getTargetFields())) + .build(); + } + + private ResultConfig result(ResultDraft draft) { + SinkConfig summarySink = new SinkConfig(); + summarySink.setFormat(draft == null ? "console" : defaultValue(draft.getSinkFormat(), "console")); + summarySink.setType(draft == null ? "result" : defaultValue(draft.getSinkType(), "result")); + + if (!"result".equalsIgnoreCase(summarySink.getType())) { + summarySink.setFormat("console"); + summarySink.setType("result"); + } + + SinkConfig evidenceSink = new SinkConfig(); + evidenceSink.setFormat("json"); + evidenceSink.setType("diff-record"); + evidenceSink.setProperties("{\"path\":\"./diff-records.json\",\"pretty\":true}"); + + return ResultConfig.builder() + .sinks(List.of(summarySink, evidenceSink)) + .build(); + } + + private String env(String name) { + return "${env." + name + "}"; + } + + private List copy(List values) { + return values == null ? new ArrayList<>() : new ArrayList<>(values); + } + + private String defaultValue(String value, String fallback) { + return value == null || value.trim().isEmpty() ? fallback : value.trim(); + } + + private Integer defaultValue(Integer value, Integer fallback) { + return value == null ? fallback : value; + } + + private Long defaultValue(Long value, Long fallback) { + return value == null ? fallback : value; + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigRequest.java b/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigRequest.java new file mode 100644 index 0000000..aefcafc --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigRequest.java @@ -0,0 +1,41 @@ +package com.consilens.cli.ai; + +import lombok.Builder; +import lombok.Value; + +/** + * Request for generating a Consilens CLI configuration from AI and explicit hints. + */ +@Value +@Builder +public class AIConfigRequest { + + String goal; + String sourceType; + String sourceUrl; + String sourceName; + String sourceTable; + String sourceQuery; + String sourceUserEnv; + String sourcePasswordEnv; + String targetType; + String targetUrl; + String targetName; + String targetTable; + String targetQuery; + String targetUserEnv; + String targetPasswordEnv; + String keys; + String sourceKeys; + String targetKeys; + String fields; + String sourceFields; + String targetFields; + String strategyMode; + String algorithm; + Integer bisectionFactor; + Long bisectionThreshold; + Integer batchSize; + Long maxDifferences; + AIBackendOptions backendOptions; +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigResult.java b/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigResult.java new file mode 100644 index 0000000..65b6403 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigResult.java @@ -0,0 +1,24 @@ +package com.consilens.cli.ai; + +import com.consilens.ai.config.model.AIConfigDraft; +import com.consilens.ai.config.model.AIConfigIssue; +import com.consilens.cli.model.CliConfiguration; +import lombok.Builder; +import lombok.Value; + +import java.util.List; + +/** + * Result of AI-assisted configuration generation. + */ +@Value +@Builder +public class AIConfigResult { + + AIConfigDraft draft; + CliConfiguration configuration; + String yaml; + List issues; + boolean valid; + boolean dryRunPassed; +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigService.java b/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigService.java new file mode 100644 index 0000000..5a8987d --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/ai/AIConfigService.java @@ -0,0 +1,269 @@ +package com.consilens.cli.ai; + +import com.consilens.ai.config.AIConfigDraftValidator; +import com.consilens.ai.config.model.AIConfigDraft; +import com.consilens.ai.config.model.AIConfigIssue; +import com.consilens.ai.config.model.DatasetDraft; +import com.consilens.ai.config.model.MappingDraft; +import com.consilens.ai.config.model.ResultDraft; +import com.consilens.ai.config.model.StrategyDraft; +import com.consilens.ai.spi.LLMBackend; +import com.consilens.cli.model.CliConfiguration; +import com.fasterxml.jackson.databind.ObjectMapper; + +import java.util.Arrays; +import java.util.List; +import java.util.stream.Collectors; + +/** + * Generates a validated Consilens CLI configuration from explicit hints and optional LLM output. + */ +public class AIConfigService { + + private final AIConfigDraftValidator validator; + private final AIConfigCompiler compiler; + private final LLMBackendResolver backendResolver; + private final ObjectMapper objectMapper; + + public AIConfigService() { + this(new AIConfigDraftValidator(), new AIConfigCompiler(), new LLMBackendResolver(), new ObjectMapper()); + } + + AIConfigService(AIConfigDraftValidator validator, + AIConfigCompiler compiler, + LLMBackendResolver backendResolver, + ObjectMapper objectMapper) { + this.validator = validator; + this.compiler = compiler; + this.backendResolver = backendResolver; + this.objectMapper = objectMapper; + } + + public AIConfigResult generate(AIConfigRequest request) { + AIConfigDraft draft = buildDraftFromRequest(request); + List issues = validator.validate(draft); + + AIBackendOptions backendOptions = request.getBackendOptions(); + String backendName = backendResolver.resolveBackendName(backendOptions); + if (validator.hasErrors(issues) + && backendOptions != null + && !backendOptions.isNoLlm() + && !"noop".equalsIgnoreCase(backendName)) { + draft = mergeExplicitHints(request, generateDraftWithLlm(request)); + issues = validator.validate(draft); + } + + if (validator.hasErrors(issues)) { + return AIConfigResult.builder() + .draft(draft) + .issues(issues) + .valid(false) + .dryRunPassed(false) + .build(); + } + + CliConfiguration configuration = compiler.compile(draft); + try { + configuration.validate(); + } catch (Exception e) { + issues = List.of(AIConfigIssue.builder() + .severity(AIConfigIssue.Severity.ERROR) + .path("configuration") + .code("AI_CONFIG_COMPILED_CONFIG_INVALID") + .message(e.getMessage()) + .build()); + return AIConfigResult.builder() + .draft(draft) + .configuration(configuration) + .issues(issues) + .valid(false) + .dryRunPassed(false) + .build(); + } + + return AIConfigResult.builder() + .draft(draft) + .configuration(configuration) + .yaml(compiler.toYaml(configuration)) + .issues(issues) + .valid(true) + .dryRunPassed(false) + .build(); + } + + private AIConfigDraft generateDraftWithLlm(AIConfigRequest request) { + LLMBackend backend = backendResolver.resolve(request.getBackendOptions()); + String response = backend.complete(buildPrompt(request)); + try { + return objectMapper.readValue(extractJson(response), AIConfigDraft.class); + } catch (Exception e) { + throw new IllegalArgumentException("AI backend did not return a valid AIConfigDraft JSON: " + e.getMessage(), e); + } + } + + private String buildPrompt(AIConfigRequest request) { + return "Generate a Consilens AIConfigDraft JSON only.\n" + + "Do not output markdown.\n" + + "Never include plaintext passwords. Use usernameEnv and passwordEnv.\n" + + "Required JSON shape:\n" + + "{\n" + + " \"source\": {\"type\":\"\",\"jdbcUrl\":\"\",\"usernameEnv\":\"\",\"passwordEnv\":\"\"," + + "\"resourceType\":\"table\",\"resourceName\":\"\"},\n" + + " \"target\": {\"type\":\"\",\"jdbcUrl\":\"\",\"usernameEnv\":\"\",\"passwordEnv\":\"\"," + + "\"resourceType\":\"table\",\"resourceName\":\"\"},\n" + + " \"mapping\": {\"sourceKeys\":[],\"targetKeys\":[],\"sourceFields\":[],\"targetFields\":[]},\n" + + " \"strategy\": {\"mode\":\"checksum\",\"algorithm\":\"xor\"},\n" + + " \"result\": {\"sinkFormat\":\"console\",\"sinkType\":\"result\"},\n" + + " \"assumptions\": [],\n" + + " \"warnings\": []\n" + + "}\n" + + "The compiler will also add a json diff-record evidence sink for diagnostics.\n" + + "User goal:\n" + + nullToEmpty(request.getGoal()); + } + + private String extractJson(String response) { + if (response == null) { + return ""; + } + String trimmed = response.trim(); + int start = trimmed.indexOf('{'); + int end = trimmed.lastIndexOf('}'); + if (start >= 0 && end >= start) { + return trimmed.substring(start, end + 1); + } + return trimmed; + } + + private AIConfigDraft mergeExplicitHints(AIConfigRequest request, AIConfigDraft draft) { + AIConfigDraft explicit = buildDraftFromRequest(request); + if (draft == null) { + return explicit; + } + draft.setSource(mergeDataset(explicit.getSource(), draft.getSource())); + draft.setTarget(mergeDataset(explicit.getTarget(), draft.getTarget())); + draft.setMapping(mergeMapping(explicit.getMapping(), draft.getMapping())); + draft.setStrategy(mergeStrategy(explicit.getStrategy(), draft.getStrategy())); + if (draft.getResult() == null) { + draft.setResult(explicit.getResult()); + } + return draft; + } + + private DatasetDraft mergeDataset(DatasetDraft explicit, DatasetDraft generated) { + if (generated == null) { + return explicit; + } + generated.setType(first(explicit.getType(), generated.getType())); + generated.setName(first(explicit.getName(), generated.getName())); + generated.setJdbcUrl(first(explicit.getJdbcUrl(), generated.getJdbcUrl())); + generated.setUsernameEnv(first(explicit.getUsernameEnv(), generated.getUsernameEnv())); + generated.setPasswordEnv(first(explicit.getPasswordEnv(), generated.getPasswordEnv())); + generated.setResourceType(first(explicit.getResourceType(), generated.getResourceType())); + generated.setResourceName(first(explicit.getResourceName(), generated.getResourceName())); + generated.setQuery(first(explicit.getQuery(), generated.getQuery())); + return generated; + } + + private MappingDraft mergeMapping(MappingDraft explicit, MappingDraft generated) { + if (generated == null) { + return explicit; + } + if (explicit.getSourceKeys() != null && !explicit.getSourceKeys().isEmpty()) { + generated.setSourceKeys(explicit.getSourceKeys()); + } + if (explicit.getTargetKeys() != null && !explicit.getTargetKeys().isEmpty()) { + generated.setTargetKeys(explicit.getTargetKeys()); + } + if (explicit.getSourceFields() != null && !explicit.getSourceFields().isEmpty()) { + generated.setSourceFields(explicit.getSourceFields()); + } + if (explicit.getTargetFields() != null && !explicit.getTargetFields().isEmpty()) { + generated.setTargetFields(explicit.getTargetFields()); + } + return generated; + } + + private StrategyDraft mergeStrategy(StrategyDraft explicit, StrategyDraft generated) { + if (generated == null) { + return explicit; + } + generated.setMode(first(explicit.getMode(), generated.getMode())); + generated.setAlgorithm(first(explicit.getAlgorithm(), generated.getAlgorithm())); + generated.setBisectionFactor(explicit.getBisectionFactor() != null + ? explicit.getBisectionFactor() : generated.getBisectionFactor()); + generated.setBisectionThreshold(explicit.getBisectionThreshold() != null + ? explicit.getBisectionThreshold() : generated.getBisectionThreshold()); + generated.setBatchSize(explicit.getBatchSize() != null ? explicit.getBatchSize() : generated.getBatchSize()); + generated.setMaxDifferences(explicit.getMaxDifferences() != null + ? explicit.getMaxDifferences() : generated.getMaxDifferences()); + return generated; + } + + private AIConfigDraft buildDraftFromRequest(AIConfigRequest request) { + List sourceKeys = list(first(request.getSourceKeys(), request.getKeys())); + List targetKeys = list(first(request.getTargetKeys(), request.getKeys())); + List sourceFields = list(first(request.getSourceFields(), request.getFields())); + List targetFields = list(first(request.getTargetFields(), request.getFields())); + + return AIConfigDraft.builder() + .source(DatasetDraft.builder() + .type(request.getSourceType()) + .name(request.getSourceName()) + .jdbcUrl(request.getSourceUrl()) + .usernameEnv(defaultEnv(request.getSourceUserEnv(), "SOURCE_USERNAME")) + .passwordEnv(defaultEnv(request.getSourcePasswordEnv(), "SOURCE_PASSWORD")) + .resourceType(request.getSourceQuery() == null ? "table" : "sql") + .resourceName(request.getSourceTable()) + .query(request.getSourceQuery()) + .build()) + .target(DatasetDraft.builder() + .type(request.getTargetType()) + .name(request.getTargetName()) + .jdbcUrl(request.getTargetUrl()) + .usernameEnv(defaultEnv(request.getTargetUserEnv(), "TARGET_USERNAME")) + .passwordEnv(defaultEnv(request.getTargetPasswordEnv(), "TARGET_PASSWORD")) + .resourceType(request.getTargetQuery() == null ? "table" : "sql") + .resourceName(request.getTargetTable()) + .query(request.getTargetQuery()) + .build()) + .mapping(MappingDraft.builder() + .sourceKeys(sourceKeys) + .targetKeys(targetKeys) + .sourceFields(sourceFields) + .targetFields(targetFields) + .build()) + .strategy(StrategyDraft.builder() + .mode(request.getStrategyMode()) + .algorithm(request.getAlgorithm()) + .bisectionFactor(request.getBisectionFactor()) + .bisectionThreshold(request.getBisectionThreshold()) + .batchSize(request.getBatchSize()) + .maxDifferences(request.getMaxDifferences()) + .build()) + .result(ResultDraft.builder().sinkFormat("console").sinkType("result").build()) + .build(); + } + + private String defaultEnv(String value, String fallback) { + return first(value, fallback); + } + + private List list(String csv) { + if (csv == null || csv.trim().isEmpty()) { + return List.of(); + } + return Arrays.stream(csv.split(",")) + .map(String::trim) + .filter(value -> !value.isEmpty()) + .collect(Collectors.toList()); + } + + private String first(String explicit, String fallback) { + return explicit == null || explicit.trim().isEmpty() ? fallback : explicit.trim(); + } + + private String nullToEmpty(String value) { + return value == null ? "" : value; + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/ai/AIDiagnoseService.java b/consilens-cli/src/main/java/com/consilens/cli/ai/AIDiagnoseService.java new file mode 100644 index 0000000..3e8cdf1 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/ai/AIDiagnoseService.java @@ -0,0 +1,170 @@ +package com.consilens.cli.ai; + +import com.consilens.ai.model.AnalysisResult; +import com.consilens.ai.model.PatternMatch; +import com.consilens.ai.spi.AIAnalyzer; +import com.consilens.core.diff.DiffOperation; +import com.consilens.core.diff.DiffResult; +import com.consilens.core.diff.DiffRow; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.node.ArrayNode; + +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +/** + * Loads diff evidence and runs deterministic analyzer diagnostics. + */ +public class AIDiagnoseService { + + private final ObjectMapper objectMapper; + private final AIAnalyzer analyzer; + + public AIDiagnoseService(AIAnalyzer analyzer) { + this.objectMapper = new ObjectMapper(); + this.objectMapper.findAndRegisterModules(); + this.analyzer = analyzer; + } + + public String diagnose(String resultPath) throws IOException { + DiffResult result = loadResult(resultPath); + AnalysisResult analysis = analyzer.analyze(result); + return format(result, analysis); + } + + DiffResult loadResult(String resultPath) throws IOException { + JsonNode root = objectMapper.readTree(new File(resultPath)); + List rows; + if (root.isArray()) { + rows = parseRows((ArrayNode) root); + } else if (root.has("differences") && root.get("differences").isArray()) { + rows = parseRows((ArrayNode) root.get("differences")); + } else { + throw new IOException("Diff result must contain a differences array or be a diff-record array"); + } + return DiffResult.of(rows, null, null); + } + + private List parseRows(ArrayNode array) throws IOException { + List rows = new ArrayList<>(); + for (JsonNode node : array) { + rows.add(parseRow(node)); + } + return rows; + } + + private DiffRow parseRow(JsonNode node) throws IOException { + if (node.has("operation") && node.has("primaryKey")) { + return parseRecordNode(node); + } + return objectMapper.treeToValue(node, DiffRow.class); + } + + private DiffRow parseRecordNode(JsonNode node) { + DiffOperation operation = operation(node.path("operation").asText()); + List primaryKey = values(node.get("primaryKey")); + List sourceValues = values(node.get("sourceValues")); + List targetValues = values(node.get("targetValues")); + List columnNames1 = strings(node.get("columnNames1")); + List columnNames2 = strings(node.get("columnNames2")); + if (columnNames1.isEmpty()) { + columnNames1 = strings(node.get("columns")); + } + if (columnNames2.isEmpty()) { + columnNames2 = columnNames1; + } + + switch (operation) { + case SOURCE_MISSING: + return DiffRow.added(primaryKey, targetValues, columnNames2); + case TARGET_MISSING: + return DiffRow.removed(primaryKey, sourceValues, columnNames1); + case MISMATCH: + return DiffRow.modified(primaryKey, sourceValues, targetValues, columnNames1, columnNames2); + default: + throw new IllegalArgumentException("Unsupported diff operation: " + operation); + } + } + + private DiffOperation operation(String value) { + if (value == null || value.isBlank()) { + throw new IllegalArgumentException("operation is required"); + } + try { + return DiffOperation.valueOf(value.trim()); + } catch (IllegalArgumentException ignored) { + return DiffOperation.fromCode(value.trim()); + } + } + + private List values(JsonNode node) { + if (node == null || node.isMissingNode() || node.isNull()) { + return List.of(); + } + if (node.isArray()) { + List values = new ArrayList<>(); + node.forEach(item -> values.add(toValue(item))); + return values; + } + return List.of(toValue(node)); + } + + private Object toValue(JsonNode node) { + if (node == null || node.isNull()) { + return null; + } + if (node.isNumber()) { + return node.numberValue(); + } + if (node.isBoolean()) { + return node.booleanValue(); + } + return node.asText(); + } + + private List strings(JsonNode node) { + if (node == null || node.isMissingNode() || node.isNull()) { + return List.of(); + } + if (!node.isArray()) { + return List.of(node.asText()); + } + List values = new ArrayList<>(); + node.forEach(item -> values.add(item.asText())); + return values; + } + + private String format(DiffResult result, AnalysisResult analysis) { + StringBuilder builder = new StringBuilder(); + builder.append("# AI Diagnose").append(System.lineSeparator()).append(System.lineSeparator()); + builder.append("Summary:").append(System.lineSeparator()) + .append(" ").append(analysis.getSummary()).append(System.lineSeparator()).append(System.lineSeparator()); + builder.append("Evidence:").append(System.lineSeparator()) + .append(" differences=").append(result.getDifferenceCount()).append(System.lineSeparator()) + .append(" confidence=").append(String.format("%.0f%%", analysis.getConfidence() * 100)) + .append(System.lineSeparator()).append(System.lineSeparator()); + builder.append("Patterns:").append(System.lineSeparator()); + if (analysis.getPatterns() == null || analysis.getPatterns().isEmpty()) { + builder.append(" - none").append(System.lineSeparator()); + } else { + for (PatternMatch match : analysis.getPatterns()) { + builder.append(" - ").append(match.getPatternName()) + .append(" confidence=").append(String.format("%.0f%%", match.getConfidence() * 100)) + .append(" affectedRows=").append(match.getAffectedRows()) + .append(System.lineSeparator()) + .append(" ").append(match.getDescription()).append(System.lineSeparator()); + } + } + builder.append(System.lineSeparator()).append("Repair Hints:").append(System.lineSeparator()); + if (analysis.getRepairHints() == null || analysis.getRepairHints().isEmpty()) { + builder.append(" - No deterministic repair hint available. Review diff samples manually.") + .append(System.lineSeparator()); + } else { + analysis.getRepairHints().forEach(hint -> builder.append(" - ").append(hint).append(System.lineSeparator())); + } + return builder.toString(); + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/ai/AIExplainService.java b/consilens-cli/src/main/java/com/consilens/cli/ai/AIExplainService.java new file mode 100644 index 0000000..275875a --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/ai/AIExplainService.java @@ -0,0 +1,83 @@ +package com.consilens.cli.ai; + +import com.consilens.cli.model.CliConfiguration; +import com.consilens.cli.model.ComparisonConfig; +import com.consilens.cli.model.ConnectionConfig; + +import java.util.ArrayList; +import java.util.List; + +/** + * Builds deterministic explanation text from the existing CLI configuration model. + */ +public class AIExplainService { + + public String explain(CliConfiguration config) { + List risks = risks(config); + StringBuilder builder = new StringBuilder(); + builder.append("# AI Explain").append(System.lineSeparator()).append(System.lineSeparator()); + builder.append("Source:").append(System.lineSeparator()) + .append(" ").append(dataset(config.getSource())).append(System.lineSeparator()).append(System.lineSeparator()); + builder.append("Target:").append(System.lineSeparator()) + .append(" ").append(dataset(config.getTarget())).append(System.lineSeparator()).append(System.lineSeparator()); + ComparisonConfig comparison = config.getComparison(); + builder.append("Keys:").append(System.lineSeparator()) + .append(" source ").append(comparison.getKeys().getSource()) + .append(" -> target ").append(comparison.getKeys().getTarget()) + .append(System.lineSeparator()).append(System.lineSeparator()); + if (comparison.getFields() != null && !comparison.getFields().isBothEmpty()) { + builder.append("Fields:").append(System.lineSeparator()) + .append(" source ").append(comparison.getFields().getSource()) + .append(" -> target ").append(comparison.getFields().getTarget()) + .append(System.lineSeparator()).append(System.lineSeparator()); + } + builder.append("Strategy:").append(System.lineSeparator()) + .append(" ").append(config.getStrategyMode()).append(" ") + .append(config.getAlgorithm()) + .append(System.lineSeparator()).append(System.lineSeparator()); + builder.append("Result:").append(System.lineSeparator()) + .append(" sinks=").append(config.getResult() == null ? 0 : config.getResult().getSinks().size()) + .append(System.lineSeparator()).append(System.lineSeparator()); + builder.append("Risks:").append(System.lineSeparator()); + if (risks.isEmpty()) { + builder.append(" - No obvious configuration risk found.").append(System.lineSeparator()); + } else { + risks.forEach(risk -> builder.append(" - ").append(risk).append(System.lineSeparator())); + } + builder.append(System.lineSeparator()); + builder.append("Recommendations:").append(System.lineSeparator()) + .append(" - Run `consilens diff --dry-run -c ` before executing a real diff.") + .append(System.lineSeparator()) + .append(" - Keep credentials in environment variables, not plaintext YAML.") + .append(System.lineSeparator()); + return builder.toString(); + } + + private String dataset(ConnectionConfig config) { + ConnectionConfig.ResourceConfig resource = config.getResource(); + String location = "sql".equalsIgnoreCase(resource.getType()) ? resource.getPath() : resource.getName(); + return config.getType() + " " + value(config.getName()) + " " + resource.getType() + ":" + location; + } + + private List risks(CliConfiguration config) { + List risks = new ArrayList<>(); + if (config.getComparison().getFields() == null || config.getComparison().getFields().isBothEmpty()) { + risks.add("No explicit comparison fields configured; engine behavior depends on connector metadata."); + } + if (config.getSource().getPassword() != null && !config.getSource().getPassword().startsWith("${env.")) { + risks.add("Source password is not an environment placeholder."); + } + if (config.getTarget().getPassword() != null && !config.getTarget().getPassword().startsWith("${env.")) { + risks.add("Target password is not an environment placeholder."); + } + if (config.getStrategy() != null && config.getStrategy().getMaxDifferences() != null + && config.getStrategy().getMaxDifferences() < 1000) { + risks.add("maxDifferences is low; large comparisons may stop before producing useful samples."); + } + return risks; + } + + private String value(String value) { + return value == null || value.isBlank() ? "(unnamed)" : value; + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/ai/LLMBackendResolver.java b/consilens-cli/src/main/java/com/consilens/cli/ai/LLMBackendResolver.java new file mode 100644 index 0000000..8035215 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/ai/LLMBackendResolver.java @@ -0,0 +1,90 @@ +package com.consilens.cli.ai; + +import com.consilens.ai.spi.LLMBackend; +import com.consilens.ai.spi.LLMBackendManager; + +import java.util.LinkedHashMap; +import java.util.Map; +import java.util.function.Function; + +/** + * Resolves an LLM backend from CLI options and environment defaults. + */ +public class LLMBackendResolver { + + private final Function envProvider; + + public LLMBackendResolver() { + this(System::getenv); + } + + LLMBackendResolver(Function envProvider) { + this.envProvider = envProvider; + } + + public LLMBackend resolve(AIBackendOptions options) { + AIBackendOptions effective = options == null ? AIBackendOptions.builder().build() : options; + String backend = resolveBackendName(effective); + Map config = new LinkedHashMap<>(); + put(config, "model", firstNonBlank(effective.getModel(), env("CONSILENS_AI_MODEL"), null)); + put(config, "baseUrl", firstNonBlank(effective.getBaseUrl(), env("CONSILENS_AI_BASE_URL"), backendDefaultBaseUrl(backend))); + put(config, "apiKey", firstNonBlank(effective.getApiKey(), apiKeyEnv(backend), null)); + put(config, "timeout", firstNonBlank(effective.getTimeout(), env("CONSILENS_AI_TIMEOUT"), null)); + put(config, "temperature", effective.getTemperature()); + put(config, "maxTokens", effective.getMaxTokens()); + try { + return LLMBackendManager.getInstance().create(backend, config); + } catch (RuntimeException e) { + throw new IllegalArgumentException("Unknown or unavailable AI backend: " + backend, e); + } + } + + public String resolveBackendName(AIBackendOptions options) { + AIBackendOptions effective = options == null ? AIBackendOptions.builder().backend(null).build() : options; + return firstNonBlank(effective.getBackend(), env("CONSILENS_AI_BACKEND"), "noop"); + } + + private String backendDefaultBaseUrl(String backend) { + if ("ollama".equalsIgnoreCase(backend)) { + return firstNonBlank(env("OLLAMA_BASE_URL"), "http://localhost:11434"); + } + return null; + } + + private String apiKeyEnv(String backend) { + if ("openai".equalsIgnoreCase(backend)) { + return env("OPENAI_API_KEY"); + } + if ("deepseek".equalsIgnoreCase(backend)) { + return env("DEEPSEEK_API_KEY"); + } + return null; + } + + private void put(Map config, String key, Object value) { + if (value != null && !String.valueOf(value).trim().isEmpty()) { + config.put(key, value); + } + } + + private String env(String name) { + return envProvider.apply(name); + } + + private String firstNonBlank(String first, String second) { + return firstNonBlank(first, second, null); + } + + private String firstNonBlank(String first, String second, String third) { + if (first != null && !first.trim().isEmpty()) { + return first.trim(); + } + if (second != null && !second.trim().isEmpty()) { + return second.trim(); + } + if (third != null && !third.trim().isEmpty()) { + return third.trim(); + } + return null; + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/AIConfigCliOptions.java b/consilens-cli/src/main/java/com/consilens/cli/command/AIConfigCliOptions.java new file mode 100644 index 0000000..f3603bb --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/command/AIConfigCliOptions.java @@ -0,0 +1,159 @@ +package com.consilens.cli.command; + +import com.consilens.cli.ai.AIBackendOptions; +import com.consilens.cli.ai.AIConfigRequest; +import picocli.CommandLine.Option; +import picocli.CommandLine.Parameters; + +/** + * Shared options for AI commands that generate a Consilens config draft. + */ +class AIConfigCliOptions { + + @Parameters(index = "0", arity = "0..1", description = "Natural language diff goal") + String goal; + + @Option(names = "--backend", description = "AI backend: noop, ollama, openai, deepseek. Defaults to CONSILENS_AI_BACKEND or noop") + String backend; + + @Option(names = "--model", description = "AI model name") + String model; + + @Option(names = "--base-url", description = "AI backend base URL") + String baseUrl; + + @Option(names = "--api-key", description = "AI backend API key") + String apiKey; + + @Option(names = "--timeout", description = "AI backend timeout") + String timeout; + + @Option(names = "--temperature", description = "AI sampling temperature") + Double temperature; + + @Option(names = "--max-tokens", description = "AI max output tokens") + Integer maxTokens; + + @Option(names = "--no-llm", description = "Do not call an LLM; only use explicit CLI hints") + boolean noLlm; + + @Option(names = "--source-type", description = "Source connector type") + String sourceType; + + @Option(names = "--source-url", description = "Source JDBC URL") + String sourceUrl; + + @Option(names = "--source-name", description = "Source logical name") + String sourceName; + + @Option(names = "--source-table", description = "Source table name") + String sourceTable; + + @Option(names = "--source-query", description = "Source SELECT/WITH query") + String sourceQuery; + + @Option(names = "--source-user-env", defaultValue = "SOURCE_USERNAME", description = "Source username env var name") + String sourceUserEnv; + + @Option(names = "--source-password-env", defaultValue = "SOURCE_PASSWORD", description = "Source password env var name") + String sourcePasswordEnv; + + @Option(names = "--target-type", description = "Target connector type") + String targetType; + + @Option(names = "--target-url", description = "Target JDBC URL") + String targetUrl; + + @Option(names = "--target-name", description = "Target logical name") + String targetName; + + @Option(names = "--target-table", description = "Target table name") + String targetTable; + + @Option(names = "--target-query", description = "Target SELECT/WITH query") + String targetQuery; + + @Option(names = "--target-user-env", defaultValue = "TARGET_USERNAME", description = "Target username env var name") + String targetUserEnv; + + @Option(names = "--target-password-env", defaultValue = "TARGET_PASSWORD", description = "Target password env var name") + String targetPasswordEnv; + + @Option(names = "--keys", description = "Comma-separated key columns used on both sides") + String keys; + + @Option(names = "--source-keys", description = "Comma-separated source key columns") + String sourceKeys; + + @Option(names = "--target-keys", description = "Comma-separated target key columns") + String targetKeys; + + @Option(names = "--fields", description = "Comma-separated fields used on both sides") + String fields; + + @Option(names = "--source-fields", description = "Comma-separated source fields") + String sourceFields; + + @Option(names = "--target-fields", description = "Comma-separated target fields") + String targetFields; + + @Option(names = "--strategy-mode", defaultValue = "checksum", description = "Strategy mode: checksum or join") + String strategyMode; + + @Option(names = "--algorithm", defaultValue = "xor", description = "Checksum algorithm: xor or concat") + String algorithm; + + @Option(names = "--bisection-factor", description = "Bisection factor") + Integer bisectionFactor; + + @Option(names = "--bisection-threshold", description = "Bisection threshold") + Long bisectionThreshold; + + @Option(names = "--batch-size", description = "Batch size") + Integer batchSize; + + @Option(names = "--max-differences", description = "Maximum retained differences") + Long maxDifferences; + + AIConfigRequest toRequest() { + return AIConfigRequest.builder() + .goal(goal) + .sourceType(sourceType) + .sourceUrl(sourceUrl) + .sourceName(sourceName) + .sourceTable(sourceTable) + .sourceQuery(sourceQuery) + .sourceUserEnv(sourceUserEnv) + .sourcePasswordEnv(sourcePasswordEnv) + .targetType(targetType) + .targetUrl(targetUrl) + .targetName(targetName) + .targetTable(targetTable) + .targetQuery(targetQuery) + .targetUserEnv(targetUserEnv) + .targetPasswordEnv(targetPasswordEnv) + .keys(keys) + .sourceKeys(sourceKeys) + .targetKeys(targetKeys) + .fields(fields) + .sourceFields(sourceFields) + .targetFields(targetFields) + .strategyMode(strategyMode) + .algorithm(algorithm) + .bisectionFactor(bisectionFactor) + .bisectionThreshold(bisectionThreshold) + .batchSize(batchSize) + .maxDifferences(maxDifferences) + .backendOptions(AIBackendOptions.builder() + .backend(backend) + .model(model) + .baseUrl(baseUrl) + .apiKey(apiKey) + .timeout(timeout) + .temperature(temperature) + .maxTokens(maxTokens) + .noLlm(noLlm) + .build()) + .build(); + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/AiCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/AiCommand.java new file mode 100644 index 0000000..81b05b1 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/command/AiCommand.java @@ -0,0 +1,27 @@ +package com.consilens.cli.command; + +import picocli.CommandLine.Command; + +/** + * AI-assisted Consilens commands. + */ +@Command( + name = "ai", + description = "AI-assisted configuration, explanation and diagnosis commands", + mixinStandardHelpOptions = true, + subcommands = { + AiConfigCommand.class, + AiExplainCommand.class, + AiDiagnoseCommand.class, + AiDiffCommand.class, + AiProvidersCommand.class, + AiDoctorCommand.class + } +) +public class AiCommand implements Runnable { + + @Override + public void run() { + System.out.println("Use `consilens ai --help` to see available AI commands."); + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/AiConfigCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/AiConfigCommand.java new file mode 100644 index 0000000..b4b4ef8 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/command/AiConfigCommand.java @@ -0,0 +1,106 @@ +package com.consilens.cli.command; + +import com.consilens.ai.config.model.AIConfigIssue; +import com.consilens.cli.ai.AIConfigResult; +import com.consilens.cli.ai.AIConfigService; +import com.consilens.cli.config.ConfigurationManager; +import com.consilens.cli.service.DiffService; +import lombok.extern.slf4j.Slf4j; +import picocli.CommandLine.Command; +import picocli.CommandLine.Mixin; +import picocli.CommandLine.Option; + +import java.io.ByteArrayInputStream; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.charset.StandardCharsets; +import java.util.concurrent.Callable; + +/** + * Generates a production-shaped Consilens YAML configuration with AI assistance. + */ +@Slf4j +@Command( + name = "config", + description = "Generate a validated Consilens YAML configuration from a goal and explicit hints", + mixinStandardHelpOptions = true +) +public class AiConfigCommand implements Callable { + + private final AIConfigService configService; + private final DiffService diffService; + private final ConfigurationManager configurationManager; + + @Mixin + private AIConfigCliOptions options = new AIConfigCliOptions(); + + @Option(names = {"-o", "--output"}, description = "Output YAML file") + private String output; + + @Option(names = "--dry-run", description = "Run DiffService dry-run after generating the config") + private boolean dryRun; + + public AiConfigCommand() { + this(new AIConfigService(), new DiffService(), new ConfigurationManager()); + } + + AiConfigCommand(AIConfigService configService, + DiffService diffService, + ConfigurationManager configurationManager) { + this.configService = configService; + this.diffService = diffService; + this.configurationManager = configurationManager; + } + + @Override + public Integer call() { + try { + AIConfigResult result = configService.generate(options.toRequest()); + if (!result.isValid()) { + printIssues(result); + System.err.println("No file was written."); + return 1; + } + + if (dryRun) { + diffService.performDryRun(loadGeneratedConfiguration(result.getYaml())); + } + + if (output == null || output.isBlank()) { + System.out.println(result.getYaml()); + } else { + Path outputPath = Path.of(output).toAbsolutePath().normalize(); + if (outputPath.getParent() != null) { + Files.createDirectories(outputPath.getParent()); + } + Files.writeString(outputPath, result.getYaml()); + System.out.println("[AI CONFIG] generated=" + outputPath); + } + System.out.println("[AI CONFIG] validation=passed"); + if (dryRun) { + System.out.println("[AI CONFIG] dryRun=passed"); + } + return 0; + } catch (Exception e) { + log.error("AI config generation failed", e); + System.err.println("[AI CONFIG ERROR] " + e.getMessage()); + return 1; + } + } + + private void printIssues(AIConfigResult result) { + System.err.println("[AI CONFIG ERROR] validation failed"); + if (result.getIssues() == null) { + return; + } + for (AIConfigIssue issue : result.getIssues()) { + System.err.printf("%s %s %s:%n %s%n", + issue.getSeverity(), issue.getCode(), issue.getPath(), issue.getMessage()); + } + } + + private com.consilens.cli.model.CliConfiguration loadGeneratedConfiguration(String yaml) throws Exception { + return configurationManager.loadConfiguration( + new ByteArrayInputStream(yaml.getBytes(StandardCharsets.UTF_8)), "yaml"); + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/AiDiagnoseCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/AiDiagnoseCommand.java new file mode 100644 index 0000000..47fc477 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/command/AiDiagnoseCommand.java @@ -0,0 +1,78 @@ +package com.consilens.cli.command; + +import com.consilens.ai.spi.AIAnalyzerManager; +import com.consilens.cli.ai.AIDiagnoseService; +import lombok.extern.slf4j.Slf4j; +import picocli.CommandLine.Command; +import picocli.CommandLine.Option; + +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.concurrent.Callable; +import java.util.function.Function; + +/** + * Diagnoses existing diff evidence with deterministic rule-based analysis. + */ +@Slf4j +@Command( + name = "diagnose", + description = "Diagnose an existing diff result or diff-record JSON file", + mixinStandardHelpOptions = true +) +public class AiDiagnoseCommand implements Callable { + + private static final String DEFAULT_ANALYZER = "rulebased"; + + private final Function diagnoseServiceFactory; + + @Option(names = "--result", required = true, description = "Path to DiffResult JSON or diff-record JSON array") + private String resultPath; + + @Option(names = "--analyzer", description = "Analyzer provider name. Defaults to CONSILENS_AI_ANALYZER or rulebased") + private String analyzer; + + @Option(names = {"-o", "--output"}, description = "Write diagnosis report to a file instead of stdout") + private String output; + + public AiDiagnoseCommand() { + this(name -> new AIDiagnoseService(AIAnalyzerManager.getInstance().create(name))); + } + + AiDiagnoseCommand(Function diagnoseServiceFactory) { + this.diagnoseServiceFactory = diagnoseServiceFactory; + } + + @Override + public Integer call() { + try { + String report = diagnoseServiceFactory.apply(resolveAnalyzer()).diagnose(resultPath); + if (output == null || output.isBlank()) { + System.out.print(report); + } else { + Path outputPath = Path.of(output).toAbsolutePath().normalize(); + if (outputPath.getParent() != null) { + Files.createDirectories(outputPath.getParent()); + } + Files.writeString(outputPath, report); + System.out.println("[AI DIAGNOSE] report=" + outputPath); + } + return 0; + } catch (Exception e) { + log.error("AI diagnose failed", e); + System.err.println("[AI DIAGNOSE ERROR] " + e.getMessage()); + return 1; + } + } + + private String resolveAnalyzer() { + if (analyzer != null && !analyzer.isBlank()) { + return analyzer.trim(); + } + String envAnalyzer = System.getenv("CONSILENS_AI_ANALYZER"); + if (envAnalyzer != null && !envAnalyzer.isBlank()) { + return envAnalyzer.trim(); + } + return DEFAULT_ANALYZER; + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/AiDiffCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/AiDiffCommand.java new file mode 100644 index 0000000..15065f6 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/command/AiDiffCommand.java @@ -0,0 +1,86 @@ +package com.consilens.cli.command; + +import com.consilens.ai.config.model.AIConfigIssue; +import com.consilens.cli.ai.AIConfigResult; +import com.consilens.cli.ai.AIConfigService; +import lombok.extern.slf4j.Slf4j; +import picocli.CommandLine.Command; +import picocli.CommandLine.Mixin; +import picocli.CommandLine.Option; + +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.concurrent.Callable; + +/** + * Safe AI diff entrypoint that generates a config, but does not execute real diff yet. + */ +@Slf4j +@Command( + name = "diff", + description = "Generate a validated diff config from AI input without executing the real diff", + mixinStandardHelpOptions = true +) +public class AiDiffCommand implements Callable { + + private final AIConfigService configService; + + @Mixin + private AIConfigCliOptions options = new AIConfigCliOptions(); + + @Option(names = {"-o", "--output"}, required = true, description = "Output YAML file") + private String output; + + @Option(names = "--execute", description = "Execute diff after generation. Not supported yet.") + private boolean execute; + + public AiDiffCommand() { + this(new AIConfigService()); + } + + AiDiffCommand(AIConfigService configService) { + this.configService = configService; + } + + @Override + public Integer call() { + if (execute) { + System.err.println("`consilens ai diff --execute` is not supported yet. Generate YAML first, then run `consilens diff -c ` explicitly."); + return 2; + } + try { + AIConfigResult result = configService.generate(options.toRequest()); + if (!result.isValid()) { + printIssues(result); + System.err.println("No file was written."); + return 1; + } + Path outputPath = Path.of(output).toAbsolutePath().normalize(); + if (outputPath.getParent() != null) { + Files.createDirectories(outputPath.getParent()); + } + Files.writeString(outputPath, result.getYaml()); + System.out.println("[AI DIFF] generated=" + outputPath); + System.out.println("[AI DIFF] validation=passed"); + System.out.println("Next:"); + System.out.println(" consilens diff --dry-run -c " + outputPath); + System.out.println(" consilens diff -c " + outputPath); + return 0; + } catch (Exception e) { + log.error("AI diff config generation failed", e); + System.err.println("[AI DIFF ERROR] " + e.getMessage()); + return 1; + } + } + + private void printIssues(AIConfigResult result) { + System.err.println("[AI DIFF ERROR] validation failed"); + if (result.getIssues() == null) { + return; + } + for (AIConfigIssue issue : result.getIssues()) { + System.err.printf("%s %s %s:%n %s%n", + issue.getSeverity(), issue.getCode(), issue.getPath(), issue.getMessage()); + } + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/AiDoctorCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/AiDoctorCommand.java new file mode 100644 index 0000000..f0b6069 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/command/AiDoctorCommand.java @@ -0,0 +1,315 @@ +package com.consilens.cli.command; + +import com.consilens.ai.model.BackendInfo; +import com.consilens.ai.spi.AIAnalyzer; +import com.consilens.ai.spi.AIAnalyzerManager; +import com.consilens.ai.spi.LLMBackend; +import com.consilens.ai.spi.LLMBackendManager; +import com.consilens.cli.ai.AIBackendOptions; +import com.consilens.cli.ai.LLMBackendResolver; +import com.fasterxml.jackson.databind.ObjectMapper; +import picocli.CommandLine.Command; +import picocli.CommandLine.Model.CommandSpec; +import picocli.CommandLine.Option; +import picocli.CommandLine.Spec; + +import java.io.PrintWriter; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.TreeSet; +import java.util.concurrent.Callable; +import java.util.function.Function; +import java.util.function.Supplier; + +/** + * Runs offline production-readiness checks for AI command wiring. + */ +@Command( + name = "doctor", + description = "Check AI providers, selected analyzer, backend wiring and required credentials", + mixinStandardHelpOptions = true +) +public class AiDoctorCommand implements Callable { + + private static final String DEFAULT_ANALYZER = "rulebased"; + + private final Supplier> analyzerNames; + private final Supplier> backendNames; + private final Function analyzerFactory; + private final Function backendFactory; + private final Function backendNameResolver; + private final Function envProvider; + private final ObjectMapper jsonMapper; + + @Option(names = "--analyzer", description = "Analyzer provider name. Defaults to CONSILENS_AI_ANALYZER or rulebased") + private String analyzer; + + @Option(names = "--backend", description = "AI backend: noop, ollama, openai, deepseek. Defaults to CONSILENS_AI_BACKEND or noop") + private String backend; + + @Option(names = "--model", description = "AI model name") + private String model; + + @Option(names = "--base-url", description = "AI backend base URL") + private String baseUrl; + + @Option(names = "--api-key", description = "AI backend API key") + private String apiKey; + + @Option(names = "--timeout", description = "AI backend timeout") + private String timeout; + + @Option(names = "--temperature", description = "AI sampling temperature") + private Double temperature; + + @Option(names = "--max-tokens", description = "AI max output tokens") + private Integer maxTokens; + + @Option(names = "--online", description = "Also call backend availability check. Default checks are offline only.") + private boolean online; + + @Option(names = "--format", defaultValue = "text", description = "Output format: text or json (default: text)") + private String format; + + @Spec + private CommandSpec spec; + + public AiDoctorCommand() { + this( + () -> AIAnalyzerManager.getInstance().supportedNames(), + () -> LLMBackendManager.getInstance().supportedNames(), + name -> AIAnalyzerManager.getInstance().create(name), + AiDoctorCommand::resolveBackend, + AiDoctorCommand::resolveBackendName, + System::getenv, + new ObjectMapper() + ); + } + + AiDoctorCommand(Supplier> analyzerNames, + Supplier> backendNames, + Function analyzerFactory, + Function backendFactory, + Function backendNameResolver, + Function envProvider, + ObjectMapper jsonMapper) { + this.analyzerNames = analyzerNames; + this.backendNames = backendNames; + this.analyzerFactory = analyzerFactory; + this.backendFactory = backendFactory; + this.backendNameResolver = backendNameResolver; + this.envProvider = envProvider; + this.jsonMapper = jsonMapper; + } + + @Override + public Integer call() throws Exception { + DoctorReport report = runChecks(); + if ("json".equalsIgnoreCase(format)) { + printJson(report); + } else if ("text".equalsIgnoreCase(format)) { + printText(report); + } else { + spec.commandLine().getErr().println("Unsupported format: " + format + ". Use text or json."); + spec.commandLine().getErr().flush(); + return 2; + } + return report.hasFailures() ? 1 : 0; + } + + private DoctorReport runChecks() { + DoctorReport report = new DoctorReport(); + List analyzers = sorted(analyzerNames.get()); + List backends = sorted(backendNames.get()); + report.add("analyzers", analyzers.isEmpty() ? "FAIL" : "PASS", + analyzers.isEmpty() ? "No AI analyzer providers discovered" : "Discovered: " + String.join(", ", analyzers)); + + String selectedAnalyzer = resolveAnalyzer(); + if (!analyzers.contains(selectedAnalyzer)) { + report.add("analyzer", "FAIL", "Analyzer not discovered: " + selectedAnalyzer); + } else { + checkAnalyzer(report, selectedAnalyzer); + } + + report.add("llmBackends", backends.isEmpty() ? "FAIL" : "PASS", + backends.isEmpty() ? "No LLM backend providers discovered" : "Discovered: " + String.join(", ", backends)); + + AIBackendOptions backendOptions = backendOptions(); + String selectedBackend = backendNameResolver.apply(backendOptions); + LLMBackend selected = null; + if (!backends.contains(selectedBackend)) { + report.add("llmBackend", "FAIL", "LLM backend not discovered: " + selectedBackend); + } else { + selected = checkBackend(report, selectedBackend, backendOptions); + } + + boolean credentialsOk = checkCredentials(report, selectedBackend); + checkOnlineAvailability(report, selectedBackend, selected, credentialsOk); + return report; + } + + private void checkAnalyzer(DoctorReport report, String selectedAnalyzer) { + try { + AIAnalyzer created = analyzerFactory.apply(selectedAnalyzer); + if (created == null) { + report.add("analyzer", "FAIL", "Analyzer factory returned null: " + selectedAnalyzer); + return; + } + report.add("analyzer", created.isAvailable() ? "PASS" : "FAIL", + "Selected: " + created.getName()); + } catch (Exception e) { + report.add("analyzer", "FAIL", "Failed to create analyzer " + selectedAnalyzer + ": " + e.getMessage()); + } + } + + private LLMBackend checkBackend(DoctorReport report, String selectedBackend, AIBackendOptions backendOptions) { + try { + LLMBackend created = backendFactory.apply(backendOptions); + if (created == null) { + report.add("llmBackend", "FAIL", "Backend factory returned null: " + selectedBackend); + return null; + } + BackendInfo info = created.info(); + String modelText = info == null || info.getModel() == null ? "unknown" : info.getModel(); + report.add("llmBackend", "PASS", "Selected: " + selectedBackend + ", model=" + modelText); + return created; + } catch (Exception e) { + report.add("llmBackend", "FAIL", "Failed to create backend " + selectedBackend + ": " + e.getMessage()); + return null; + } + } + + private boolean checkCredentials(DoctorReport report, String selectedBackend) { + String envName = requiredApiKeyEnv(selectedBackend); + if (envName == null) { + if ("noop".equalsIgnoreCase(selectedBackend)) { + report.add("credentials", "WARN", "noop backend does not call an LLM"); + } else { + report.add("credentials", "PASS", "No API key required for backend " + selectedBackend); + } + return true; + } + if (hasText(apiKey) || hasText(envProvider.apply(envName))) { + report.add("credentials", "PASS", "API key configured via --api-key or " + envName); + return true; + } + report.add("credentials", "FAIL", envName + " or --api-key is required for backend " + selectedBackend); + return false; + } + + private void checkOnlineAvailability(DoctorReport report, String selectedBackend, LLMBackend backend, boolean credentialsOk) { + if (!online) { + report.add("onlineAvailability", "SKIP", "Use --online to check backend reachability"); + return; + } + if ("noop".equalsIgnoreCase(selectedBackend)) { + report.add("onlineAvailability", "SKIP", "noop backend has no online endpoint"); + return; + } + if (backend == null || !credentialsOk) { + report.add("onlineAvailability", "SKIP", "Skipped because backend configuration is invalid"); + return; + } + boolean available = backend.isAvailable(); + report.add("onlineAvailability", available ? "PASS" : "FAIL", + available ? "Backend is reachable" : "Backend is not reachable"); + } + + private void printText(DoctorReport report) { + PrintWriter out = spec.commandLine().getOut(); + out.println("# AI Doctor"); + out.println(); + out.println("Checks:"); + for (Map check : report.checks) { + out.println(" - " + check.get("name") + ": " + check.get("status") + " " + check.get("message")); + } + out.println(); + out.println("Status: " + report.status()); + out.flush(); + } + + private void printJson(DoctorReport report) throws Exception { + PrintWriter out = spec.commandLine().getOut(); + Map payload = new LinkedHashMap<>(); + payload.put("status", report.status()); + payload.put("checks", report.checks); + out.println(jsonMapper.writeValueAsString(payload)); + out.flush(); + } + + private AIBackendOptions backendOptions() { + return AIBackendOptions.builder() + .backend(backend) + .model(model) + .baseUrl(baseUrl) + .apiKey(apiKey) + .timeout(timeout) + .temperature(temperature) + .maxTokens(maxTokens) + .build(); + } + + private String resolveAnalyzer() { + if (hasText(analyzer)) { + return analyzer.trim(); + } + String envAnalyzer = envProvider.apply("CONSILENS_AI_ANALYZER"); + if (hasText(envAnalyzer)) { + return envAnalyzer.trim(); + } + return DEFAULT_ANALYZER; + } + + private String requiredApiKeyEnv(String selectedBackend) { + if ("openai".equalsIgnoreCase(selectedBackend)) { + return "OPENAI_API_KEY"; + } + if ("deepseek".equalsIgnoreCase(selectedBackend)) { + return "DEEPSEEK_API_KEY"; + } + return null; + } + + private List sorted(Set names) { + return names == null ? List.of() : new ArrayList<>(new TreeSet<>(names)); + } + + private boolean hasText(String value) { + return value != null && !value.trim().isEmpty(); + } + + private static LLMBackend resolveBackend(AIBackendOptions options) { + return new LLMBackendResolver().resolve(options); + } + + private static String resolveBackendName(AIBackendOptions options) { + return new LLMBackendResolver().resolveBackendName(options); + } + + private static class DoctorReport { + + private final List> checks = new ArrayList<>(); + + private void add(String name, String status, String message) { + Map check = new LinkedHashMap<>(); + check.put("name", name); + check.put("status", status); + check.put("message", message); + checks.add(check); + } + + private boolean hasFailures() { + return checks.stream().anyMatch(check -> "FAIL".equals(check.get("status"))); + } + + private String status() { + if (hasFailures()) { + return "FAIL"; + } + return checks.stream().anyMatch(check -> "WARN".equals(check.get("status"))) ? "WARN" : "PASS"; + } + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/AiExplainCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/AiExplainCommand.java new file mode 100644 index 0000000..8e26dd9 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/command/AiExplainCommand.java @@ -0,0 +1,78 @@ +package com.consilens.cli.command; + +import com.consilens.cli.ai.AIExplainService; +import com.consilens.cli.config.ConfigurationManager; +import com.consilens.cli.model.CliConfiguration; +import com.consilens.cli.service.DiffService; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import lombok.extern.slf4j.Slf4j; +import picocli.CommandLine.Command; +import picocli.CommandLine.Option; + +import java.io.File; +import java.util.concurrent.Callable; + +/** + * Explains a Consilens configuration using deterministic engine facts. + */ +@Slf4j +@Command( + name = "explain", + description = "Explain a Consilens YAML configuration and its execution risks", + mixinStandardHelpOptions = true +) +public class AiExplainCommand implements Callable { + + private final ConfigurationManager configurationManager; + private final DiffService diffService; + private final AIExplainService explainService; + private final ObjectMapper rawConfigMapper; + + @Option(names = {"-c", "--config"}, required = true, description = "Configuration file path") + private String configFile; + + @Option(names = "--dry-run", description = "Run DiffService dry-run before printing the explanation") + private boolean dryRun; + + public AiExplainCommand() { + this(new ConfigurationManager(), new DiffService(), new AIExplainService(), new ObjectMapper(new YAMLFactory())); + } + + AiExplainCommand(ConfigurationManager configurationManager, + DiffService diffService, + AIExplainService explainService, + ObjectMapper rawConfigMapper) { + this.configurationManager = configurationManager; + this.diffService = diffService; + this.explainService = explainService; + this.rawConfigMapper = rawConfigMapper; + } + + @Override + public Integer call() { + try { + CliConfiguration config = dryRun ? configurationManager.loadConfiguration(configFile, false) : loadRawConfig(); + if (dryRun) { + diffService.performDryRun(config); + } + System.out.print(explainService.explain(config)); + if (dryRun) { + System.out.println(); + System.out.println("Dry run:"); + System.out.println(" passed"); + } + return 0; + } catch (Exception e) { + log.error("AI explain failed", e); + System.err.println("[AI EXPLAIN ERROR] " + e.getMessage()); + return 1; + } + } + + private CliConfiguration loadRawConfig() throws Exception { + CliConfiguration config = rawConfigMapper.readValue(new File(configFile), CliConfiguration.class); + config.validate(); + return config; + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/AiProvidersCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/AiProvidersCommand.java new file mode 100644 index 0000000..d8b375f --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/command/AiProvidersCommand.java @@ -0,0 +1,102 @@ +package com.consilens.cli.command; + +import com.consilens.ai.spi.AIAnalyzerManager; +import com.consilens.ai.spi.LLMBackendManager; +import com.fasterxml.jackson.databind.ObjectMapper; +import picocli.CommandLine.Command; +import picocli.CommandLine.Model.CommandSpec; +import picocli.CommandLine.Option; +import picocli.CommandLine.Spec; + +import java.io.PrintWriter; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.TreeSet; +import java.util.concurrent.Callable; +import java.util.function.Supplier; + +/** + * Lists AI providers discovered through SPI. + */ +@Command( + name = "providers", + description = "List discovered AI analyzer and LLM backend providers", + mixinStandardHelpOptions = true +) +public class AiProvidersCommand implements Callable { + + private final Supplier> analyzerNames; + private final Supplier> backendNames; + private final ObjectMapper jsonMapper; + + @Option(names = "--format", defaultValue = "text", description = "Output format: text or json (default: text)") + private String format; + + @Spec + private CommandSpec spec; + + public AiProvidersCommand() { + this( + () -> AIAnalyzerManager.getInstance().supportedNames(), + () -> LLMBackendManager.getInstance().supportedNames(), + new ObjectMapper() + ); + } + + AiProvidersCommand(Supplier> analyzerNames, Supplier> backendNames) { + this(analyzerNames, backendNames, new ObjectMapper()); + } + + AiProvidersCommand(Supplier> analyzerNames, + Supplier> backendNames, + ObjectMapper jsonMapper) { + this.analyzerNames = analyzerNames; + this.backendNames = backendNames; + this.jsonMapper = jsonMapper; + } + + @Override + public Integer call() throws Exception { + List analyzers = sorted(analyzerNames.get()); + List backends = sorted(backendNames.get()); + if ("json".equalsIgnoreCase(format)) { + Map> payload = new LinkedHashMap<>(); + payload.put("analyzers", analyzers); + payload.put("llmBackends", backends); + PrintWriter out = spec.commandLine().getOut(); + out.println(jsonMapper.writeValueAsString(payload)); + out.flush(); + return 0; + } + if (!"text".equalsIgnoreCase(format)) { + spec.commandLine().getErr().println("Unsupported format: " + format + ". Use text or json."); + spec.commandLine().getErr().flush(); + return 2; + } + + PrintWriter out = spec.commandLine().getOut(); + out.println("# AI Providers"); + out.println(); + printGroup(out, "Analyzers", analyzers); + out.println(); + printGroup(out, "LLM Backends", backends); + out.flush(); + return 0; + } + + private void printGroup(PrintWriter out, String title, List names) { + out.println(title + ":"); + if (names.isEmpty()) { + out.println(" - none"); + return; + } + names.forEach(name -> out.println(" - " + name)); + } + + private List sorted(Set names) { + return names == null ? List.of() : new ArrayList<>(new TreeSet<>(names)); + } +} diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/ConfigCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/ConfigCommand.java index 5988d63..90f8126 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/command/ConfigCommand.java +++ b/consilens-cli/src/main/java/com/consilens/cli/command/ConfigCommand.java @@ -1,8 +1,5 @@ package com.consilens.cli.command; -import com.consilens.cli.config.ConfigurationManager; -import com.consilens.cli.model.CliConfiguration; - import lombok.extern.slf4j.Slf4j; import picocli.CommandLine.Command; diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/ConfigValidateCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/ConfigValidateCommand.java index ba64c11..8adc8f9 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/command/ConfigValidateCommand.java +++ b/consilens-cli/src/main/java/com/consilens/cli/command/ConfigValidateCommand.java @@ -4,6 +4,7 @@ import com.consilens.cli.model.CliConfiguration; import com.consilens.cli.service.ConnectorConfigMapper; import com.consilens.cli.service.ConnectorProbeService; +import com.consilens.cli.service.SensitiveValueMasker; import lombok.extern.slf4j.Slf4j; import picocli.CommandLine.Command; @@ -73,16 +74,16 @@ private void printVerboseDetails(ConfigurationManager configurationManager, Stri if (config.getSource() != null) { System.out.println(" Source:"); System.out.println(" Type : " + nvl(config.getSource().getType())); - System.out.println(" URL : " + nvl(config.getSource().getUrl())); - System.out.println(" User : " + nvl(config.getSource().getUsername())); + System.out.println(" URL : " + SensitiveValueMasker.maskJdbcUrl(config.getSource().getUrl())); + System.out.println(" User : " + SensitiveValueMasker.maskUsername(config.getSource().getUsername())); } // Target connection if (config.getTarget() != null) { System.out.println(" Target:"); System.out.println(" Type : " + nvl(config.getTarget().getType())); - System.out.println(" URL : " + nvl(config.getTarget().getUrl())); - System.out.println(" User : " + nvl(config.getTarget().getUsername())); + System.out.println(" URL : " + SensitiveValueMasker.maskJdbcUrl(config.getTarget().getUrl())); + System.out.println(" User : " + SensitiveValueMasker.maskUsername(config.getTarget().getUsername())); } // Comparison @@ -161,7 +162,7 @@ private void testDatabaseConnections(ConfigurationManager configurationManager, // Test source connection try { - System.out.print(" Testing source connection (" + config.getSource().getUrl() + ") ... "); + System.out.print(" Testing source connection (" + SensitiveValueMasker.maskJdbcUrl(config.getSource().getUrl()) + ") ... "); new ConnectorProbeService().verifyAccessible( ConnectorConfigMapper.toConnectorConfig(config.getSource())); System.out.println("✓ OK"); @@ -176,7 +177,7 @@ private void testDatabaseConnections(ConfigurationManager configurationManager, // Test target connection try { - System.out.print(" Testing target connection (" + config.getTarget().getUrl() + ") ... "); + System.out.print(" Testing target connection (" + SensitiveValueMasker.maskJdbcUrl(config.getTarget().getUrl()) + ") ... "); new ConnectorProbeService().verifyAccessible( ConnectorConfigMapper.toConnectorConfig(config.getTarget())); System.out.println("✓ OK"); diff --git a/consilens-cli/src/main/java/com/consilens/cli/command/DiffCommand.java b/consilens-cli/src/main/java/com/consilens/cli/command/DiffCommand.java index 1acc5cb..92cff8d 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/command/DiffCommand.java +++ b/consilens-cli/src/main/java/com/consilens/cli/command/DiffCommand.java @@ -4,6 +4,7 @@ import com.consilens.cli.model.CliConfiguration; import com.consilens.cli.model.CliDiffResult; import com.consilens.cli.service.DiffService; +import com.consilens.cli.service.SensitiveValueMasker; import lombok.extern.slf4j.Slf4j; import picocli.CommandLine.Command; @@ -41,9 +42,9 @@ public void run() { log.info("Starting diff operation with configuration:"); log.info(" Strategy: {}", config.getStrategyMode()); log.info(" Algorithm: {}", config.getAlgorithm()); - log.info(" Source: {}", config.getSource().getUrl()); + log.info(" Source: {}", SensitiveValueMasker.maskJdbcUrl(config.getSource().getUrl())); log.info(" Source Resource: {}", resourceDisplay(config.getSource())); - log.info(" Target: {}", config.getTarget().getUrl()); + log.info(" Target: {}", SensitiveValueMasker.maskJdbcUrl(config.getTarget().getUrl())); log.info(" Target Resource: {}", resourceDisplay(config.getTarget())); log.info(" Source Key Columns: {}", config.getComparison().getKeys().getSource()); log.info(" Target Key Columns: {}", config.getComparison().getKeys().getTarget()); diff --git a/consilens-cli/src/main/java/com/consilens/cli/model/CliConfiguration.java b/consilens-cli/src/main/java/com/consilens/cli/model/CliConfiguration.java index 168190e..5d9dc47 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/model/CliConfiguration.java +++ b/consilens-cli/src/main/java/com/consilens/cli/model/CliConfiguration.java @@ -11,6 +11,7 @@ import com.consilens.core.validation.ValidationFramework; import com.consilens.sink.api.model.ResultConfig; import com.consilens.sink.api.model.SinkConfig; +import com.consilens.sink.table.TableColumnNames; import com.consilens.sink.table.TableSinkConfig; import com.fasterxml.jackson.annotation.JsonIgnore; import com.fasterxml.jackson.annotation.JsonInclude; @@ -312,6 +313,12 @@ private void validateTableSinkColumns(int sinkIndex, TableSinkConfig tableSinkCo if (tableSinkConfig.getColumns() == null || tableSinkConfig.getColumns().isEmpty()) { return; } + try { + TableColumnNames.validateUniqueSanitizedColumns(tableSinkConfig.getColumns(), + "result.sinks[" + sinkIndex + "]"); + } catch (IllegalArgumentException e) { + throw ValidationException.simple("CONFIGURATION_VALIDATION", e.getMessage()); + } Set columnNames = new HashSet<>(); for (int i = 0; i < tableSinkConfig.getColumns().size(); i++) { String columnName = tableSinkConfig.getColumns().get(i).getName(); diff --git a/consilens-cli/src/main/java/com/consilens/cli/model/CliDiffResult.java b/consilens-cli/src/main/java/com/consilens/cli/model/CliDiffResult.java index f1063bd..66cad43 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/model/CliDiffResult.java +++ b/consilens-cli/src/main/java/com/consilens/cli/model/CliDiffResult.java @@ -60,7 +60,7 @@ public class CliDiffResult { */ @Getter - private int sourceMissingCount; + private long sourceMissingCount; /** * Number of rows missing in target (TARGET_MISSING operation). @@ -69,7 +69,7 @@ public class CliDiffResult { */ @Getter - private int targetMissingCount; + private long targetMissingCount; /** * Number of rows with data mismatch (MISMATCH operation). @@ -78,22 +78,22 @@ public class CliDiffResult { */ @Getter - private int mismatchCount; + private long mismatchCount; /** * Total number of differences. */ - private int totalDifferences; + private long totalDifferences; /** * Number of rows processed in source table. */ - private int sourceRowCount; + private long sourceRowCount; /** * Number of rows processed in target table. */ - private int targetRowCount; + private long targetRowCount; /** * Additional metadata about the operation. @@ -148,7 +148,7 @@ public boolean hasDifferences() { * Get the percentage of rows that differ (relative to larger table). */ public double getDifferencePercentage() { - int maxRows = Math.max(sourceRowCount, targetRowCount); + long maxRows = Math.max(sourceRowCount, targetRowCount); if (maxRows == 0) { return 0.0; } diff --git a/consilens-cli/src/main/java/com/consilens/cli/model/ConnectionConfig.java b/consilens-cli/src/main/java/com/consilens/cli/model/ConnectionConfig.java index 967574c..ec35f56 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/model/ConnectionConfig.java +++ b/consilens-cli/src/main/java/com/consilens/cli/model/ConnectionConfig.java @@ -90,27 +90,7 @@ public void validate(String fieldName) throws ValidationException { private boolean requiresJdbcValidation() { String effectiveUrl = getUrl(); - if (effectiveUrl != null && effectiveUrl.startsWith("jdbc:")) { - return true; - } - if (type == null) { - return false; - } - switch (type.trim().toLowerCase(Locale.ROOT)) { - case "mysql": - case "postgresql": - case "oracle": - case "sqlserver": - case "presto": - case "trino": - case "doris": - case "starrocks": - case "clickhouse": - case "tidb": - return true; - default: - return false; - } + return effectiveUrl != null && effectiveUrl.startsWith("jdbc:"); } @Data diff --git a/consilens-cli/src/main/java/com/consilens/cli/model/StrategyConfig.java b/consilens-cli/src/main/java/com/consilens/cli/model/StrategyConfig.java index 0c4c4ba..69bae96 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/model/StrategyConfig.java +++ b/consilens-cli/src/main/java/com/consilens/cli/model/StrategyConfig.java @@ -49,6 +49,10 @@ public class StrategyConfig { @JsonProperty("localCompare") private LocalCompareConfig localCompare = LocalCompareConfig.builder().mode("full").build(); + @Builder.Default + @JsonProperty("maxDifferences") + private Long maxDifferences = 1_000_000L; + @JsonIgnore public ComparisonStrategy getModeEnum() { return ComparisonStrategy.fromString(mode); @@ -65,6 +69,7 @@ public void validate() throws ValidationException { .notEmpty(algorithm, "strategy.algorithm") .positive(batchSize, "strategy.batchSize") .positive(bisectionFactor, "strategy.bisectionFactor") + .positive(maxDifferences, "strategy.maxDifferences") .throwIfInvalid(); String normalizedMode = mode == null ? null : mode.trim().toLowerCase(); diff --git a/consilens-cli/src/main/java/com/consilens/cli/service/CompareRequestFactory.java b/consilens-cli/src/main/java/com/consilens/cli/service/CompareRequestFactory.java index f8bf096..1fe7ef2 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/service/CompareRequestFactory.java +++ b/consilens-cli/src/main/java/com/consilens/cli/service/CompareRequestFactory.java @@ -342,6 +342,9 @@ private CompareExecutionOptions toExecutionOptions(CliConfiguration config) { if (config.getConcurrency() != null) { attributes.put("concurrencyConfig", config.getConcurrency()); } + if (config.getStrategy().getMaxDifferences() != null) { + attributes.put("maxDifferences", config.getStrategy().getMaxDifferences()); + } return CompareExecutionOptions.builder() .bisectionFactor(config.getStrategy().getBisectionFactor()) @@ -351,7 +354,8 @@ private CompareExecutionOptions toExecutionOptions(CliConfiguration config) { .localCompareMode(config.getStrategy().getLocalCompare() != null ? config.getStrategy().getLocalCompare().getMode() : null) - .validateUniqueKeys(false) + .validateUniqueKeys(true) + .maxDifferences(config.getStrategy().getMaxDifferences()) .attributes(attributes.isEmpty() ? null : attributes) .build(); } diff --git a/consilens-cli/src/main/java/com/consilens/cli/service/DiffService.java b/consilens-cli/src/main/java/com/consilens/cli/service/DiffService.java index 778c3c5..6a8cfcb 100644 --- a/consilens-cli/src/main/java/com/consilens/cli/service/DiffService.java +++ b/consilens-cli/src/main/java/com/consilens/cli/service/DiffService.java @@ -171,12 +171,12 @@ private CliDiffResult convertToCLIResult(DiffResult coreResult, String strategy, return CliDiffResult.builder() .strategy(strategy) - .sourceMissingCount((int) stats.getSourceMissingCount()) - .targetMissingCount((int) stats.getTargetMissingCount()) - .mismatchCount((int) stats.getMismatchCount()) - .totalDifferences((int) stats.getTotalDifferences()) - .sourceRowCount((int) stats.getSourceRowCount()) - .targetRowCount((int) stats.getTargetRowCount()) + .sourceMissingCount(stats.getSourceMissingCount()) + .targetMissingCount(stats.getTargetMissingCount()) + .mismatchCount(stats.getMismatchCount()) + .totalDifferences(stats.getTotalDifferences()) + .sourceRowCount(stats.getSourceRowCount()) + .targetRowCount(stats.getTargetRowCount()) .differences(convertDiffRows(coreResult.getDifferences())) .tableMetadata(tableMetadata) .infoTree(coreResult.getInfoTree() != null && coreResult.getInfoTree().isPresent() @@ -313,6 +313,9 @@ private CompareExecutionOptions toExecutionOptions(CliConfiguration config) { if (config.getConcurrency() != null) { attributes.put("concurrencyConfig", config.getConcurrency()); } + if (config.getStrategy().getMaxDifferences() != null) { + attributes.put("maxDifferences", config.getStrategy().getMaxDifferences()); + } return CompareExecutionOptions.builder() .bisectionFactor(config.getStrategy().getBisectionFactor()) @@ -322,7 +325,8 @@ private CompareExecutionOptions toExecutionOptions(CliConfiguration config) { .localCompareMode(config.getStrategy().getLocalCompare() != null ? config.getStrategy().getLocalCompare().getMode() : null) - .validateUniqueKeys(false) + .validateUniqueKeys(true) + .maxDifferences(config.getStrategy().getMaxDifferences()) .attributes(attributes.isEmpty() ? null : attributes) .build(); } @@ -482,8 +486,8 @@ public CliDiffResult performDryRun(CliConfiguration config) throws Exception { .targetMissingCount(0) .mismatchCount(0) .totalDifferences(0) - .sourceRowCount((int) sourceRowCount) - .targetRowCount((int) targetRowCount) + .sourceRowCount(sourceRowCount) + .targetRowCount(targetRowCount) .differences(new java.util.ArrayList<>()) .tableMetadata(tableMetadata) .build(); diff --git a/consilens-cli/src/main/java/com/consilens/cli/service/SensitiveValueMasker.java b/consilens-cli/src/main/java/com/consilens/cli/service/SensitiveValueMasker.java new file mode 100644 index 0000000..8eb2966 --- /dev/null +++ b/consilens-cli/src/main/java/com/consilens/cli/service/SensitiveValueMasker.java @@ -0,0 +1,104 @@ +package com.consilens.cli.service; + +import java.util.List; +import java.util.Locale; + +public final class SensitiveValueMasker { + + private static final List SENSITIVE_QUERY_KEYS = List.of( + "password", "passwd", "pwd", "token", "secret", "user", "username", "key"); + + private SensitiveValueMasker() { + } + + public static String maskJdbcUrl(String url) { + if (url == null || url.isBlank()) { + return "(not set)"; + } + String masked = maskSemicolonProperties(maskUserInfo(url)); + int queryStart = masked.indexOf('?'); + if (queryStart < 0 || queryStart == masked.length() - 1) { + return masked; + } + String prefix = masked.substring(0, queryStart + 1); + String query = masked.substring(queryStart + 1); + String[] params = query.split("&", -1); + for (int i = 0; i < params.length; i++) { + int eq = params[i].indexOf('='); + if (eq <= 0) { + continue; + } + String key = params[i].substring(0, eq); + if (isSensitiveQueryKey(key)) { + params[i] = key + "=***"; + } + } + return prefix + String.join("&", params); + } + + public static String maskUsername(String username) { + if (username == null || username.isBlank()) { + return "(not set)"; + } + String trimmed = username.trim(); + if (trimmed.length() <= 2) { + return "***"; + } + return trimmed.charAt(0) + "***" + trimmed.charAt(trimmed.length() - 1); + } + + private static String maskUserInfo(String value) { + int scheme = value.indexOf("://"); + if (scheme < 0) { + return value; + } + int authorityStart = scheme + 3; + int slash = value.indexOf('/', authorityStart); + int query = value.indexOf('?', authorityStart); + int authorityEnd; + if (slash < 0) { + authorityEnd = query >= 0 ? query : value.length(); + } else if (query < 0) { + authorityEnd = slash; + } else { + authorityEnd = Math.min(slash, query); + } + int at = value.lastIndexOf('@', authorityEnd); + if (at < authorityStart) { + return value; + } + return value.substring(0, authorityStart) + "***@" + value.substring(at + 1); + } + + private static boolean isSensitiveQueryKey(String key) { + String normalized = key.trim().toLowerCase(Locale.ROOT); + String compact = normalized.replace("_", "").replace("-", "").replace(".", ""); + return SENSITIVE_QUERY_KEYS.stream().anyMatch(normalized::equals) + || compact.contains("password") + || compact.contains("passwd") + || compact.contains("token") + || compact.contains("secret") + || compact.equals("apikey") + || compact.endsWith("apikey") + || compact.endsWith("accesskey") + || compact.endsWith("privatekey"); + } + + private static String maskSemicolonProperties(String value) { + String[] parts = value.split(";", -1); + if (parts.length <= 1) { + return value; + } + for (int i = 1; i < parts.length; i++) { + int eq = parts[i].indexOf('='); + if (eq <= 0) { + continue; + } + String key = parts[i].substring(0, eq); + if (isSensitiveQueryKey(key)) { + parts[i] = key + "=***"; + } + } + return String.join(";", parts); + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/ai/AIConfigCompilerTest.java b/consilens-cli/src/test/java/com/consilens/cli/ai/AIConfigCompilerTest.java new file mode 100644 index 0000000..b230bac --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/ai/AIConfigCompilerTest.java @@ -0,0 +1,75 @@ +package com.consilens.cli.ai; + +import com.consilens.ai.config.model.AIConfigDraft; +import com.consilens.ai.config.model.DatasetDraft; +import com.consilens.ai.config.model.MappingDraft; +import com.consilens.ai.config.model.StrategyDraft; +import com.consilens.cli.model.CliConfiguration; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AIConfigCompilerTest { + + private final AIConfigCompiler compiler = new AIConfigCompiler(); + + @Test + void shouldCompileDraftToCanonicalCliConfiguration() { + CliConfiguration config = compiler.compile(draft()); + + assertDoesNotThrow(config::validate); + assertEquals("mysql", config.getSource().getType()); + assertEquals("users", config.getSource().getResource().getName()); + assertEquals("${env.SOURCE_USER}", config.getSource().getUsername()); + assertEquals("xor", config.getStrategy().getAlgorithm()); + assertEquals(2, config.getResult().getSinks().size()); + assertEquals("result", config.getResult().getSinks().get(0).getType()); + assertEquals("diff-record", config.getResult().getSinks().get(1).getType()); + } + + @Test + void shouldSerializeCanonicalYamlShape() { + String yaml = compiler.toYaml(compiler.compile(draft())); + + assertTrue(yaml.contains("source:")); + assertTrue(yaml.contains("connection:")); + assertTrue(yaml.contains("resource:")); + assertTrue(yaml.contains("comparison:")); + assertTrue(yaml.contains("strategy:")); + assertTrue(yaml.contains("result:")); + assertTrue(yaml.contains("diff-record")); + assertTrue(yaml.contains("./diff-records.json")); + } + + private AIConfigDraft draft() { + return AIConfigDraft.builder() + .source(dataset("mysql", "source-mysql", "jdbc:mysql://localhost:3306/source", + "SOURCE_USER", "SOURCE_PASSWORD")) + .target(dataset("postgresql", "target-postgresql", "jdbc:postgresql://localhost:5432/target", + "TARGET_USER", "TARGET_PASSWORD")) + .mapping(MappingDraft.builder() + .sourceKeys(List.of("id")) + .targetKeys(List.of("id")) + .sourceFields(List.of("name", "email")) + .targetFields(List.of("name", "email")) + .build()) + .strategy(StrategyDraft.builder().mode("checksum").algorithm("xor").build()) + .build(); + } + + private DatasetDraft dataset(String type, String name, String url, String userEnv, String passwordEnv) { + return DatasetDraft.builder() + .type(type) + .name(name) + .jdbcUrl(url) + .usernameEnv(userEnv) + .passwordEnv(passwordEnv) + .resourceType("table") + .resourceName("users") + .build(); + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/ai/AIConfigServiceTest.java b/consilens-cli/src/test/java/com/consilens/cli/ai/AIConfigServiceTest.java new file mode 100644 index 0000000..80760f0 --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/ai/AIConfigServiceTest.java @@ -0,0 +1,133 @@ +package com.consilens.cli.ai; + +import com.consilens.ai.model.BackendInfo; +import com.consilens.ai.model.ChatMessage; +import com.consilens.ai.model.FunctionDefinition; +import com.consilens.ai.model.LLMResponse; +import com.consilens.ai.spi.LLMBackend; +import com.consilens.cli.model.CliConfiguration; +import com.fasterxml.jackson.databind.ObjectMapper; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AIConfigServiceTest { + + @Test + void shouldGenerateConfigFromStructuredLlmJsonWhenHintsAreMissing() { + String json = "{" + + "\"source\":{\"type\":\"mysql\",\"jdbcUrl\":\"jdbc:mysql://localhost:3306/source\"," + + "\"usernameEnv\":\"MYSQL_USER\",\"passwordEnv\":\"MYSQL_PASSWORD\"," + + "\"resourceType\":\"table\",\"resourceName\":\"users\"}," + + "\"target\":{\"type\":\"postgresql\",\"jdbcUrl\":\"jdbc:postgresql://localhost:5432/target\"," + + "\"usernameEnv\":\"PG_USER\",\"passwordEnv\":\"PG_PASSWORD\"," + + "\"resourceType\":\"table\",\"resourceName\":\"users\"}," + + "\"mapping\":{\"sourceKeys\":[\"id\"],\"targetKeys\":[\"id\"]," + + "\"sourceFields\":[\"name\"],\"targetFields\":[\"name\"]}," + + "\"strategy\":{\"mode\":\"checksum\",\"algorithm\":\"xor\"}," + + "\"result\":{\"sinkFormat\":\"console\",\"sinkType\":\"result\"}" + + "}"; + AIConfigService service = serviceReturning(json); + + AIConfigResult result = service.generate(AIConfigRequest.builder() + .goal("compare users") + .backendOptions(AIBackendOptions.builder().backend("openai").build()) + .build()); + + assertTrue(result.isValid()); + CliConfiguration config = result.getConfiguration(); + assertEquals("mysql", config.getSource().getType()); + assertEquals("postgresql", config.getTarget().getType()); + assertEquals("id", config.getComparison().getKeys().getSource().get(0)); + assertTrue(result.getYaml().contains("comparison:")); + } + + @Test + void shouldRejectInvalidLlmJson() { + AIConfigService service = serviceReturning("not json"); + + AIConfigRequest request = AIConfigRequest.builder() + .goal("compare users") + .backendOptions(AIBackendOptions.builder().backend("openai").build()) + .build(); + + assertThrows(IllegalArgumentException.class, () -> service.generate(request)); + } + + @Test + void shouldNotCallNoopLlmWhenBackendOptionIsMissing() { + AIConfigService service = new AIConfigService( + new com.consilens.ai.config.AIConfigDraftValidator(), + new AIConfigCompiler(), + new StaticBackendResolver(new StaticBackend("not json")), + new ObjectMapper()); + + AIConfigResult result = service.generate(AIConfigRequest.builder() + .goal("compare users") + .backendOptions(AIBackendOptions.builder().backend(null).build()) + .build()); + + assertEquals(false, result.isValid()); + assertTrue(result.getIssues().stream() + .anyMatch(issue -> "AI_CONFIG_DATASET_TYPE_MISSING".equals(issue.getCode()))); + } + + private AIConfigService serviceReturning(String response) { + return new AIConfigService( + new com.consilens.ai.config.AIConfigDraftValidator(), + new AIConfigCompiler(), + new StaticBackendResolver(new StaticBackend(response)), + new ObjectMapper()); + } + + private static class StaticBackendResolver extends LLMBackendResolver { + private final LLMBackend backend; + + StaticBackendResolver(LLMBackend backend) { + this.backend = backend; + } + + @Override + public LLMBackend resolve(AIBackendOptions options) { + return backend; + } + } + + private static class StaticBackend implements LLMBackend { + private final String response; + + StaticBackend(String response) { + this.response = response; + } + + @Override + public LLMResponse chat(String systemPrompt, List messages, List functions) { + return LLMResponse.builder().text(response).finishReason("stop").build(); + } + + @Override + public String complete(String prompt) { + return response; + } + + @Override + public boolean isAvailable() { + return true; + } + + @Override + public BackendInfo info() { + return BackendInfo.builder() + .name("static") + .model("test") + .version("test") + .supportsFunctionCalling(false) + .supportsStreaming(false) + .build(); + } + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/ai/AIDiagnoseServiceTest.java b/consilens-cli/src/test/java/com/consilens/cli/ai/AIDiagnoseServiceTest.java new file mode 100644 index 0000000..bd1526e --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/ai/AIDiagnoseServiceTest.java @@ -0,0 +1,70 @@ +package com.consilens.cli.ai; + +import com.consilens.ai.spi.AIAnalyzerManager; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; + +import java.io.IOException; +import java.nio.file.Files; +import java.nio.file.Path; + +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AIDiagnoseServiceTest { + + @TempDir + Path tempDir; + + private final AIDiagnoseService service = + new AIDiagnoseService(AIAnalyzerManager.getInstance().create("rulebased")); + + @Test + void shouldDiagnoseDiffRecordArray() throws Exception { + Path result = write("diff-records.json", + "[{" + + "\"operation\":\"mismatch\"," + + "\"primaryKey\":\"1\"," + + "\"sourceValues\":[\"Alice\"]," + + "\"targetValues\":[\"\"]," + + "\"columnNames1\":[\"name\"]," + + "\"columnNames2\":[\"name\"]" + + "}]"); + + String report = service.diagnose(result.toString()); + + assertTrue(report.contains("# AI Diagnose")); + assertTrue(report.contains("differences=1")); + assertTrue(report.contains("NULL_HANDLING")); + } + + @Test + void shouldDiagnoseObjectWithDifferencesArray() throws Exception { + Path result = write("diff-result.json", + "{\"differences\":[{" + + "\"operation\":\"MISMATCH\"," + + "\"primaryKey\":[1]," + + "\"sourceValues\":[\"abcdef\"]," + + "\"targetValues\":[\"abc\"]," + + "\"columnNames1\":[\"code\"]," + + "\"columnNames2\":[\"code\"]" + + "}]}"); + + String report = service.diagnose(result.toString()); + + assertTrue(report.contains("TRUNCATION")); + } + + @Test + void shouldRejectStatsOnlyJson() throws Exception { + Path result = write("stats-only.json", "{\"differenceCount\":1}"); + + assertThrows(IOException.class, () -> service.diagnose(result.toString())); + } + + private Path write(String fileName, String content) throws IOException { + Path path = tempDir.resolve(fileName); + Files.writeString(path, content); + return path; + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/ai/LLMBackendResolverTest.java b/consilens-cli/src/test/java/com/consilens/cli/ai/LLMBackendResolverTest.java new file mode 100644 index 0000000..3c2747e --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/ai/LLMBackendResolverTest.java @@ -0,0 +1,79 @@ +package com.consilens.cli.ai; + +import com.consilens.ai.spi.LLMBackend; +import org.junit.jupiter.api.Test; + +import java.util.Map; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; + +class LLMBackendResolverTest { + + private final LLMBackendResolver resolver = new LLMBackendResolver(); + + @Test + void shouldResolveNoopByDefault() { + LLMBackend backend = resolver.resolve(AIBackendOptions.builder().build()); + + assertEquals("noop", backend.info().getName()); + } + + @Test + void shouldResolveOpenAIWithExplicitOptions() { + LLMBackend backend = resolver.resolve(AIBackendOptions.builder() + .backend("openai") + .model("test-model") + .baseUrl("https://example.invalid/v1") + .apiKey("test-key") + .build()); + + assertEquals("openai", backend.info().getName()); + assertEquals("test-model", backend.info().getModel()); + } + + @Test + void shouldResolveDeepSeekWithExplicitOptions() { + LLMBackend backend = resolver.resolve(AIBackendOptions.builder() + .backend("deepseek") + .model("deepseek-test") + .baseUrl("https://example.invalid") + .apiKey("test-key") + .build()); + + assertEquals("deepseek", backend.info().getName()); + assertEquals("deepseek-test", backend.info().getModel()); + } + + @Test + void shouldResolveOllamaWithExplicitOptions() { + LLMBackend backend = resolver.resolve(AIBackendOptions.builder() + .backend("ollama") + .model("qwen-test") + .baseUrl("http://localhost:11434") + .timeout("5s") + .temperature(0.2) + .maxTokens(128) + .build()); + + assertEquals("ollama", backend.info().getName()); + assertEquals("qwen-test", backend.info().getModel()); + } + + @Test + void shouldRejectUnknownBackend() { + AIBackendOptions options = AIBackendOptions.builder().backend("missing-backend").build(); + + assertThrows(IllegalArgumentException.class, () -> resolver.resolve(options)); + } + + @Test + void shouldResolveBackendNameFromEnvironmentWhenOptionMissing() { + Map env = Map.of("CONSILENS_AI_BACKEND", "openai"); + LLMBackendResolver envResolver = new LLMBackendResolver(env::get); + + String backend = envResolver.resolveBackendName(AIBackendOptions.builder().backend(null).build()); + + assertEquals("openai", backend); + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/command/AiConfigCommandTest.java b/consilens-cli/src/test/java/com/consilens/cli/command/AiConfigCommandTest.java new file mode 100644 index 0000000..4e53ae2 --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/command/AiConfigCommandTest.java @@ -0,0 +1,65 @@ +package com.consilens.cli.command; + +import com.consilens.cli.model.CliConfiguration; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; +import picocli.CommandLine; + +import java.nio.file.Files; +import java.nio.file.Path; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AiConfigCommandTest { + + @TempDir + Path tempDir; + + @Test + void shouldGenerateYamlFileFromExplicitHints() throws Exception { + Path output = tempDir.resolve("ai-config.yaml"); + + int exitCode = new CommandLine(new AiConfigCommand()).execute( + "--no-llm", + "--source-type", "mysql", + "--source-url", "jdbc:mysql://localhost:3306/source", + "--source-table", "users", + "--source-user-env", "MYSQL_USER", + "--source-password-env", "MYSQL_PASSWORD", + "--target-type", "postgresql", + "--target-url", "jdbc:postgresql://localhost:5432/target", + "--target-table", "users", + "--target-user-env", "PG_USER", + "--target-password-env", "PG_PASSWORD", + "--keys", "id", + "--fields", "name,email", + "--output", output.toString()); + + assertEquals(0, exitCode); + assertTrue(Files.exists(output)); + String yaml = Files.readString(output); + assertTrue(yaml.contains("connection:")); + assertTrue(yaml.contains("diff-record")); + assertTrue(yaml.contains("./diff-records.json")); + assertFalse(yaml.contains("comparisons:")); + + CliConfiguration config = new ObjectMapper(new YAMLFactory()).readValue(output.toFile(), CliConfiguration.class); + assertEquals("mysql", config.getSource().getType()); + assertEquals("id", config.getComparison().getKeys().getSource().get(0)); + assertEquals(2, config.getResult().getSinks().size()); + } + + @Test + void shouldFailWhenRequiredHintsAreMissingWithoutWritingFile() { + Path output = tempDir.resolve("missing.yaml"); + + int exitCode = new CommandLine(new AiConfigCommand()).execute("--no-llm", "--output", output.toString()); + + assertEquals(1, exitCode); + assertFalse(Files.exists(output)); + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/command/AiDiagnoseCommandTest.java b/consilens-cli/src/test/java/com/consilens/cli/command/AiDiagnoseCommandTest.java new file mode 100644 index 0000000..49bb1fd --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/command/AiDiagnoseCommandTest.java @@ -0,0 +1,128 @@ +package com.consilens.cli.command; + +import com.consilens.ai.model.AnalysisResult; +import com.consilens.ai.spi.AIAnalyzer; +import com.consilens.cli.ai.AIDiagnoseService; +import com.consilens.core.diff.DiffResult; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; +import picocli.CommandLine; + +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.concurrent.atomic.AtomicReference; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AiDiagnoseCommandTest { + + @TempDir + Path tempDir; + + @Test + void shouldDiagnoseDiffRecordArray() throws Exception { + Path result = tempDir.resolve("diff-records.json"); + Files.writeString(result, + "[{" + + "\"operation\":\"MISMATCH\"," + + "\"primaryKey\":[1]," + + "\"sourceValues\":[\"Alice\"]," + + "\"targetValues\":[\"\"]," + + "\"columnNames1\":[\"name\"]," + + "\"columnNames2\":[\"name\"]" + + "}]"); + + int exitCode = new CommandLine(new AiDiagnoseCommand()).execute("--result", result.toString()); + + assertEquals(0, exitCode); + } + + @Test + void shouldFailWhenResultDoesNotContainDiffEvidence() throws Exception { + Path result = tempDir.resolve("stats-only.json"); + Files.writeString(result, "{\"differenceCount\":1}"); + + int exitCode = new CommandLine(new AiDiagnoseCommand()).execute("--result", result.toString()); + + assertEquals(1, exitCode); + } + + @Test + void shouldUseInjectedDiagnoseService() { + AtomicReference analyzer = new AtomicReference<>(); + AtomicReference resultPath = new AtomicReference<>(); + + int exitCode = new CommandLine(new AiDiagnoseCommand(name -> { + analyzer.set(name); + return new AIDiagnoseService(new TestAnalyzer()) { + @Override + public String diagnose(String path) { + resultPath.set(path); + return "diagnosed"; + } + }; + })).execute("--result", "diff.json", "--analyzer", "custom"); + + assertEquals(0, exitCode); + assertEquals("custom", analyzer.get()); + assertEquals("diff.json", resultPath.get()); + } + + @Test + void shouldUseRuleBasedAnalyzerByDefault() { + AtomicReference analyzer = new AtomicReference<>(); + + int exitCode = new CommandLine(new AiDiagnoseCommand(name -> { + analyzer.set(name); + return new AIDiagnoseService(new TestAnalyzer()) { + @Override + public String diagnose(String path) { + return "diagnosed"; + } + }; + })).execute("--result", "diff.json"); + + assertEquals(0, exitCode); + assertEquals("rulebased", analyzer.get()); + } + + @Test + void shouldWriteDiagnosisReportToOutputFile() throws Exception { + Path output = tempDir.resolve("reports/diagnose.md"); + + int exitCode = new CommandLine(new AiDiagnoseCommand(name -> new AIDiagnoseService(new TestAnalyzer()) { + @Override + public String diagnose(String path) { + return "# report"; + } + })).execute("--result", "diff.json", "--output", output.toString()); + + assertEquals(0, exitCode); + assertTrue(Files.exists(output)); + assertEquals("# report", Files.readString(output)); + } + + private static class TestAnalyzer implements AIAnalyzer { + + @Override + public AnalysisResult analyze(DiffResult diffResult) { + throw new UnsupportedOperationException(); + } + + @Override + public String explainResult(DiffResult diffResult) { + throw new UnsupportedOperationException(); + } + + @Override + public String getName() { + return "test"; + } + + @Override + public boolean isAvailable() { + return true; + } + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/command/AiDiffCommandTest.java b/consilens-cli/src/test/java/com/consilens/cli/command/AiDiffCommandTest.java new file mode 100644 index 0000000..5fca2a1 --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/command/AiDiffCommandTest.java @@ -0,0 +1,56 @@ +package com.consilens.cli.command; + +import com.consilens.cli.model.CliConfiguration; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; +import picocli.CommandLine; + +import java.nio.file.Files; +import java.nio.file.Path; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AiDiffCommandTest { + + @TempDir + Path tempDir; + + @Test + void shouldGenerateConfigWithoutExecutingDiff() throws Exception { + Path output = tempDir.resolve("ai-diff.yaml"); + + int exitCode = new CommandLine(new AiDiffCommand()).execute( + "--no-llm", + "--source-type", "mysql", + "--source-url", "jdbc:mysql://localhost:3306/source", + "--source-table", "orders", + "--source-user-env", "MYSQL_USER", + "--source-password-env", "MYSQL_PASSWORD", + "--target-type", "postgresql", + "--target-url", "jdbc:postgresql://localhost:5432/target", + "--target-table", "orders", + "--target-user-env", "PG_USER", + "--target-password-env", "PG_PASSWORD", + "--keys", "id", + "--output", output.toString()); + + assertEquals(0, exitCode); + assertTrue(Files.exists(output)); + CliConfiguration config = new ObjectMapper(new YAMLFactory()).readValue(output.toFile(), CliConfiguration.class); + assertEquals("orders", config.getSource().getResource().getName()); + } + + @Test + void shouldRejectExecuteFlag() { + Path output = tempDir.resolve("ai-diff.yaml"); + + int exitCode = new CommandLine(new AiDiffCommand()).execute("--execute", "--output", output.toString()); + + assertEquals(2, exitCode); + assertFalse(Files.exists(output)); + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/command/AiDoctorCommandTest.java b/consilens-cli/src/test/java/com/consilens/cli/command/AiDoctorCommandTest.java new file mode 100644 index 0000000..7064aa0 --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/command/AiDoctorCommandTest.java @@ -0,0 +1,224 @@ +package com.consilens.cli.command; + +import com.consilens.ai.model.AnalysisResult; +import com.consilens.ai.model.BackendInfo; +import com.consilens.ai.model.ChatMessage; +import com.consilens.ai.model.FunctionDefinition; +import com.consilens.ai.model.LLMResponse; +import com.consilens.ai.spi.AIAnalyzer; +import com.consilens.ai.spi.LLMBackend; +import com.consilens.cli.ai.AIBackendOptions; +import com.consilens.core.diff.DiffResult; +import org.junit.jupiter.api.Test; +import picocli.CommandLine; + +import java.io.ByteArrayOutputStream; +import java.io.OutputStreamWriter; +import java.io.PrintWriter; +import java.nio.charset.StandardCharsets; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AiDoctorCommandTest { + + @Test + void shouldPassOfflineDefaultChecks() { + ByteArrayOutputStream out = new ByteArrayOutputStream(); + CommandLine commandLine = commandLine(new AiDoctorCommand( + () -> Set.of("rulebased"), + () -> Set.of("noop", "openai"), + TestAnalyzer::new, + options -> new TestBackend("noop", "none", true), + options -> "noop", + name -> null, + new com.fasterxml.jackson.databind.ObjectMapper()), out, new ByteArrayOutputStream()); + + int exitCode = commandLine.execute(); + + assertEquals(0, exitCode); + String output = out.toString(StandardCharsets.UTF_8); + assertTrue(output.contains("# AI Doctor")); + assertTrue(output.contains("Status: WARN")); + assertTrue(output.contains("onlineAvailability: SKIP")); + } + + @Test + void shouldFailWhenCloudBackendHasNoApiKey() { + ByteArrayOutputStream out = new ByteArrayOutputStream(); + CommandLine commandLine = commandLine(new AiDoctorCommand( + () -> Set.of("rulebased"), + () -> Set.of("openai"), + TestAnalyzer::new, + options -> new TestBackend("openai", "gpt-test", true), + options -> "openai", + name -> null, + new com.fasterxml.jackson.databind.ObjectMapper()), out, new ByteArrayOutputStream()); + + int exitCode = commandLine.execute("--backend", "openai"); + + assertEquals(1, exitCode); + String output = out.toString(StandardCharsets.UTF_8); + assertTrue(output.contains("credentials: FAIL")); + assertTrue(output.contains("OPENAI_API_KEY")); + } + + @Test + void shouldPassCloudBackendWithApiKeyAndJsonOutput() { + ByteArrayOutputStream out = new ByteArrayOutputStream(); + CommandLine commandLine = commandLine(new AiDoctorCommand( + () -> Set.of("rulebased"), + () -> Set.of("deepseek"), + TestAnalyzer::new, + options -> new TestBackend("deepseek", "deepseek-chat", true), + options -> "deepseek", + Map.of("DEEPSEEK_API_KEY", "test-key")::get, + new com.fasterxml.jackson.databind.ObjectMapper()), out, new ByteArrayOutputStream()); + + int exitCode = commandLine.execute("--backend", "deepseek", "--format", "json"); + + assertEquals(0, exitCode); + assertTrue(out.toString(StandardCharsets.UTF_8).contains("\"status\":\"PASS\"")); + assertTrue(out.toString(StandardCharsets.UTF_8).contains("\"credentials\"")); + } + + @Test + void shouldRejectUnsupportedFormat() { + ByteArrayOutputStream err = new ByteArrayOutputStream(); + CommandLine commandLine = commandLine(new AiDoctorCommand( + () -> Set.of("rulebased"), + () -> Set.of("noop"), + TestAnalyzer::new, + options -> new TestBackend("noop", "none", true), + options -> "noop", + name -> null, + new com.fasterxml.jackson.databind.ObjectMapper()), new ByteArrayOutputStream(), err); + + int exitCode = commandLine.execute("--format", "xml"); + + assertEquals(2, exitCode); + assertTrue(err.toString(StandardCharsets.UTF_8).contains("Unsupported format: xml")); + } + + @Test + void shouldRunOnlineCheckOnlyWhenRequested() { + AtomicBoolean availabilityCalled = new AtomicBoolean(false); + ByteArrayOutputStream out = new ByteArrayOutputStream(); + CommandLine commandLine = commandLine(new AiDoctorCommand( + () -> Set.of("rulebased"), + () -> Set.of("ollama"), + TestAnalyzer::new, + options -> new TestBackend("ollama", "qwen-test", true, availabilityCalled), + options -> "ollama", + name -> null, + new com.fasterxml.jackson.databind.ObjectMapper()), out, new ByteArrayOutputStream()); + + int exitCode = commandLine.execute("--backend", "ollama", "--online"); + + assertEquals(0, exitCode); + assertTrue(availabilityCalled.get()); + assertTrue(out.toString(StandardCharsets.UTF_8).contains("onlineAvailability: PASS")); + } + + @Test + void shouldNotRunOnlineCheckByDefault() { + AtomicBoolean availabilityCalled = new AtomicBoolean(false); + CommandLine commandLine = commandLine(new AiDoctorCommand( + () -> Set.of("rulebased"), + () -> Set.of("ollama"), + TestAnalyzer::new, + options -> new TestBackend("ollama", "qwen-test", true, availabilityCalled), + options -> "ollama", + name -> null, + new com.fasterxml.jackson.databind.ObjectMapper()), new ByteArrayOutputStream(), new ByteArrayOutputStream()); + + int exitCode = commandLine.execute("--backend", "ollama"); + + assertEquals(0, exitCode); + assertFalse(availabilityCalled.get()); + } + + private CommandLine commandLine(AiDoctorCommand command, ByteArrayOutputStream out, ByteArrayOutputStream err) { + CommandLine commandLine = new CommandLine(command); + commandLine.setOut(new PrintWriter(new OutputStreamWriter(out, StandardCharsets.UTF_8), true)); + commandLine.setErr(new PrintWriter(new OutputStreamWriter(err, StandardCharsets.UTF_8), true)); + return commandLine; + } + + private static class TestAnalyzer implements AIAnalyzer { + + private final String name; + + private TestAnalyzer(String name) { + this.name = name; + } + + @Override + public AnalysisResult analyze(DiffResult diffResult) { + throw new UnsupportedOperationException(); + } + + @Override + public String explainResult(DiffResult diffResult) { + throw new UnsupportedOperationException(); + } + + @Override + public String getName() { + return name; + } + + @Override + public boolean isAvailable() { + return true; + } + } + + private static class TestBackend implements LLMBackend { + + private final String name; + private final String model; + private final boolean available; + private final AtomicBoolean availabilityCalled; + + private TestBackend(String name, String model, boolean available) { + this(name, model, available, new AtomicBoolean(false)); + } + + private TestBackend(String name, String model, boolean available, AtomicBoolean availabilityCalled) { + this.name = name; + this.model = model; + this.available = available; + this.availabilityCalled = availabilityCalled; + } + + @Override + public LLMResponse chat(String systemPrompt, List messages, List functions) { + throw new UnsupportedOperationException(); + } + + @Override + public String complete(String prompt) { + throw new UnsupportedOperationException(); + } + + @Override + public boolean isAvailable() { + availabilityCalled.set(true); + return available; + } + + @Override + public BackendInfo info() { + return BackendInfo.builder() + .name(name) + .model(model) + .build(); + } + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/command/AiExplainCommandTest.java b/consilens-cli/src/test/java/com/consilens/cli/command/AiExplainCommandTest.java new file mode 100644 index 0000000..a17e74d --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/command/AiExplainCommandTest.java @@ -0,0 +1,55 @@ +package com.consilens.cli.command; + +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; +import picocli.CommandLine; + +import java.nio.file.Files; +import java.nio.file.Path; + +import static org.junit.jupiter.api.Assertions.assertEquals; + +class AiExplainCommandTest { + + @TempDir + Path tempDir; + + @Test + void shouldExplainGeneratedConfigWithoutResolvingEnvPlaceholders() throws Exception { + Path config = tempDir.resolve("config.yaml"); + Files.writeString(config, + "source:\n" + + " type: mysql\n" + + " connection:\n" + + " url: jdbc:mysql://localhost:3306/source\n" + + " username: ${env.MYSQL_USER}\n" + + " password: ${env.MYSQL_PASSWORD}\n" + + " resource:\n" + + " type: table\n" + + " name: users\n" + + "target:\n" + + " type: postgresql\n" + + " connection:\n" + + " url: jdbc:postgresql://localhost:5432/target\n" + + " username: ${env.PG_USER}\n" + + " password: ${env.PG_PASSWORD}\n" + + " resource:\n" + + " type: table\n" + + " name: users\n" + + "comparison:\n" + + " keys:\n" + + " source: [id]\n" + + " target: [id]\n" + + "strategy:\n" + + " mode: checksum\n" + + " algorithm: xor\n" + + "result:\n" + + " sinks:\n" + + " - format: console\n" + + " type: result\n"); + + int exitCode = new CommandLine(new AiExplainCommand()).execute("-c", config.toString()); + + assertEquals(0, exitCode); + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/command/AiProvidersCommandTest.java b/consilens-cli/src/test/java/com/consilens/cli/command/AiProvidersCommandTest.java new file mode 100644 index 0000000..1f5f74a --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/command/AiProvidersCommandTest.java @@ -0,0 +1,73 @@ +package com.consilens.cli.command; + +import org.junit.jupiter.api.Test; +import picocli.CommandLine; + +import java.io.ByteArrayOutputStream; +import java.io.OutputStreamWriter; +import java.io.PrintWriter; +import java.nio.charset.StandardCharsets; +import java.util.Set; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AiProvidersCommandTest { + + @Test + void shouldListInjectedProviders() { + ByteArrayOutputStream out = new ByteArrayOutputStream(); + CommandLine commandLine = new CommandLine(new AiProvidersCommand( + () -> Set.of("rulebased"), + () -> Set.of("openai", "deepseek", "noop"))); + commandLine.setOut(new PrintWriter(new OutputStreamWriter(out, StandardCharsets.UTF_8), true)); + + int exitCode = commandLine.execute(); + + assertEquals(0, exitCode); + String output = out.toString(StandardCharsets.UTF_8); + assertTrue(output.contains("# AI Providers")); + assertTrue(output.contains("Analyzers:")); + assertTrue(output.contains(" - rulebased")); + assertTrue(output.contains("LLM Backends:")); + assertTrue(output.contains(" - deepseek")); + assertTrue(output.contains(" - noop")); + assertTrue(output.contains(" - openai")); + } + + @Test + void shouldListProvidersAsJson() { + ByteArrayOutputStream out = new ByteArrayOutputStream(); + CommandLine commandLine = new CommandLine(new AiProvidersCommand( + () -> Set.of("rulebased"), + () -> Set.of("openai", "deepseek"))); + commandLine.setOut(new PrintWriter(new OutputStreamWriter(out, StandardCharsets.UTF_8), true)); + + int exitCode = commandLine.execute("--format", "json"); + + assertEquals(0, exitCode); + assertEquals("{\"analyzers\":[\"rulebased\"],\"llmBackends\":[\"deepseek\",\"openai\"]}\n", + out.toString(StandardCharsets.UTF_8)); + } + + @Test + void shouldRejectUnsupportedFormat() { + ByteArrayOutputStream err = new ByteArrayOutputStream(); + CommandLine commandLine = new CommandLine(new AiProvidersCommand( + () -> Set.of("rulebased"), + () -> Set.of("openai"))); + commandLine.setErr(new PrintWriter(new OutputStreamWriter(err, StandardCharsets.UTF_8), true)); + + int exitCode = commandLine.execute("--format", "xml"); + + assertEquals(2, exitCode); + assertTrue(err.toString(StandardCharsets.UTF_8).contains("Unsupported format: xml")); + } + + @Test + void shouldExecuteWithDiscoveredProviders() { + int exitCode = new CommandLine(new AiProvidersCommand()).execute(); + + assertEquals(0, exitCode); + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/config/ExampleConfigurationCompatibilityTest.java b/consilens-cli/src/test/java/com/consilens/cli/config/ExampleConfigurationCompatibilityTest.java index 783184c..7f57043 100644 --- a/consilens-cli/src/test/java/com/consilens/cli/config/ExampleConfigurationCompatibilityTest.java +++ b/consilens-cli/src/test/java/com/consilens/cli/config/ExampleConfigurationCompatibilityTest.java @@ -2,32 +2,26 @@ import com.consilens.cli.model.CliConfiguration; import org.junit.jupiter.params.ParameterizedTest; -import org.junit.jupiter.params.provider.ValueSource; +import org.junit.jupiter.params.provider.MethodSource; +import java.io.IOException; +import java.nio.file.Files; import java.util.HashMap; +import java.util.stream.Collectors; import java.util.Map; import java.nio.file.Path; import java.nio.file.Paths; +import java.util.List; +import java.util.stream.Stream; import static org.junit.jupiter.api.Assertions.assertNotNull; class ExampleConfigurationCompatibilityTest { - @ParameterizedTest - @ValueSource(strings = { - "examples/minimal-mysql-to-pg.yaml", - "examples/same-db-mysql-comparison.yaml", - "examples/large-table-mysql-to-starrocks.yaml", - "examples/performance-test-mysql-vs-postgres.yaml", - "examples/performance-test-mysql-vs-starrocks.yaml", - "examples/custom-sql-mysql-vs-postgres-checksum.yaml", - "examples/mysql-to-doris-partitioned-checksum.yaml", - "examples/detail-to-aggregate-custom-sql.yaml", - "examples/performance-test-mysql-vs-postgres.json" - }) - void shouldLoadExampleConfigurations(String relativePath) throws Exception { + @ParameterizedTest(name = "{0}") + @MethodSource("exampleConfigurationPaths") + void shouldLoadExampleConfigurations(Path configPath) throws Exception { ConfigurationManager configurationManager = new ConfigurationManager(testEnvironment()); - Path configPath = Paths.get("..", relativePath).toAbsolutePath().normalize(); CliConfiguration config = configurationManager.loadConfiguration(configPath.toString(), false); @@ -40,6 +34,24 @@ void shouldLoadExampleConfigurations(String relativePath) throws Exception { assertNotNull(config.getComparison().getKeys()); } + private static Stream exampleConfigurationPaths() throws IOException { + Path examplesDirectory = Paths.get("..", "examples").toAbsolutePath().normalize(); + List paths; + try (Stream stream = Files.list(examplesDirectory)) { + paths = stream + .filter(Files::isRegularFile) + .filter(path -> { + String fileName = path.getFileName().toString(); + return fileName.endsWith(".yaml") + || fileName.endsWith(".yml") + || fileName.endsWith(".json"); + }) + .sorted() + .collect(Collectors.toList()); + } + return paths.stream(); + } + private Map testEnvironment() { Map env = new HashMap<>(); env.put("MYSQL_USER", "test_user"); diff --git a/consilens-cli/src/test/java/com/consilens/cli/model/CliConfigurationTest.java b/consilens-cli/src/test/java/com/consilens/cli/model/CliConfigurationTest.java new file mode 100644 index 0000000..f3ad5fc --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/model/CliConfigurationTest.java @@ -0,0 +1,100 @@ +package com.consilens.cli.model; + +import com.consilens.core.validation.ValidationException; +import com.consilens.sink.api.model.ResultConfig; +import com.consilens.sink.api.model.SinkConfig; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +class CliConfigurationTest { + + @Test + void shouldValidateFullConfigurationWithTableSink() { + CliConfiguration config = baseConfig(); + + assertDoesNotThrow(config::validate); + assertDoesNotThrow(config::validateDatabaseConnections); + } + + @Test + void shouldRejectUnsupportedStrategyConfiguration() { + CliConfiguration config = baseConfig(); + config.setStrategy(StrategyConfig.builder() + .mode("local") + .algorithm("concat") + .bisectionFactor(4) + .batchSize(1000) + .maxDifferences(1L) + .build()); + + assertThrows(ValidationException.class, config::validate); + } + + @Test + void shouldRejectTableSinkWithCollidingSanitizedColumns() { + CliConfiguration config = baseConfig(); + SinkConfig tableSink = config.getResult().getSinks().get(0); + tableSink.setProperties("{\"type\":\"mysql\",\"url\":\"jdbc:mysql://localhost:3306/test\",\"columns\":[" + + "{\"name\":\"a-b\",\"value\":\"${operation}\"}," + + "{\"name\":\"a_b\",\"value\":\"${operation}\"}]}"); + + assertThrows(ValidationException.class, config::validate); + } + + @Test + void shouldRejectTableSinkWithUnsupportedDatabaseType() { + CliConfiguration config = baseConfig(); + SinkConfig tableSink = config.getResult().getSinks().get(0); + tableSink.setProperties("{\"type\":\"oracle\",\"url\":\"jdbc:oracle:thin:@localhost:1521/xe\"}"); + + assertThrows(ValidationException.class, config::validate); + } + + private CliConfiguration baseConfig() { + return CliConfiguration.builder() + .source(ConnectionConfig.builder() + .type("mysql") + .connection(ConnectionConfig.ConnectorConnectionProperties.builder() + .url("jdbc:mysql://localhost:3306/source") + .username("root") + .password("secret") + .build()) + .resource(ConnectionConfig.ResourceConfig.builder().type("table").name("source_table").build()) + .build()) + .target(ConnectionConfig.builder() + .type("postgresql") + .connection(ConnectionConfig.ConnectorConnectionProperties.builder() + .url("jdbc:postgresql://localhost:5432/target") + .username("root") + .password("secret") + .build()) + .resource(ConnectionConfig.ResourceConfig.builder().type("table").name("target_table").build()) + .build()) + .comparison(ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("id")).target(List.of("id")).build()) + .build()) + .strategy(StrategyConfig.builder() + .mode("checksum") + .algorithm("concat") + .bisectionFactor(4) + .batchSize(1000) + .maxDifferences(1_000L) + .build()) + .result(ResultConfig.builder() + .sinks(List.of(tableSink())) + .build()) + .build(); + } + + private SinkConfig tableSink() { + SinkConfig sinkConfig = new SinkConfig(); + sinkConfig.setFormat("table"); + sinkConfig.setType("result"); + sinkConfig.setProperties("{\"type\":\"mysql\",\"url\":\"jdbc:mysql://localhost:3306/test\"}"); + return sinkConfig; + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/model/ComparisonConfigTest.java b/consilens-cli/src/test/java/com/consilens/cli/model/ComparisonConfigTest.java index fdc87c6..00e3f96 100644 --- a/consilens-cli/src/test/java/com/consilens/cli/model/ComparisonConfigTest.java +++ b/consilens-cli/src/test/java/com/consilens/cli/model/ComparisonConfigTest.java @@ -5,7 +5,9 @@ import java.util.List; +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; class ComparisonConfigTest { @@ -37,4 +39,54 @@ void shouldRejectInvalidMappingExpressionChoices() { assertThrows(ValidationException.class, config::validate); } + + @Test + void shouldRejectMismatchedKeysFieldsFiltersAndDuplicateMappings() { + ComparisonConfig mismatchedKeys = ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("id")).target(List.of("id", "tenant_id")).build()) + .build(); + assertThrows(ValidationException.class, mismatchedKeys::validate); + + ComparisonConfig mismatchedFields = ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("id")).target(List.of("id")).build()) + .fields(ListPairConfig.builder().source(List.of("amount")).target(List.of()).build()) + .build(); + assertThrows(ValidationException.class, mismatchedFields::validate); + + ComparisonConfig mismatchedFilters = ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("id")).target(List.of("id")).build()) + .filters(StringPairConfig.builder().source("id > 1").target(null).build()) + .build(); + assertThrows(ValidationException.class, mismatchedFilters::validate); + + ComparisonConfig duplicateMappings = ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("id")).target(List.of("id")).build()) + .mappings(List.of( + CompareMappingConfig.builder() + .name("amount") + .source(FieldExpressionConfig.builder().column("amount").build()) + .target(FieldExpressionConfig.builder().column("amount").build()) + .build(), + CompareMappingConfig.builder() + .name("amount") + .source(FieldExpressionConfig.builder().column("discount").build()) + .target(FieldExpressionConfig.builder().column("discount").build()) + .build())) + .build(); + assertThrows(ValidationException.class, duplicateMappings::validate); + } + + @Test + void shouldAcceptWellFormedComparisonConfigWithExcludeAndExtraColumns() { + ComparisonConfig config = ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("id")).target(List.of("id")).build()) + .fields(ListPairConfig.builder().source(List.of("amount")).target(List.of("amount")).build()) + .exclude(ListPairConfig.builder().source(List.of("debug")).target(List.of("debug")).build()) + .extraColumns(List.of("source_only_note")) + .filters(StringPairConfig.builder().source("id > 1").target("id > 1").build()) + .build(); + + assertDoesNotThrow(config::validate); + assertTrue(config.getExtraColumns().contains("source_only_note")); + } } diff --git a/consilens-cli/src/test/java/com/consilens/cli/model/ConnectionConfigTest.java b/consilens-cli/src/test/java/com/consilens/cli/model/ConnectionConfigTest.java index 343ac40..2f7d9ae 100644 --- a/consilens-cli/src/test/java/com/consilens/cli/model/ConnectionConfigTest.java +++ b/consilens-cli/src/test/java/com/consilens/cli/model/ConnectionConfigTest.java @@ -30,6 +30,25 @@ void shouldAllowNonJdbcConnectorValidationWhenConnectionMapIsPresent() { assertDoesNotThrow(() -> config.validate("source")); } + @Test + void shouldNotRequireJdbcValidationBasedOnConnectorType() { + ConnectionConfig.ConnectorConnectionProperties properties = ConnectionConfig.ConnectorConnectionProperties.builder() + .build(); + properties.addProperty("host", "localhost"); + properties.addProperty("database", "orders"); + + ConnectionConfig config = ConnectionConfig.builder() + .type("mysql") + .connection(properties) + .resource(ConnectionConfig.ResourceConfig.builder() + .type("table") + .name("orders") + .build()) + .build(); + + assertDoesNotThrow(() -> config.validate("source")); + } + @Test void shouldExposeConnectionPropertiesFromNestedConnectionBlock() { ConnectionConfig.ConnectorConnectionProperties properties = ConnectionConfig.ConnectorConnectionProperties.builder() @@ -73,4 +92,92 @@ void shouldRejectPathForTableResource() { assertThrows(Exception.class, () -> config.validate("source")); } + + @Test + void shouldValidateSqlResourceShapeAndTrustedSql() { + ConnectionConfig validSelect = ConnectionConfig.builder() + .type("mysql") + .connection(ConnectionConfig.ConnectorConnectionProperties.builder() + .url("jdbc:mysql://localhost:3306/orders") + .username("root") + .password("secret") + .build()) + .resource(ConnectionConfig.ResourceConfig.builder() + .type("sql") + .path("WITH base AS (SELECT id FROM orders) SELECT id FROM base") + .build()) + .build(); + + assertDoesNotThrow(() -> validSelect.validate("source")); + + ConnectionConfig missingPath = ConnectionConfig.builder() + .type("mysql") + .connection(ConnectionConfig.ConnectorConnectionProperties.builder() + .url("jdbc:mysql://localhost:3306/orders") + .username("root") + .password("secret") + .build()) + .resource(ConnectionConfig.ResourceConfig.builder() + .type("sql") + .build()) + .build(); + + assertThrows(Exception.class, () -> missingPath.validate("source")); + + ConnectionConfig unsafeSql = ConnectionConfig.builder() + .type("mysql") + .connection(ConnectionConfig.ConnectorConnectionProperties.builder() + .url("jdbc:mysql://localhost:3306/orders") + .username("root") + .password("secret") + .build()) + .resource(ConnectionConfig.ResourceConfig.builder() + .type("sql") + .path("SELECT id FROM orders; DROP TABLE orders") + .build()) + .build(); + + assertThrows(Exception.class, () -> unsafeSql.validate("source")); + } + + @Test + void shouldRequireJdbcUrlAndUsernameOnlyWhenJdbcUrlIsPresent() { + ConnectionConfig jdbcWithoutUsername = ConnectionConfig.builder() + .type("custom") + .connection(ConnectionConfig.ConnectorConnectionProperties.builder() + .url("jdbc:custom://localhost/db") + .build()) + .resource(ConnectionConfig.ResourceConfig.builder() + .type("table") + .name("orders") + .build()) + .build(); + + assertThrows(Exception.class, () -> jdbcWithoutUsername.validate("source")); + + ConnectionConfig nonJdbcConnection = ConnectionConfig.builder() + .type("custom") + .connection(ConnectionConfig.ConnectorConnectionProperties.builder().build()) + .resource(ConnectionConfig.ResourceConfig.builder() + .type("table") + .name("orders") + .build()) + .build(); + nonJdbcConnection.getConnection().addProperty("endpoint", "localhost:9000"); + + assertDoesNotThrow(() -> nonJdbcConnection.validate("source")); + } + + @Test + void shouldRejectMissingConnectionForJdbcResource() { + ConnectionConfig config = ConnectionConfig.builder() + .type("mysql") + .resource(ConnectionConfig.ResourceConfig.builder() + .type("table") + .name("orders") + .build()) + .build(); + + assertThrows(Exception.class, () -> config.validate("source")); + } } diff --git a/consilens-cli/src/test/java/com/consilens/cli/model/StrategyConfigTest.java b/consilens-cli/src/test/java/com/consilens/cli/model/StrategyConfigTest.java new file mode 100644 index 0000000..c1b00ad --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/model/StrategyConfigTest.java @@ -0,0 +1,73 @@ +package com.consilens.cli.model; + +import com.consilens.core.validation.ValidationException; +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +class StrategyConfigTest { + + @Test + void shouldAcceptSupportedModesAlgorithmsAndLocalCompareModes() { + assertDoesNotThrow(() -> StrategyConfig.builder() + .mode("checksum") + .algorithm("concat") + .bisectionFactor(2) + .batchSize(1) + .maxDifferences(1L) + .localCompare(LocalCompareConfig.builder().mode("full").build()) + .build() + .validate()); + + assertDoesNotThrow(() -> StrategyConfig.builder() + .mode("join") + .algorithm("xor") + .bisectionFactor(2) + .batchSize(1) + .maxDifferences(1L) + .localCompare(LocalCompareConfig.builder().mode("row-hash").build()) + .build() + .validate()); + } + + @Test + void shouldRejectUnsupportedModeAlgorithmLocalCompareAndLimits() { + assertThrows(ValidationException.class, () -> StrategyConfig.builder() + .mode("local") + .algorithm("concat") + .bisectionFactor(2) + .batchSize(1) + .maxDifferences(1L) + .build() + .validate()); + + assertThrows(ValidationException.class, () -> StrategyConfig.builder() + .mode("checksum") + .algorithm("md5") + .bisectionFactor(2) + .batchSize(1) + .maxDifferences(1L) + .build() + .validate()); + + assertThrows(ValidationException.class, () -> StrategyConfig.builder() + .mode("checksum") + .algorithm("concat") + .bisectionFactor(2) + .batchSize(1) + .maxDifferences(1L) + .localCompare(LocalCompareConfig.builder().mode("sample").build()) + .build() + .validate()); + + assertThrows(ValidationException.class, () -> StrategyConfig.builder() + .mode("checksum") + .algorithm("concat") + .bisectionFactor(2) + .batchSize(1) + .maxDifferences(0L) + .build() + .validate()); + } +} diff --git a/consilens-cli/src/test/java/com/consilens/cli/service/CompareRequestFactoryTest.java b/consilens-cli/src/test/java/com/consilens/cli/service/CompareRequestFactoryTest.java index 2edcac9..8e8b103 100644 --- a/consilens-cli/src/test/java/com/consilens/cli/service/CompareRequestFactoryTest.java +++ b/consilens-cli/src/test/java/com/consilens/cli/service/CompareRequestFactoryTest.java @@ -9,6 +9,7 @@ import com.consilens.cli.model.LocalCompareConfig; import com.consilens.cli.model.StrategyConfig; import com.consilens.cli.model.StringPairConfig; +import com.consilens.connector.api.planner.ComparePlanTypes; import com.consilens.connector.api.planner.CompareRequest; import org.junit.jupiter.api.Test; @@ -60,6 +61,8 @@ void shouldCompileMappingsToSqlResources() { request.getTarget().getResource().getPath()); assertEquals(null, request.getSourceFilter()); assertEquals(null, request.getTargetFilter()); + assertEquals(Boolean.TRUE, request.getExecutionOptions().getValidateUniqueKeys()); + assertEquals(1_000_000L, request.getExecutionOptions().getMaxDifferences()); } @Test @@ -96,6 +99,125 @@ void shouldTreatCustomSqlTablesAsSqlResources() { assertEquals("id > 10", request.getSourceFilter().getExpression()); } + @Test + void shouldMapChecksumStrategyToPreferredPlansAndExecutionOptions() { + CliConfiguration config = baseConfig(); + config.setStrategy(StrategyConfig.builder() + .mode("checksum") + .algorithm("xor") + .bisectionFactor(8) + .bisectionThreshold(12_000L) + .batchSize(2000) + .enableProfiling(true) + .maxDifferences(123L) + .localCompare(LocalCompareConfig.builder().mode("row-hash").build()) + .build()); + config.setComparison(ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("id")).target(List.of("order_id")).build()) + .build()); + + CompareRequest request = factory.create(config); + + assertEquals(List.of( + ComparePlanTypes.PUSHDOWN_CHECKSUM, + ComparePlanTypes.KEY_HASH, + ComparePlanTypes.STREAMING_MERGE), + request.getStrategyPreference().getPreferredPlans()); + assertEquals(Boolean.TRUE, request.getStrategyPreference().getAllowFallback()); + assertEquals("xor", request.getExecutionOptions().getChecksumAlgorithm()); + assertEquals(8, request.getExecutionOptions().getBisectionFactor()); + assertEquals(12_000L, request.getExecutionOptions().getBisectionThreshold()); + assertEquals(Boolean.TRUE, request.getExecutionOptions().getEnableProfiling()); + assertEquals("row-hash", request.getExecutionOptions().getLocalCompareMode()); + assertEquals(Boolean.TRUE, request.getExecutionOptions().getValidateUniqueKeys()); + assertEquals(123L, request.getExecutionOptions().getMaxDifferences()); + assertEquals(123L, request.getExecutionOptions().getAttributes().get("maxDifferences")); + } + + @Test + void shouldMapJoinStrategyToRequiredServerJoinPlan() { + CliConfiguration config = baseConfig(); + config.setStrategy(StrategyConfig.builder() + .mode("join") + .algorithm("concat") + .bisectionFactor(4) + .batchSize(1000) + .maxDifferences(1000L) + .build()); + config.setComparison(ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("id")).target(List.of("id")).build()) + .build()); + + CompareRequest request = factory.create(config); + + assertEquals(List.of(ComparePlanTypes.SERVER_JOIN), request.getStrategyPreference().getPreferredPlans()); + assertEquals(Boolean.FALSE, request.getStrategyPreference().getAllowFallback()); + } + + @Test + void shouldUseBatchSizeWhenBisectionThresholdIsNotConfigured() { + CliConfiguration config = baseConfig(); + config.setStrategy(StrategyConfig.builder() + .mode("checksum") + .algorithm("concat") + .bisectionFactor(4) + .batchSize(321) + .maxDifferences(1000L) + .build()); + config.setComparison(ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("id")).target(List.of("id")).build()) + .build()); + + CompareRequest request = factory.create(config); + + assertEquals(3210L, request.getExecutionOptions().getBisectionThreshold()); + } + + @Test + void shouldCompileMappingKeysLiteralsAndDisabledCompareFields() { + CliConfiguration config = baseConfig(); + config.setComparison(ComparisonConfig.builder() + .keys(ListPairConfig.builder().source(List.of("order_id")).target(List.of("id")).build()) + .mappings(List.of( + CompareMappingConfig.builder() + .name("logical_id") + .key(true) + .source(FieldExpressionConfig.builder().column("order_id").build()) + .target(FieldExpressionConfig.builder().column("id").build()) + .build(), + CompareMappingConfig.builder() + .name("source_name") + .source(FieldExpressionConfig.builder().column("name").build()) + .target(FieldExpressionConfig.builder().column("full_name").build()) + .build(), + CompareMappingConfig.builder() + .name("constant_flag") + .source(FieldExpressionConfig.builder().literal(true).build()) + .target(FieldExpressionConfig.builder().literal(false).build()) + .build(), + CompareMappingConfig.builder() + .name("debug_only") + .compare(false) + .source(FieldExpressionConfig.builder().column("debug_source").build()) + .target(FieldExpressionConfig.builder().column("debug_target").build()) + .build())) + .exclude(ListPairConfig.builder() + .source(List.of("constant_flag")) + .target(List.of("constant_flag")) + .build()) + .build()); + + CompareRequest request = factory.create(config); + + assertEquals(List.of("logical_id"), request.getSourceKeySpec().getFields()); + assertEquals(List.of("source_name"), request.getSourceComparisons().getFields()); + assertEquals(List.of("constant_flag"), request.getSourceComparisons().getExclude()); + assertEquals("SELECT order_id AS logical_id, name AS source_name, debug_source AS debug_only FROM source_table", + request.getSource().getResource().getPath()); + assertEquals("SELECT id AS logical_id, full_name AS source_name, debug_target AS debug_only FROM target_table", + request.getTarget().getResource().getPath()); + } + private CliConfiguration baseConfig() { return CliConfiguration.builder() .source(ConnectionConfig.builder() @@ -121,6 +243,7 @@ private CliConfiguration baseConfig() { .algorithm("xor") .bisectionFactor(4) .bisectionThreshold(1000L) + .maxDifferences(1_000_000L) .localCompare(LocalCompareConfig.builder().mode("full").build()) .build()) .build(); diff --git a/consilens-cli/src/test/java/com/consilens/cli/service/SensitiveValueMaskerTest.java b/consilens-cli/src/test/java/com/consilens/cli/service/SensitiveValueMaskerTest.java new file mode 100644 index 0000000..a965fc3 --- /dev/null +++ b/consilens-cli/src/test/java/com/consilens/cli/service/SensitiveValueMaskerTest.java @@ -0,0 +1,31 @@ +package com.consilens.cli.service; + +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.assertEquals; + +class SensitiveValueMaskerTest { + + @Test + void shouldMaskCredentialsInJdbcUrl() { + String masked = SensitiveValueMasker.maskJdbcUrl( + "jdbc:mysql://user:secret@localhost:3306/app?password=secret&useSSL=false&access_token=abc"); + + assertEquals("jdbc:mysql://***@localhost:3306/app?password=***&useSSL=false&access_token=***", masked); + } + + @Test + void shouldMaskSemicolonSeparatedJdbcProperties() { + String masked = SensitiveValueMasker.maskJdbcUrl( + "jdbc:sqlserver://localhost:1433;databaseName=app;user=sa;password=secret;encrypt=true"); + + assertEquals("jdbc:sqlserver://localhost:1433;databaseName=app;user=***;password=***;encrypt=true", masked); + } + + @Test + void shouldMaskUsername() { + assertEquals("a***e", SensitiveValueMasker.maskUsername("alice")); + assertEquals("***", SensitiveValueMasker.maskUsername("ab")); + assertEquals("(not set)", SensitiveValueMasker.maskUsername(null)); + } +} diff --git a/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/SqlQueryGenerator.java b/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/SqlQueryGenerator.java index d1d6e83..3e73ade 100644 --- a/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/SqlQueryGenerator.java +++ b/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/SqlQueryGenerator.java @@ -167,6 +167,13 @@ default String getChecksumSQLFromSql(String fromSql, throw new UnsupportedOperationException("getChecksumSQLFromSql is not implemented"); } + /** + * Returns whether the generator can build checksum SQL for the given algorithm. + */ + default boolean supportsChecksumAlgorithm(ChecksumAlgorithm checksumAlgorithm) { + return checksumAlgorithm == null || !checksumAlgorithm.isXor(); + } + /** * Generate SQL for calculating checksum/hash of table data (backward compatibility). * Uses CONCAT algorithm by default. diff --git a/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/model/DerivedCompareColumns.java b/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/model/DerivedCompareColumns.java new file mode 100644 index 0000000..5115a55 --- /dev/null +++ b/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/model/DerivedCompareColumns.java @@ -0,0 +1,21 @@ +package com.consilens.connector.api.model; + +import java.util.Locale; + +public final class DerivedCompareColumns { + + private DerivedCompareColumns() { + } + + public static boolean isDerived(String columnName) { + String normalized = columnName == null ? "" : columnName.trim().toLowerCase(Locale.ROOT); + return "checksum".equals(normalized) + || "row_checksum".equals(normalized) + || "record_checksum".equals(normalized) + || "row_hash".equals(normalized) + || "record_hash".equals(normalized) + || "row_md5".equals(normalized) + || "consilens_checksum".equals(normalized) + || "consilens_row_hash".equals(normalized); + } +} diff --git a/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/planner/CompareExecutionOptions.java b/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/planner/CompareExecutionOptions.java index 923c524..bfb464e 100644 --- a/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/planner/CompareExecutionOptions.java +++ b/consilens-connector/consilens-connector-api/src/main/java/com/consilens/connector/api/planner/CompareExecutionOptions.java @@ -25,5 +25,7 @@ public class CompareExecutionOptions { private Boolean validateUniqueKeys; + private Long maxDifferences; + private Map attributes; } diff --git a/consilens-connector/consilens-connector-plugins/consilens-connector-base/src/main/java/com/consilens/conncetor/base/jdbc/JdbcDatasetHandle.java b/consilens-connector/consilens-connector-plugins/consilens-connector-base/src/main/java/com/consilens/conncetor/base/jdbc/JdbcDatasetHandle.java index 8b9b081..b6a57c3 100644 --- a/consilens-connector/consilens-connector-plugins/consilens-connector-base/src/main/java/com/consilens/conncetor/base/jdbc/JdbcDatasetHandle.java +++ b/consilens-connector/consilens-connector-plugins/consilens-connector-base/src/main/java/com/consilens/conncetor/base/jdbc/JdbcDatasetHandle.java @@ -24,6 +24,7 @@ import com.consilens.connector.api.model.ComparisonSpec; import com.consilens.connector.api.model.ConnectorNativeType; import com.consilens.connector.api.model.DataType; +import com.consilens.connector.api.model.DerivedCompareColumns; import com.consilens.connector.api.model.FieldDescriptor; import com.consilens.connector.api.model.ResourceLocator; import com.consilens.connector.api.model.SchemaDescriptor; @@ -233,7 +234,6 @@ private DatasetMetadata createMetadata(String connectorName, ConnectorCapability.SCHEMA_DISCOVERY, ConnectorCapability.FILTER_PUSHDOWN, ConnectorCapability.PROJECTION_PUSHDOWN, - ConnectorCapability.SERVER_SIDE_JOIN, ConnectorCapability.SERVER_SIDE_HASH, ConnectorCapability.ORDERED_SCAN, ConnectorCapability.STREAM_SCAN @@ -660,7 +660,10 @@ private List comparisonColumns(ComparisonSpec comparisons, Set keys = new LinkedHashSet<>(keyColumns); List result = new ArrayList<>(); for (FieldDescriptor field : schemaDescriptor.getFields()) { - if (field.getName() != null && !keys.contains(field.getName()) && !excluded.contains(field.getName())) { + if (field.getName() != null + && !keys.contains(field.getName()) + && !excluded.contains(field.getName()) + && !DerivedCompareColumns.isDerived(field.getName())) { result.add(field.getName()); } } @@ -825,7 +828,7 @@ private Map discoverDorisPartitionAttributes(ResourceLocator res try { Properties properties = buildConnectionProperties(connection, new Properties()); try (Connection jdbcConnection = DriverManager.getConnection(requireJdbcUrl(), properties); - PreparedStatement statement = jdbcConnection.prepareStatement("SHOW CREATE TABLE " + tablePath.getFullPath()); + PreparedStatement statement = jdbcConnection.prepareStatement("SHOW CREATE TABLE " + quotedTablePath(tablePath)); ResultSet resultSet = statement.executeQuery()) { if (!resultSet.next()) { return attributes; @@ -855,11 +858,15 @@ private Map discoverDorisPartitionAttributes(ResourceLocator res } } } catch (Exception e) { - log.debug("Failed to discover Doris partition metadata for {}", tablePath.getFullPath(), e); + log.warn("Failed to discover Doris partition metadata for {}", tablePath.getFullPath(), e); } return attributes; } + private String quotedTablePath(TablePath tablePath) { + return dialect.getCapabilityProvider().quote(tablePath.getPathComponents()); + } + private String readString(ResultSet resultSet, String columnLabel, int fallbackIndex) throws SQLException { try { return resultSet.getString(columnLabel); diff --git a/consilens-connector/consilens-connector-plugins/consilens-connector-base/src/test/java/com/consilens/conncetor/base/jdbc/JdbcDatasetHandleTest.java b/consilens-connector/consilens-connector-plugins/consilens-connector-base/src/test/java/com/consilens/conncetor/base/jdbc/JdbcDatasetHandleTest.java index 31d1ce7..e3f57a7 100644 --- a/consilens-connector/consilens-connector-plugins/consilens-connector-base/src/test/java/com/consilens/conncetor/base/jdbc/JdbcDatasetHandleTest.java +++ b/consilens-connector/consilens-connector-plugins/consilens-connector-base/src/test/java/com/consilens/conncetor/base/jdbc/JdbcDatasetHandleTest.java @@ -8,12 +8,12 @@ import org.junit.jupiter.api.Test; import java.lang.reflect.Proxy; -import java.util.EnumSet; import java.util.LinkedHashMap; import java.util.Map; import java.util.Properties; import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertFalse; import static org.junit.jupiter.api.Assertions.assertTrue; import static org.junit.jupiter.api.Assertions.assertNull; @@ -66,9 +66,10 @@ void shouldExposeRelationalSupportWithoutLeakingExecutionInputsToMetadata() { readOptions); assertTrue(handle.getSupport(RelationalDatasetSupport.class).isPresent()); + assertTrue(handle.getMetadata().getCapabilities().supports(ConnectorCapability.SERVER_SIDE_HASH)); + assertFalse(handle.getMetadata().getCapabilities().supports(ConnectorCapability.SERVER_SIDE_JOIN)); assertNull(handle.getMetadata().getAttributes().get("readOptions")); assertNull(handle.getMetadata().getAttributes().get("connection")); - assertTrue(handle.getMetadata().getCapabilities().supports(ConnectorCapability.SERVER_SIDE_JOIN)); } @Test @@ -82,6 +83,8 @@ void shouldExposeSqlResourceTypeInMetadata() { ReadOptions.builder().build()); assertEquals("sql", handle.getMetadata().getAttributes().get("resourceType")); + assertTrue(handle.getMetadata().getCapabilities().supports(ConnectorCapability.SERVER_SIDE_HASH)); + assertFalse(handle.getMetadata().getCapabilities().supports(ConnectorCapability.SERVER_SIDE_JOIN)); assertEquals("SELECT id FROM orders", handle.getMetadata().getLogicalName()); } diff --git a/consilens-connector/consilens-connector-plugins/consilens-connector-mysql/src/main/java/com/consilens/connector/mysql/MySQLSqlQueryGenerator.java b/consilens-connector/consilens-connector-plugins/consilens-connector-mysql/src/main/java/com/consilens/connector/mysql/MySQLSqlQueryGenerator.java index 8e73251..5b6eb91 100644 --- a/consilens-connector/consilens-connector-plugins/consilens-connector-mysql/src/main/java/com/consilens/connector/mysql/MySQLSqlQueryGenerator.java +++ b/consilens-connector/consilens-connector-plugins/consilens-connector-mysql/src/main/java/com/consilens/connector/mysql/MySQLSqlQueryGenerator.java @@ -61,6 +61,13 @@ public String getChecksumSQL(String schemaName, String tableName, } } + @Override + public boolean supportsChecksumAlgorithm(ChecksumAlgorithm checksumAlgorithm) { + return checksumAlgorithm == null + || checksumAlgorithm == ChecksumAlgorithm.CONCAT + || checksumAlgorithm.isXor(); + } + @Override protected String buildNullSafeNotEquals(String left, String right) { return "NOT (" + left + " <=> " + right + ")"; diff --git a/consilens-connector/consilens-connector-plugins/consilens-connector-oracle/src/main/java/com/consilens/connector/oracle/OracleSqlQueryGenerator.java b/consilens-connector/consilens-connector-plugins/consilens-connector-oracle/src/main/java/com/consilens/connector/oracle/OracleSqlQueryGenerator.java index 6a8783f..7a5f59f 100644 --- a/consilens-connector/consilens-connector-plugins/consilens-connector-oracle/src/main/java/com/consilens/connector/oracle/OracleSqlQueryGenerator.java +++ b/consilens-connector/consilens-connector-plugins/consilens-connector-oracle/src/main/java/com/consilens/connector/oracle/OracleSqlQueryGenerator.java @@ -51,6 +51,13 @@ public String getChecksumSQL(String schemaName, String tableName, } } + @Override + public boolean supportsChecksumAlgorithm(ChecksumAlgorithm checksumAlgorithm) { + return checksumAlgorithm == null + || checksumAlgorithm == ChecksumAlgorithm.CONCAT + || checksumAlgorithm.isXor(); + } + /** * Generate checksum SQL using traditional CONCAT method (backward compatible) */ diff --git a/consilens-connector/consilens-connector-plugins/consilens-connector-postgresql/src/main/java/com/consilens/connector/postgresql/PostgreSQLSqlQueryGenerator.java b/consilens-connector/consilens-connector-plugins/consilens-connector-postgresql/src/main/java/com/consilens/connector/postgresql/PostgreSQLSqlQueryGenerator.java index 6fd8604..2dc27df 100644 --- a/consilens-connector/consilens-connector-plugins/consilens-connector-postgresql/src/main/java/com/consilens/connector/postgresql/PostgreSQLSqlQueryGenerator.java +++ b/consilens-connector/consilens-connector-plugins/consilens-connector-postgresql/src/main/java/com/consilens/connector/postgresql/PostgreSQLSqlQueryGenerator.java @@ -64,6 +64,13 @@ public String getChecksumSQL(String schemaName, String tableName, } } + @Override + public boolean supportsChecksumAlgorithm(ChecksumAlgorithm checksumAlgorithm) { + return checksumAlgorithm == null + || checksumAlgorithm == ChecksumAlgorithm.CONCAT + || checksumAlgorithm.isXor(); + } + /** * Generate checksum SQL using traditional CONCAT method (backward compatible) */ diff --git a/consilens-connector/consilens-connector-plugins/consilens-connector-starrocks/src/main/java/com/consilens/connector/starrocks/StarRocksSqlQueryGenerator.java b/consilens-connector/consilens-connector-plugins/consilens-connector-starrocks/src/main/java/com/consilens/connector/starrocks/StarRocksSqlQueryGenerator.java index 594e971..f7dd372 100644 --- a/consilens-connector/consilens-connector-plugins/consilens-connector-starrocks/src/main/java/com/consilens/connector/starrocks/StarRocksSqlQueryGenerator.java +++ b/consilens-connector/consilens-connector-plugins/consilens-connector-starrocks/src/main/java/com/consilens/connector/starrocks/StarRocksSqlQueryGenerator.java @@ -1,5 +1,6 @@ package com.consilens.connector.starrocks; +import com.consilens.common.enums.ChecksumAlgorithm; import com.consilens.connector.api.CapabilityProvider; import com.consilens.connector.api.DataTypeHandler; import com.consilens.conncetor.base.BaseSqlQueryGenerator; @@ -51,6 +52,25 @@ public String getLimitClause(long offset, long limit) { @Override public String getChecksumSQL(String schemaName, String tableName, + List keyColumns, + List columns, + Map columnDataTypes, + String whereClause, + ChecksumAlgorithm checksumAlgorithm) { + if (checksumAlgorithm != null && checksumAlgorithm.isXor()) { + return getChecksumSQLWithXor(schemaName, tableName, keyColumns, columns, columnDataTypes, whereClause); + } + return getChecksumSQLWithConcat(schemaName, tableName, keyColumns, columns, columnDataTypes, whereClause); + } + + @Override + public boolean supportsChecksumAlgorithm(ChecksumAlgorithm checksumAlgorithm) { + return checksumAlgorithm == null + || checksumAlgorithm == ChecksumAlgorithm.CONCAT + || checksumAlgorithm.isXor(); + } + + private String getChecksumSQLWithConcat(String schemaName, String tableName, List keyColumns, List columns, Map columnDataTypes, @@ -101,6 +121,65 @@ public String getChecksumSQL(String schemaName, String tableName, return sql.toString(); } + private String getChecksumSQLWithXor(String schemaName, String tableName, + List keyColumns, + List columns, + Map columnDataTypes, + String whereClause) { + StringBuilder sql = new StringBuilder(); + sql.append("SELECT COUNT(*) as row_count, "); + + if (columns.isEmpty()) { + sql.append("'0' as checksum "); + } else { + sql.append("CASE WHEN COUNT(*) = 0 THEN '0' ELSE CONCAT("); + for (int i = 1; i <= 16; i++) { + if (i > 1) { + sql.append(", "); + } + sql.append(buildXorHexDigitExpression(i)); + } + sql.append(") END as checksum "); + sql.append("FROM (SELECT UPPER(SUBSTRING(MD5(CONCAT_WS('|', "); + for (int i = 0; i < columns.size(); i++) { + if (i > 0) { + sql.append(", "); + } + String col = columns.get(i); + DataType dataType = columnDataTypes.get(col); + sql.append(dataTypeHandler.normalizeColumn(col, dataType)); + } + sql.append(")), 1, 16)) as row_hash FROM "); + sql.append(buildRelationRef(schemaName, tableName)); + if (whereClause != null && !whereClause.trim().isEmpty()) { + sql.append(" WHERE ").append(whereClause); + } + sql.append(") AS data"); + } + + return sql.toString(); + } + + private String buildXorHexDigitExpression(int position) { + String nibbleValue = "CAST(CONV(SUBSTRING(row_hash, " + position + ", 1), 16, 10) AS BIGINT)"; + StringBuilder sql = new StringBuilder("UPPER(CONV("); + int[] masks = {1, 2, 4, 8}; + for (int i = 0; i < masks.length; i++) { + if (i > 0) { + sql.append(" + "); + } + int mask = masks[i]; + sql.append("MOD(SUM(MOD(FLOOR(") + .append(nibbleValue) + .append(" / ") + .append(mask) + .append("), 2)), 2) * ") + .append(mask); + } + sql.append(", 10, 16))"); + return sql.toString(); + } + @Override public String getRowHashSQL(String schemaName, String tableName, List primaryKeys, diff --git a/consilens-connector/consilens-connector-plugins/consilens-connector-starrocks/src/test/java/com/consilens/connector/starrocks/StarRocksSqlQueryGeneratorTest.java b/consilens-connector/consilens-connector-plugins/consilens-connector-starrocks/src/test/java/com/consilens/connector/starrocks/StarRocksSqlQueryGeneratorTest.java index 30ed99b..2c18d05 100644 --- a/consilens-connector/consilens-connector-plugins/consilens-connector-starrocks/src/test/java/com/consilens/connector/starrocks/StarRocksSqlQueryGeneratorTest.java +++ b/consilens-connector/consilens-connector-plugins/consilens-connector-starrocks/src/test/java/com/consilens/connector/starrocks/StarRocksSqlQueryGeneratorTest.java @@ -1,5 +1,6 @@ package com.consilens.connector.starrocks; +import com.consilens.common.enums.ChecksumAlgorithm; import com.consilens.connector.api.model.DataType; import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.Test; @@ -68,6 +69,28 @@ void testGetChecksumSQL() { assertTrue(sql.contains("ORDER BY")); } + @Test + void testSupportsChecksumAlgorithm() { + assertTrue(generator.supportsChecksumAlgorithm(ChecksumAlgorithm.CONCAT)); + assertTrue(generator.supportsChecksumAlgorithm(ChecksumAlgorithm.XOR)); + } + + @Test + void testGetChecksumSQLWithXor() { + Map types = new HashMap<>(); + types.put("id", DataType.INTEGER); + types.put("name", DataType.VARCHAR); + + String sql = generator.getChecksumSQL("test_db", "test_table", + Arrays.asList("id"), Arrays.asList("id", "name"), types, "id > 10", ChecksumAlgorithm.XOR); + + assertTrue(sql.contains("CASE WHEN COUNT(*) = 0 THEN '0'")); + assertTrue(sql.contains("UPPER(SUBSTRING(MD5(CONCAT_WS('|',")); + assertTrue(sql.contains("CONV(SUBSTRING(row_hash, 1, 1), 16, 10)")); + assertTrue(sql.contains("CONCAT(")); + assertTrue(sql.contains("WHERE id > 10")); + } + @Test void testGetFullOuterJoinSQL() { String sql = generator.getFullOuterJoinSQL("table1", "table2", diff --git a/consilens-core/pom.xml b/consilens-core/pom.xml index 1a28aef..a16ccf1 100644 --- a/consilens-core/pom.xml +++ b/consilens-core/pom.xml @@ -34,6 +34,20 @@ consilens-connector-api + + com.consilens + consilens-connector-mysql + ${project.version} + test + + + + com.consilens + consilens-connector-postgresql + ${project.version} + test + + com.fasterxml.jackson.core diff --git a/consilens-core/src/main/java/com/consilens/core/algorithm/ChecksumDiffer.java b/consilens-core/src/main/java/com/consilens/core/algorithm/ChecksumDiffer.java index 1d2f05d..a5e967f 100644 --- a/consilens-core/src/main/java/com/consilens/core/algorithm/ChecksumDiffer.java +++ b/consilens-core/src/main/java/com/consilens/core/algorithm/ChecksumDiffer.java @@ -432,21 +432,19 @@ private CompletableFuture getChecksumWithCache(TableSegment segm /** * Create bounded segment from checksum result. * - * CRITICAL: We need to set both minKey and maxKey to satisfy isBounded() check, - * but the original whereClause will ensure the correct upper bound (using <=). - * The maxKey will generate "< maxKey" in buildWhereClause(), but the whereClause - * will override it with the correct "<= maxKey" condition. + * CRITICAL: Initial bounds come from actual observed boundary rows, so the root segment + * must include the upper bound. Recursive child segments still use exclusive upper bounds + * to avoid overlap; only this root bounded segment flips to inclusive. */ private TableSegment createBoundedSegment(TableSegment original, ChecksumResult bounds) { if (bounds == null || bounds.getMinKey() == null || bounds.getMaxKey() == null) { return original; } - // Set both minKey and maxKey to satisfy isBounded() requirement - // The original whereClause will be preserved and will provide the correct upper bound return original.toBuilder() .minKey(Optional.of(bounds.getMinKey())) .maxKey(Optional.of(bounds.getMaxKey())) + .upperBoundInclusive(true) .build(); } @@ -1125,6 +1123,7 @@ private void storeDifferences(InfoTreeRecorder infoTreeRecorder, List d if (differences == null || differences.isEmpty()) { return; } + ensureDiffLimit(differences.size()); log.debug("Streaming {} differences for segment: {}", differences.size(), segmentId); infoTreeRecorder.addMetric(segmentId, "differences", differences.size()); diff --git a/consilens-core/src/main/java/com/consilens/core/algorithm/TableDiffer.java b/consilens-core/src/main/java/com/consilens/core/algorithm/TableDiffer.java index e4f7b91..dab11ae 100644 --- a/consilens-core/src/main/java/com/consilens/core/algorithm/TableDiffer.java +++ b/consilens-core/src/main/java/com/consilens/core/algorithm/TableDiffer.java @@ -63,6 +63,17 @@ protected List getCollectedDifferences() { return new ArrayList<>(); } + protected void ensureDiffLimit(long nextDifferenceCount) { + if (diffSink instanceof InMemoryDiffSink) { + long current = ((InMemoryDiffSink) diffSink).size(); + long maxDifferences = config.getMaxDifferences(); + if (current + nextDifferenceCount > maxDifferences) { + throw new IllegalStateException("Diff result exceeds maxDifferences=" + maxDifferences + + ". Increase strategy.maxDifferences or narrow the comparison scope."); + } + } + } + /** * Main entry point for table diffing. */ @@ -262,11 +273,13 @@ public static class DifferConfig { private final ChecksumAlgorithm checksumAlgorithm; private final LocalCompareMode localCompareMode; private final ConcurrencyConfig concurrencyConfig; + private final long maxDifferences; public DifferConfig(int bisectionFactor, long bisectionThreshold, boolean enableProfiling, ChecksumAlgorithm checksumAlgorithm, LocalCompareMode localCompareMode, - ConcurrencyConfig concurrencyConfig) { + ConcurrencyConfig concurrencyConfig, + long maxDifferences) { this.bisectionFactor = bisectionFactor; this.bisectionThreshold = bisectionThreshold; this.enableProfiling = enableProfiling; @@ -274,6 +287,15 @@ public DifferConfig(int bisectionFactor, long bisectionThreshold, this.checksumAlgorithm = checksumAlgorithm != null ? checksumAlgorithm : ChecksumAlgorithm.CONCAT; this.localCompareMode = localCompareMode != null ? localCompareMode : LocalCompareMode.FULL; this.concurrencyConfig = concurrencyConfig != null ? concurrencyConfig : ConcurrencyConfig.defaultConfig(); + this.maxDifferences = maxDifferences > 0 ? maxDifferences : 1_000_000L; + } + + public DifferConfig(int bisectionFactor, long bisectionThreshold, + boolean enableProfiling, ChecksumAlgorithm checksumAlgorithm, + LocalCompareMode localCompareMode, + ConcurrencyConfig concurrencyConfig) { + this(bisectionFactor, bisectionThreshold, enableProfiling, + checksumAlgorithm, localCompareMode, concurrencyConfig, 1_000_000L); } public DifferConfig(int bisectionFactor, long bisectionThreshold, diff --git a/consilens-core/src/main/java/com/consilens/core/compare/CompareExecutionSettings.java b/consilens-core/src/main/java/com/consilens/core/compare/CompareExecutionSettings.java index cdbe52e..31c2eb5 100644 --- a/consilens-core/src/main/java/com/consilens/core/compare/CompareExecutionSettings.java +++ b/consilens-core/src/main/java/com/consilens/core/compare/CompareExecutionSettings.java @@ -17,6 +17,7 @@ public class CompareExecutionSettings { private static final int DEFAULT_BISECTION_FACTOR = 32; private static final long DEFAULT_BISECTION_THRESHOLD = 16384L; + private static final long DEFAULT_MAX_DIFFERENCES = 1_000_000L; private final int bisectionFactor; private final long bisectionThreshold; @@ -25,6 +26,8 @@ public class CompareExecutionSettings { private final LocalCompareMode localCompareMode; private final ConcurrencyConfig concurrencyConfig; private final boolean validateUniqueKeys; + @Builder.Default + private final long maxDifferences = DEFAULT_MAX_DIFFERENCES; public static CompareExecutionSettings fromRequest(CompareRequest request) { CompareExecutionOptions executionOptions = request != null ? request.getExecutionOptions() : null; @@ -45,7 +48,8 @@ public static CompareExecutionSettings fromRequest(CompareRequest request) { ? LocalCompareMode.fromString(executionOptions.getLocalCompareMode()) : LocalCompareMode.FULL) .concurrencyConfig(resolveConcurrencyConfig(attributes)) - .validateUniqueKeys(executionOptions != null && Boolean.TRUE.equals(executionOptions.getValidateUniqueKeys())) + .validateUniqueKeys(executionOptions == null || !Boolean.FALSE.equals(executionOptions.getValidateUniqueKeys())) + .maxDifferences(resolveMaxDifferences(executionOptions, attributes)) .build(); } @@ -56,7 +60,24 @@ public TableDiffer.DifferConfig toDifferConfig() { enableProfiling, checksumAlgorithm, localCompareMode, - concurrencyConfig); + concurrencyConfig, + maxDifferences); + } + + public CompareExecutionSettings withChecksumAlgorithm(ChecksumAlgorithm checksumAlgorithm) { + if (this.checksumAlgorithm == checksumAlgorithm) { + return this; + } + return CompareExecutionSettings.builder() + .bisectionFactor(bisectionFactor) + .bisectionThreshold(bisectionThreshold) + .enableProfiling(enableProfiling) + .checksumAlgorithm(checksumAlgorithm) + .localCompareMode(localCompareMode) + .concurrencyConfig(concurrencyConfig) + .validateUniqueKeys(validateUniqueKeys) + .maxDifferences(maxDifferences) + .build(); } private static ConcurrencyConfig resolveConcurrencyConfig(Map attributes) { @@ -68,4 +89,21 @@ private static ConcurrencyConfig resolveConcurrencyConfig(Map at } return ConcurrencyConfig.defaultConfig(); } + + private static long resolveMaxDifferences(CompareExecutionOptions executionOptions, Map attributes) { + if (executionOptions != null && executionOptions.getMaxDifferences() != null) { + return Math.max(1L, executionOptions.getMaxDifferences()); + } + if (attributes == null) { + return DEFAULT_MAX_DIFFERENCES; + } + Object value = attributes.get("maxDifferences"); + if (value instanceof Number) { + return Math.max(1L, ((Number) value).longValue()); + } + if (value instanceof String && !((String) value).trim().isEmpty()) { + return Math.max(1L, Long.parseLong(((String) value).trim())); + } + return DEFAULT_MAX_DIFFERENCES; + } } diff --git a/consilens-core/src/main/java/com/consilens/core/compare/DefaultComparePlanner.java b/consilens-core/src/main/java/com/consilens/core/compare/DefaultComparePlanner.java index aac8c7e..7de512d 100644 --- a/consilens-core/src/main/java/com/consilens/core/compare/DefaultComparePlanner.java +++ b/consilens-core/src/main/java/com/consilens/core/compare/DefaultComparePlanner.java @@ -4,6 +4,7 @@ import com.consilens.connector.api.capability.ConnectorCapability; import com.consilens.connector.api.dataset.DatasetHandle; import com.consilens.connector.api.dataset.DatasetMetadata; +import com.consilens.connector.api.dataset.RelationalDatasetSupport; import com.consilens.connector.api.planner.ComparePlanTypes; import com.consilens.connector.api.planner.CompareRequest; import com.consilens.connector.api.planner.CompareStrategyPreference; @@ -24,12 +25,14 @@ public ComparePlan plan(CompareRequest request, DatasetHandle source, DatasetHan CapabilitySet targetCapabilities = getCapabilities(target); List availablePlans = new ArrayList<>(); - if (sourceCapabilities.supports(ConnectorCapability.SERVER_SIDE_HASH) + if (supportsRelationalExecution(source, target) + && sourceCapabilities.supports(ConnectorCapability.SERVER_SIDE_HASH) && targetCapabilities.supports(ConnectorCapability.SERVER_SIDE_HASH)) { availablePlans.add(new PushdownChecksumPlan(executionSettings)); } if (!hasSqlResource(source, target) + && supportsRelationalExecution(source, target) && sameExecutionDomain(source, target) && sourceCapabilities.supports(ConnectorCapability.SERVER_SIDE_JOIN) && targetCapabilities.supports(ConnectorCapability.SERVER_SIDE_JOIN)) { @@ -76,6 +79,12 @@ private boolean supportsStreaming(DatasetHandle datasetHandle) { return datasetHandle != null && datasetHandle.getRecordScanner().isPresent(); } + private boolean supportsRelationalExecution(DatasetHandle source, DatasetHandle target) { + return source != null && target != null + && source.getSupport(RelationalDatasetSupport.class).isPresent() + && target.getSupport(RelationalDatasetSupport.class).isPresent(); + } + private boolean hasSqlResource(DatasetHandle source, DatasetHandle target) { return isSqlResource(source) || isSqlResource(target); } diff --git a/consilens-core/src/main/java/com/consilens/core/compare/executor/ChecksumPlanExecutor.java b/consilens-core/src/main/java/com/consilens/core/compare/executor/ChecksumPlanExecutor.java index 923378e..05c48a9 100644 --- a/consilens-core/src/main/java/com/consilens/core/compare/executor/ChecksumPlanExecutor.java +++ b/consilens-core/src/main/java/com/consilens/core/compare/executor/ChecksumPlanExecutor.java @@ -1,5 +1,8 @@ package com.consilens.core.compare.executor; +import com.consilens.common.enums.ChecksumAlgorithm; +import com.consilens.connector.api.SqlQueryGenerator; +import com.consilens.connector.api.dataset.RelationalDatasetSupport; import com.consilens.connector.api.planner.CompareRequest; import com.consilens.connector.api.planner.CompareSegment; import com.consilens.core.algorithm.ChecksumDiffer; @@ -9,7 +12,11 @@ import com.consilens.core.compare.plan.PushdownChecksumPlan; import com.consilens.core.compare.relational.RelationalCompareSegmentAdapter; import com.consilens.core.diff.DiffResult; +import lombok.extern.slf4j.Slf4j; +import java.util.Optional; + +@Slf4j public class ChecksumPlanExecutor implements PlanExecutor { @Override @@ -19,11 +26,12 @@ public boolean supports(ComparePlan plan) { @Override public DiffResult execute(ComparePlan plan, CompareRequest request, CompareSegment source, CompareSegment target) throws Exception { - CompareExecutionSettings executionSettings = resolveExecutionSettings(plan, request); + CompareExecutionSettings executionSettings = resolveEffectiveExecutionSettings( + resolveExecutionSettings(plan, request), source, target); try (RelationalCompareSegmentAdapter.PreparedTableSegment sourceSegment = - RelationalCompareSegmentAdapter.toTableSegment(source, executionSettings); + RelationalCompareSegmentAdapter.toTableSegment(source, executionSettings); RelationalCompareSegmentAdapter.PreparedTableSegment targetSegment = - RelationalCompareSegmentAdapter.toTableSegment(target, executionSettings); + RelationalCompareSegmentAdapter.toTableSegment(target, executionSettings); ChecksumDiffer differ = new ChecksumDiffer(executionSettings.toDifferConfig())) { return differ.diffTables(sourceSegment.getTableSegment(), targetSegment.getTableSegment()).get(); } @@ -35,4 +43,42 @@ private CompareExecutionSettings resolveExecutionSettings(ComparePlan plan, Comp } return CompareExecutionSettings.fromRequest(request); } + + static CompareExecutionSettings resolveEffectiveExecutionSettings(CompareExecutionSettings executionSettings, + CompareSegment source, + CompareSegment target) { + if (executionSettings == null || executionSettings.getChecksumAlgorithm() == null + || !executionSettings.getChecksumAlgorithm().isXor()) { + return executionSettings; + } + + Optional sourceSupport = resolveRelationalSupport(source); + Optional targetSupport = resolveRelationalSupport(target); + if (sourceSupport.isEmpty() || targetSupport.isEmpty()) { + return executionSettings; + } + + SqlQueryGenerator sourceGenerator = sourceSupport.get().getDialect().getSqlQueryGenerator(); + SqlQueryGenerator targetGenerator = targetSupport.get().getDialect().getSqlQueryGenerator(); + boolean sourceSupportsRequested = sourceGenerator.supportsChecksumAlgorithm(executionSettings.getChecksumAlgorithm()); + boolean targetSupportsRequested = targetGenerator.supportsChecksumAlgorithm(executionSettings.getChecksumAlgorithm()); + if (sourceSupportsRequested && targetSupportsRequested) { + return executionSettings; + } + + log.warn("Checksum algorithm {} is not supported by both sides (source: {} via {}, target: {} via {}). " + + "Downgrading to CONCAT for a consistent pushdown checksum comparison.", + executionSettings.getChecksumAlgorithm(), + sourceSupport.get().getName(), + sourceGenerator.getClass().getSimpleName(), + targetSupport.get().getName(), + targetGenerator.getClass().getSimpleName()); + return executionSettings.withChecksumAlgorithm(ChecksumAlgorithm.CONCAT); + } + + private static Optional resolveRelationalSupport(CompareSegment segment) { + return segment != null && segment.getDataset() != null + ? segment.getDataset().getSupport(RelationalDatasetSupport.class) + : Optional.empty(); + } } diff --git a/consilens-core/src/main/java/com/consilens/core/compare/executor/ConnectorRecordDiffer.java b/consilens-core/src/main/java/com/consilens/core/compare/executor/ConnectorRecordDiffer.java index bccd962..b513468 100644 --- a/consilens-core/src/main/java/com/consilens/core/compare/executor/ConnectorRecordDiffer.java +++ b/consilens-core/src/main/java/com/consilens/core/compare/executor/ConnectorRecordDiffer.java @@ -5,6 +5,7 @@ import com.consilens.connector.api.dataset.DatasetHandle; import com.consilens.connector.api.model.ComparisonSpec; import com.consilens.connector.api.model.DataType; +import com.consilens.connector.api.model.DerivedCompareColumns; import com.consilens.connector.api.model.FieldDescriptor; import com.consilens.connector.api.model.KeySpec; import com.consilens.connector.api.model.ResourceLocator; @@ -61,13 +62,13 @@ DiffResult diff(CompareSegment source, CompareSegment target, CompareExecutionSe while (sourceCursor.hasGroup() || targetCursor.hasGroup()) { if (!sourceCursor.hasGroup()) { for (RecordRow row : targetCursor.currentRows()) { - differences.add(DiffRow.added(targetCursor.currentKey().displayParts(), row.values, targetData.columns)); + addDifference(differences, DiffRow.added(targetCursor.currentKey().displayParts(), row.values, targetData.columns), settings); sourceMissingCount++; } targetCursor.advanceGroup(); } else if (!targetCursor.hasGroup()) { for (RecordRow row : sourceCursor.currentRows()) { - differences.add(DiffRow.removed(sourceCursor.currentKey().displayParts(), row.values, sourceData.columns)); + addDifference(differences, DiffRow.removed(sourceCursor.currentKey().displayParts(), row.values, sourceData.columns), settings); targetMissingCount++; } sourceCursor.advanceGroup(); @@ -75,13 +76,13 @@ DiffResult diff(CompareSegment source, CompareSegment target, CompareExecutionSe int keyComparison = sourceCursor.currentKey().compareTo(targetCursor.currentKey()); if (keyComparison < 0) { for (RecordRow row : sourceCursor.currentRows()) { - differences.add(DiffRow.removed(sourceCursor.currentKey().displayParts(), row.values, sourceData.columns)); + addDifference(differences, DiffRow.removed(sourceCursor.currentKey().displayParts(), row.values, sourceData.columns), settings); targetMissingCount++; } sourceCursor.advanceGroup(); } else if (keyComparison > 0) { for (RecordRow row : targetCursor.currentRows()) { - differences.add(DiffRow.added(targetCursor.currentKey().displayParts(), row.values, targetData.columns)); + addDifference(differences, DiffRow.added(targetCursor.currentKey().displayParts(), row.values, targetData.columns), settings); sourceMissingCount++; } targetCursor.advanceGroup(); @@ -97,25 +98,25 @@ DiffResult diff(CompareSegment source, CompareSegment target, CompareExecutionSe if (sourceRows.size() != 1 || targetRows.size() != 1) { for (RecordRow row : sourceRows) { - differences.add(DiffRow.removed(key.displayParts(), row.values, sourceData.columns)); + addDifference(differences, DiffRow.removed(key.displayParts(), row.values, sourceData.columns), settings); targetMissingCount++; } for (RecordRow row : targetRows) { - differences.add(DiffRow.added(key.displayParts(), row.values, targetData.columns)); + addDifference(differences, DiffRow.added(key.displayParts(), row.values, targetData.columns), settings); sourceMissingCount++; } } else { RecordRow sourceRow = sourceRows.get(0); RecordRow targetRow = targetRows.get(0); if (!sourceRow.normalizedValues.equals(targetRow.normalizedValues)) { - differences.add(DiffRow.modified( + addDifference(differences, DiffRow.modified( key.displayParts(), sourceRow.values, targetRow.values, sourceData.columns, targetData.columns, changedColumns(sourceData.columns, sourceRow.normalizedValues, targetRow.normalizedValues), - changedColumns(targetData.columns, sourceRow.normalizedValues, targetRow.normalizedValues))); + changedColumns(targetData.columns, sourceRow.normalizedValues, targetRow.normalizedValues)), settings); mismatchCount++; } } @@ -167,6 +168,15 @@ DiffResult diff(CompareSegment source, CompareSegment target, CompareExecutionSe .build(); } + private void addDifference(List differences, DiffRow diffRow, CompareExecutionSettings settings) { + long maxDifferences = settings != null ? settings.getMaxDifferences() : 1_000_000L; + if (differences.size() >= maxDifferences) { + throw new ConnectorException("Diff result exceeds maxDifferences=" + maxDifferences + + ". Increase strategy.maxDifferences or narrow the comparison scope."); + } + differences.add(diffRow); + } + DiffResult empty(CompareSegment source, CompareSegment target, long sourceRows, long targetRows) { return DiffResult.builder() .differences(List.of()) @@ -251,7 +261,10 @@ private List columns(KeySpec keySpec, ComparisonSpec comparisons, Schema ? new LinkedHashSet<>(keySpec.getFields()) : Set.of(); for (FieldDescriptor field : schema.getFields()) { - if (field.getName() != null && !keys.contains(field.getName()) && !excluded.contains(field.getName())) { + if (field.getName() != null + && !keys.contains(field.getName()) + && !excluded.contains(field.getName()) + && !DerivedCompareColumns.isDerived(field.getName())) { result.add(field.getName()); } } diff --git a/consilens-core/src/main/java/com/consilens/core/compare/relational/RelationalCompareSegmentAdapter.java b/consilens-core/src/main/java/com/consilens/core/compare/relational/RelationalCompareSegmentAdapter.java index bdc8a27..8ac770a 100644 --- a/consilens-core/src/main/java/com/consilens/core/compare/relational/RelationalCompareSegmentAdapter.java +++ b/consilens-core/src/main/java/com/consilens/core/compare/relational/RelationalCompareSegmentAdapter.java @@ -3,6 +3,7 @@ import com.consilens.connector.api.ConnectorException; import com.consilens.connector.api.dataset.RelationalDatasetSupport; import com.consilens.connector.api.model.ComparisonSpec; +import com.consilens.connector.api.model.DerivedCompareColumns; import com.consilens.connector.api.model.FieldDescriptor; import com.consilens.connector.api.model.KeySpec; import com.consilens.connector.api.model.PoolConfiguration; @@ -174,7 +175,10 @@ private static List resolveComparisonColumns(ComparisonSpec comparisons, Set keySet = new LinkedHashSet<>(keyColumns); return schema.getFields().stream() .map(FieldDescriptor::getName) - .filter(name -> name != null && !keySet.contains(name) && !excluded.contains(name)) + .filter(name -> name != null + && !keySet.contains(name) + && !excluded.contains(name) + && !DerivedCompareColumns.isDerived(name)) .collect(Collectors.toCollection(ArrayList::new)); } diff --git a/consilens-core/src/main/java/com/consilens/core/database/adpter/AbstractDatabaseAdapter.java b/consilens-core/src/main/java/com/consilens/core/database/adpter/AbstractDatabaseAdapter.java index 5e2f13f..b5b77d0 100644 --- a/consilens-core/src/main/java/com/consilens/core/database/adpter/AbstractDatabaseAdapter.java +++ b/consilens-core/src/main/java/com/consilens/core/database/adpter/AbstractDatabaseAdapter.java @@ -570,16 +570,9 @@ private List getMinMaxKey(TableSegment segment, boolean getMin) { return null; } - String whereClause = segment.buildWhereClause(); - String sql = segment.hasRelationSource() - ? dialect.getSqlQueryGenerator().getMinMaxKeySQLFromSql( - segment.getRelationFromSql(), segment.getKeyColumns(), getMin, whereClause) - : dialect.getSqlQueryGenerator().getMinMaxKeySQL( - segment.getTablePath().getSchema().orElse(null), - segment.getTablePath().getTableName(), - segment.getKeyColumns(), - getMin, - whereClause); + String sql = segment.getKeyColumns().size() > 1 + ? buildCompositeBoundaryKeyQuery(segment, getMin) + : buildSingleColumnBoundaryKeyQuery(segment, getMin); try { List results = query(sql, Object[].class); @@ -609,6 +602,61 @@ private List getMinMaxKey(TableSegment segment, boolean getMin) { return null; } + private String buildSingleColumnBoundaryKeyQuery(TableSegment segment, boolean getMin) { + String whereClause = segment.buildWhereClause(); + return segment.hasRelationSource() + ? dialect.getSqlQueryGenerator().getMinMaxKeySQLFromSql( + segment.getRelationFromSql(), segment.getKeyColumns(), getMin, whereClause) + : dialect.getSqlQueryGenerator().getMinMaxKeySQL( + segment.getTablePath().getSchema().orElse(null), + segment.getTablePath().getTableName(), + segment.getKeyColumns(), + getMin, + whereClause); + } + + private String buildCompositeBoundaryKeyQuery(TableSegment segment, boolean getMin) { + StringBuilder sql = new StringBuilder(); + List keyColumns = segment.getKeyColumns(); + String whereClause = segment.buildWhereClause(); + + sql.append("SELECT "); + sql.append(String.join(", ", keyColumns.stream().map(this::quoteColumn).toArray(String[]::new))); + sql.append(" FROM "); + sql.append(resolveRelationRef(segment)); + + if (whereClause != null && !whereClause.trim().isEmpty()) { + sql.append(" WHERE ").append(whereClause); + } + + String direction = getMin ? "ASC" : "DESC"; + sql.append(" ORDER BY "); + sql.append(String.join(", ", + keyColumns.stream() + .map(column -> quoteColumn(column) + " " + direction) + .toArray(String[]::new))); + sql.append(" ").append(dialect.getSqlQueryGenerator().getLimitClause(1)); + return sql.toString(); + } + + private String resolveRelationRef(TableSegment segment) { + if (segment.hasRelationSource()) { + return "(" + stripTrailingSemicolon(segment.getRelationFromSql()) + ") consilens_sql_source"; + } + return String.join(".", + segment.getTablePath().getPathComponents().stream() + .map(this::quoteColumn) + .toArray(String[]::new)); + } + + private String stripTrailingSemicolon(String sql) { + String normalized = sql != null ? sql.trim() : ""; + while (normalized.endsWith(";")) { + normalized = normalized.substring(0, normalized.length() - 1).trim(); + } + return normalized; + } + @Override public List querySegment(TableSegment segment) { try { diff --git a/consilens-core/src/main/java/com/consilens/core/diff/InMemoryDiffSink.java b/consilens-core/src/main/java/com/consilens/core/diff/InMemoryDiffSink.java index f399463..a03c9a8 100644 --- a/consilens-core/src/main/java/com/consilens/core/diff/InMemoryDiffSink.java +++ b/consilens-core/src/main/java/com/consilens/core/diff/InMemoryDiffSink.java @@ -21,4 +21,8 @@ public void onDiffRow(DiffRow diffRow) { public List getDifferences() { return new ArrayList<>(differences); } + + public int size() { + return differences.size(); + } } diff --git a/consilens-core/src/main/java/com/consilens/core/segment/TableSegment.java b/consilens-core/src/main/java/com/consilens/core/segment/TableSegment.java index a7b29d4..0a45e9a 100644 --- a/consilens-core/src/main/java/com/consilens/core/segment/TableSegment.java +++ b/consilens-core/src/main/java/com/consilens/core/segment/TableSegment.java @@ -77,6 +77,9 @@ public class TableSegment { @Builder.Default private int segmentIndex = 0; + @Builder.Default + private boolean upperBoundInclusive = false; + private boolean caseSensitive; // Runtime data @@ -244,6 +247,7 @@ public List segmentByCheckpoints(List> checkpoints) { TableSegment segment = toBuilder() .minKey(Optional.of(previousPoint)) .maxKey(Optional.of(checkpoint)) + .upperBoundInclusive(false) .build(); segments.add(segment); previousPoint = checkpoint; @@ -253,6 +257,7 @@ public List segmentByCheckpoints(List> checkpoints) { TableSegment finalSegment = toBuilder() .minKey(Optional.of(previousPoint)) .maxKey(maxKey) + .upperBoundInclusive(upperBoundInclusive) .build(); segments.add(finalSegment); @@ -357,7 +362,10 @@ public String buildWhereClause() { if (whereBuilder.length() > 0) { whereBuilder.append(" AND "); } - whereBuilder.append(String.format("%s < %s", column, formatValue(maxValue))); + whereBuilder.append(String.format("%s %s %s", + column, + upperBoundInclusive ? "<=" : "<", + formatValue(maxValue))); } } } else { @@ -411,9 +419,11 @@ public String buildWhereClause() { keyColumns.get(j), formatValue(maxValues.get(j)))); } - // Add comparison for current key (always use <) - upperBound.append(String.format("%s < %s", - keyColumns.get(i), formatValue(maxValues.get(i)))); + // Add comparison for current key. + upperBound.append(String.format("%s %s %s", + keyColumns.get(i), + (upperBoundInclusive && i == keyColumns.size() - 1) ? "<=" : "<", + formatValue(maxValues.get(i)))); if (i > 0) { upperBound.append(")"); diff --git a/consilens-core/src/test/java/com/consilens/core/compare/CompareExecutionSettingsTest.java b/consilens-core/src/test/java/com/consilens/core/compare/CompareExecutionSettingsTest.java new file mode 100644 index 0000000..587b4d9 --- /dev/null +++ b/consilens-core/src/test/java/com/consilens/core/compare/CompareExecutionSettingsTest.java @@ -0,0 +1,87 @@ +package com.consilens.core.compare; + +import com.consilens.common.enums.ChecksumAlgorithm; +import com.consilens.common.enums.LocalCompareMode; +import com.consilens.connector.api.planner.CompareExecutionOptions; +import com.consilens.connector.api.planner.CompareRequest; +import com.consilens.core.thread.ConcurrencyConfig; +import org.junit.jupiter.api.Test; + +import java.util.Map; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertNotNull; +import static org.junit.jupiter.api.Assertions.assertSame; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class CompareExecutionSettingsTest { + + @Test + void shouldUseProductionDefaultsWhenRequestHasNoExecutionOptions() { + CompareExecutionSettings settings = CompareExecutionSettings.fromRequest(null); + + assertEquals(32, settings.getBisectionFactor()); + assertEquals(16_384L, settings.getBisectionThreshold()); + assertFalse(settings.isEnableProfiling()); + assertEquals(ChecksumAlgorithm.CONCAT, settings.getChecksumAlgorithm()); + assertEquals(LocalCompareMode.FULL, settings.getLocalCompareMode()); + assertTrue(settings.isValidateUniqueKeys()); + assertEquals(1_000_000L, settings.getMaxDifferences()); + assertNotNull(settings.getConcurrencyConfig()); + } + + @Test + void shouldResolveExecutionOptionsAndCustomConcurrencyConfig() { + ConcurrencyConfig concurrencyConfig = new ConcurrencyConfig( + new ConcurrencyConfig.PoolConfig(2, 4, 16, 30L, "test-io-"), + new ConcurrencyConfig.PoolConfig(1, 2, 16, 30L, "test-cpu-")); + CompareRequest request = CompareRequest.builder() + .executionOptions(CompareExecutionOptions.builder() + .bisectionFactor(8) + .bisectionThreshold(512L) + .enableProfiling(true) + .checksumAlgorithm("xor") + .localCompareMode("row-hash") + .validateUniqueKeys(false) + .maxDifferences(25L) + .attributes(Map.of("concurrencyConfig", concurrencyConfig)) + .build()) + .build(); + + CompareExecutionSettings settings = CompareExecutionSettings.fromRequest(request); + + assertEquals(8, settings.getBisectionFactor()); + assertEquals(512L, settings.getBisectionThreshold()); + assertTrue(settings.isEnableProfiling()); + assertEquals(ChecksumAlgorithm.XOR, settings.getChecksumAlgorithm()); + assertEquals(LocalCompareMode.ROW_HASH, settings.getLocalCompareMode()); + assertFalse(settings.isValidateUniqueKeys()); + assertEquals(25L, settings.getMaxDifferences()); + assertSame(concurrencyConfig, settings.getConcurrencyConfig()); + } + + @Test + void shouldResolveMaxDifferencesFromAttributesAndClampInvalidLimits() { + CompareRequest stringAttributeRequest = CompareRequest.builder() + .executionOptions(CompareExecutionOptions.builder() + .attributes(Map.of("maxDifferences", " 42 ")) + .build()) + .build(); + CompareRequest numberAttributeRequest = CompareRequest.builder() + .executionOptions(CompareExecutionOptions.builder() + .attributes(Map.of("maxDifferences", 7)) + .build()) + .build(); + CompareRequest explicitInvalidRequest = CompareRequest.builder() + .executionOptions(CompareExecutionOptions.builder() + .maxDifferences(0L) + .attributes(Map.of("maxDifferences", 42L)) + .build()) + .build(); + + assertEquals(42L, CompareExecutionSettings.fromRequest(stringAttributeRequest).getMaxDifferences()); + assertEquals(7L, CompareExecutionSettings.fromRequest(numberAttributeRequest).getMaxDifferences()); + assertEquals(1L, CompareExecutionSettings.fromRequest(explicitInvalidRequest).getMaxDifferences()); + } +} diff --git a/consilens-core/src/test/java/com/consilens/core/compare/DefaultComparePlannerTest.java b/consilens-core/src/test/java/com/consilens/core/compare/DefaultComparePlannerTest.java index 3eb7b84..893b551 100644 --- a/consilens-core/src/test/java/com/consilens/core/compare/DefaultComparePlannerTest.java +++ b/consilens-core/src/test/java/com/consilens/core/compare/DefaultComparePlannerTest.java @@ -8,10 +8,13 @@ import com.consilens.connector.api.dataset.HashProvider; import com.consilens.connector.api.dataset.KeyLookupProvider; import com.consilens.connector.api.dataset.RecordScanner; +import com.consilens.connector.api.dataset.RelationalDatasetSupport; import com.consilens.connector.api.dataset.SnapshotProvider; import com.consilens.connector.api.dataset.SplitPlanner; +import com.consilens.connector.api.DatabaseDialect; import com.consilens.connector.api.model.ResourceLocator; import com.consilens.connector.api.model.SchemaDescriptor; +import com.consilens.connector.api.model.TablePath; import com.consilens.connector.api.planner.CompareExecutionOptions; import com.consilens.connector.api.planner.ComparePlanTypes; import com.consilens.connector.api.planner.CompareRequest; @@ -21,6 +24,8 @@ import java.util.EnumSet; import java.util.Map; import java.util.Optional; +import java.sql.Connection; +import java.sql.SQLException; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertThrows; @@ -42,10 +47,10 @@ void shouldRespectPreferredJoinPlanWhenAvailable() { .build()) .build(); - DatasetHandle source = dataset("shared", capabilities( + DatasetHandle source = relationalDataset("shared", capabilities( ConnectorCapability.SERVER_SIDE_JOIN, ConnectorCapability.SERVER_SIDE_HASH)); - DatasetHandle target = dataset("shared", capabilities( + DatasetHandle target = relationalDataset("shared", capabilities( ConnectorCapability.SERVER_SIDE_JOIN, ConnectorCapability.SERVER_SIDE_HASH)); @@ -63,8 +68,8 @@ void shouldFallbackToChecksumWhenPreferredJoinUnavailableAndFallbackAllowed() { .build()) .build(); - DatasetHandle source = dataset("left", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); - DatasetHandle target = dataset("right", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); + DatasetHandle source = relationalDataset("left", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); + DatasetHandle target = relationalDataset("right", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); ComparePlan plan = planner.plan(request, source, target); @@ -80,8 +85,8 @@ void shouldFailWhenPreferredPlanUnavailableAndFallbackDisabled() { .build()) .build(); - DatasetHandle source = dataset("left", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); - DatasetHandle target = dataset("right", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); + DatasetHandle source = relationalDataset("left", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); + DatasetHandle target = relationalDataset("right", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); assertThrows(IllegalStateException.class, () -> planner.plan(request, source, target)); } @@ -94,23 +99,23 @@ void shouldIgnoreAdvertisedKeyHashCapabilitiesWithoutProviders() { DatasetHandle source = dataset("left", capabilities( ConnectorCapability.SERVER_SIDE_HASH, ConnectorCapability.KEY_LOOKUP, - ConnectorCapability.STREAM_SCAN)); + ConnectorCapability.STREAM_SCAN), true); DatasetHandle target = dataset("right", capabilities( ConnectorCapability.SERVER_SIDE_HASH, ConnectorCapability.KEY_LOOKUP, - ConnectorCapability.STREAM_SCAN)); + ConnectorCapability.STREAM_SCAN), true); ComparePlan plan = planner.plan(request, source, target); - assertEquals(ComparePlanTypes.PUSHDOWN_CHECKSUM, plan.getPlanType()); + assertEquals(ComparePlanTypes.STREAMING_MERGE, plan.getPlanType()); } @Test - void shouldSelectChecksumForNonRelationalDatasetsWithHashCapability() { + void shouldSelectChecksumForRelationalDatasetsWithHashCapability() { CompareRequest request = CompareRequest.builder().build(); - DatasetHandle source = dataset("left", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); - DatasetHandle target = dataset("right", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); + DatasetHandle source = relationalDataset("left", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); + DatasetHandle target = relationalDataset("right", capabilities(ConnectorCapability.SERVER_SIDE_HASH)); ComparePlan plan = planner.plan(request, source, target); @@ -121,10 +126,10 @@ void shouldSelectChecksumForNonRelationalDatasetsWithHashCapability() { void shouldPreferChecksumForSqlResources() { CompareRequest request = CompareRequest.builder().build(); - DatasetHandle source = dataset("shared", capabilities( + DatasetHandle source = relationalDataset("shared", capabilities( ConnectorCapability.SERVER_SIDE_JOIN, ConnectorCapability.SERVER_SIDE_HASH), true, Map.of("resourceType", "sql")); - DatasetHandle target = dataset("shared", capabilities( + DatasetHandle target = relationalDataset("shared", capabilities( ConnectorCapability.SERVER_SIDE_JOIN, ConnectorCapability.SERVER_SIDE_HASH), true, Map.of("resourceType", "table")); @@ -133,10 +138,77 @@ void shouldPreferChecksumForSqlResources() { assertEquals(ComparePlanTypes.PUSHDOWN_CHECKSUM, plan.getPlanType()); } + @Test + void shouldSelectStreamingWhenRelationalHashIsUnavailable() { + CompareRequest request = CompareRequest.builder() + .strategyPreference(CompareStrategyPreference.builder() + .preferredPlans(java.util.List.of(ComparePlanTypes.PUSHDOWN_CHECKSUM, ComparePlanTypes.STREAMING_MERGE)) + .allowFallback(true) + .build()) + .build(); + + DatasetHandle source = relationalDataset("left", capabilities(ConnectorCapability.STREAM_SCAN), true, Map.of()); + DatasetHandle target = relationalDataset("right", capabilities(ConnectorCapability.STREAM_SCAN), true, Map.of()); + + ComparePlan plan = planner.plan(request, source, target); + + assertEquals(ComparePlanTypes.STREAMING_MERGE, plan.getPlanType()); + } + + @Test + void shouldFailWhenChecksumIsRequiredButHashCapabilityIsUnavailable() { + CompareRequest request = CompareRequest.builder() + .strategyPreference(CompareStrategyPreference.builder() + .preferredPlans(java.util.List.of("checksum")) + .allowFallback(false) + .build()) + .build(); + + DatasetHandle source = relationalDataset("left", capabilities(ConnectorCapability.STREAM_SCAN), true, Map.of()); + DatasetHandle target = relationalDataset("right", capabilities(ConnectorCapability.STREAM_SCAN), true, Map.of()); + + assertThrows(IllegalStateException.class, () -> planner.plan(request, source, target)); + } + + @Test + void shouldResolvePlanAliasesCaseInsensitively() { + CompareRequest request = CompareRequest.builder() + .strategyPreference(CompareStrategyPreference.builder() + .preferredPlans(java.util.List.of(" JOIN ")) + .allowFallback(false) + .build()) + .build(); + + DatasetHandle source = relationalDataset("shared", capabilities( + ConnectorCapability.SERVER_SIDE_JOIN, + ConnectorCapability.SERVER_SIDE_HASH)); + DatasetHandle target = relationalDataset("shared", capabilities( + ConnectorCapability.SERVER_SIDE_JOIN, + ConnectorCapability.SERVER_SIDE_HASH)); + + ComparePlan plan = planner.plan(request, source, target); + + assertEquals(ComparePlanTypes.SERVER_JOIN, plan.getPlanType()); + } + + @Test + void shouldFailWhenNoCompatiblePlanExists() { + CompareRequest request = CompareRequest.builder().build(); + + DatasetHandle source = dataset("left", capabilities()); + DatasetHandle target = dataset("right", capabilities()); + + assertThrows(IllegalStateException.class, () -> planner.plan(request, source, target)); + } + private DatasetHandle dataset(String executionDomainId, CapabilitySet capabilities) { return dataset(executionDomainId, capabilities, false); } + private DatasetHandle relationalDataset(String executionDomainId, CapabilitySet capabilities) { + return relationalDataset(executionDomainId, capabilities, false, Map.of()); + } + private DatasetHandle dataset(String executionDomainId, CapabilitySet capabilities, boolean withScanner) { return dataset(executionDomainId, capabilities, withScanner, Map.of()); } @@ -154,11 +226,27 @@ private DatasetHandle dataset(String executionDomainId, return new StubDatasetHandle(metadata, withScanner); } + private DatasetHandle relationalDataset(String executionDomainId, + CapabilitySet capabilities, + boolean withScanner, + Map attributes) { + DatasetMetadata metadata = DatasetMetadata.builder() + .logicalName("orders") + .executionDomainId(executionDomainId) + .capabilities(capabilities) + .attributes(attributes) + .build(); + return new StubRelationalDatasetHandle(metadata, withScanner); + } + private CapabilitySet capabilities(ConnectorCapability... capabilities) { + if (capabilities == null || capabilities.length == 0) { + return CapabilitySet.empty(); + } return new CapabilitySet(EnumSet.copyOf(java.util.List.of(capabilities))); } - private static final class StubDatasetHandle implements DatasetHandle { + private static class StubDatasetHandle implements DatasetHandle { private final DatasetMetadata metadata; private final boolean withScanner; @@ -234,4 +322,46 @@ public Optional getFilterPushdownProvider() { public void close() { } } + + private static final class StubRelationalDatasetHandle extends StubDatasetHandle implements RelationalDatasetSupport { + + private StubRelationalDatasetHandle(DatasetMetadata metadata, boolean withScanner) { + super(metadata, withScanner); + } + + @Override + public String getName() { + return "orders"; + } + + @Override + public String getConnectorType() { + return "mysql"; + } + + @Override + public String getJdbcUrl() { + return "jdbc:mysql://localhost:3306/test"; + } + + @Override + public String getUsername() { + return "root"; + } + + @Override + public DatabaseDialect getDialect() { + return null; + } + + @Override + public TablePath getTablePath() { + return TablePath.of("orders"); + } + + @Override + public Connection getConnection() throws SQLException { + return null; + } + } } diff --git a/consilens-core/src/test/java/com/consilens/core/compare/DefaultCompareRuntimeTest.java b/consilens-core/src/test/java/com/consilens/core/compare/DefaultCompareRuntimeTest.java index 24789c8..efa52df 100644 --- a/consilens-core/src/test/java/com/consilens/core/compare/DefaultCompareRuntimeTest.java +++ b/consilens-core/src/test/java/com/consilens/core/compare/DefaultCompareRuntimeTest.java @@ -10,8 +10,10 @@ import com.consilens.connector.api.dataset.RecordScanner; import com.consilens.connector.api.dataset.SnapshotProvider; import com.consilens.connector.api.dataset.SplitPlanner; +import com.consilens.connector.api.model.ComparisonSpec; import com.consilens.connector.api.model.FieldDescriptor; import com.consilens.connector.api.model.KeySpec; +import com.consilens.connector.api.model.PredicateSpec; import com.consilens.connector.api.model.ResourceLocator; import com.consilens.connector.api.model.SchemaDescriptor; import com.consilens.connector.api.normalization.NormalizationSpec; @@ -29,9 +31,11 @@ import java.util.List; import java.util.Map; import java.util.Optional; +import java.util.stream.Collectors; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertNotNull; import static org.junit.jupiter.api.Assertions.assertSame; import static org.mockito.Mockito.mock; @@ -70,6 +74,67 @@ void shouldAttachNormalizationToReadOptionsInsteadOfConnection() throws Exceptio assertEquals("target", capturedTarget.getReadOptions().getOptions().get("normalizationSide")); } + @Test + void shouldBuildCompareSegmentsFromRequest() throws Exception { + CapturingConnectorRegistry registry = new CapturingConnectorRegistry( + new StubConnectorAdapter(), + new StubConnectorAdapter()); + CapturingExecutor executor = new CapturingExecutor(); + DefaultCompareRuntime runtime = new DefaultCompareRuntime( + registry, + (request, source, target) -> new PushdownChecksumPlan(CompareExecutionSettings.fromRequest(request)), + List.of(executor)); + KeySpec sourceKey = KeySpec.builder().fields(List.of("id")).build(); + KeySpec targetKey = KeySpec.builder().fields(List.of("order_id")).build(); + ComparisonSpec sourceComparisons = ComparisonSpec.builder() + .fields(List.of("amount", "dt")) + .exclude(List.of("debug_source")) + .build(); + ComparisonSpec targetComparisons = ComparisonSpec.builder() + .fields(List.of("actual_amount", "dt")) + .exclude(List.of("debug_target")) + .build(); + PredicateSpec sourceFilter = PredicateSpec.builder() + .type("sql") + .expression("dt = '2026-05-07'") + .build(); + PredicateSpec targetFilter = PredicateSpec.builder() + .type("sql") + .expression("dt = '2026-05-07'") + .build(); + CompareRequest request = CompareRequest.builder() + .source(connectorConfig("source-orders")) + .target(connectorConfig("target-orders")) + .sourceKeySpec(sourceKey) + .targetKeySpec(targetKey) + .sourceComparisons(sourceComparisons) + .targetComparisons(targetComparisons) + .sourceFilter(sourceFilter) + .targetFilter(targetFilter) + .build(); + + runtime.execute(request); + + CompareSegment sourceSegment = executor.lastSourceSegment; + CompareSegment targetSegment = executor.lastTargetSegment; + assertNotNull(sourceSegment); + assertNotNull(targetSegment); + assertEquals("source", sourceSegment.getSide()); + assertEquals("target", targetSegment.getSide()); + assertSame(sourceKey, sourceSegment.getKeySpec()); + assertSame(targetKey, targetSegment.getKeySpec()); + assertSame(sourceComparisons, sourceSegment.getComparisons()); + assertSame(targetComparisons, targetSegment.getComparisons()); + assertSame(sourceFilter, sourceSegment.getFilter()); + assertSame(targetFilter, targetSegment.getFilter()); + assertEquals("source-orders", sourceSegment.getResource().getName()); + assertEquals("target-orders", targetSegment.getResource().getName()); + assertEquals(List.of("id", "amount", "actual_amount", "dt", "order_cnt"), + sourceSegment.getSchema().getFields().stream().map(FieldDescriptor::getName).collect(Collectors.toList())); + assertEquals(List.of("id", "amount", "actual_amount", "dt", "order_cnt"), + targetSegment.getSchema().getFields().stream().map(FieldDescriptor::getName).collect(Collectors.toList())); + } + private ConnectorConfig connectorConfig(String tableName) { Map connection = new LinkedHashMap<>(); connection.put("url", "jdbc:mysql://localhost:3306/test"); diff --git a/consilens-core/src/test/java/com/consilens/core/compare/executor/ChecksumPlanExecutorTest.java b/consilens-core/src/test/java/com/consilens/core/compare/executor/ChecksumPlanExecutorTest.java new file mode 100644 index 0000000..8d7add8 --- /dev/null +++ b/consilens-core/src/test/java/com/consilens/core/compare/executor/ChecksumPlanExecutorTest.java @@ -0,0 +1,66 @@ +package com.consilens.core.compare.executor; + +import com.consilens.common.enums.ChecksumAlgorithm; +import com.consilens.connector.api.DatabaseDialect; +import com.consilens.connector.api.SqlQueryGenerator; +import com.consilens.connector.api.dataset.DatasetHandle; +import com.consilens.connector.api.dataset.RelationalDatasetSupport; +import com.consilens.connector.api.planner.CompareExecutionOptions; +import com.consilens.connector.api.planner.CompareRequest; +import com.consilens.connector.api.planner.CompareSegment; +import com.consilens.core.compare.CompareExecutionSettings; +import org.junit.jupiter.api.Test; + +import java.util.Optional; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +class ChecksumPlanExecutorTest { + + @Test + void shouldDowngradeXorWhenOneSideDoesNotSupportIt() { + CompareExecutionSettings executionSettings = CompareExecutionSettings.fromRequest(CompareRequest.builder() + .executionOptions(CompareExecutionOptions.builder().checksumAlgorithm("xor").build()) + .build()); + + CompareExecutionSettings effectiveSettings = ChecksumPlanExecutor.resolveEffectiveExecutionSettings( + executionSettings, + segmentWithChecksumSupport("mysql-source", true), + segmentWithChecksumSupport("starrocks-target", false)); + + assertEquals(ChecksumAlgorithm.CONCAT, effectiveSettings.getChecksumAlgorithm()); + } + + @Test + void shouldKeepXorWhenBothSidesSupportIt() { + CompareExecutionSettings executionSettings = CompareExecutionSettings.fromRequest(CompareRequest.builder() + .executionOptions(CompareExecutionOptions.builder().checksumAlgorithm("xor").build()) + .build()); + + CompareExecutionSettings effectiveSettings = ChecksumPlanExecutor.resolveEffectiveExecutionSettings( + executionSettings, + segmentWithChecksumSupport("mysql-source", true), + segmentWithChecksumSupport("postgres-target", true)); + + assertEquals(ChecksumAlgorithm.XOR, effectiveSettings.getChecksumAlgorithm()); + } + + private CompareSegment segmentWithChecksumSupport(String supportName, boolean supportsXor) { + SqlQueryGenerator generator = mock(SqlQueryGenerator.class); + when(generator.supportsChecksumAlgorithm(ChecksumAlgorithm.XOR)).thenReturn(supportsXor); + + DatabaseDialect dialect = mock(DatabaseDialect.class); + when(dialect.getSqlQueryGenerator()).thenReturn(generator); + + RelationalDatasetSupport support = mock(RelationalDatasetSupport.class); + when(support.getName()).thenReturn(supportName); + when(support.getDialect()).thenReturn(dialect); + + DatasetHandle dataset = mock(DatasetHandle.class); + when(dataset.getSupport(RelationalDatasetSupport.class)).thenReturn(Optional.of(support)); + + return CompareSegment.builder().dataset(dataset).build(); + } +} diff --git a/consilens-core/src/test/java/com/consilens/core/compare/executor/ConnectorRecordDifferTest.java b/consilens-core/src/test/java/com/consilens/core/compare/executor/ConnectorRecordDifferTest.java index 51c95a0..1b00baf 100644 --- a/consilens-core/src/test/java/com/consilens/core/compare/executor/ConnectorRecordDifferTest.java +++ b/consilens-core/src/test/java/com/consilens/core/compare/executor/ConnectorRecordDifferTest.java @@ -30,6 +30,7 @@ import java.util.Optional; import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; class ConnectorRecordDifferTest { @@ -80,23 +81,152 @@ void shouldExcludeColumnsInStreamingMode() { assertEquals(0L, result.getStatistics().getTotalDifferences()); } + @Test + void shouldIgnoreDerivedHashColumnsWhenFieldsAreNotExplicit() { + SchemaDescriptor schema = schema(List.of("id", "value", "row_hash")); + CompareSegment source = segment("source_orders", schema, null, + List.of(record(Map.of("id", "1", "value", "A", "row_hash", "mysql-hash")))); + CompareSegment target = segment("target_orders", schema, null, + List.of(record(Map.of("id", "1", "value", "A", "row_hash", "postgres-hash")))); + + DiffResult result = new ConnectorRecordDiffer().diff( + source, + target, + CompareExecutionSettings.builder() + .validateUniqueKeys(true) + .build()); + + assertEquals(0L, result.getStatistics().getTotalDifferences()); + assertEquals(0, result.getDifferences().size()); + } + + @Test + void shouldIgnoreKnownDerivedChecksumColumnsWhenFieldsAreNotExplicit() { + SchemaDescriptor schema = schema(List.of( + "id", + "value", + "checksum", + "row_checksum", + "record_checksum", + "record_hash", + "row_md5", + "consilens_checksum", + "consilens_row_hash")); + CompareSegment source = segment("source_orders", schema, null, + List.of(record(Map.of( + "id", "1", + "value", "A", + "checksum", "source-checksum", + "row_checksum", "source-row-checksum", + "record_checksum", "source-record-checksum", + "record_hash", "source-record-hash", + "row_md5", "source-md5", + "consilens_checksum", "source-consilens-checksum", + "consilens_row_hash", "source-consilens-row-hash")))); + CompareSegment target = segment("target_orders", schema, null, + List.of(record(Map.of( + "id", "1", + "value", "A", + "checksum", "target-checksum", + "row_checksum", "target-row-checksum", + "record_checksum", "target-record-checksum", + "record_hash", "target-record-hash", + "row_md5", "target-md5", + "consilens_checksum", "target-consilens-checksum", + "consilens_row_hash", "target-consilens-row-hash")))); + + DiffResult result = new ConnectorRecordDiffer().diff( + source, + target, + CompareExecutionSettings.builder() + .validateUniqueKeys(true) + .build()); + + assertEquals(0L, result.getStatistics().getTotalDifferences()); + assertEquals(0, result.getDifferences().size()); + } + + @Test + void shouldCompareDerivedHashColumnsWhenFieldsAreExplicit() { + SchemaDescriptor schema = schema(List.of("id", "value", "row_hash")); + CompareSegment source = segment("source_orders", schema, + ComparisonSpec.builder().fields(List.of("row_hash")).build(), + List.of(record(Map.of("id", "1", "value", "A", "row_hash", "mysql-hash")))); + CompareSegment target = segment("target_orders", schema, + ComparisonSpec.builder().fields(List.of("row_hash")).build(), + List.of(record(Map.of("id", "1", "value", "A", "row_hash", "postgres-hash")))); + + DiffResult result = new ConnectorRecordDiffer().diff( + source, + target, + CompareExecutionSettings.builder() + .validateUniqueKeys(true) + .build()); + + assertEquals(1L, result.getStatistics().getMismatchCount()); + assertEquals(1L, result.getStatistics().getTotalDifferences()); + assertEquals(List.of("row_hash"), result.getDifferences().get(0).getChangedColumns1()); + assertEquals(List.of("row_hash"), result.getDifferences().get(0).getChangedColumns2()); + } + + @Test + void shouldAllowDuplicateKeysWhenValidationIsDisabled() { + CompareSegment source = segment("source_orders", List.of(record("1", "A"), record("1", "B"))); + CompareSegment target = segment("target_orders", List.of(record("1", "A"))); + + DiffResult result = new ConnectorRecordDiffer().diff( + source, + target, + CompareExecutionSettings.builder() + .validateUniqueKeys(false) + .build()); + + assertEquals(3L, result.getStatistics().getTotalDifferences()); + } + + @Test + void shouldFailOnDuplicateKeysWhenValidationIsEnabled() { + CompareSegment source = segment("source_orders", List.of(record("1", "A"), record("1", "B"))); + CompareSegment target = segment("target_orders", List.of(record("1", "A"))); + + assertThrows(com.consilens.connector.api.ConnectorException.class, () -> new ConnectorRecordDiffer().diff( + source, + target, + CompareExecutionSettings.builder() + .validateUniqueKeys(true) + .build())); + } + + @Test + void shouldFailWhenDiffCountExceedsConfiguredLimit() { + CompareSegment source = segment("source_orders", List.of(record("1", "A"), record("2", "B"))); + CompareSegment target = segment("target_orders", List.of()); + + assertThrows(com.consilens.connector.api.ConnectorException.class, () -> new ConnectorRecordDiffer().diff( + source, + target, + CompareExecutionSettings.builder() + .validateUniqueKeys(true) + .maxDifferences(1L) + .build())); + } + private CompareSegment segment(String tableName, List records) { return segment(tableName, ComparisonSpec.builder().fields(List.of("value")).build(), records); } private CompareSegment segment(String tableName, ComparisonSpec comparisons, List records) { + return segment(tableName, schema(List.of("id", "value")), comparisons, records); + } + + private CompareSegment segment(String tableName, + SchemaDescriptor schema, + ComparisonSpec comparisons, + List records) { ResourceLocator resource = ResourceLocator.builder() .type("table") .name(tableName) .build(); - SchemaDescriptor schema = SchemaDescriptor.builder() - .fields(List.of( - FieldDescriptor.builder().name("id").canonicalType("varchar").build(), - FieldDescriptor.builder().name("value").canonicalType("varchar").build())) - .fieldMap(Map.of( - "id", FieldDescriptor.builder().name("id").canonicalType("varchar").build(), - "value", FieldDescriptor.builder().name("value").canonicalType("varchar").build())) - .build(); return CompareSegment.builder() .dataset(new TestDatasetHandle(resource, schema, records)) .resource(resource) @@ -113,6 +243,28 @@ private CanonicalRecord record(String id, String value) { return new TestRecord(RecordKey.builder().parts(List.of(id)).build(), values); } + private CanonicalRecord record(Map rawValues) { + Map values = new LinkedHashMap<>(); + for (Map.Entry entry : rawValues.entrySet()) { + values.put(entry.getKey(), CanonicalValue.builder().type("varchar").value(entry.getValue()).build()); + } + return new TestRecord(RecordKey.builder().parts(List.of(rawValues.get("id"))).build(), values); + } + + private SchemaDescriptor schema(List columns) { + List fields = new ArrayList<>(); + Map fieldMap = new LinkedHashMap<>(); + for (String column : columns) { + FieldDescriptor field = FieldDescriptor.builder().name(column).canonicalType("varchar").build(); + fields.add(field); + fieldMap.put(column, field); + } + return SchemaDescriptor.builder() + .fields(fields) + .fieldMap(fieldMap) + .build(); + } + private static final class TestDatasetHandle implements DatasetHandle { private final ResourceLocator resource; diff --git a/consilens-core/src/test/java/com/consilens/core/compare/relational/RelationalCompareSegmentAdapterTest.java b/consilens-core/src/test/java/com/consilens/core/compare/relational/RelationalCompareSegmentAdapterTest.java index 2ac595c..9519762 100644 --- a/consilens-core/src/test/java/com/consilens/core/compare/relational/RelationalCompareSegmentAdapterTest.java +++ b/consilens-core/src/test/java/com/consilens/core/compare/relational/RelationalCompareSegmentAdapterTest.java @@ -17,7 +17,10 @@ import com.consilens.connector.api.model.ResourceLocator; import com.consilens.connector.api.model.SchemaDescriptor; import com.consilens.connector.api.model.TablePath; +import com.consilens.connector.api.model.PredicateSpec; import com.consilens.connector.api.planner.CompareSegment; +import com.consilens.connector.api.planner.KeyRangeSplit; +import com.consilens.connector.api.planner.OffsetLimitSplit; import com.consilens.core.compare.CompareExecutionSettings; import com.consilens.core.segment.TableSegment; import org.junit.jupiter.api.Test; @@ -28,6 +31,7 @@ import java.util.List; import java.util.Map; import java.util.Optional; +import java.util.stream.Collectors; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertFalse; @@ -86,6 +90,131 @@ void shouldExcludeColumnsFromAutomaticMatching() { assertEquals(List.of(), prepared.getTableSegment().getExtraColumns()); } + @Test + void shouldIgnoreDerivedHashColumnsFromAutomaticMatching() { + ResourceLocator resource = ResourceLocator.builder() + .type("sql") + .name("orders_sql") + .path("SELECT id, name, row_hash FROM orders") + .build(); + StubRelationalDataset dataset = new StubRelationalDataset(resource); + SchemaDescriptor schema = SchemaDescriptor.builder() + .fields(List.of( + FieldDescriptor.builder().name("id").canonicalType("bigint").build(), + FieldDescriptor.builder().name("name").canonicalType("VARCHAR").build(), + FieldDescriptor.builder().name("row_hash").canonicalType("VARCHAR").build())) + .fieldMap(Map.of( + "id", FieldDescriptor.builder().name("id").canonicalType("bigint").build(), + "name", FieldDescriptor.builder().name("name").canonicalType("VARCHAR").build(), + "row_hash", FieldDescriptor.builder().name("row_hash").canonicalType("VARCHAR").build())) + .build(); + CompareSegment segment = CompareSegment.builder() + .dataset(dataset) + .resource(resource) + .keySpec(KeySpec.builder().fields(List.of("id")).build()) + .schema(schema) + .build(); + + RelationalCompareSegmentAdapter.PreparedTableSegment prepared = + RelationalCompareSegmentAdapter.toTableSegment(segment, CompareExecutionSettings.fromRequest(null)); + + assertEquals(List.of("name"), prepared.getTableSegment().getExtraColumns()); + } + + @Test + void shouldHonorExplicitComparisonFieldsIncludingDerivedHashColumns() { + ResourceLocator resource = ResourceLocator.builder() + .type("sql") + .name("orders_sql") + .path("SELECT id, name, row_hash FROM orders") + .build(); + StubRelationalDataset dataset = new StubRelationalDataset(resource); + SchemaDescriptor schema = schema("id", "name", "row_hash"); + CompareSegment segment = CompareSegment.builder() + .dataset(dataset) + .resource(resource) + .keySpec(KeySpec.builder().fields(List.of("id")).build()) + .comparisons(ComparisonSpec.builder().fields(List.of("row_hash")).build()) + .schema(schema) + .build(); + + RelationalCompareSegmentAdapter.PreparedTableSegment prepared = + RelationalCompareSegmentAdapter.toTableSegment(segment, CompareExecutionSettings.fromRequest(null)); + + assertEquals(List.of("row_hash"), prepared.getTableSegment().getExtraColumns()); + } + + @Test + void shouldRejectExplicitComparisonFieldsOverlappingKeys() { + ResourceLocator resource = ResourceLocator.builder() + .type("table") + .name("orders") + .build(); + StubRelationalDataset dataset = new StubRelationalDataset(resource); + CompareSegment segment = CompareSegment.builder() + .dataset(dataset) + .resource(resource) + .keySpec(KeySpec.builder().fields(List.of("id")).build()) + .comparisons(ComparisonSpec.builder().fields(List.of("id", "name")).build()) + .schema(dataset.getSchema()) + .build(); + + assertThrows(ConnectorException.class, + () -> RelationalCompareSegmentAdapter.toTableSegment(segment, CompareExecutionSettings.fromRequest(null))); + } + + @Test + void shouldApplyFilterAndSupportedSplitsToTableSegment() { + ResourceLocator resource = ResourceLocator.builder() + .type("table") + .name("orders") + .build(); + StubRelationalDataset dataset = new StubRelationalDataset(resource); + CompareSegment keyRangeSegment = CompareSegment.builder() + .dataset(dataset) + .resource(resource) + .keySpec(KeySpec.builder().fields(List.of("id")).build()) + .filter(PredicateSpec.builder().expression("id >= 10").build()) + .schema(dataset.getSchema()) + .split(KeyRangeSplit.builder() + .startKey(List.of(10L)) + .endKey(List.of(20L)) + .build()) + .build(); + CompareSegment offsetSegment = keyRangeSegment.toBuilder() + .split(OffsetLimitSplit.builder() + .offset(100L) + .limit(50L) + .build()) + .build(); + + TableSegment keyRange = RelationalCompareSegmentAdapter + .toTableSegment(keyRangeSegment, CompareExecutionSettings.fromRequest(null)) + .getTableSegment(); + TableSegment offset = RelationalCompareSegmentAdapter + .toTableSegment(offsetSegment, CompareExecutionSettings.fromRequest(null)) + .getTableSegment(); + + assertEquals(Optional.of("id >= 10 AND id < 20 AND ((id >= 10))"), keyRange.getWhereClause()); + assertEquals(Optional.of(List.of(10L)), keyRange.getMinKey()); + assertEquals(Optional.of(List.of(20L)), keyRange.getMaxKey()); + assertTrue(offset.getLimitOffset().isPresent()); + assertEquals(50L, offset.getLimitOffset().get().getLimit()); + assertEquals(100L, offset.getLimitOffset().get().getOffset()); + } + + private SchemaDescriptor schema(String... names) { + Map fieldMap = new LinkedHashMap<>(); + List fields = java.util.Arrays.stream(names) + .map(name -> FieldDescriptor.builder().name(name).canonicalType("VARCHAR").build()) + .peek(field -> fieldMap.put(field.getName(), field)) + .collect(Collectors.toList()); + return SchemaDescriptor.builder() + .fields(fields) + .fieldMap(fieldMap) + .build(); + } + private static final class StubRelationalDataset implements DatasetHandle, RelationalDatasetSupport { private final ResourceLocator resource; @@ -181,7 +310,10 @@ public DatabaseDialect getDialect() { @Override public TablePath getTablePath() { - throw new ConnectorException("SQL resource does not expose a physical TablePath"); + if ("sql".equalsIgnoreCase(resource.getType())) { + throw new ConnectorException("SQL resource does not expose a physical TablePath"); + } + return TablePath.of(resource.getName()); } @Override diff --git a/consilens-core/src/test/java/com/consilens/core/database/adpter/AbstractDatabaseAdapterTest.java b/consilens-core/src/test/java/com/consilens/core/database/adpter/AbstractDatabaseAdapterTest.java new file mode 100644 index 0000000..e4ca5f0 --- /dev/null +++ b/consilens-core/src/test/java/com/consilens/core/database/adpter/AbstractDatabaseAdapterTest.java @@ -0,0 +1,120 @@ +package com.consilens.core.database.adpter; + +import com.consilens.common.enums.ChecksumAlgorithm; +import com.consilens.connector.api.CapabilityProvider; +import com.consilens.connector.api.DatabaseDialect; +import com.consilens.connector.api.SqlQueryGenerator; +import com.consilens.connector.api.model.TablePath; +import com.consilens.connector.api.model.TableSchema; +import com.consilens.core.database.connection.ConnectionPool; +import com.consilens.core.segment.TableSegment; +import org.junit.jupiter.api.Test; + +import java.sql.Connection; +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.Optional; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertTrue; +import static org.mockito.ArgumentMatchers.anyLong; +import static org.mockito.ArgumentMatchers.anyString; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +class AbstractDatabaseAdapterTest { + + @Test + void shouldUseOrderedBoundaryQueryForCompositeKeysFromSqlRelation() { + SqlQueryGenerator queryGenerator = mock(SqlQueryGenerator.class); + CapabilityProvider capabilityProvider = mock(CapabilityProvider.class); + DatabaseDialect dialect = mock(DatabaseDialect.class); + ConnectionPool connectionPool = mock(ConnectionPool.class); + + when(dialect.getSqlQueryGenerator()).thenReturn(queryGenerator); + when(dialect.getCapabilityProvider()).thenReturn(capabilityProvider); + when(connectionPool.getConnectorType()).thenReturn("mysql"); + when(queryGenerator.getCountSQLFromSql(anyString(), eq("biz_date >= '2026-05-01'"))) + .thenReturn("COUNT_SQL"); + when(queryGenerator.getLimitClause(anyLong())).thenReturn("LIMIT 1"); + when(capabilityProvider.quote(anyString())).thenAnswer(invocation -> "`" + invocation.getArgument(0) + "`"); + + CapturingAdapter adapter = new CapturingAdapter(connectionPool, dialect); + TableSegment segment = TableSegment.builder() + .tablePath(TablePath.of("agg_sql")) + .relationSource(new TableSegment.RelationSource( + "SELECT biz_date, status FROM daily_order_summary", + "agg_sql")) + .keyColumns(List.of("biz_date", "status")) + .whereClause(Optional.of("biz_date >= '2026-05-01'")) + .build(); + + TableSegment.ChecksumResult result = adapter.countAndBounds(segment); + + assertEquals(List.of("2026-05-01", "active"), result.getMinKey()); + assertEquals(List.of("2026-05-01", "pending"), result.getMaxKey()); + assertTrue(adapter.executedSql.stream().anyMatch(sql -> + sql.contains("FROM (SELECT biz_date, status FROM daily_order_summary) consilens_sql_source") + && sql.contains("ORDER BY `biz_date` ASC, `status` ASC LIMIT 1"))); + assertTrue(adapter.executedSql.stream().anyMatch(sql -> + sql.contains("FROM (SELECT biz_date, status FROM daily_order_summary) consilens_sql_source") + && sql.contains("ORDER BY `biz_date` DESC, `status` DESC LIMIT 1"))); + } + + private static final class CapturingAdapter extends AbstractDatabaseAdapter { + + private final List executedSql = new ArrayList<>(); + + private CapturingAdapter(ConnectionPool connectionPool, DatabaseDialect dialect) { + super("capturing", connectionPool, dialect, ChecksumAlgorithm.CONCAT); + } + + @Override + public TableSchema getTableSchema(List tablePath) { + throw new UnsupportedOperationException("Schema lookup not needed for this test"); + } + + @Override + public long count(TableSegment segment) { + return 4L; + } + + @Override + public List query(String sql, Class resultType) { + executedSql.add(sql); + @SuppressWarnings("unchecked") + List result = (List) java.util.Collections.singletonList( + sql.contains("ASC") + ? new Object[]{"2026-05-01", "active"} + : new Object[]{"2026-05-01", "pending"}); + return result; + } + + @Override + public List query(String sql, RowMapper rowMapper) { + executedSql.add(sql); + @SuppressWarnings("unchecked") + List result = (List) List.of(Map.of("row_count", 4L)); + return result; + } + + @Override + public List query(String sql, RowMapper rowMapper, Object... parameters) { + return query(sql, rowMapper); + } + + @Override + public List query(String sql, Class resultType, Object... parameters) { + return query(sql, resultType); + } + + @Override + public Connection getConnection() throws SQLException { + throw new SQLException("Not used in this test"); + } + } +} diff --git a/consilens-core/src/test/java/com/consilens/core/segment/TableSegmentTest.java b/consilens-core/src/test/java/com/consilens/core/segment/TableSegmentTest.java index 2315675..726e3b6 100644 --- a/consilens-core/src/test/java/com/consilens/core/segment/TableSegmentTest.java +++ b/consilens-core/src/test/java/com/consilens/core/segment/TableSegmentTest.java @@ -183,6 +183,43 @@ public void testBuildWhereClause() { assertTrue(whereClause.contains("status = 'active'")); } + @Test + public void testBuildWhereClauseCanIncludeUpperBound() { + TableSegment segment = TableSegment.builder() + .tablePath(TablePath.of("test_table")) + .keyColumns(Arrays.asList("biz_date", "status")) + .minKey(Optional.of(Arrays.asList("2026-05-01", "active"))) + .maxKey(Optional.of(Arrays.asList("2026-05-01", "pending"))) + .upperBoundInclusive(true) + .build(); + + String whereClause = segment.buildWhereClause(); + + assertTrue(whereClause.contains("(biz_date > '2026-05-01' OR (biz_date = '2026-05-01' AND status >= 'active'))")); + assertTrue(whereClause.contains("(biz_date < '2026-05-01' OR (biz_date = '2026-05-01' AND status <= 'pending'))")); + } + + @Test + public void testFinalSegmentKeepsInclusiveUpperBoundOnlyOnTailSegment() { + TableSegment segment = TableSegment.builder() + .tablePath(TablePath.of("test_table")) + .keyColumns(Arrays.asList("id")) + .minKey(Optional.of(Arrays.asList(0))) + .maxKey(Optional.of(Arrays.asList(100))) + .upperBoundInclusive(true) + .build(); + + List segments = segment.segmentByCheckpoints(Arrays.asList( + Arrays.asList(25), + Arrays.asList(50), + Arrays.asList(75))); + + assertFalse(segments.get(0).isUpperBoundInclusive()); + assertFalse(segments.get(1).isUpperBoundInclusive()); + assertFalse(segments.get(2).isUpperBoundInclusive()); + assertTrue(segments.get(3).isUpperBoundInclusive()); + } + @Test public void testBuildWhereClauseRejectsUnsafeCustomClause() { TableSegment segment = TableSegment.builder() diff --git a/consilens-performance/README.md b/consilens-performance/README.md new file mode 100644 index 0000000..12be667 --- /dev/null +++ b/consilens-performance/README.md @@ -0,0 +1,51 @@ +# consilens-performance + +`consilens-performance` provides lightweight utilities for measuring Consilens +execution behavior in development and release validation. + +The module is intended to be used as a library from tests, smoke checks, or +internal benchmark harnesses. It does not require a live database for synthetic +workloads. + +## Core APIs + +- `PerformanceTestRunner`: runs warmup and measured iterations, then returns a + `PerformanceTestResult`. +- `PerformanceCollector`: records latency, throughput, errors, data volume, JVM + resource usage, and thread pool statistics. +- `AdaptiveThreadPoolExecutor`: bounded adaptive executor with task metrics. +- `PerformanceReportGenerator`: writes Markdown and HTML reports. + +## Minimal Example + +```java +PerformanceTestConfig config = PerformanceTestConfig.builder() + .testName("smoke") + .warmupIterations(1) + .testIterations(10) + .concurrencyLevel(2) + .monitoringIntervalMs(100) + .build(); + +PerformanceTestRunner runner = new PerformanceTestRunner(); +try { + PerformanceTestResult result = runner.runTest(config, () -> + PerformanceTestRunner.TestResult.success(1_000, 8_192, 0)); + + new PerformanceReportGenerator() + .generateMarkdownReport(result, "target/performance/smoke.md"); +} finally { + runner.shutdown(); +} +``` + +## Validation + +Run the module tests: + +```bash +./mvnw -pl consilens-performance -am test +``` + +The tests cover configuration validation, smoke execution, failure collection, +executor shutdown behavior, and report generation. diff --git a/consilens-performance/pom.xml b/consilens-performance/pom.xml index bbf15e2..0f0f676 100644 --- a/consilens-performance/pom.xml +++ b/consilens-performance/pom.xml @@ -25,7 +25,6 @@ org.slf4j slf4j-api - provided @@ -52,6 +51,13 @@ micrometer-observation provided + + + org.junit.jupiter + junit-jupiter + ${junit.version} + test + @@ -64,6 +70,11 @@ 11 + + + org.apache.maven.plugins + maven-surefire-plugin + diff --git a/consilens-performance/src/main/java/com/consilens/performance/AdaptiveThreadPoolExecutor.java b/consilens-performance/src/main/java/com/consilens/performance/AdaptiveThreadPoolExecutor.java index a45b249..e7e1c2e 100644 --- a/consilens-performance/src/main/java/com/consilens/performance/AdaptiveThreadPoolExecutor.java +++ b/consilens-performance/src/main/java/com/consilens/performance/AdaptiveThreadPoolExecutor.java @@ -14,6 +14,7 @@ public class AdaptiveThreadPoolExecutor { private final ThreadPoolExecutor executor; + private final ScheduledExecutorService monitorExecutor; private final AtomicInteger activeTasks = new AtomicInteger(0); private final AtomicInteger submittedTasks = new AtomicInteger(0); private final AtomicInteger completedTasks = new AtomicInteger(0); @@ -31,11 +32,13 @@ public class AdaptiveThreadPoolExecutor { private volatile long lastOptimizationCheck; public AdaptiveThreadPoolExecutor(int minPoolSize, int maxPoolSize, long keepAliveTime, TimeUnit timeUnit) { + validateConfiguration(minPoolSize, maxPoolSize, keepAliveTime, timeUnit); this.minPoolSize = minPoolSize; this.maxPoolSize = maxPoolSize; this.keepAliveTime = keepAliveTime; this.timeUnit = timeUnit; this.workQueue = new LinkedBlockingQueue<>(); + this.monitorExecutor = createMonitorExecutor(); this.executor = createThreadPool(); this.lastAdjustmentTime = System.currentTimeMillis(); @@ -61,7 +64,7 @@ public AdaptiveThreadPoolExecutor() { */ private ThreadPoolExecutor createThreadPool() { ThreadFactory threadFactory = new AdaptiveThreadFactory(); - RejectedExecutionHandler rejectedHandler = new AdaptiveRejectedExecutionHandler(); + RejectedExecutionHandler rejectedHandler = new AdaptiveRejectedExecutionHandler(rejectedTasks); ThreadPoolExecutor executor = new ThreadPoolExecutor( minPoolSize, @@ -79,10 +82,42 @@ private ThreadPoolExecutor createThreadPool() { return executor; } + private ScheduledExecutorService createMonitorExecutor() { + return Executors.newSingleThreadScheduledExecutor(r -> { + Thread t = new Thread(r, "ThreadPoolMonitor"); + t.setDaemon(true); + return t; + }); + } + + private void validateConfiguration(int minPoolSize, int maxPoolSize, long keepAliveTime, TimeUnit timeUnit) { + if (minPoolSize <= 0) { + throw new IllegalArgumentException("minPoolSize must be positive"); + } + if (maxPoolSize < minPoolSize) { + throw new IllegalArgumentException("maxPoolSize must be greater than or equal to minPoolSize"); + } + if (keepAliveTime < 0) { + throw new IllegalArgumentException("keepAliveTime cannot be negative"); + } + if (timeUnit == null) { + throw new IllegalArgumentException("timeUnit cannot be null"); + } + } + /** * Submit task with performance tracking. */ public CompletableFuture submit(Callable task) { + if (task == null) { + throw new IllegalArgumentException("task cannot be null"); + } + if (executor.isShutdown()) { + rejectedTasks.incrementAndGet(); + CompletableFuture rejected = new CompletableFuture<>(); + rejected.completeExceptionally(new RejectedExecutionException("Executor has been shut down")); + return rejected; + } submittedTasks.incrementAndGet(); activeTasks.incrementAndGet(); @@ -125,6 +160,12 @@ public CompletableFuture submit(Runnable task) { * Execute task with timeout. */ public CompletableFuture submitWithTimeout(Callable task, long timeout, TimeUnit unit) { + if (timeout <= 0) { + throw new IllegalArgumentException("timeout must be positive"); + } + if (unit == null) { + throw new IllegalArgumentException("unit cannot be null"); + } CompletableFuture future = submit(task); return future.orTimeout(timeout, unit); } @@ -176,13 +217,7 @@ private void considerPoolSizeAdjustment() { * Start performance monitoring. */ private void startPerformanceMonitoring() { - ScheduledExecutorService monitor = Executors.newSingleThreadScheduledExecutor(r -> { - Thread t = new Thread(r, "ThreadPoolMonitor"); - t.setDaemon(true); - return t; - }); - - monitor.scheduleAtFixedRate(this::logPerformanceMetrics, 1, 1, TimeUnit.MINUTES); + monitorExecutor.scheduleAtFixedRate(this::logPerformanceMetrics, 1, 1, TimeUnit.MINUTES); } /** @@ -221,15 +256,16 @@ private void checkPerformanceHealth(int poolSize, int activeThreads, int queueSi // High average execution time if (avgExecutionMs > 1000) { // More than 1 second - log.warn("High average execution time ({:.2f}ms). Check for blocking operations.", avgExecutionMs); + log.warn("High average execution time ({}ms). Check for blocking operations.", + String.format("%.2f", avgExecutionMs)); } // High rejection rate long rejected = rejectedTasks.get(); long submitted = submittedTasks.get(); if (submitted > 100 && (double) rejected / submitted > 0.01) { // More than 1% rejection rate - log.warn("High rejection rate ({}/{} = {:.2f}%). Consider increasing max pool size.", - rejected, submitted, (double) rejected / submitted * 100); + log.warn("High rejection rate ({}/{} = {}%). Consider increasing max pool size.", + rejected, submitted, String.format("%.2f", (double) rejected / submitted * 100)); } } @@ -301,6 +337,7 @@ public String getPerformanceRecommendations() { */ public void shutdown() { log.info("Shutting down adaptive thread pool executor"); + monitorExecutor.shutdownNow(); executor.shutdown(); try { if (!executor.awaitTermination(30, TimeUnit.SECONDS)) { @@ -313,6 +350,10 @@ public void shutdown() { } } + public boolean isShutdown() { + return executor.isShutdown() && monitorExecutor.isShutdown(); + } + /** * Get the underlying executor. */ @@ -339,11 +380,18 @@ public Thread newThread(Runnable r) { * Custom rejected execution handler. */ private static class AdaptiveRejectedExecutionHandler implements RejectedExecutionHandler { - private final AtomicInteger rejectedCount = new AtomicInteger(0); + private final AtomicInteger rejectedCount; + + private AdaptiveRejectedExecutionHandler(AtomicInteger rejectedCount) { + this.rejectedCount = rejectedCount; + } @Override public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) { int count = rejectedCount.incrementAndGet(); + if (executor.isShutdown()) { + throw new RejectedExecutionException("Task rejected because executor is shut down"); + } log.warn("Task rejected (count: {}). Pool size: {}, Queue size: {}, Active: {}", count, executor.getPoolSize(), executor.getQueue().size(), executor.getActiveCount()); @@ -406,4 +454,4 @@ public String getSummary() { ); } } -} \ No newline at end of file +} diff --git a/consilens-performance/src/main/java/com/consilens/performance/MemoryMonitor.java b/consilens-performance/src/main/java/com/consilens/performance/MemoryMonitor.java index 30c5fea..c44914a 100644 --- a/consilens-performance/src/main/java/com/consilens/performance/MemoryMonitor.java +++ b/consilens-performance/src/main/java/com/consilens/performance/MemoryMonitor.java @@ -37,6 +37,9 @@ public class MemoryMonitor { private volatile boolean memoryPressure; public MemoryMonitor(double memoryThreshold, boolean enableGCSuggestion) { + if (memoryThreshold <= 0.0 || memoryThreshold > 1.0) { + throw new IllegalArgumentException("Memory threshold must be in (0, 1]"); + } this.runtime = Runtime.getRuntime(); this.memoryMXBean = ManagementFactory.getMemoryMXBean(); this.memoryThreshold = memoryThreshold; @@ -99,7 +102,8 @@ private void updateMemoryStats() { MemoryUsage heapUsage = memoryMXBean.getHeapMemoryUsage(); currentMemoryUsed = heapUsage.getUsed(); maxMemoryUsed = Math.max(maxMemoryUsed, currentMemoryUsed); - memoryUtilization = (double) currentMemoryUsed / heapUsage.getMax(); + long heapMax = heapUsage.getMax() > 0 ? heapUsage.getMax() : runtime.maxMemory(); + memoryUtilization = heapMax > 0 ? (double) currentMemoryUsed / heapMax : 0.0; memoryPressure = memoryUtilization > memoryThreshold; } @@ -161,10 +165,12 @@ private void logMemoryStats() { MemoryUsage nonHeapUsage = memoryMXBean.getNonHeapMemoryUsage(); if (log.isDebugEnabled()) { - log.debug("Memory Stats - Heap: {}/{}MB ({:.1f}%), Non-Heap: {}/{}MB", + long heapMax = heapUsage.getMax() > 0 ? heapUsage.getMax() : runtime.maxMemory(); + double heapPercent = heapMax > 0 ? (double) heapUsage.getUsed() / heapMax * 100 : 0.0; + log.debug("Memory Stats - Heap: {}/{}MB ({}%), Non-Heap: {}/{}MB", heapUsage.getUsed() / 1024 / 1024, - heapUsage.getMax() / 1024 / 1024, - (double) heapUsage.getUsed() / heapUsage.getMax() * 100, + heapMax / 1024 / 1024, + String.format("%.1f", heapPercent), nonHeapUsage.getUsed() / 1024 / 1024, nonHeapUsage.getMax() / 1024 / 1024); } @@ -229,6 +235,7 @@ public MemoryInfo getMemoryInfo() { * Get memory recommendations. */ public String getMemoryRecommendations() { + updateMemoryStats(); StringBuilder recommendations = new StringBuilder(); if (memoryUtilization > 0.9) { @@ -259,6 +266,7 @@ public String getMemoryRecommendations() { * Check if JVM needs more memory. */ public boolean needsMoreMemory() { + updateMemoryStats(); return memoryUtilization > 0.85 || (gcCount.get() > 10 && getAverageGCTime() > 100); } @@ -326,4 +334,4 @@ public String getSummary() { ); } } -} \ No newline at end of file +} diff --git a/consilens-performance/src/main/java/com/consilens/performance/PerformanceCollector.java b/consilens-performance/src/main/java/com/consilens/performance/PerformanceCollector.java index c21130a..128ea9d 100644 --- a/consilens-performance/src/main/java/com/consilens/performance/PerformanceCollector.java +++ b/consilens-performance/src/main/java/com/consilens/performance/PerformanceCollector.java @@ -44,6 +44,7 @@ public class PerformanceCollector { // Test metadata private String testName; private final Map testParameters = new ConcurrentHashMap<>(); + private long monitoringIntervalMs = 100L; public PerformanceCollector() { this.resourceMonitor = new ResourceMonitor(); @@ -71,6 +72,16 @@ public void addTestParameter(String key, Object value) { this.testParameters.put(key, value); } + /** + * Set resource monitoring interval. + */ + public void setMonitoringIntervalMs(long monitoringIntervalMs) { + if (monitoringIntervalMs <= 0) { + throw new IllegalArgumentException("Monitoring interval must be positive"); + } + this.monitoringIntervalMs = monitoringIntervalMs; + } + /** * Start collecting performance metrics. */ @@ -91,8 +102,7 @@ public void startCollection() { failedQueries.set(0); queryLatencies.clear(); - // Start resource monitoring (sample every 100ms) - resourceMonitor.startMonitoring(100); + resourceMonitor.startMonitoring(monitoringIntervalMs); log.info("Performance metrics collection started"); } @@ -162,6 +172,9 @@ public void recordQuery(long durationMs, boolean success) { * Collect all metrics and build PerformanceMetrics object. */ public PerformanceMetrics collectMetrics() { + if (startTime == null) { + throw new IllegalStateException("Performance collection has not been started"); + } if (endTime == null) { endTime = Instant.now(); } @@ -291,6 +304,8 @@ public void reset() { failedQueries.set(0); queryLatencies.clear(); testParameters.clear(); + startTime = null; + endTime = null; } /** diff --git a/consilens-performance/src/main/java/com/consilens/performance/PerformanceReportGenerator.java b/consilens-performance/src/main/java/com/consilens/performance/PerformanceReportGenerator.java index 19a063b..fb7de7b 100644 --- a/consilens-performance/src/main/java/com/consilens/performance/PerformanceReportGenerator.java +++ b/consilens-performance/src/main/java/com/consilens/performance/PerformanceReportGenerator.java @@ -24,6 +24,7 @@ public void generateMarkdownReport(PerformanceTestResult result, String outputPa try { String markdown = buildMarkdownReport(result); Path path = Paths.get(outputPath); + createParentDirectories(path); Files.writeString(path, markdown); log.info("Markdown report generated: {}", outputPath); } catch (IOException e) { @@ -39,6 +40,7 @@ public void generateHtmlReport(PerformanceTestResult result, String outputPath) try { String html = buildHtmlReport(result); Path path = Paths.get(outputPath); + createParentDirectories(path); Files.writeString(path, html); log.info("HTML report generated: {}", outputPath); } catch (IOException e) { @@ -51,6 +53,7 @@ public void generateHtmlReport(PerformanceTestResult result, String outputPath) * Build Markdown report content. */ private String buildMarkdownReport(PerformanceTestResult result) { + validateReportInput(result); StringBuilder md = new StringBuilder(); PerformanceMetrics metrics = result.getMetrics(); @@ -205,6 +208,25 @@ private String buildMarkdownReport(PerformanceTestResult result) { return md.toString(); } + private void createParentDirectories(Path path) throws IOException { + Path parent = path.toAbsolutePath().getParent(); + if (parent != null) { + Files.createDirectories(parent); + } + } + + private void validateReportInput(PerformanceTestResult result) { + if (result == null) { + throw new IllegalArgumentException("Performance test result cannot be null"); + } + if (result.getConfig() == null) { + throw new IllegalArgumentException("Performance test result config cannot be null"); + } + if (result.isSuccess() && result.getMetrics() == null) { + throw new IllegalArgumentException("Successful performance test result must include metrics"); + } + } + /** * Generate performance analysis. */ @@ -323,6 +345,7 @@ private String generateRecommendations(PerformanceMetrics metrics) { * Build HTML report content. */ private String buildHtmlReport(PerformanceTestResult result) { + validateReportInput(result); StringBuilder html = new StringBuilder(); PerformanceMetrics metrics = result.getMetrics(); diff --git a/consilens-performance/src/main/java/com/consilens/performance/PerformanceTestConfig.java b/consilens-performance/src/main/java/com/consilens/performance/PerformanceTestConfig.java index 2f80b6a..b74be1a 100644 --- a/consilens-performance/src/main/java/com/consilens/performance/PerformanceTestConfig.java +++ b/consilens-performance/src/main/java/com/consilens/performance/PerformanceTestConfig.java @@ -124,6 +124,10 @@ public void validate() { throw new IllegalArgumentException("Test name cannot be null or empty"); } + if (loadPattern == null) { + throw new IllegalArgumentException("Load pattern cannot be null"); + } + if (warmupIterations < 0) { throw new IllegalArgumentException("Warmup iterations cannot be negative"); } @@ -140,6 +144,14 @@ public void validate() { throw new IllegalArgumentException("Monitoring interval must be positive"); } + if (testDuration != null && (testDuration.isZero() || testDuration.isNegative())) { + throw new IllegalArgumentException("Test duration must be positive"); + } + + if (testParameters == null) { + throw new IllegalArgumentException("Test parameters cannot be null"); + } + if (databaseConfig != null) { validateDatabaseConfig(databaseConfig); } @@ -164,5 +176,31 @@ private void validateDatabaseConfig(DatabaseConfig config) { if (config.getQueryTimeoutSeconds() <= 0) { throw new IllegalArgumentException("Query timeout must be positive"); } + + validateTableConfig("sourceTable", config.getSourceTable()); + validateTableConfig("targetTable", config.getTargetTable()); + } + + /** + * Validate optional table configuration. + */ + private void validateTableConfig(String name, TableConfig config) { + if (config == null) { + return; + } + if (config.getTableName() == null || config.getTableName().trim().isEmpty()) { + throw new IllegalArgumentException(name + ".tableName cannot be null or empty"); + } + if (config.getKeyColumns() == null || config.getKeyColumns().length == 0) { + throw new IllegalArgumentException(name + ".keyColumns cannot be empty"); + } + for (String keyColumn : config.getKeyColumns()) { + if (keyColumn == null || keyColumn.trim().isEmpty()) { + throw new IllegalArgumentException(name + ".keyColumns cannot contain blank values"); + } + } + if (config.getRowLimit() < 0) { + throw new IllegalArgumentException(name + ".rowLimit cannot be negative"); + } } } diff --git a/consilens-performance/src/main/java/com/consilens/performance/PerformanceTestRunner.java b/consilens-performance/src/main/java/com/consilens/performance/PerformanceTestRunner.java index 08205d6..4c24233 100644 --- a/consilens-performance/src/main/java/com/consilens/performance/PerformanceTestRunner.java +++ b/consilens-performance/src/main/java/com/consilens/performance/PerformanceTestRunner.java @@ -7,6 +7,7 @@ import java.util.List; import java.util.concurrent.Callable; import java.util.concurrent.CountDownLatch; +import java.util.concurrent.Semaphore; import java.util.concurrent.TimeUnit; /** @@ -40,6 +41,7 @@ public PerformanceTestResult runTest(PerformanceTestConfig config, Callable results = executeTest(config, testLogic); - - collector.stopCollection(); + List results; + boolean collecting = false; + try { + collector.startCollection(); + collecting = true; + results = executeTest(config, testLogic); + } finally { + if (collecting) { + collector.stopCollection(); + } + } // Collect metrics PerformanceMetrics metrics = collector.collectMetrics(); @@ -64,10 +72,11 @@ public PerformanceTestResult runTest(PerformanceTestConfig config, Callable executeStep(int totalIterations, int maxConcurrency, Ca List results = new ArrayList<>(); // Step interval: 25%, 50%, 75%, 100% of max concurrency - int[] concurrencySteps = {maxConcurrency / 4, maxConcurrency / 2, maxConcurrency * 3 / 4, maxConcurrency}; + int[] concurrencySteps = { + Math.max(1, maxConcurrency / 4), + Math.max(1, maxConcurrency / 2), + Math.max(1, maxConcurrency * 3 / 4), + maxConcurrency + }; int iterationsPerStep = totalIterations / concurrencySteps.length; for (int concurrency : concurrencySteps) { @@ -299,7 +313,8 @@ private List executeSequential(int iterations, Callable results.add(result); } catch (Exception e) { - log.error("Test iteration {} failed", i, e); + log.warn("Test iteration {} failed: {}", i, e.getMessage()); + log.debug("Test iteration failure details", e); collector.recordError(); } } @@ -314,11 +329,19 @@ private List executeConcurrent(int iterations, int concurrency, Call throws InterruptedException { List results = new ArrayList<>(); + if (iterations <= 0) { + return results; + } + int effectiveConcurrency = Math.max(1, concurrency); + Semaphore permits = new Semaphore(effectiveConcurrency); CountDownLatch latch = new CountDownLatch(iterations); for (int i = 0; i < iterations; i++) { threadPool.submit(() -> { + boolean permitAcquired = false; try { + permits.acquire(); + permitAcquired = true; long startTime = System.currentTimeMillis(); TestResult result = testLogic.call(); long duration = System.currentTimeMillis() - startTime; @@ -337,11 +360,15 @@ private List executeConcurrent(int iterations, int concurrency, Call return result; } catch (Exception e) { - log.error("Concurrent test iteration failed", e); + log.warn("Concurrent test iteration failed: {}", e.getMessage()); + log.debug("Concurrent test iteration failure details", e); collector.recordError(); return null; } finally { + if (permitAcquired) { + permits.release(); + } latch.countDown(); } }); @@ -350,7 +377,8 @@ private List executeConcurrent(int iterations, int concurrency, Call // Wait for all tasks to complete boolean completed = latch.await(10, TimeUnit.MINUTES); if (!completed) { - log.warn("Test did not complete within timeout"); + collector.recordError(); + throw new IllegalStateException("Performance test did not complete within timeout"); } return results; @@ -368,11 +396,30 @@ public void shutdown() { * Test result holder. */ @Data + @lombok.Builder + @lombok.NoArgsConstructor + @lombok.AllArgsConstructor public static class TestResult { private long rowsProcessed; private long bytesProcessed; private long differencesFound; private boolean success; private String errorMessage; + + public static TestResult success(long rowsProcessed, long bytesProcessed, long differencesFound) { + return TestResult.builder() + .rowsProcessed(rowsProcessed) + .bytesProcessed(bytesProcessed) + .differencesFound(differencesFound) + .success(true) + .build(); + } + + public static TestResult failure(String errorMessage) { + return TestResult.builder() + .success(false) + .errorMessage(errorMessage) + .build(); + } } } diff --git a/consilens-performance/src/main/java/com/consilens/performance/ResourceMonitor.java b/consilens-performance/src/main/java/com/consilens/performance/ResourceMonitor.java index 5a95412..d26c2e5 100644 --- a/consilens-performance/src/main/java/com/consilens/performance/ResourceMonitor.java +++ b/consilens-performance/src/main/java/com/consilens/performance/ResourceMonitor.java @@ -6,6 +6,7 @@ import java.io.InputStreamReader; import java.lang.management.*; import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledFuture; import java.util.concurrent.ScheduledExecutorService; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicLong; @@ -24,6 +25,7 @@ public class ResourceMonitor { private final ScheduledExecutorService scheduler; private volatile boolean monitoring = false; + private volatile ScheduledFuture monitoringTask; // CPU metrics private final AtomicLong cpuSampleCount = new AtomicLong(0); @@ -68,6 +70,9 @@ public ResourceMonitor() { * Start monitoring resources at the specified interval. */ public void startMonitoring(long intervalMs) { + if (intervalMs <= 0) { + throw new IllegalArgumentException("Monitoring interval must be positive"); + } if (monitoring) { log.warn("Resource monitoring is already running"); return; @@ -77,9 +82,11 @@ public void startMonitoring(long intervalMs) { resetMetrics(); captureInitialGcMetrics(); - scheduler.scheduleAtFixedRate(() -> { + monitoringTask = scheduler.scheduleAtFixedRate(() -> { try { - collectMetrics(); + if (monitoring) { + collectMetrics(); + } } catch (Exception e) { log.error("Error collecting resource metrics", e); } @@ -93,6 +100,11 @@ public void startMonitoring(long intervalMs) { */ public void stopMonitoring() { monitoring = false; + ScheduledFuture task = monitoringTask; + if (task != null) { + task.cancel(false); + monitoringTask = null; + } log.info("Resource monitoring stopped"); } diff --git a/consilens-performance/src/test/java/com/consilens/performance/AdaptiveThreadPoolExecutorTest.java b/consilens-performance/src/test/java/com/consilens/performance/AdaptiveThreadPoolExecutorTest.java new file mode 100644 index 0000000..1d10367 --- /dev/null +++ b/consilens-performance/src/test/java/com/consilens/performance/AdaptiveThreadPoolExecutorTest.java @@ -0,0 +1,46 @@ +package com.consilens.performance; + +import org.junit.jupiter.api.Test; + +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.TimeUnit; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class AdaptiveThreadPoolExecutorTest { + + @Test + void shouldValidatePoolConfiguration() { + assertThrows(IllegalArgumentException.class, + () -> new AdaptiveThreadPoolExecutor(0, 1, 1, TimeUnit.SECONDS)); + assertThrows(IllegalArgumentException.class, + () -> new AdaptiveThreadPoolExecutor(2, 1, 1, TimeUnit.SECONDS)); + assertThrows(IllegalArgumentException.class, + () -> new AdaptiveThreadPoolExecutor(1, 1, -1, TimeUnit.SECONDS)); + assertThrows(IllegalArgumentException.class, + () -> new AdaptiveThreadPoolExecutor(1, 1, 1, null)); + } + + @Test + void shouldSubmitTasksCollectStatsAndRejectAfterShutdown() throws Exception { + AdaptiveThreadPoolExecutor executor = new AdaptiveThreadPoolExecutor(1, 2, 1, TimeUnit.SECONDS); + try { + CompletableFuture future = executor.submit(() -> "ok"); + + assertEquals("ok", future.get(5, TimeUnit.SECONDS)); + AdaptiveThreadPoolExecutor.ThreadPoolStats stats = executor.getStats(); + assertEquals(1, stats.getSubmittedTasks()); + assertEquals(1, stats.getCompletedTasks()); + } finally { + executor.shutdown(); + } + + assertTrue(executor.isShutdown()); + CompletableFuture rejected = executor.submit(() -> "late"); + assertThrows(ExecutionException.class, rejected::get); + assertEquals(1, executor.getStats().getRejectedTasks()); + } +} diff --git a/consilens-performance/src/test/java/com/consilens/performance/PerformanceReportGeneratorTest.java b/consilens-performance/src/test/java/com/consilens/performance/PerformanceReportGeneratorTest.java new file mode 100644 index 0000000..da3d123 --- /dev/null +++ b/consilens-performance/src/test/java/com/consilens/performance/PerformanceReportGeneratorTest.java @@ -0,0 +1,55 @@ +package com.consilens.performance; + +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; + +import java.nio.file.Files; +import java.nio.file.Path; +import java.time.Instant; + +import static org.junit.jupiter.api.Assertions.assertTrue; + +class PerformanceReportGeneratorTest { + + @TempDir + Path tempDir; + + @Test + void shouldCreateParentDirectoriesAndWriteReports() throws Exception { + PerformanceTestConfig config = PerformanceTestConfig.builder() + .testName("report-smoke") + .warmupIterations(0) + .testIterations(1) + .build(); + PerformanceMetrics metrics = PerformanceMetrics.builder() + .testName("report-smoke") + .startTime(Instant.now()) + .endTime(Instant.now()) + .totalDurationMs(100) + .totalRowsProcessed(1000) + .totalBytesProcessed(1024 * 1024) + .differencesFound(2) + .build(); + metrics.getLatency().addLatency(10); + metrics.getLatency().addLatency(20); + metrics.getLatency().calculatePercentiles(); + PerformanceTestResult result = PerformanceTestResult.builder() + .config(config) + .metrics(metrics) + .success(true) + .build(); + PerformanceReportGenerator generator = new PerformanceReportGenerator(); + Path markdown = tempDir.resolve("nested/reports/performance.md"); + Path html = tempDir.resolve("nested/reports/performance.html"); + + generator.generateMarkdownReport(result, markdown.toString()); + generator.generateHtmlReport(result, html.toString()); + + String markdownContent = Files.readString(markdown); + String htmlContent = Files.readString(html); + assertTrue(markdownContent.contains("report-smoke")); + assertTrue(markdownContent.contains("1,000")); + assertTrue(htmlContent.contains("report-smoke")); + assertTrue(htmlContent.contains(" PerformanceTestConfig.builder() + .testName(" ") + .build() + .validate()); + assertThrows(IllegalArgumentException.class, () -> PerformanceTestConfig.builder() + .testName("bad") + .loadPattern(null) + .build() + .validate()); + assertThrows(IllegalArgumentException.class, () -> PerformanceTestConfig.builder() + .testName("bad") + .testIterations(0) + .build() + .validate()); + assertThrows(IllegalArgumentException.class, () -> PerformanceTestConfig.builder() + .testName("bad") + .concurrencyLevel(0) + .build() + .validate()); + assertThrows(IllegalArgumentException.class, () -> PerformanceTestConfig.builder() + .testName("bad") + .monitoringIntervalMs(0) + .build() + .validate()); + assertThrows(IllegalArgumentException.class, () -> PerformanceTestConfig.builder() + .testName("bad") + .testDuration(Duration.ZERO) + .build() + .validate()); + assertThrows(IllegalArgumentException.class, () -> PerformanceTestConfig.builder() + .testName("bad") + .testParameters(null) + .build() + .validate()); + } + + @Test + void shouldValidateDatabaseAndTableConfigWhenPresent() { + PerformanceTestConfig valid = PerformanceTestConfig.builder() + .testName("database") + .databaseConfig(PerformanceTestConfig.DatabaseConfig.builder() + .jdbcUrl("jdbc:h2:mem:perf") + .username("sa") + .sourceTable(PerformanceTestConfig.TableConfig.builder() + .tableName("source_orders") + .keyColumns(new String[]{"id"}) + .rowLimit(100) + .build()) + .targetTable(PerformanceTestConfig.TableConfig.builder() + .tableName("target_orders") + .keyColumns(new String[]{"id"}) + .build()) + .build()) + .testParameters(new HashMap<>()) + .build(); + + assertDoesNotThrow(valid::validate); + + assertThrows(IllegalArgumentException.class, () -> PerformanceTestConfig.builder() + .testName("database") + .databaseConfig(PerformanceTestConfig.DatabaseConfig.builder() + .jdbcUrl("jdbc:h2:mem:perf") + .username("sa") + .sourceTable(PerformanceTestConfig.TableConfig.builder() + .tableName("orders") + .keyColumns(new String[]{" "}) + .build()) + .build()) + .build() + .validate()); + + assertThrows(IllegalArgumentException.class, () -> PerformanceTestConfig.builder() + .testName("database") + .databaseConfig(PerformanceTestConfig.DatabaseConfig.builder() + .jdbcUrl("jdbc:h2:mem:perf") + .username("sa") + .sourceTable(PerformanceTestConfig.TableConfig.builder() + .tableName("orders") + .keyColumns(new String[]{"id"}) + .rowLimit(-1) + .build()) + .build()) + .build() + .validate()); + } +} diff --git a/consilens-performance/src/test/java/com/consilens/performance/PerformanceTestRunnerTest.java b/consilens-performance/src/test/java/com/consilens/performance/PerformanceTestRunnerTest.java new file mode 100644 index 0000000..7afb251 --- /dev/null +++ b/consilens-performance/src/test/java/com/consilens/performance/PerformanceTestRunnerTest.java @@ -0,0 +1,70 @@ +package com.consilens.performance; + +import org.junit.jupiter.api.Test; + +import java.util.concurrent.atomic.AtomicInteger; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertNotNull; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class PerformanceTestRunnerTest { + + @Test + void shouldRunSmokeTestAndCollectMetrics() { + PerformanceTestRunner runner = new PerformanceTestRunner(); + try { + AtomicInteger calls = new AtomicInteger(); + PerformanceTestConfig config = PerformanceTestConfig.builder() + .testName("runner-smoke") + .warmupIterations(1) + .testIterations(5) + .concurrencyLevel(2) + .monitoringIntervalMs(10) + .build(); + + PerformanceTestResult result = runner.runTest(config, () -> { + calls.incrementAndGet(); + return PerformanceTestRunner.TestResult.success(10, 2048, 1); + }); + + assertTrue(result.isSuccess(), result.getErrorMessage()); + assertNotNull(result.getMetrics()); + assertEquals(6, calls.get()); + assertEquals(5, result.getTestResults().size()); + assertEquals(50, result.getMetrics().getTotalRowsProcessed()); + assertEquals(10_240, result.getMetrics().getTotalBytesProcessed()); + assertEquals(5, result.getMetrics().getDifferencesFound()); + assertEquals(0, result.getMetrics().getErrorCount()); + assertEquals(5, result.getMetrics().getConcurrency().getTotalTasksSubmitted()); + } finally { + runner.shutdown(); + } + } + + @Test + void shouldReturnFailureWhenIterationsFail() { + PerformanceTestRunner runner = new PerformanceTestRunner(); + try { + PerformanceTestConfig config = PerformanceTestConfig.builder() + .testName("runner-failure") + .warmupIterations(0) + .testIterations(3) + .concurrencyLevel(1) + .monitoringIntervalMs(10) + .build(); + + PerformanceTestResult result = runner.runTest(config, () -> { + throw new IllegalStateException("boom"); + }); + + assertFalse(result.isSuccess()); + assertNotNull(result.getMetrics()); + assertEquals(3, result.getMetrics().getErrorCount()); + assertTrue(result.getErrorMessage().contains("3")); + } finally { + runner.shutdown(); + } + } +} diff --git a/consilens-sink/consilens-sink-api/src/main/java/com/consilens/sink/api/model/ResultConfig.java b/consilens-sink/consilens-sink-api/src/main/java/com/consilens/sink/api/model/ResultConfig.java index 1a7587b..b493e30 100644 --- a/consilens-sink/consilens-sink-api/src/main/java/com/consilens/sink/api/model/ResultConfig.java +++ b/consilens-sink/consilens-sink-api/src/main/java/com/consilens/sink/api/model/ResultConfig.java @@ -25,5 +25,5 @@ public class ResultConfig { private List sinks = new ArrayList<>(); @Builder.Default - private boolean failOnSinkError = false; + private boolean failOnSinkError = true; } diff --git a/consilens-sink/consilens-sink-plugins/consilens-sink-json/pom.xml b/consilens-sink/consilens-sink-plugins/consilens-sink-json/pom.xml index be7a2ec..d9529cd 100644 --- a/consilens-sink/consilens-sink-plugins/consilens-sink-json/pom.xml +++ b/consilens-sink/consilens-sink-plugins/consilens-sink-json/pom.xml @@ -37,5 +37,12 @@ org.apache.logging.log4j log4j-api + + + org.junit.jupiter + junit-jupiter + ${junit.version} + test + diff --git a/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/main/java/com/consilens/sink/json/JsonDiffRecordSink.java b/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/main/java/com/consilens/sink/json/JsonDiffRecordSink.java index c207ae7..3e7eafb 100644 --- a/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/main/java/com/consilens/sink/json/JsonDiffRecordSink.java +++ b/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/main/java/com/consilens/sink/json/JsonDiffRecordSink.java @@ -27,7 +27,7 @@ * *

Three output modes depending on {@code columns} and {@code mergeDefaults}: *

    - *
  • Default mode ({@code columns} empty): serializes {@link DiffRow} directly.
  • + *
  • Default mode ({@code columns} empty): writes stable JSON objects with primitive/list fields.
  • *
  • Full custom mode ({@code columns} non-empty, {@code mergeDefaults=false}): only the configured columns as a JSON object.
  • *
  • Merge mode ({@code columns} non-empty, {@code mergeDefaults=true}): default fields with value overrides, * plus extra columns appended after defaults.
  • @@ -38,7 +38,8 @@ public class JsonDiffRecordSink implements Sink { /** Default field names output in merge mode. */ private static final List DEFAULT_FIELDS = Arrays.asList( - "operation", "primaryKey", "sourceValues", "targetValues", "changedColumns1", "changedColumns2"); + "operation", "primaryKey", "sourceValues", "targetValues", + "columnNames1", "columnNames2", "changedColumns1", "changedColumns2"); private final List buffer = new ArrayList<>(); private ObjectMapper objectMapper; @@ -75,8 +76,18 @@ public void onDiffRecords(List rows, DiffContext context) { buffer.add(record); } } else { - buffer.addAll(rows); + for (DiffRow row : rows) { + buffer.add(buildDefaultRecord(row)); + } + } + } + + private LinkedHashMap buildDefaultRecord(DiffRow row) { + LinkedHashMap record = new LinkedHashMap<>(); + for (String field : DEFAULT_FIELDS) { + record.put(field, defaultValue(field, row)); } + return record; } private LinkedHashMap buildMergeRecord(DiffRow row, DiffContext context, @@ -107,6 +118,8 @@ private Object defaultValue(String field, DiffRow row) { case "primaryKey": return row.getPrimaryKeyString(); case "sourceValues": return row.getAllSourceValues(); case "targetValues": return row.getAllTargetValues(); + case "columnNames1": return row.getColumnNames1(); + case "columnNames2": return row.getColumnNames2(); case "changedColumns1": return row.getChangedColumns1(); case "changedColumns2": return row.getChangedColumns2(); default: return null; diff --git a/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/main/java/com/consilens/sink/json/JsonSinkConfig.java b/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/main/java/com/consilens/sink/json/JsonSinkConfig.java index ce5ae5f..836a3a0 100644 --- a/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/main/java/com/consilens/sink/json/JsonSinkConfig.java +++ b/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/main/java/com/consilens/sink/json/JsonSinkConfig.java @@ -11,7 +11,7 @@ * JSON format sink configuration. * *

    If {@code columns} is non-empty, each record is output as a JSON object with custom columns; - * otherwise serializes DiffRow / DiffResult directly (default behavior). + * otherwise writes stable JSON fields for each diff row. * *

      * result:
    diff --git a/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/test/java/com/consilens/sink/json/JsonDiffRecordSinkTest.java b/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/test/java/com/consilens/sink/json/JsonDiffRecordSinkTest.java
    new file mode 100644
    index 0000000..5b54926
    --- /dev/null
    +++ b/consilens-sink/consilens-sink-plugins/consilens-sink-json/src/test/java/com/consilens/sink/json/JsonDiffRecordSinkTest.java
    @@ -0,0 +1,134 @@
    +package com.consilens.sink.json;
    +
    +import com.consilens.core.diff.DiffRow;
    +import com.consilens.core.lifecycle.DiffContext;
    +import com.consilens.sink.api.model.ColumnMapping;
    +import com.consilens.sink.api.model.SinkConfig;
    +import com.fasterxml.jackson.databind.JsonNode;
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import org.junit.jupiter.api.Test;
    +import org.junit.jupiter.api.io.TempDir;
    +
    +import java.nio.file.Files;
    +import java.nio.file.Path;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import static org.junit.jupiter.api.Assertions.assertEquals;
    +import static org.junit.jupiter.api.Assertions.assertFalse;
    +import static org.junit.jupiter.api.Assertions.assertTrue;
    +
    +class JsonDiffRecordSinkTest {
    +
    +    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
    +
    +    @TempDir
    +    Path tempDir;
    +
    +    @Test
    +    void shouldWriteStableDiffRecordFieldsByDefault() throws Exception {
    +        Path output = tempDir.resolve("diff-records.json");
    +        write(output, Map.of("pretty", true), DiffRow.modified(
    +                List.of(1),
    +                List.of("Alice", "100.00"),
    +                List.of("Alicia", "101.00"),
    +                List.of("name", "amount"),
    +                List.of("name", "amount"),
    +                List.of("name", "amount"),
    +                List.of("name", "amount")));
    +
    +        JsonNode row = firstRow(output);
    +        assertEquals("mismatch", row.get("operation").asText());
    +        assertEquals("1", row.get("primaryKey").asText());
    +        assertTrue(row.get("sourceValues").isArray());
    +        assertEquals("Alice", row.get("sourceValues").get(0).asText());
    +        assertEquals("100.00", row.get("sourceValues").get(1).asText());
    +        assertEquals("Alicia", row.get("targetValues").get(0).asText());
    +        assertEquals("name", row.get("columnNames1").get(0).asText());
    +        assertEquals("amount", row.get("columnNames2").get(1).asText());
    +        assertEquals("name", row.get("changedColumns1").get(0).asText());
    +        assertEquals("amount", row.get("changedColumns2").get(1).asText());
    +    }
    +
    +    @Test
    +    void shouldWriteOnlyConfiguredColumnsInCustomMode() throws Exception {
    +        Path output = tempDir.resolve("custom-diff-records.json");
    +        Map properties = new LinkedHashMap<>();
    +        properties.put("columns", List.of(
    +                new ColumnMapping("taskId", "${taskId}", null, null),
    +                new ColumnMapping("operation", "${operation}", null, null),
    +                new ColumnMapping("srcName", "${src.name}", null, null),
    +                new ColumnMapping("tgtName", "${tgt.name}", null, null),
    +                new ColumnMapping("fallback", "${src.missing}", "n/a", null),
    +                new ColumnMapping("env", "production", null, null)));
    +
    +        write(output, properties, DiffRow.modified(
    +                List.of(1),
    +                List.of("Alice"),
    +                List.of("Alicia"),
    +                List.of("name"),
    +                List.of("name")));
    +
    +        JsonNode row = firstRow(output);
    +        assertEquals(6, row.size());
    +        assertEquals("task-1", row.get("taskId").asText());
    +        assertEquals("mismatch", row.get("operation").asText());
    +        assertEquals("Alice", row.get("srcName").asText());
    +        assertEquals("Alicia", row.get("tgtName").asText());
    +        assertEquals("n/a", row.get("fallback").asText());
    +        assertEquals("production", row.get("env").asText());
    +        assertFalse(row.has("primaryKey"));
    +    }
    +
    +    @Test
    +    void shouldMergeDefaultFieldsWithConfiguredOverrides() throws Exception {
    +        Path output = tempDir.resolve("merged-diff-records.json");
    +        Map properties = new LinkedHashMap<>();
    +        properties.put("mergeDefaults", true);
    +        properties.put("columns", List.of(
    +                new ColumnMapping("primaryKey", "pk-${primaryKey}", null, null),
    +                new ColumnMapping("taskId", "${taskId}", null, null)));
    +
    +        write(output, properties, DiffRow.modified(
    +                List.of(1),
    +                List.of("Alice"),
    +                List.of("Alicia"),
    +                List.of("name"),
    +                List.of("name"),
    +                List.of("name"),
    +                List.of("name")));
    +
    +        JsonNode row = firstRow(output);
    +        assertEquals("mismatch", row.get("operation").asText());
    +        assertEquals("pk-1", row.get("primaryKey").asText());
    +        assertEquals("Alice", row.get("sourceValues").get(0).asText());
    +        assertEquals("Alicia", row.get("targetValues").get(0).asText());
    +        assertEquals("name", row.get("changedColumns1").get(0).asText());
    +        assertEquals("task-1", row.get("taskId").asText());
    +    }
    +
    +    private void write(Path output, Map properties, DiffRow row) throws Exception {
    +        Map jsonProperties = new LinkedHashMap<>(properties);
    +        jsonProperties.put("path", output.toString());
    +
    +        SinkConfig config = new SinkConfig();
    +        config.setFormat("json");
    +        config.setType("diff-record");
    +        config.setProperties(OBJECT_MAPPER.writeValueAsString(jsonProperties));
    +
    +        DiffContext context = DiffContext.builder().taskId("task-1").build();
    +        JsonDiffRecordSink sink = new JsonDiffRecordSink();
    +        sink.open(config, context);
    +        sink.onDiffRecords(List.of(row), context);
    +        sink.close();
    +    }
    +
    +    private JsonNode firstRow(Path output) throws Exception {
    +        assertTrue(Files.exists(output));
    +        JsonNode root = OBJECT_MAPPER.readTree(Files.readString(output));
    +        assertTrue(root.isArray());
    +        assertEquals(1, root.size());
    +        return root.get(0);
    +    }
    +}
    diff --git a/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableColumnNames.java b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableColumnNames.java
    new file mode 100644
    index 0000000..c84d113
    --- /dev/null
    +++ b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableColumnNames.java
    @@ -0,0 +1,52 @@
    +package com.consilens.sink.table;
    +
    +import com.consilens.sink.api.model.ColumnMapping;
    +import com.consilens.connector.api.write.OutputColumnSpec;
    +
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Set;
    +
    +public final class TableColumnNames {
    +
    +    private TableColumnNames() {
    +    }
    +
    +    public static String sanitize(String column) {
    +        return column.replaceAll("[^a-zA-Z0-9_]", "_");
    +    }
    +
    +    public static void validateUniqueSanitizedColumns(List columns, String context) {
    +        if (columns == null || columns.isEmpty()) {
    +            return;
    +        }
    +        Set sanitizedNames = new HashSet<>();
    +        for (ColumnMapping column : columns) {
    +            if (column == null || column.getName() == null) {
    +                continue;
    +            }
    +            String sanitized = sanitize(column.getName()).toLowerCase();
    +            if (!sanitizedNames.add(sanitized)) {
    +                throw new IllegalArgumentException(context + " contains duplicate column name after sanitization: "
    +                        + column.getName() + " -> " + sanitize(column.getName()));
    +            }
    +        }
    +    }
    +
    +    public static void validateUniqueOutputColumns(List columns, String context) {
    +        if (columns == null || columns.isEmpty()) {
    +            return;
    +        }
    +        Set names = new HashSet<>();
    +        for (OutputColumnSpec column : columns) {
    +            if (column == null || column.getColumnName() == null) {
    +                continue;
    +            }
    +            String normalized = column.getColumnName().toLowerCase();
    +            if (!names.add(normalized)) {
    +                throw new IllegalArgumentException(context + " contains duplicate output column: "
    +                        + column.getColumnName());
    +            }
    +        }
    +    }
    +}
    diff --git a/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableDiffRecordSink.java b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableDiffRecordSink.java
    index fd0346a..566a723 100644
    --- a/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableDiffRecordSink.java
    +++ b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableDiffRecordSink.java
    @@ -48,27 +48,29 @@ public class TableDiffRecordSink implements Sink {
         @Override
         public void open(SinkConfig config, DiffContext context) throws Exception {
             sinkConfig = parseConfig(config.getProperties());
    -        dataSource = createDataSource(sinkConfig);
    -        tableName = sinkConfig.resolveTableName();
             batchSize = sinkConfig.getBatchSize();
             if (batchSize <= 0) {
                 throw new IllegalArgumentException("sink.batchSize 必须大于 0");
             }
    +        TableColumnNames.validateUniqueSanitizedColumns(sinkConfig.getColumns(), "TableDiffRecordSink columns");
    +        dataSource = createDataSource(sinkConfig);
    +        tableName = sinkConfig.resolveTableName();
     
             sourceColumns = context.getSourceColumnNames() != null ? context.getSourceColumnNames() : new ArrayList<>();
             targetColumns = context.getTargetColumnNames() != null ? context.getTargetColumnNames() : new ArrayList<>();
             sourceOutputColumns = new LinkedHashMap<>();
             targetOutputColumns = new LinkedHashMap<>();
             for (String column : sourceColumns) {
    -            sourceOutputColumns.put(sanitize(column) + "_1", column);
    +            sourceOutputColumns.put(TableColumnNames.sanitize(column) + "_1", column);
             }
             for (String column : targetColumns) {
    -            targetOutputColumns.put(sanitize(column) + "_2", column);
    +            targetOutputColumns.put(TableColumnNames.sanitize(column) + "_2", column);
             }
     
             DatabaseDialect dialect = resolveDialect(sinkConfig);
             writeCompiler = dialect.getTableWriteCompiler();
             outputColumns = buildOutputColumns(dialect);
    +        TableColumnNames.validateUniqueOutputColumns(outputColumns, "TableDiffRecordSink");
             writePlan = writeCompiler.compile(new TableWriteCompileRequest(
                     tableName,
                     sinkConfig.isCreateTable(),
    @@ -186,7 +188,7 @@ private List buildOutputColumns(DatabaseDialect dialect) {
                 List columns = new ArrayList<>();
                 for (ColumnMapping mapping : sinkConfig.getColumns()) {
                     columns.add(new OutputColumnSpec(
    -                        sanitize(mapping.getName()),
    +                        TableColumnNames.sanitize(mapping.getName()),
                             resolveSystemType(dialect, mapping.getColumnType(), Types.TEXT()),
                             true,
                             mapping.getColumnType()
    @@ -201,10 +203,10 @@ private List buildOutputColumns(DatabaseDialect dialect) {
             columns.add(new OutputColumnSpec("nl_dq_diff_columns1", Types.JSON(), true, null));
             columns.add(new OutputColumnSpec("nl_dq_diff_columns2", Types.JSON(), true, null));
             for (String column : sourceColumns) {
    -            columns.add(new OutputColumnSpec(sanitize(column) + "_1", Types.TEXT(), true, null));
    +            columns.add(new OutputColumnSpec(TableColumnNames.sanitize(column) + "_1", Types.TEXT(), true, null));
             }
             for (String column : targetColumns) {
    -            columns.add(new OutputColumnSpec(sanitize(column) + "_2", Types.TEXT(), true, null));
    +            columns.add(new OutputColumnSpec(TableColumnNames.sanitize(column) + "_2", Types.TEXT(), true, null));
             }
             return List.copyOf(columns);
         }
    @@ -219,7 +221,7 @@ private TypeDescriptor resolveSystemType(DatabaseDialect dialect, String declare
     
         private ColumnMapping findColumnMapping(String outputColumnName) {
             for (ColumnMapping mapping : sinkConfig.getColumns()) {
    -            if (sanitize(mapping.getName()).equals(outputColumnName)) {
    +            if (TableColumnNames.sanitize(mapping.getName()).equals(outputColumnName)) {
                     return mapping;
                 }
             }
    @@ -230,7 +232,7 @@ private Map buildOverrideMap() {
             Map map = new LinkedHashMap<>();
             if (sinkConfig.getColumns() != null) {
                 for (ColumnMapping column : sinkConfig.getColumns()) {
    -                map.put(sanitize(column.getName()), column);
    +                map.put(TableColumnNames.sanitize(column.getName()), column);
                 }
             }
             return map;
    @@ -247,10 +249,6 @@ private void rollbackQuietly(Connection connection) {
             }
         }
     
    -    private String sanitize(String column) {
    -        return column.replaceAll("[^a-zA-Z0-9_]", "_");
    -    }
    -
         private String toJsonArray(List list) {
             if (list == null || list.isEmpty()) {
                 return "[]";
    diff --git a/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableResultSink.java b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableResultSink.java
    index 343d13e..70f5654 100644
    --- a/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableResultSink.java
    +++ b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/main/java/com/consilens/sink/table/TableResultSink.java
    @@ -41,12 +41,14 @@ public class TableResultSink implements Sink {
         @Override
         public void open(SinkConfig config, DiffContext context) throws Exception {
             sinkConfig = parseConfig(config.getProperties());
    +        TableColumnNames.validateUniqueSanitizedColumns(sinkConfig.getColumns(), "TableResultSink columns");
             dataSource = createDataSource(sinkConfig);
             tableName = sinkConfig.resolveTableName();
     
             DatabaseDialect dialect = DatabaseDialects.require(sinkConfig.resolveDatabaseType());
             writeCompiler = dialect.getTableWriteCompiler();
             outputColumns = buildOutputColumns(dialect);
    +        TableColumnNames.validateUniqueOutputColumns(outputColumns, "TableResultSink");
             writePlan = writeCompiler.compile(new TableWriteCompileRequest(
                     tableName,
                     sinkConfig.isCreateTable(),
    @@ -182,7 +184,7 @@ private List buildOutputColumns(DatabaseDialect dialect) {
                 List columns = new ArrayList<>();
                 for (ColumnMapping mapping : sinkConfig.getColumns()) {
                     columns.add(new OutputColumnSpec(
    -                        sanitize(mapping.getName()),
    +                        TableColumnNames.sanitize(mapping.getName()),
                             resolveSystemType(dialect, mapping.getColumnType(), Types.TEXT()),
                             true,
                             mapping.getColumnType()
    @@ -214,17 +216,13 @@ private TypeDescriptor resolveSystemType(DatabaseDialect dialect, String declare
     
         private ColumnMapping findColumnMapping(String outputColumnName) {
             for (ColumnMapping mapping : sinkConfig.getColumns()) {
    -            if (sanitize(mapping.getName()).equals(outputColumnName)) {
    +            if (TableColumnNames.sanitize(mapping.getName()).equals(outputColumnName)) {
                     return mapping;
                 }
             }
             return null;
         }
     
    -    private String sanitize(String column) {
    -        return column.replaceAll("[^a-zA-Z0-9_]", "_");
    -    }
    -
         private TableSinkConfig parseConfig(String properties) throws Exception {
             if (properties == null || properties.isBlank()) {
                 throw new IllegalArgumentException("TableResultSink requires properties configuration (url, username, password)");
    diff --git a/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/test/java/com/consilens/sink/table/TableDiffRecordSinkTest.java b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/test/java/com/consilens/sink/table/TableDiffRecordSinkTest.java
    index be2ce32..52d2946 100644
    --- a/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/test/java/com/consilens/sink/table/TableDiffRecordSinkTest.java
    +++ b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/test/java/com/consilens/sink/table/TableDiffRecordSinkTest.java
    @@ -41,6 +41,30 @@ void shouldRejectNonPositiveBatchSize() {
             assertThrows(IllegalArgumentException.class, () -> sink.open(sinkConfig, context));
         }
     
    +    @Test
    +    void shouldRejectCustomColumnsThatCollideAfterSanitization() {
    +        SinkConfig sinkConfig = new SinkConfig();
    +        sinkConfig.setFormat("table");
    +        sinkConfig.setType("diff-record");
    +        sinkConfig.setProperties("{"
    +                + "\"type\":\"mysql\","
    +                + "\"url\":\"jdbc:h2:mem:sanitized_collision;MODE=MySQL;DB_CLOSE_DELAY=-1\","
    +                + "\"username\":\"sa\","
    +                + "\"password\":\"\","
    +                + "\"driver\":\"org.h2.Driver\","
    +                + "\"tableName\":\"diff_record_collision\","
    +                + "\"columns\":["
    +                + "{\"name\":\"a-b\",\"value\":\"1\"},"
    +                + "{\"name\":\"a_b\",\"value\":\"2\"}"
    +                + "]"
    +                + "}");
    +
    +        TableDiffRecordSink sink = new TableDiffRecordSink();
    +        DiffContext context = DiffContext.builder().taskId("task-collision").build();
    +
    +        assertThrows(IllegalArgumentException.class, () -> sink.open(sinkConfig, context));
    +    }
    +
         @Test
         void shouldRollbackEntireTransactionWhenBatchInsertFails() throws Exception {
             String url = "jdbc:h2:mem:diff_record_rollback;MODE=MySQL;DB_CLOSE_DELAY=-1";
    diff --git a/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/test/java/com/consilens/sink/table/TableResultSinkTest.java b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/test/java/com/consilens/sink/table/TableResultSinkTest.java
    index 374e042..d50624f 100644
    --- a/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/test/java/com/consilens/sink/table/TableResultSinkTest.java
    +++ b/consilens-sink/consilens-sink-plugins/consilens-sink-table/src/test/java/com/consilens/sink/table/TableResultSinkTest.java
    @@ -12,10 +12,35 @@
     import java.util.List;
     
     import static org.junit.jupiter.api.Assertions.assertEquals;
    +import static org.junit.jupiter.api.Assertions.assertThrows;
     import static org.junit.jupiter.api.Assertions.assertTrue;
     
     class TableResultSinkTest {
     
    +    @Test
    +    void shouldRejectCustomColumnsThatCollideAfterSanitization() {
    +        SinkConfig sinkConfig = new SinkConfig();
    +        sinkConfig.setFormat("table");
    +        sinkConfig.setType("result");
    +        sinkConfig.setProperties("{"
    +                + "\"type\":\"mysql\","
    +                + "\"url\":\"jdbc:h2:mem:result_sanitized_collision;MODE=MySQL;DB_CLOSE_DELAY=-1\","
    +                + "\"username\":\"sa\","
    +                + "\"password\":\"\","
    +                + "\"driver\":\"org.h2.Driver\","
    +                + "\"tableName\":\"result_collision\","
    +                + "\"columns\":["
    +                + "{\"name\":\"a-b\",\"value\":\"1\"},"
    +                + "{\"name\":\"a_b\",\"value\":\"2\"}"
    +                + "]"
    +                + "}");
    +
    +        TableResultSink sink = new TableResultSink();
    +        DiffContext context = DiffContext.builder().taskId("task-collision").build();
    +
    +        assertThrows(IllegalArgumentException.class, () -> sink.open(sinkConfig, context));
    +    }
    +
         @Test
         void shouldWriteErrorRowInCustomColumnMode() throws Exception {
             String url = "jdbc:h2:mem:table_result_sink;MODE=MySQL;DB_CLOSE_DELAY=-1";
    diff --git a/examples/custom-sql-mysql-vs-postgres-checksum.yaml b/examples/custom-sql-mysql-vs-postgres-checksum.yaml
    index 518fa87..1839e09 100644
    --- a/examples/custom-sql-mysql-vs-postgres-checksum.yaml
    +++ b/examples/custom-sql-mysql-vs-postgres-checksum.yaml
    @@ -6,7 +6,7 @@ source:
       type: mysql
       name: source-mysql-custom-sql
       connection:
    -    url: jdbc:mysql://localhost:3306/test
    +    url: jdbc:mysql://localhost:3306/consilens_demo
         username: ${env.MYSQL_USER}
         password: ${env.MYSQL_PASSWORD}
       resource:
    @@ -19,7 +19,7 @@ source:
             amount,
             status,
             updated_at
    -      FROM performance_test_table
    +      FROM consilens_performance_demo_table
     
     target:
       type: postgresql
    @@ -38,7 +38,7 @@ target:
             amount,
             status,
             updated_at
    -      FROM performance_test_table
    +      FROM consilens_performance_demo_table
     
     comparison:
       keys:
    diff --git a/examples/detail-to-aggregate-custom-sql.yaml b/examples/detail-to-aggregate-custom-sql.yaml
    index ccbf958..448242d 100644
    --- a/examples/detail-to-aggregate-custom-sql.yaml
    +++ b/examples/detail-to-aggregate-custom-sql.yaml
    @@ -5,7 +5,7 @@ source:
       type: mysql
       name: source-mysql-detail-aggregate
       connection:
    -    url: jdbc:mysql://localhost:3306/test
    +    url: jdbc:mysql://localhost:3306/consilens_demo
         username: ${env.MYSQL_USER}
         password: ${env.MYSQL_PASSWORD}
       resource:
    @@ -16,7 +16,7 @@ source:
             COUNT(*) AS order_count,
             SUM(amount) AS total_amount,
             MAX(updated_at) AS updated_at
    -      FROM performance_test_table
    +      FROM consilens_performance_demo_table
           WHERE deleted = 0
           GROUP BY DATE(created_at), status
     
    diff --git a/examples/mysql-to-doris-partitioned-checksum.yaml b/examples/mysql-to-doris-partitioned-checksum.yaml
    index 9ed71ae..a670ea3 100644
    --- a/examples/mysql-to-doris-partitioned-checksum.yaml
    +++ b/examples/mysql-to-doris-partitioned-checksum.yaml
    @@ -5,23 +5,23 @@ source:
       type: mysql
       name: source-mysql-to-doris
       connection:
    -    url: jdbc:mysql://localhost:3306/test
    +    url: jdbc:mysql://localhost:3306/consilens_demo
         username: ${env.MYSQL_USER}
         password: ${env.MYSQL_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     target:
       type: doris
       name: target-doris-partitioned
       connection:
    -    url: jdbc:mysql://localhost:9030/test
    +    url: jdbc:mysql://localhost:9030/consilens_demo
         username: ${env.DORIS_USER}
         password: ${env.DORIS_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     comparison:
       keys:
    diff --git a/examples/performance-test-mysql-vs-postgres-exclude.yaml b/examples/performance-test-mysql-vs-postgres-exclude.yaml
    index bc0cda6..edb0d96 100644
    --- a/examples/performance-test-mysql-vs-postgres-exclude.yaml
    +++ b/examples/performance-test-mysql-vs-postgres-exclude.yaml
    @@ -1,5 +1,5 @@
     # 性能测试配置 - MySQL vs PostgreSQL 跨数据库比对
    -# 表: performance_test_table
    +# 表: consilens_performance_demo_table
     # 行数: 2,000,000
     # 列数: 50(所有列)
     
    @@ -7,12 +7,12 @@ source:
       type: "mysql"
       name: source-mysql
       connection:
    -    url: jdbc:mysql://localhost:3306/test
    +    url: jdbc:mysql://localhost:3306/consilens_demo
         username: ${env.MYSQL_USER}
         password: ${env.MYSQL_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     target:
       type: postgresql
    @@ -23,7 +23,7 @@ target:
         password: ${env.PG_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     comparison:
       keys:
    @@ -159,7 +159,7 @@ result:
           type: result
           properties:
             type: mysql
    -        url: jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
    +        url: jdbc:mysql://localhost:3306/diff_results?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
             username: ${env.MYSQL_USER}
             password: ${env.MYSQL_PASSWORD}
             driver: com.mysql.cj.jdbc.Driver
    @@ -196,7 +196,7 @@ result:
           type: result
           properties:
             type: mysql
    -        url: jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
    +        url: jdbc:mysql://localhost:3306/diff_results?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
             username: ${env.MYSQL_USER}
             password: ${env.MYSQL_PASSWORD}
             driver: com.mysql.cj.jdbc.Driver
    @@ -234,13 +234,13 @@ result:
                 defaultValue: ""
               - name: database_name
                 columnType: varchar(128)
    -            defaultValue: "test"
    +            defaultValue: "consilens_demo"
               - name: table_name
                 columnType: varchar(128)
    -            defaultValue: "test"
    +            defaultValue: "consilens_performance_demo_table"
               - name: column_name
                 columnType: varchar(128)
    -            defaultValue: "test"
    +            defaultValue: "record_id"
               - name: actual_value
                 value: ${totalDifferences}
                 columnType: decimal(38,4)
    @@ -278,7 +278,7 @@ result:
           type: diff-record
           properties:
             type: mysql
    -        url: jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
    +        url: jdbc:mysql://localhost:3306/diff_results?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
             username: ${env.MYSQL_USER}
             password: ${env.MYSQL_PASSWORD}
             driver: com.mysql.cj.jdbc.Driver
    diff --git a/examples/performance-test-mysql-vs-postgres-output-postgres.yaml b/examples/performance-test-mysql-vs-postgres-output-postgres.yaml
    index b079ad5..9f4847b 100644
    --- a/examples/performance-test-mysql-vs-postgres-output-postgres.yaml
    +++ b/examples/performance-test-mysql-vs-postgres-output-postgres.yaml
    @@ -1,5 +1,5 @@
     # 性能测试配置 - MySQL vs PostgreSQL 跨数据库比对
    -# 表: performance_test_table
    +# 表: consilens_performance_demo_table
     # 行数: 2,000,000
     # 列数: 50(所有列)
     
    @@ -7,12 +7,12 @@ source:
       type: "mysql"
       name: source-mysql
       connection:
    -    url: jdbc:mysql://localhost:3306/test
    +    url: jdbc:mysql://localhost:3306/consilens_demo
         username: ${env.MYSQL_USER}
         password: ${env.MYSQL_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     target:
       type: postgresql
    @@ -23,7 +23,7 @@ target:
         password: ${env.PG_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     comparison:
       keys:
    @@ -229,13 +229,13 @@ result:
                 defaultValue: ""
               - name: database_name
                 columnType: varchar(128)
    -            defaultValue: "test"
    +            defaultValue: "consilens_demo"
               - name: table_name
                 columnType: varchar(128)
    -            defaultValue: "test"
    +            defaultValue: "consilens_performance_demo_table"
               - name: column_name
                 columnType: varchar(128)
    -            defaultValue: "test"
    +            defaultValue: "record_id"
               - name: actual_value
                 value: ${totalDifferences}
                 columnType: decimal(38,4)
    diff --git a/examples/performance-test-mysql-vs-postgres.json b/examples/performance-test-mysql-vs-postgres.json
    index d75f532..c29bb31 100644
    --- a/examples/performance-test-mysql-vs-postgres.json
    +++ b/examples/performance-test-mysql-vs-postgres.json
    @@ -3,13 +3,13 @@
         "type": "mysql",
         "name": "source-mysql",
         "connection": {
    -      "url": "jdbc:mysql://localhost:3306/test",
    +      "url": "jdbc:mysql://localhost:3306/consilens_demo",
           "username": "${env.MYSQL_USER}",
           "password": "${env.MYSQL_PASSWORD}"
         },
         "resource": {
           "type": "table",
    -      "name": "performance_test_table"
    +      "name": "consilens_performance_demo_table"
         }
       },
       "target": {
    @@ -22,7 +22,7 @@
         },
         "resource": {
           "type": "table",
    -      "name": "performance_test_table"
    +      "name": "consilens_performance_demo_table"
         }
       },
       "comparison": {
    @@ -158,7 +158,7 @@
             "type": "result",
             "properties": {
               "type": "mysql",
    -          "url": "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true",
    +          "url": "jdbc:mysql://localhost:3306/diff_results?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true",
               "username": "${env.MYSQL_USER}",
               "password": "${env.MYSQL_PASSWORD}",
               "driver": "com.mysql.cj.jdbc.Driver",
    @@ -213,7 +213,7 @@
             "type": "result",
             "properties": {
               "type": "mysql",
    -          "url": "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true",
    +          "url": "jdbc:mysql://localhost:3306/diff_results?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true",
               "username": "${env.MYSQL_USER}",
               "password": "${env.MYSQL_PASSWORD}",
               "driver": "com.mysql.cj.jdbc.Driver",
    @@ -268,17 +268,17 @@
                 {
                   "name": "database_name",
                   "columnType": "varchar(128)",
    -              "defaultValue": "test"
    +              "defaultValue": "consilens_demo"
                 },
                 {
                   "name": "table_name",
                   "columnType": "varchar(128)",
    -              "defaultValue": "test"
    +              "defaultValue": "consilens_performance_demo_table"
                 },
                 {
                   "name": "column_name",
                   "columnType": "varchar(128)",
    -              "defaultValue": "test"
    +              "defaultValue": "record_id"
                 },
                 {
                   "name": "actual_value",
    @@ -343,7 +343,7 @@
             "type": "diff-record",
             "properties": {
               "type": "mysql",
    -          "url": "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true",
    +          "url": "jdbc:mysql://localhost:3306/diff_results?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true",
               "username": "${env.MYSQL_USER}",
               "password": "${env.MYSQL_PASSWORD}",
               "driver": "com.mysql.cj.jdbc.Driver",
    diff --git a/examples/performance-test-mysql-vs-postgres.yaml b/examples/performance-test-mysql-vs-postgres.yaml
    index 2904241..349e4fe 100644
    --- a/examples/performance-test-mysql-vs-postgres.yaml
    +++ b/examples/performance-test-mysql-vs-postgres.yaml
    @@ -1,5 +1,5 @@
     # 性能测试配置 - MySQL vs PostgreSQL 跨数据库比对
    -# 表: performance_test_table
    +# 表: consilens_performance_demo_table
     # 行数: 2,000,000
     # 列数: 50(所有列)
     
    @@ -7,12 +7,12 @@ source:
       type: "mysql"
       name: source-mysql
       connection:
    -    url: jdbc:mysql://localhost:3306/test
    +    url: jdbc:mysql://localhost:3306/consilens_demo
         username: ${env.MYSQL_USER}
         password: ${env.MYSQL_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     target:
       type: postgresql
    @@ -23,7 +23,7 @@ target:
         password: ${env.PG_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     comparison:
       keys:
    @@ -154,7 +154,7 @@ result:
           type: result
           properties:
             type: mysql
    -        url: jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
    +        url: jdbc:mysql://localhost:3306/diff_results?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
             username: ${env.MYSQL_USER}
             password: ${env.MYSQL_PASSWORD}
             driver: com.mysql.cj.jdbc.Driver
    @@ -191,7 +191,7 @@ result:
           type: result
           properties:
             type: mysql
    -        url: jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
    +        url: jdbc:mysql://localhost:3306/diff_results?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
             username: ${env.MYSQL_USER}
             password: ${env.MYSQL_PASSWORD}
             driver: com.mysql.cj.jdbc.Driver
    @@ -229,13 +229,13 @@ result:
                 defaultValue: ""
               - name: database_name
                 columnType: varchar(128)
    -            defaultValue: "test"
    +            defaultValue: "consilens_demo"
               - name: table_name
                 columnType: varchar(128)
    -            defaultValue: "test"
    +            defaultValue: "consilens_performance_demo_table"
               - name: column_name
                 columnType: varchar(128)
    -            defaultValue: "test"
    +            defaultValue: "record_id"
               - name: actual_value
                 value: ${totalDifferences}
                 columnType: decimal(38,4)
    @@ -273,7 +273,7 @@ result:
           type: diff-record
           properties:
             type: mysql
    -        url: jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
    +        url: jdbc:mysql://localhost:3306/diff_results?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
             username: ${env.MYSQL_USER}
             password: ${env.MYSQL_PASSWORD}
             driver: com.mysql.cj.jdbc.Driver
    diff --git a/examples/performance-test-mysql-vs-starrocks.yaml b/examples/performance-test-mysql-vs-starrocks.yaml
    index 73a00c9..13a0459 100644
    --- a/examples/performance-test-mysql-vs-starrocks.yaml
    +++ b/examples/performance-test-mysql-vs-starrocks.yaml
    @@ -2,23 +2,23 @@ source:
       type: mysql
       name: source-mysql
       connection:
    -    url: jdbc:mysql://localhost:3306/test
    +    url: jdbc:mysql://localhost:3306/consilens_demo
         username: ${env.MYSQL_USER}
         password: ${env.MYSQL_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     target:
       type: starrocks
       name: target-starrocks
       connection:
    -    url: jdbc:mysql://localhost:9030/test
    +    url: jdbc:mysql://localhost:9030/consilens_demo
         username: ${env.STARROCKS_USER}
         password: ${env.STARROCKS_PASSWORD}
       resource:
         type: table
    -    name: performance_test_table
    +    name: consilens_performance_demo_table
     
     comparison:
       keys:
    diff --git a/examples/seed-sql/README.md b/examples/seed-sql/README.md
    new file mode 100644
    index 0000000..b76955a
    --- /dev/null
    +++ b/examples/seed-sql/README.md
    @@ -0,0 +1,146 @@
    +# Consilens Demo Data Scripts
    +
    +This directory contains repeatable seed scripts for the current Consilens example configurations.
    +
    +## Files
    +
    +- `load-mysql-consilens-demo-data.sql`
    +  - Creates `consilens_demo.consilens_performance_demo_table`, `mydb.users`, `mydb.orders`, `mydb.orders_backup`, and `production.fact_orders`.
    +  - Creates `diff_results` for table sink examples.
    +  - Inserts 10000 rows into every source table.
    +- `load-postgresql-consilens-demo-data.sql`
    +  - Creates `public.consilens_performance_demo_table`, `public.daily_order_summary`, and `public.users`.
    +  - Inserts 10000 rows into `consilens_performance_demo_table` and `users`.
    +  - Builds `daily_order_summary` from the same aggregate logic used by `examples/detail-to-aggregate-custom-sql.yaml`.
    +- `load-doris-consilens-demo-data.sql`
    +  - Creates `consilens_demo.consilens_performance_demo_table` for `examples/mysql-to-doris-partitioned-checksum.yaml`.
    +  - Inserts 10000 rows with the partition and comparison columns used by the Doris example.
    +- `load-starrocks-consilens-demo-data.sql`
    +  - Creates `consilens_demo.consilens_performance_demo_table` and `analytics.fact_orders`.
    +  - Inserts 10000 rows into every StarRocks target table.
    +
    +## Covered Examples
    +
    +- `examples/minimal-mysql-to-pg.yaml`
    +- `examples/same-db-mysql-comparison.yaml`
    +- `examples/performance-test-mysql-vs-postgres.yaml`
    +- `examples/performance-test-mysql-vs-postgres-exclude.yaml`
    +- `examples/performance-test-mysql-vs-postgres-output-postgres.yaml`
    +- `examples/performance-test-mysql-vs-postgres.json`
    +- `examples/custom-sql-mysql-vs-postgres-checksum.yaml`
    +- `examples/detail-to-aggregate-custom-sql.yaml`
    +- `examples/mysql-to-doris-partitioned-checksum.yaml`
    +- `examples/performance-test-mysql-vs-starrocks.yaml`
    +- `examples/large-table-mysql-to-starrocks.yaml`
    +
    +The scripts also include `dt`, `deleted`, and result database setup needed by the partition/filter and sink scenarios already present in the examples. The row count is intentionally 10000 for repeatable local smoke coverage; the performance example comments still describe larger production-scale workloads.
    +
    +## Run
    +
    +MySQL:
    +
    +```bash
    +mysql -h localhost -P 3306 -u "$MYSQL_USER" -p < examples/seed-sql/load-mysql-consilens-demo-data.sql
    +```
    +
    +PostgreSQL for examples that point to the `postgres` database:
    +
    +```bash
    +psql "postgresql://$PG_USER:$PG_PASSWORD@localhost:5432/postgres" \
    +  -f examples/seed-sql/load-postgresql-consilens-demo-data.sql
    +```
    +
    +PostgreSQL for `examples/minimal-mysql-to-pg.yaml`, which points to `mydb`:
    +
    +```bash
    +createdb -h localhost -U "$PG_USER" mydb
    +psql "postgresql://$PG_USER:$PG_PASSWORD@localhost:5432/mydb" \
    +  -f examples/seed-sql/load-postgresql-consilens-demo-data.sql
    +```
    +
    +StarRocks:
    +
    +```bash
    +mysql -h localhost -P 9030 -u "$STARROCKS_USER" -p < examples/seed-sql/load-starrocks-consilens-demo-data.sql
    +```
    +
    +If StarRocks uses more than one replica by default, change `"replication_num" = "1"` in the script before running it.
    +
    +Doris:
    +
    +```bash
    +mysql -h localhost -P 9030 -u "$DORIS_USER" -p < examples/seed-sql/load-doris-consilens-demo-data.sql
    +```
    +
    +## Time Zone
    +
    +For zero-difference cross-database checks that include timestamp fields, keep MySQL, PostgreSQL, and StarRocks comparison sessions on the same logical timezone. The seed scripts use Asia/Shanghai values. If the database servers use different defaults, configure the JDBC session or Consilens normalization before comparing timestamp columns.
    +
    +## Expected Baseline
    +
    +The default data is a zero-difference baseline for the matching example pairs:
    +
    +- MySQL `consilens_demo.consilens_performance_demo_table` vs PostgreSQL `public.consilens_performance_demo_table`
    +- MySQL `consilens_demo.consilens_performance_demo_table` vs Doris `consilens_demo.consilens_performance_demo_table`
    +- MySQL `consilens_demo.consilens_performance_demo_table` vs StarRocks `consilens_demo.consilens_performance_demo_table` for the integer fields in `performance-test-mysql-vs-starrocks.yaml`
    +- MySQL `production.fact_orders` vs StarRocks `analytics.fact_orders`
    +- MySQL `mydb.orders` vs `mydb.orders_backup`
    +- MySQL `mydb.users` vs PostgreSQL `public.users`
    +
    +Each script ends with verification queries. Every main table should report `actual_rows = 10000` and `expected_rows = 10000`.
    +
    +## Optional Difference Cases
    +
    +Run these only after loading the baseline data.
    +
    +Create a value mismatch in MySQL vs PostgreSQL performance data:
    +
    +```sql
    +UPDATE public.consilens_performance_demo_table
    +SET amount = amount + 1
    +WHERE record_id = 'REC0000001000';
    +```
    +
    +Create a target-only row in PostgreSQL:
    +
    +```sql
    +INSERT INTO public.users (id, name, email, phone, status, created_at)
    +VALUES (10001, 'user_10001_extra', 'user_10001_extra@example.com', '+861380010001', 'active', '2026-01-01 00:00:00');
    +```
    +
    +Create a MySQL same-database mismatch:
    +
    +```sql
    +UPDATE mydb.orders_backup
    +SET status = 'manual_diff'
    +WHERE order_id = 1000;
    +```
    +
    +Create a StarRocks target mismatch:
    +
    +```sql
    +INSERT INTO analytics.fact_orders (
    +  order_id,
    +  customer_id,
    +  product_id,
    +  quantity,
    +  unit_price,
    +  total_amount,
    +  order_date,
    +  status,
    +  created_at,
    +  updated_at
    +)
    +VALUES (
    +  10001,
    +  100001,
    +  200001,
    +  1,
    +  9.0000,
    +  9.0000,
    +  '2026-05-01',
    +  'target_extra',
    +  '2026-05-01 00:00:00',
    +  '2026-05-01 00:10:00'
    +);
    +```
    diff --git a/examples/seed-sql/load-doris-consilens-demo-data.sql b/examples/seed-sql/load-doris-consilens-demo-data.sql
    new file mode 100644
    index 0000000..646775a
    --- /dev/null
    +++ b/examples/seed-sql/load-doris-consilens-demo-data.sql
    @@ -0,0 +1,89 @@
    +-- Doris seed data for Consilens examples.
    +-- Target row count: 10000 rows for examples/mysql-to-doris-partitioned-checksum.yaml
    +-- Run with: mysql -h localhost -P 9030 -u "$DORIS_USER" -p < examples/seed-sql/load-doris-consilens-demo-data.sql
    +
    +CREATE DATABASE IF NOT EXISTS consilens_demo;
    +
    +USE consilens_demo;
    +
    +DROP TABLE IF EXISTS consilens_digits;
    +DROP TABLE IF EXISTS consilens_seq;
    +
    +CREATE TABLE consilens_digits (
    +  d INT NOT NULL
    +)
    +ENGINE=OLAP
    +DUPLICATE KEY(d)
    +DISTRIBUTED BY HASH(d) BUCKETS 1
    +PROPERTIES ("replication_num" = "1");
    +
    +INSERT INTO consilens_digits VALUES
    +  (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);
    +
    +CREATE TABLE consilens_seq (
    +  n INT NOT NULL
    +)
    +ENGINE=OLAP
    +DUPLICATE KEY(n)
    +DISTRIBUTED BY HASH(n) BUCKETS 10
    +PROPERTIES ("replication_num" = "1");
    +
    +INSERT INTO consilens_seq
    +SELECT ones.d + tens.d * 10 + hundreds.d * 100 + thousands.d * 1000 + 1 AS n
    +FROM consilens_digits ones
    +CROSS JOIN consilens_digits tens
    +CROSS JOIN consilens_digits hundreds
    +CROSS JOIN consilens_digits thousands
    +WHERE ones.d + tens.d * 10 + hundreds.d * 100 + thousands.d * 1000 < 10000;
    +
    +DROP TABLE IF EXISTS consilens_performance_demo_table;
    +
    +CREATE TABLE consilens_performance_demo_table (
    +  record_id VARCHAR(16) NOT NULL,
    +  col_int INT,
    +  col_decimal DECIMAL(18,4),
    +  amount DECIMAL(18,4),
    +  status VARCHAR(20),
    +  updated_at DATETIME,
    +  dt DATE NOT NULL
    +)
    +ENGINE=OLAP
    +DUPLICATE KEY(record_id)
    +PARTITION BY RANGE(dt) (
    +  PARTITION p20260501 VALUES [('2026-05-01'), ('2026-05-02')),
    +  PARTITION pmax VALUES [('2026-05-02'), ('2030-01-01'))
    +)
    +DISTRIBUTED BY HASH(record_id) BUCKETS 10
    +PROPERTIES ("replication_num" = "1");
    +
    +INSERT INTO consilens_performance_demo_table (
    +  record_id,
    +  col_int,
    +  col_decimal,
    +  amount,
    +  status,
    +  updated_at,
    +  dt
    +)
    +SELECT
    +  CONCAT('REC', LPAD(CAST(n AS STRING), 10, '0')) AS record_id,
    +  n * 10 AS col_int,
    +  CAST(ROUND(MOD(n, 100000) / 100 + 0.1234, 4) AS DECIMAL(18,4)) AS col_decimal,
    +  CAST(ROUND(10 + MOD(n, 5000) / 10, 4) AS DECIMAL(18,4)) AS amount,
    +  CASE MOD(n, 4)
    +    WHEN 0 THEN 'active'
    +    WHEN 1 THEN 'inactive'
    +    WHEN 2 THEN 'pending'
    +    ELSE 'blocked'
    +  END AS status,
    +  DATE_ADD(CAST('2026-05-01 00:05:00' AS DATETIME), INTERVAL MOD(n, 10000) SECOND) AS updated_at,
    +  CAST('2026-05-01' AS DATE) AS dt
    +FROM consilens_seq;
    +
    +SELECT 'doris.consilens_demo.consilens_performance_demo_table' AS check_name,
    +       COUNT(*) AS actual_rows,
    +       10000 AS expected_rows,
    +       MIN(record_id) AS min_record_id,
    +       MAX(record_id) AS max_record_id,
    +       ROUND(SUM(amount), 4) AS amount_sum
    +FROM consilens_demo.consilens_performance_demo_table;
    diff --git a/examples/seed-sql/load-mysql-consilens-demo-data.sql b/examples/seed-sql/load-mysql-consilens-demo-data.sql
    new file mode 100644
    index 0000000..3e5b462
    --- /dev/null
    +++ b/examples/seed-sql/load-mysql-consilens-demo-data.sql
    @@ -0,0 +1,435 @@
    +-- MySQL 5.7 compatible seed data for Consilens examples.
    +-- Target row count: 10000 rows in each source table used by the MySQL examples.
    +
    +SET NAMES utf8mb4;
    +SET time_zone = '+08:00';
    +
    +CREATE DATABASE IF NOT EXISTS consilens_demo
    +    DEFAULT CHARACTER SET utf8mb4
    +    COLLATE utf8mb4_unicode_ci;
    +
    +CREATE DATABASE IF NOT EXISTS mydb
    +    DEFAULT CHARACTER SET utf8mb4
    +    COLLATE utf8mb4_unicode_ci;
    +
    +CREATE DATABASE IF NOT EXISTS production
    +    DEFAULT CHARACTER SET utf8mb4
    +    COLLATE utf8mb4_unicode_ci;
    +
    +CREATE DATABASE IF NOT EXISTS diff_results
    +    DEFAULT CHARACTER SET utf8mb4
    +    COLLATE utf8mb4_unicode_ci;
    +
    +USE consilens_demo;
    +
    +-- =========================================================
    +-- sequence helper tables
    +-- MySQL 5.7 cannot reopen TEMPORARY TABLE multiple times
    +-- so use normal MEMORY tables instead
    +-- =========================================================
    +
    +DROP TABLE IF EXISTS consilens_digits;
    +
    +CREATE TABLE consilens_digits (
    +                                  d TINYINT NOT NULL PRIMARY KEY
    +) ENGINE=MEMORY;
    +
    +INSERT INTO consilens_digits (d)
    +VALUES
    +    (0), (1), (2), (3), (4),
    +    (5), (6), (7), (8), (9);
    +
    +DROP TABLE IF EXISTS consilens_seq;
    +
    +CREATE TABLE consilens_seq (
    +                               n INT NOT NULL PRIMARY KEY
    +) ENGINE=MEMORY;
    +
    +INSERT INTO consilens_seq (n)
    +SELECT
    +    ones.d
    +        + tens.d * 10
    +        + hundreds.d * 100
    +        + thousands.d * 1000
    +        + 1 AS n
    +FROM consilens_digits ones
    +         CROSS JOIN consilens_digits tens
    +         CROSS JOIN consilens_digits hundreds
    +         CROSS JOIN consilens_digits thousands
    +WHERE
    +    ones.d
    +        + tens.d * 10
    +        + hundreds.d * 100
    +        + thousands.d * 1000 < 10000
    +ORDER BY n;
    +
    +-- =========================================================
    +-- consilens_performance_demo_table
    +-- =========================================================
    +
    +DROP TABLE IF EXISTS consilens_demo.consilens_performance_demo_table;
    +
    +CREATE TABLE consilens_demo.consilens_performance_demo_table (
    +                                               record_id VARCHAR(16) NOT NULL,
    +                                               col_tinyint TINYINT,
    +                                               col_smallint SMALLINT,
    +                                               col_mediumint MEDIUMINT,
    +                                               col_int INT,
    +                                               col_bigint BIGINT,
    +                                               col_unsigned_int INT UNSIGNED,
    +                                               col_float FLOAT,
    +                                               col_double DOUBLE,
    +                                               col_decimal DECIMAL(18,4),
    +                                               col_numeric NUMERIC(18,4),
    +                                               col_char CHAR(10),
    +                                               col_varchar_50 VARCHAR(50),
    +                                               col_varchar_100 VARCHAR(100),
    +                                               col_varchar_255 VARCHAR(255),
    +                                               col_text TEXT,
    +                                               col_mediumtext MEDIUMTEXT,
    +                                               col_binary BINARY(8),
    +                                               col_varbinary VARBINARY(16),
    +                                               col_blob BLOB,
    +                                               col_date DATE,
    +                                               col_datetime DATETIME,
    +                                               col_timestamp TIMESTAMP NULL,
    +                                               col_time TIME,
    +                                               col_boolean BOOLEAN,
    +                                               col_tinyint_bool TINYINT(1),
    +                                               col_enum ENUM('new', 'processing', 'done', 'failed'),
    +                                               col_set SET('a', 'b', 'c'),
    +                                               col_json JSON,
    +                                               user_name VARCHAR(64),
    +                                               email VARCHAR(128),
    +                                               phone VARCHAR(32),
    +                                               address VARCHAR(255),
    +                                               city VARCHAR(64),
    +                                               country VARCHAR(64),
    +                                               postal_code VARCHAR(20),
    +                                               amount DECIMAL(18,4),
    +                                               balance DECIMAL(18,4),
    +                                               credit_limit DECIMAL(18,4),
    +                                               status VARCHAR(20),
    +                                               category VARCHAR(30),
    +                                               priority SMALLINT,
    +                                               score DOUBLE,
    +                                               created_at DATETIME,
    +                                               updated_at TIMESTAMP NULL,
    +                                               deleted TINYINT NOT NULL DEFAULT 0,
    +                                               dt DATE NOT NULL,
    +                                               KEY idx_performance_record_id (record_id),
    +                                               KEY idx_performance_dt (dt)
    +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC
    +    PARTITION BY RANGE COLUMNS(dt) (
    +        PARTITION p20260501 VALUES LESS THAN ('2026-05-02'),
    +        PARTITION pmax VALUES LESS THAN (MAXVALUE)
    +        );
    +
    +INSERT INTO consilens_demo.consilens_performance_demo_table (
    +    record_id,
    +    col_tinyint,
    +    col_smallint,
    +    col_mediumint,
    +    col_int,
    +    col_bigint,
    +    col_unsigned_int,
    +    col_float,
    +    col_double,
    +    col_decimal,
    +    col_numeric,
    +    col_char,
    +    col_varchar_50,
    +    col_varchar_100,
    +    col_varchar_255,
    +    col_text,
    +    col_mediumtext,
    +    col_binary,
    +    col_varbinary,
    +    col_blob,
    +    col_date,
    +    col_datetime,
    +    col_timestamp,
    +    col_time,
    +    col_boolean,
    +    col_tinyint_bool,
    +    col_enum,
    +    col_set,
    +    col_json,
    +    user_name,
    +    email,
    +    phone,
    +    address,
    +    city,
    +    country,
    +    postal_code,
    +    amount,
    +    balance,
    +    credit_limit,
    +    status,
    +    category,
    +    priority,
    +    score,
    +    created_at,
    +    updated_at,
    +    deleted,
    +    dt
    +)
    +SELECT
    +    CONCAT('REC', LPAD(n, 10, '0')) AS record_id,
    +    MOD(n, 100) AS col_tinyint,
    +    MOD(n, 30000) AS col_smallint,
    +    n * 3 AS col_mediumint,
    +    n * 10 AS col_int,
    +    n * 1000003 AS col_bigint,
    +    2147483648 + n AS col_unsigned_int,
    +    CAST(MOD(n, 1000) + 0.125 AS FLOAT) AS col_float,
    +    CAST(MOD(n, 100000) + 0.25 AS DOUBLE) AS col_double,
    +    CAST(ROUND(MOD(n, 100000) / 100 + 0.1234, 4) AS DECIMAL(18,4)) AS col_decimal,
    +    CAST(ROUND(MOD(n, 100000) / 50 + 0.5678, 4) AS DECIMAL(18,4)) AS col_numeric,
    +    CONCAT('C', LPAD(n, 9, '0')) AS col_char,
    +    CONCAT('v50_', LPAD(n, 5, '0')) AS col_varchar_50,
    +    CONCAT('v100_', LPAD(n, 5, '0'), '_stable') AS col_varchar_100,
    +    CONCAT('v255_', LPAD(n, 5, '0'), '_stable_payload') AS col_varchar_255,
    +    CONCAT('text-', LPAD(n, 5, '0')) AS col_text,
    +    CONCAT('mediumtext-', LPAD(n, 5, '0'), '-', REPEAT('x', MOD(n, 32))) AS col_mediumtext,
    +    UNHEX(LPAD(HEX(n), 16, '0')) AS col_binary,
    +    UNHEX(LPAD(HEX(n * 17), 32, '0')) AS col_varbinary,
    +    UNHEX(LPAD(HEX(n * 31), 32, '0')) AS col_blob,
    +    DATE_ADD('2026-05-01', INTERVAL MOD(n, 7) DAY) AS col_date,
    +    DATE_ADD('2026-05-01 00:00:00', INTERVAL MOD(n, 10000) SECOND) AS col_datetime,
    +    DATE_ADD('2026-05-01 00:00:00', INTERVAL MOD(n, 10000) SECOND) AS col_timestamp,
    +    SEC_TO_TIME(MOD(n, 86400)) AS col_time,
    +    MOD(n, 2) = 0 AS col_boolean,
    +    MOD(n, 2) AS col_tinyint_bool,
    +    CASE MOD(n, 4)
    +        WHEN 0 THEN 'new'
    +        WHEN 1 THEN 'processing'
    +        WHEN 2 THEN 'done'
    +        ELSE 'failed'
    +        END AS col_enum,
    +    CASE MOD(n, 3)
    +        WHEN 0 THEN 'a,b'
    +        WHEN 1 THEN 'b'
    +        ELSE 'c'
    +        END AS col_set,
    +    JSON_OBJECT('value', CONCAT('json_', LPAD(n, 5, '0'))) AS col_json,
    +    CONCAT('user_', LPAD(n, 5, '0')) AS user_name,
    +    CONCAT('user_', LPAD(n, 5, '0'), '@example.com') AS email,
    +    CONCAT('+861380', LPAD(n, 6, '0')) AS phone,
    +    CONCAT('No.', n, ' Consilens Road') AS address,
    +    CASE MOD(n, 4)
    +        WHEN 0 THEN 'Shanghai'
    +        WHEN 1 THEN 'Beijing'
    +        WHEN 2 THEN 'Shenzhen'
    +        ELSE 'Hangzhou'
    +        END AS city,
    +    'CN' AS country,
    +    LPAD(MOD(n, 1000000), 6, '0') AS postal_code,
    +    CAST(ROUND(10 + MOD(n, 5000) / 10, 4) AS DECIMAL(18,4)) AS amount,
    +    CAST(ROUND(1000 + MOD(n, 8000) / 10, 4) AS DECIMAL(18,4)) AS balance,
    +    CAST(ROUND(5000 + MOD(n, 3000) / 10, 4) AS DECIMAL(18,4)) AS credit_limit,
    +    CASE MOD(n, 4)
    +        WHEN 0 THEN 'active'
    +        WHEN 1 THEN 'inactive'
    +        WHEN 2 THEN 'pending'
    +        ELSE 'blocked'
    +        END AS status,
    +    CASE MOD(n, 5)
    +        WHEN 0 THEN 'retail'
    +        WHEN 1 THEN 'finance'
    +        WHEN 2 THEN 'logistics'
    +        WHEN 3 THEN 'manufacturing'
    +        ELSE 'public'
    +        END AS category,
    +    MOD(n, 5) + 1 AS priority,
    +    CAST(MOD(n, 100) + 0.5 AS DOUBLE) AS score,
    +    DATE_ADD('2026-05-01 00:00:00', INTERVAL MOD(n, 10000) SECOND) AS created_at,
    +    DATE_ADD('2026-05-01 00:05:00', INTERVAL MOD(n, 10000) SECOND) AS updated_at,
    +    IF(MOD(n, 10) = 0, 1, 0) AS deleted,
    +    DATE('2026-05-01') AS dt
    +FROM consilens_seq;
    +
    +-- =========================================================
    +-- mydb.users
    +-- =========================================================
    +
    +DROP TABLE IF EXISTS mydb.users;
    +
    +CREATE TABLE mydb.users (
    +                            id INT NOT NULL PRIMARY KEY,
    +                            name VARCHAR(100),
    +                            email VARCHAR(128) NOT NULL,
    +                            phone VARCHAR(32),
    +                            status VARCHAR(20),
    +                            created_at DATETIME,
    +                            KEY idx_users_email (email),
    +                            KEY idx_users_created_at (created_at)
    +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
    +
    +INSERT INTO mydb.users (
    +    id,
    +    name,
    +    email,
    +    phone,
    +    status,
    +    created_at
    +)
    +SELECT
    +    n AS id,
    +    CONCAT('user_', LPAD(n, 5, '0')) AS name,
    +    CONCAT('user_', LPAD(n, 5, '0'), '@example.com') AS email,
    +    CONCAT('+861380', LPAD(n, 6, '0')) AS phone,
    +    CASE MOD(n, 3)
    +        WHEN 0 THEN 'active'
    +        WHEN 1 THEN 'inactive'
    +        ELSE 'pending'
    +        END AS status,
    +    DATE_ADD('2026-01-01 00:00:00', INTERVAL MOD(n, 10000) SECOND) AS created_at
    +FROM consilens_seq;
    +
    +-- =========================================================
    +-- mydb.orders
    +-- =========================================================
    +
    +DROP TABLE IF EXISTS mydb.orders_backup;
    +DROP TABLE IF EXISTS mydb.orders;
    +
    +CREATE TABLE mydb.orders (
    +                             order_id BIGINT NOT NULL PRIMARY KEY,
    +                             customer_id INT NOT NULL,
    +                             amount DECIMAL(18,4) NOT NULL,
    +                             status VARCHAR(20) NOT NULL,
    +                             created_at DATETIME NOT NULL,
    +                             KEY idx_orders_created_at (created_at)
    +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
    +
    +INSERT INTO mydb.orders (
    +    order_id,
    +    customer_id,
    +    amount,
    +    status,
    +    created_at
    +)
    +SELECT
    +    n AS order_id,
    +    100000 + MOD(n, 500) AS customer_id,
    +    CAST(ROUND(20 + MOD(n, 10000) / 20, 4) AS DECIMAL(18,4)) AS amount,
    +    CASE MOD(n, 4)
    +        WHEN 0 THEN 'paid'
    +        WHEN 1 THEN 'created'
    +        WHEN 2 THEN 'shipped'
    +        ELSE 'closed'
    +        END AS status,
    +    DATE_ADD('2025-01-01 00:00:00', INTERVAL MOD(n, 365) DAY) AS created_at
    +FROM consilens_seq;
    +
    +CREATE TABLE mydb.orders_backup LIKE mydb.orders;
    +
    +INSERT INTO mydb.orders_backup
    +SELECT * FROM mydb.orders;
    +
    +-- =========================================================
    +-- production.fact_orders
    +-- =========================================================
    +
    +DROP TABLE IF EXISTS production.fact_orders;
    +
    +CREATE TABLE production.fact_orders (
    +                                        order_id BIGINT NOT NULL PRIMARY KEY,
    +                                        customer_id INT NOT NULL,
    +                                        product_id INT NOT NULL,
    +                                        quantity INT NOT NULL,
    +                                        unit_price DECIMAL(18,4) NOT NULL,
    +                                        total_amount DECIMAL(18,4) NOT NULL,
    +                                        order_date DATE NOT NULL,
    +                                        status VARCHAR(20) NOT NULL,
    +                                        created_at DATETIME NOT NULL,
    +                                        updated_at DATETIME NOT NULL,
    +                                        KEY idx_fact_orders_order_date (order_date)
    +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
    +
    +INSERT INTO production.fact_orders (
    +    order_id,
    +    customer_id,
    +    product_id,
    +    quantity,
    +    unit_price,
    +    total_amount,
    +    order_date,
    +    status,
    +    created_at,
    +    updated_at
    +)
    +SELECT
    +    order_id,
    +    customer_id,
    +    product_id,
    +    quantity,
    +    unit_price,
    +    CAST(quantity * unit_price AS DECIMAL(18,4)) AS total_amount,
    +    order_date,
    +    status,
    +    created_at,
    +    updated_at
    +FROM (
    +         SELECT
    +             n AS order_id,
    +             100000 + MOD(n, 500) AS customer_id,
    +             200000 + MOD(n, 1000) AS product_id,
    +             1 + MOD(n, 10) AS quantity,
    +             CAST(ROUND(5 + MOD(n, 2000) / 10, 4) AS DECIMAL(18,4)) AS unit_price,
    +             DATE_ADD('2026-05-01', INTERVAL MOD(n, 30) DAY) AS order_date,
    +             CASE MOD(n, 4)
    +                 WHEN 0 THEN 'paid'
    +                 WHEN 1 THEN 'created'
    +                 WHEN 2 THEN 'shipped'
    +                 ELSE 'closed'
    +                 END AS status,
    +             DATE_ADD('2026-05-01 00:00:00', INTERVAL MOD(n, 10000) SECOND) AS created_at,
    +             DATE_ADD('2026-05-01 00:10:00', INTERVAL MOD(n, 10000) SECOND) AS updated_at
    +         FROM consilens_seq
    +     ) s;
    +
    +-- =========================================================
    +-- validation queries
    +-- =========================================================
    +
    +SELECT 'mysql.consilens_demo.consilens_performance_demo_table' AS check_name,
    +       COUNT(*) AS actual_rows,
    +       10000 AS expected_rows,
    +       MIN(record_id) AS min_record_id,
    +       MAX(record_id) AS max_record_id,
    +       ROUND(SUM(amount), 4) AS amount_sum
    +FROM consilens_demo.consilens_performance_demo_table;
    +
    +SELECT 'mysql.mydb.users' AS check_name,
    +       COUNT(*) AS actual_rows,
    +       10000 AS expected_rows,
    +       MIN(id) AS min_id,
    +       MAX(id) AS max_id
    +FROM mydb.users;
    +
    +SELECT 'mysql.mydb.orders' AS check_name,
    +       COUNT(*) AS actual_rows,
    +       10000 AS expected_rows,
    +       ROUND(SUM(amount), 4) AS amount_sum
    +FROM mydb.orders;
    +
    +SELECT 'mysql.mydb.orders_backup' AS check_name,
    +       COUNT(*) AS actual_rows,
    +       10000 AS expected_rows,
    +       ROUND(SUM(amount), 4) AS amount_sum
    +FROM mydb.orders_backup;
    +
    +SELECT 'mysql.production.fact_orders' AS check_name,
    +       COUNT(*) AS actual_rows,
    +       10000 AS expected_rows,
    +       ROUND(SUM(total_amount), 4) AS total_amount_sum
    +FROM production.fact_orders;
    +
    +-- =========================================================
    +-- cleanup helper tables
    +-- =========================================================
    +
    +DROP TABLE IF EXISTS consilens_digits;
    +DROP TABLE IF EXISTS consilens_seq;
    diff --git a/examples/seed-sql/load-postgresql-consilens-demo-data.sql b/examples/seed-sql/load-postgresql-consilens-demo-data.sql
    new file mode 100644
    index 0000000..1dc0d92
    --- /dev/null
    +++ b/examples/seed-sql/load-postgresql-consilens-demo-data.sql
    @@ -0,0 +1,531 @@
    +-- PostgreSQL seed data for Consilens examples.
    +-- PostgreSQL 12+ compatible
    +-- Target row count: 10000 rows
    +
    +SET TIME ZONE 'Asia/Shanghai';
    +
    +-- =========================================================
    +-- cleanup
    +-- =========================================================
    +
    +DROP TABLE IF EXISTS public.daily_order_summary;
    +DROP TABLE IF EXISTS public.consilens_performance_demo_table;
    +DROP TABLE IF EXISTS public.users;
    +
    +-- =========================================================
    +-- consilens_performance_demo_table
    +-- =========================================================
    +
    +CREATE TABLE public.consilens_performance_demo_table (
    +                                                record_id VARCHAR(16) NOT NULL,
    +
    +                                                col_tinyint SMALLINT,
    +                                                col_smallint SMALLINT,
    +                                                col_mediumint INTEGER,
    +                                                col_int INTEGER,
    +                                                col_bigint BIGINT,
    +                                                col_unsigned_int BIGINT,
    +
    +                                                col_float REAL,
    +                                                col_double DOUBLE PRECISION,
    +
    +                                                col_decimal NUMERIC(18,4),
    +                                                col_numeric NUMERIC(18,4),
    +
    +                                                col_char CHAR(10),
    +                                                col_varchar_50 VARCHAR(50),
    +                                                col_varchar_100 VARCHAR(100),
    +                                                col_varchar_255 VARCHAR(255),
    +
    +                                                col_text TEXT,
    +                                                col_mediumtext TEXT,
    +
    +                                                col_binary BYTEA,
    +                                                col_varbinary BYTEA,
    +                                                col_blob BYTEA,
    +
    +                                                col_date DATE,
    +                                                col_datetime TIMESTAMP,
    +                                                col_timestamp TIMESTAMP,
    +                                                col_time TIME,
    +
    +                                                col_boolean BOOLEAN,
    +                                                col_tinyint_bool SMALLINT,
    +
    +                                                col_enum TEXT,
    +                                                col_set TEXT,
    +
    +                                                col_json JSONB,
    +
    +                                                user_name VARCHAR(64),
    +                                                email VARCHAR(128),
    +                                                phone VARCHAR(32),
    +                                                address VARCHAR(255),
    +                                                city VARCHAR(64),
    +                                                country VARCHAR(64),
    +                                                postal_code VARCHAR(20),
    +
    +                                                amount NUMERIC(18,4),
    +                                                balance NUMERIC(18,4),
    +                                                credit_limit NUMERIC(18,4),
    +
    +                                                status VARCHAR(20),
    +                                                category VARCHAR(30),
    +
    +                                                priority SMALLINT,
    +                                                score DOUBLE PRECISION,
    +
    +                                                created_at TIMESTAMP,
    +                                                updated_at TIMESTAMP,
    +
    +                                                deleted SMALLINT NOT NULL DEFAULT 0,
    +
    +                                                dt DATE NOT NULL
    +);
    +
    +CREATE INDEX idx_consilens_perf_demo_record_id
    +    ON public.consilens_performance_demo_table(record_id);
    +
    +CREATE INDEX idx_consilens_perf_demo_dt
    +    ON public.consilens_performance_demo_table(dt);
    +
    +-- =========================================================
    +-- insert consilens_performance_demo_table
    +-- =========================================================
    +
    +INSERT INTO public.consilens_performance_demo_table (
    +    record_id,
    +    col_tinyint,
    +    col_smallint,
    +    col_mediumint,
    +    col_int,
    +    col_bigint,
    +    col_unsigned_int,
    +    col_float,
    +    col_double,
    +    col_decimal,
    +    col_numeric,
    +    col_char,
    +    col_varchar_50,
    +    col_varchar_100,
    +    col_varchar_255,
    +    col_text,
    +    col_mediumtext,
    +    col_binary,
    +    col_varbinary,
    +    col_blob,
    +    col_date,
    +    col_datetime,
    +    col_timestamp,
    +    col_time,
    +    col_boolean,
    +    col_tinyint_bool,
    +    col_enum,
    +    col_set,
    +    col_json,
    +    user_name,
    +    email,
    +    phone,
    +    address,
    +    city,
    +    country,
    +    postal_code,
    +    amount,
    +    balance,
    +    credit_limit,
    +    status,
    +    category,
    +    priority,
    +    score,
    +    created_at,
    +    updated_at,
    +    deleted,
    +    dt
    +)
    +SELECT
    +    'REC' || LPAD(n::TEXT, 10, '0') AS record_id,
    +
    +    (n % 100)::SMALLINT AS col_tinyint,
    +
    +        (n % 30000)::SMALLINT AS col_smallint,
    +
    +        (n * 3)::INTEGER AS col_mediumint,
    +
    +        (n * 10)::INTEGER AS col_int,
    +
    +        (n::BIGINT * 1000003)
    +            AS col_bigint,
    +
    +    (
    +        CAST(2147483648 AS BIGINT)
    +            + n::BIGINT
    +        ) AS col_unsigned_int,
    +
    +    CAST(
    +            (n % 1000) + 0.125
    +        AS REAL
    +    ) AS col_float,
    +
    +    CAST(
    +            (n % 100000) + 0.25
    +        AS DOUBLE PRECISION
    +    ) AS col_double,
    +
    +    CAST(
    +            ROUND(
    +                    (n % 100000)::NUMERIC / 100 + 0.1234,
    +                    4
    +            )
    +        AS NUMERIC(18,4)
    +    ) AS col_decimal,
    +
    +    CAST(
    +            ROUND(
    +                    (n % 100000)::NUMERIC / 50 + 0.5678,
    +                    4
    +            )
    +        AS NUMERIC(18,4)
    +    ) AS col_numeric,
    +
    +    'C' || LPAD(n::TEXT, 9, '0')
    +                                    AS col_char,
    +
    +    'v50_' || LPAD(n::TEXT, 5, '0')
    +                                    AS col_varchar_50,
    +
    +    'v100_' || LPAD(n::TEXT, 5, '0') || '_stable'
    +                                    AS col_varchar_100,
    +
    +    'v255_' || LPAD(n::TEXT, 5, '0') || '_stable_payload'
    +                                    AS col_varchar_255,
    +
    +    'text-' || LPAD(n::TEXT, 5, '0')
    +                                    AS col_text,
    +
    +    'mediumtext-'
    +        || LPAD(n::TEXT, 5, '0')
    +        || '-'
    +        || REPEAT('x', n % 32)
    +                                    AS col_mediumtext,
    +
    +    DECODE(
    +            RPAD(
    +                    TO_HEX(n::BIGINT),
    +                    16,
    +                    '0'
    +            ),
    +            'hex'
    +    ) AS col_binary,
    +
    +    DECODE(
    +            RPAD(
    +                    TO_HEX(n::BIGINT * 17),
    +                    32,
    +                    '0'
    +            ),
    +            'hex'
    +    ) AS col_varbinary,
    +
    +    DECODE(
    +            RPAD(
    +                    TO_HEX(n::BIGINT * 31),
    +                    32,
    +                    '0'
    +            ),
    +            'hex'
    +    ) AS col_blob,
    +
    +    DATE '2026-05-01'
    +        + (n % 7)
    +                                    AS col_date,
    +
    +    TIMESTAMP '2026-05-01 00:00:00'
    +        + ((n % 10000) * INTERVAL '1 second')
    +        AS col_datetime,
    +
    +    TIMESTAMP '2026-05-01 00:00:00'
    +    + ((n % 10000) * INTERVAL '1 second')
    +    AS col_timestamp,
    +
    +    TIME '00:00:00'
    +    + ((n % 86400) * INTERVAL '1 second')
    +    AS col_time,
    +
    +    (n % 2 = 0)
    +    AS col_boolean,
    +
    +    (n % 2)::SMALLINT
    +    AS col_tinyint_bool,
    +
    +    CASE n % 4
    +    WHEN 0 THEN 'new'
    +    WHEN 1 THEN 'processing'
    +    WHEN 2 THEN 'done'
    +    ELSE 'failed'
    +END AS col_enum,
    +
    +  CASE n % 3
    +    WHEN 0 THEN 'a,b'
    +    WHEN 1 THEN 'b'
    +    ELSE 'c'
    +END AS col_set,
    +
    +  TO_JSONB(
    +    ('json_' || LPAD(n::TEXT, 5, '0'))::TEXT
    +  ) AS col_json,
    +
    +  'user_' || LPAD(n::TEXT, 5, '0')
    +    AS user_name,
    +
    +  'user_' || LPAD(n::TEXT, 5, '0') || '@example.com'
    +    AS email,
    +
    +  '+861380' || LPAD(n::TEXT, 6, '0')
    +    AS phone,
    +
    +  'No.' || n || ' Consilens Road'
    +    AS address,
    +
    +  CASE n % 4
    +    WHEN 0 THEN 'Shanghai'
    +    WHEN 1 THEN 'Beijing'
    +    WHEN 2 THEN 'Shenzhen'
    +    ELSE 'Hangzhou'
    +END AS city,
    +
    +  'CN'
    +    AS country,
    +
    +  LPAD(
    +    (n % 1000000)::TEXT,
    +    6,
    +    '0'
    +  ) AS postal_code,
    +
    +  CAST(
    +    ROUND(
    +      10 + (n % 5000)::NUMERIC / 10,
    +      4
    +    )
    +    AS NUMERIC(18,4)
    +  ) AS amount,
    +
    +  CAST(
    +    ROUND(
    +      1000 + (n % 8000)::NUMERIC / 10,
    +      4
    +    )
    +    AS NUMERIC(18,4)
    +  ) AS balance,
    +
    +  CAST(
    +    ROUND(
    +      5000 + (n % 3000)::NUMERIC / 10,
    +      4
    +    )
    +    AS NUMERIC(18,4)
    +  ) AS credit_limit,
    +
    +  CASE n % 4
    +    WHEN 0 THEN 'active'
    +    WHEN 1 THEN 'inactive'
    +    WHEN 2 THEN 'pending'
    +    ELSE 'blocked'
    +END AS status,
    +
    +  CASE n % 5
    +    WHEN 0 THEN 'retail'
    +    WHEN 1 THEN 'finance'
    +    WHEN 2 THEN 'logistics'
    +    WHEN 3 THEN 'manufacturing'
    +    ELSE 'public'
    +END AS category,
    +
    +  (n % 5 + 1)::SMALLINT
    +    AS priority,
    +
    +  CAST(
    +    (n % 100) + 0.5
    +    AS DOUBLE PRECISION
    +  ) AS score,
    +
    +  TIMESTAMP '2026-05-01 00:00:00'
    +    + ((n % 10000) * INTERVAL '1 second')
    +    AS created_at,
    +
    +  TIMESTAMP '2026-05-01 00:05:00'
    +    + ((n % 10000) * INTERVAL '1 second')
    +    AS updated_at,
    +
    +  CASE
    +    WHEN n % 10 = 0 THEN 1
    +    ELSE 0
    +END AS deleted,
    +
    +  DATE '2026-05-01'
    +    AS dt
    +
    +FROM GENERATE_SERIES(1, 10000) AS seq(n);
    +
    +-- =========================================================
    +-- daily_order_summary
    +-- =========================================================
    +
    +CREATE TABLE public.daily_order_summary (
    +                                            biz_date DATE NOT NULL,
    +                                            status VARCHAR(20) NOT NULL,
    +                                            order_count BIGINT NOT NULL,
    +                                            total_amount NUMERIC(38,4) NOT NULL,
    +                                            updated_at TIMESTAMP,
    +                                            PRIMARY KEY (biz_date, status)
    +);
    +
    +INSERT INTO public.daily_order_summary (
    +    biz_date,
    +    status,
    +    order_count,
    +    total_amount,
    +    updated_at
    +)
    +SELECT
    +    created_at::DATE
    +    AS biz_date,
    +
    +        status,
    +
    +    COUNT(*)
    +        AS order_count,
    +
    +    CAST(
    +            SUM(amount)
    +        AS NUMERIC(38,4)
    +    ) AS total_amount,
    +
    +    MAX(updated_at)
    +        AS updated_at
    +
    +FROM public.consilens_performance_demo_table
    +
    +WHERE deleted = 0
    +
    +GROUP BY
    +    created_at::DATE,
    +  status;
    +
    +-- =========================================================
    +-- users
    +-- =========================================================
    +
    +CREATE TABLE public.users (
    +                              id INTEGER NOT NULL PRIMARY KEY,
    +                              name VARCHAR(100),
    +                              email VARCHAR(128) NOT NULL,
    +                              phone VARCHAR(32),
    +                              status VARCHAR(20),
    +                              created_at TIMESTAMP
    +);
    +
    +CREATE INDEX idx_users_email
    +    ON public.users(email);
    +
    +CREATE INDEX idx_users_created_at
    +    ON public.users(created_at);
    +
    +INSERT INTO public.users (
    +    id,
    +    name,
    +    email,
    +    phone,
    +    status,
    +    created_at
    +)
    +SELECT
    +    n AS id,
    +
    +    'user_' || LPAD(n::TEXT, 5, '0')
    +      AS name,
    +
    +    'user_' || LPAD(n::TEXT, 5, '0') || '@example.com'
    +      AS email,
    +
    +    '+861380' || LPAD(n::TEXT, 6, '0')
    +      AS phone,
    +
    +    CASE n % 3
    +    WHEN 0 THEN 'active'
    +    WHEN 1 THEN 'inactive'
    +    ELSE 'pending'
    +END AS status,
    +
    +  TIMESTAMP '2026-01-01 00:00:00'
    +    + ((n % 10000) * INTERVAL '1 second')
    +    AS created_at
    +
    +FROM GENERATE_SERIES(1, 10000) AS seq(n);
    +
    +-- =========================================================
    +-- analyze
    +-- =========================================================
    +
    +ANALYZE public.consilens_performance_demo_table;
    +ANALYZE public.daily_order_summary;
    +ANALYZE public.users;
    +
    +-- =========================================================
    +-- validation queries
    +-- =========================================================
    +
    +SELECT
    +    'postgres.public.consilens_performance_demo_table'
    +        AS check_name,
    +
    +    COUNT(*)
    +        AS actual_rows,
    +
    +    10000
    +        AS expected_rows,
    +
    +    MIN(record_id)
    +        AS min_record_id,
    +
    +    MAX(record_id)
    +        AS max_record_id,
    +
    +    ROUND(
    +            SUM(amount)::NUMERIC,
    +            4
    +    ) AS amount_sum
    +
    +FROM public.consilens_performance_demo_table;
    +
    +SELECT
    +    'postgres.public.daily_order_summary'
    +        AS check_name,
    +
    +    COUNT(*)
    +        AS actual_groups,
    +
    +    ROUND(
    +            SUM(total_amount)::NUMERIC,
    +            4
    +    ) AS total_amount_sum
    +
    +FROM public.daily_order_summary;
    +
    +SELECT
    +    'postgres.public.users'
    +        AS check_name,
    +
    +    COUNT(*)
    +        AS actual_rows,
    +
    +    10000
    +        AS expected_rows,
    +
    +    MIN(id)
    +        AS min_id,
    +
    +    MAX(id)
    +        AS max_id
    +
    +FROM public.users;
    diff --git a/examples/seed-sql/load-starrocks-consilens-demo-data.sql b/examples/seed-sql/load-starrocks-consilens-demo-data.sql
    new file mode 100644
    index 0000000..732b7c7
    --- /dev/null
    +++ b/examples/seed-sql/load-starrocks-consilens-demo-data.sql
    @@ -0,0 +1,303 @@
    +-- StarRocks seed data for Consilens examples.
    +-- Target row count: 10000 rows in each StarRocks target table used by the examples.
    +-- Run with: mysql -h localhost -P 9030 -u "$STARROCKS_USER" -p < examples/seed-sql/load-starrocks-consilens-demo-data.sql
    +
    +CREATE DATABASE IF NOT EXISTS consilens_demo;
    +CREATE DATABASE IF NOT EXISTS analytics;
    +
    +USE consilens_demo;
    +
    +DROP TABLE IF EXISTS consilens_seq;
    +DROP TABLE IF EXISTS consilens_digits;
    +
    +CREATE TABLE consilens_digits (
    +  d INT NOT NULL
    +)
    +ENGINE=OLAP
    +DUPLICATE KEY(d)
    +DISTRIBUTED BY HASH(d) BUCKETS 1
    +PROPERTIES ("replication_num" = "1");
    +
    +INSERT INTO consilens_digits VALUES
    +  (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);
    +
    +CREATE TABLE consilens_seq (
    +  n INT NOT NULL
    +)
    +ENGINE=OLAP
    +DUPLICATE KEY(n)
    +DISTRIBUTED BY HASH(n) BUCKETS 10
    +PROPERTIES ("replication_num" = "1");
    +
    +INSERT INTO consilens_seq
    +SELECT ones.d + tens.d * 10 + hundreds.d * 100 + thousands.d * 1000 + 1 AS n
    +FROM consilens_digits ones
    +CROSS JOIN consilens_digits tens
    +CROSS JOIN consilens_digits hundreds
    +CROSS JOIN consilens_digits thousands
    +WHERE ones.d + tens.d * 10 + hundreds.d * 100 + thousands.d * 1000 < 10000;
    +
    +DROP TABLE IF EXISTS consilens_performance_demo_table;
    +
    +CREATE TABLE consilens_performance_demo_table (
    +  record_id VARCHAR(16) NOT NULL,
    +  col_tinyint TINYINT,
    +  col_smallint SMALLINT,
    +  col_mediumint INT,
    +  col_int INT,
    +  col_bigint BIGINT,
    +  col_unsigned_int BIGINT,
    +  col_float FLOAT,
    +  col_double DOUBLE,
    +  col_decimal DECIMAL(18,4),
    +  col_numeric DECIMAL(18,4),
    +  col_char CHAR(10),
    +  col_varchar_50 VARCHAR(50),
    +  col_varchar_100 VARCHAR(100),
    +  col_varchar_255 VARCHAR(255),
    +  col_text STRING,
    +  col_mediumtext STRING,
    +  col_binary STRING,
    +  col_varbinary STRING,
    +  col_blob STRING,
    +  col_date DATE,
    +  col_datetime DATETIME,
    +  col_timestamp DATETIME,
    +  col_time VARCHAR(8),
    +  col_boolean BOOLEAN,
    +  col_tinyint_bool TINYINT,
    +  col_enum VARCHAR(20),
    +  col_set VARCHAR(20),
    +  col_json JSON,
    +  user_name VARCHAR(64),
    +  email VARCHAR(128),
    +  phone VARCHAR(32),
    +  address VARCHAR(255),
    +  city VARCHAR(64),
    +  country VARCHAR(64),
    +  postal_code VARCHAR(20),
    +  amount DECIMAL(18,4),
    +  balance DECIMAL(18,4),
    +  credit_limit DECIMAL(18,4),
    +  status VARCHAR(20),
    +  category VARCHAR(30),
    +  priority SMALLINT,
    +  score DOUBLE,
    +  created_at DATETIME,
    +  updated_at DATETIME,
    +  deleted TINYINT NOT NULL,
    +  dt DATE NOT NULL
    +)
    +ENGINE=OLAP
    +DUPLICATE KEY(record_id)
    +PARTITION BY RANGE(dt) (
    +  PARTITION p20260501 VALUES [('2026-05-01'), ('2026-05-02')),
    +  PARTITION pmax VALUES [('2026-05-02'), ('2030-01-01'))
    +)
    +DISTRIBUTED BY HASH(record_id) BUCKETS 10
    +PROPERTIES ("replication_num" = "1");
    +
    +INSERT INTO consilens_performance_demo_table (
    +  record_id,
    +  col_tinyint,
    +  col_smallint,
    +  col_mediumint,
    +  col_int,
    +  col_bigint,
    +  col_unsigned_int,
    +  col_float,
    +  col_double,
    +  col_decimal,
    +  col_numeric,
    +  col_char,
    +  col_varchar_50,
    +  col_varchar_100,
    +  col_varchar_255,
    +  col_text,
    +  col_mediumtext,
    +  col_binary,
    +  col_varbinary,
    +  col_blob,
    +  col_date,
    +  col_datetime,
    +  col_timestamp,
    +  col_time,
    +  col_boolean,
    +  col_tinyint_bool,
    +  col_enum,
    +  col_set,
    +  col_json,
    +  user_name,
    +  email,
    +  phone,
    +  address,
    +  city,
    +  country,
    +  postal_code,
    +  amount,
    +  balance,
    +  credit_limit,
    +  status,
    +  category,
    +  priority,
    +  score,
    +  created_at,
    +  updated_at,
    +  deleted,
    +  dt
    +)
    +SELECT
    +  CONCAT('REC', LPAD(CAST(n AS STRING), 10, '0')) AS record_id,
    +  MOD(n, 100) AS col_tinyint,
    +  MOD(n, 30000) AS col_smallint,
    +  n * 3 AS col_mediumint,
    +  n * 10 AS col_int,
    +  n * 1000003 AS col_bigint,
    +  2147483648 + n AS col_unsigned_int,
    +  CAST(MOD(n, 1000) + 0.125 AS FLOAT) AS col_float,
    +  CAST(MOD(n, 100000) + 0.25 AS DOUBLE) AS col_double,
    +  CAST(ROUND(MOD(n, 100000) / 100 + 0.1234, 4) AS DECIMAL(18,4)) AS col_decimal,
    +  CAST(ROUND(MOD(n, 100000) / 50 + 0.5678, 4) AS DECIMAL(18,4)) AS col_numeric,
    +  CONCAT('C', LPAD(CAST(n AS STRING), 9, '0')) AS col_char,
    +  CONCAT('v50_', LPAD(CAST(n AS STRING), 5, '0')) AS col_varchar_50,
    +  CONCAT('v100_', LPAD(CAST(n AS STRING), 5, '0'), '_stable') AS col_varchar_100,
    +  CONCAT('v255_', LPAD(CAST(n AS STRING), 5, '0'), '_stable_payload') AS col_varchar_255,
    +  CONCAT('text-', LPAD(CAST(n AS STRING), 5, '0')) AS col_text,
    +  CONCAT('mediumtext-', LPAD(CAST(n AS STRING), 5, '0'), '-', REPEAT('x', MOD(n, 32))) AS col_mediumtext,
    +  LPAD(HEX(n), 16, '0') AS col_binary,
    +  LPAD(HEX(n * 17), 32, '0') AS col_varbinary,
    +  LPAD(HEX(n * 31), 32, '0') AS col_blob,
    +  CAST('2026-05-01' AS DATE) AS col_date,
    +  CAST('2026-05-01 00:00:00' AS DATETIME) AS col_datetime,
    +  CAST('2026-05-01 00:00:00' AS DATETIME) AS col_timestamp,
    +  '00:00:00' AS col_time,
    +  MOD(n, 2) = 0 AS col_boolean,
    +  MOD(n, 2) AS col_tinyint_bool,
    +  CASE MOD(n, 4)
    +    WHEN 0 THEN 'new'
    +    WHEN 1 THEN 'processing'
    +    WHEN 2 THEN 'done'
    +    ELSE 'failed'
    +  END AS col_enum,
    +  CASE MOD(n, 3)
    +    WHEN 0 THEN 'a,b'
    +    WHEN 1 THEN 'b'
    +    ELSE 'c'
    +  END AS col_set,
    +  PARSE_JSON(CONCAT('"json_', LPAD(CAST(n AS STRING), 5, '0'), '"')) AS col_json,
    +  CONCAT('user_', LPAD(CAST(n AS STRING), 5, '0')) AS user_name,
    +  CONCAT('user_', LPAD(CAST(n AS STRING), 5, '0'), '@example.com') AS email,
    +  CONCAT('+861380', LPAD(CAST(n AS STRING), 6, '0')) AS phone,
    +  CONCAT('No.', CAST(n AS STRING), ' Consilens Road') AS address,
    +  CASE MOD(n, 4)
    +    WHEN 0 THEN 'Shanghai'
    +    WHEN 1 THEN 'Beijing'
    +    WHEN 2 THEN 'Shenzhen'
    +    ELSE 'Hangzhou'
    +  END AS city,
    +  'CN' AS country,
    +  LPAD(CAST(MOD(n, 1000000) AS STRING), 6, '0') AS postal_code,
    +  CAST(ROUND(10 + MOD(n, 5000) / 10, 4) AS DECIMAL(18,4)) AS amount,
    +  CAST(ROUND(1000 + MOD(n, 8000) / 10, 4) AS DECIMAL(18,4)) AS balance,
    +  CAST(ROUND(5000 + MOD(n, 3000) / 10, 4) AS DECIMAL(18,4)) AS credit_limit,
    +  CASE MOD(n, 4)
    +    WHEN 0 THEN 'active'
    +    WHEN 1 THEN 'inactive'
    +    WHEN 2 THEN 'pending'
    +    ELSE 'blocked'
    +  END AS status,
    +  CASE MOD(n, 5)
    +    WHEN 0 THEN 'retail'
    +    WHEN 1 THEN 'finance'
    +    WHEN 2 THEN 'logistics'
    +    WHEN 3 THEN 'manufacturing'
    +    ELSE 'public'
    +  END AS category,
    +  MOD(n, 5) + 1 AS priority,
    +  CAST(MOD(n, 100) + 0.5 AS DOUBLE) AS score,
    +  CAST('2026-05-01 00:00:00' AS DATETIME) AS created_at,
    +  CAST('2026-05-01 00:05:00' AS DATETIME) AS updated_at,
    +  IF(MOD(n, 10) = 0, 1, 0) AS deleted,
    +  CAST('2026-05-01' AS DATE) AS dt
    +FROM consilens_seq;
    +
    +USE analytics;
    +
    +DROP TABLE IF EXISTS fact_orders;
    +
    +CREATE TABLE fact_orders (
    +  order_id BIGINT NOT NULL,
    +  customer_id INT NOT NULL,
    +  product_id INT NOT NULL,
    +  quantity INT NOT NULL,
    +  unit_price DECIMAL(18,4) NOT NULL,
    +  total_amount DECIMAL(18,4) NOT NULL,
    +  order_date DATE NOT NULL,
    +  status VARCHAR(20) NOT NULL,
    +  created_at DATETIME NOT NULL,
    +  updated_at DATETIME NOT NULL
    +)
    +ENGINE=OLAP
    +DUPLICATE KEY(order_id)
    +PARTITION BY RANGE(order_date) (
    +  PARTITION p202605 VALUES [('2026-05-01'), ('2026-06-01')),
    +  PARTITION pmax VALUES [('2026-06-01'), ('2030-01-01'))
    +)
    +DISTRIBUTED BY HASH(order_id) BUCKETS 10
    +PROPERTIES ("replication_num" = "1");
    +
    +INSERT INTO fact_orders (
    +  order_id,
    +  customer_id,
    +  product_id,
    +  quantity,
    +  unit_price,
    +  total_amount,
    +  order_date,
    +  status,
    +  created_at,
    +  updated_at
    +)
    +SELECT
    +  order_id,
    +  customer_id,
    +  product_id,
    +  quantity,
    +  unit_price,
    +  CAST(quantity * unit_price AS DECIMAL(18,4)) AS total_amount,
    +  order_date,
    +  status,
    +  created_at,
    +  updated_at
    +FROM (
    +  SELECT
    +    n AS order_id,
    +    100000 + MOD(n, 500) AS customer_id,
    +    200000 + MOD(n, 1000) AS product_id,
    +    1 + MOD(n, 10) AS quantity,
    +    CAST(ROUND(5 + MOD(n, 2000) / 10, 4) AS DECIMAL(18,4)) AS unit_price,
    +    DATE_ADD(CAST('2026-05-01' AS DATE), INTERVAL MOD(n, 30) DAY) AS order_date,
    +    CASE MOD(n, 4)
    +      WHEN 0 THEN 'paid'
    +      WHEN 1 THEN 'created'
    +      WHEN 2 THEN 'shipped'
    +      ELSE 'closed'
    +    END AS status,
    +    DATE_ADD(CAST('2026-05-01 00:00:00' AS DATETIME), INTERVAL MOD(n, 10000) SECOND) AS created_at,
    +    DATE_ADD(CAST('2026-05-01 00:10:00' AS DATETIME), INTERVAL MOD(n, 10000) SECOND) AS updated_at
    +  FROM consilens_demo.consilens_seq
    +) s;
    +
    +SELECT 'starrocks.consilens_demo.consilens_performance_demo_table' AS check_name,
    +       COUNT(*) AS actual_rows,
    +       10000 AS expected_rows,
    +       MIN(record_id) AS min_record_id,
    +       MAX(record_id) AS max_record_id,
    +       ROUND(SUM(amount), 4) AS amount_sum
    +FROM consilens_demo.consilens_performance_demo_table;
    +
    +SELECT 'starrocks.analytics.fact_orders' AS check_name,
    +       COUNT(*) AS actual_rows,
    +       10000 AS expected_rows,
    +       ROUND(SUM(total_amount), 4) AS total_amount_sum
    +FROM analytics.fact_orders;
    diff --git a/pom.xml b/pom.xml
    index f61ee13..3916817 100644
    --- a/pom.xml
    +++ b/pom.xml
    @@ -278,6 +278,12 @@
                     ${okhttp.version}
                 
     
    +            
    +                com.squareup.okhttp3
    +                mockwebserver
    +                ${okhttp.version}
    +            
    +
                 
                     com.consilens
                     consilens-sink-api
    diff --git "a/website/1\357\274\211\346\225\260\346\215\256\346\257\224\345\257\271\347\232\204\345\267\245\347\250\213\345\256\236\350\267\265\357\274\232\344\273\216\350\204\232\346\234\254\345\210\260 Consilens.md" "b/website/1\357\274\211\346\225\260\346\215\256\346\257\224\345\257\271\347\232\204\345\267\245\347\250\213\345\256\236\350\267\265\357\274\232\344\273\216\350\204\232\346\234\254\345\210\260 Consilens.md"
    index d1ed60a..4bc0a6c 100644
    --- "a/website/1\357\274\211\346\225\260\346\215\256\346\257\224\345\257\271\347\232\204\345\267\245\347\250\213\345\256\236\350\267\265\357\274\232\344\273\216\350\204\232\346\234\254\345\210\260 Consilens.md"	
    +++ "b/website/1\357\274\211\346\225\260\346\215\256\346\257\224\345\257\271\347\232\204\345\267\245\347\250\213\345\256\236\350\267\265\357\274\232\344\273\216\350\204\232\346\234\254\345\210\260 Consilens.md"	
    @@ -3,7 +3,7 @@
     >
     >
     >Github:
    ->https://github.com/NoeticLens/consilens
    +>https://github.com/datavane/consilens
     >欢迎关注、Star、Fork,参与贡献
     
     ## Consilens 是什么
    @@ -39,6 +39,17 @@ Consilens 默认推荐走 `checksum` 路径,而不是一开始就拉全量明
     6. 范围足够小时,默认拉取完整行做精确比对;宽表且差异稀疏时,也可以显式启用 `row-hash` 过滤;
     7. 最后把结果输出到配置的 sink。
     
    +```mermaid
    +flowchart LR
    +    A["读取配置"] --> B["连接 source / target"]
    +    B --> C["获取边界与统计信息"]
    +    C --> D["按主键区间分段 checksum"]
    +    D -->|一致| E["直接跳过"]
    +    D -->|不一致| F["继续递归收敛"]
    +    F --> G["小范围精确 diff"]
    +    G --> H["输出到 sink"]
    +```
    +
     这条路径的好处是:大多数一致数据可以尽快排除,真正昂贵的明细比较只发生在小范围异常区间里。
     
     ## 核心特性
    diff --git "a/website/2\357\274\211Consilens \346\212\200\346\234\257\346\236\266\346\236\204\344\270\216\345\256\236\347\216\260\345\216\237\347\220\206\357\274\232\344\273\216\345\210\206\346\256\265\346\240\241\351\252\214\345\210\260\346\217\222\344\273\266\346\211\251\345\261\225.md" "b/website/2\357\274\211Consilens \346\212\200\346\234\257\346\236\266\346\236\204\344\270\216\345\256\236\347\216\260\345\216\237\347\220\206\357\274\232\344\273\216\345\210\206\346\256\265\346\240\241\351\252\214\345\210\260\346\217\222\344\273\266\346\211\251\345\261\225.md"
    index 271d934..2f3e87e 100644
    --- "a/website/2\357\274\211Consilens \346\212\200\346\234\257\346\236\266\346\236\204\344\270\216\345\256\236\347\216\260\345\216\237\347\220\206\357\274\232\344\273\216\345\210\206\346\256\265\346\240\241\351\252\214\345\210\260\346\217\222\344\273\266\346\211\251\345\261\225.md"	
    +++ "b/website/2\357\274\211Consilens \346\212\200\346\234\257\346\236\266\346\236\204\344\270\216\345\256\236\347\216\260\345\216\237\347\220\206\357\274\232\344\273\216\345\210\206\346\256\265\346\240\241\351\252\214\345\210\260\346\217\222\344\273\266\346\211\251\345\261\225.md"	
    @@ -1,301 +1,404 @@
    -# Consilens 技术架构与实现原理:从分段校验到插件扩展
    +>导 读: 
    +      本文从架构设计视角拆解 Consilens checksum 模式:面对跨库、跨机房、大规模数据一致性校验,为什么不能简单全量拉取比较,而要采用“数据库端摘要计算、主键空间递归切片、本地精确比对”的设计。文章重点分析执行计划选择、分段策略、字段标准化、checksum 判断、递归收敛和资源控制等核心技术决策,帮助读者建立对 Consilens 大表校验机制的整体理解。
    +>
    +>
    +>Github:
    +>https://github.com/datavane/consilens
    +>欢迎关注、Star、Fork,参与贡献
     
    -如果把 Consilens 只看成一个“跑一次 diff 的命令行工具”,很容易低估它真正复杂的部分。
    +# 跨源大表一致性校验的工程取舍
     
    -跨数据源一致性校验真正难的,不在 CLI,也不在某个 hash 函数,而在三件事怎么同时成立:
    +在小数据量场景里,一致性校验很简单:
     
    -1. **跨数据源可比**:不同数据库的数据类型、时间函数、布尔表达和 NULL 语义要被收敛成一致表示;
    -2. **大表可跑**:不能靠把整张表拖到客户端来解决问题;
    -3. **结果可消费**:最后输出的不只是“是否一致”,还要能进入文件、控制台或数据库表。
    +```sql
    +SELECT * FROM source_table;
    +SELECT * FROM target_table;
    +```
     
    -Consilens 的架构,基本就是围绕这三件事拆出来的。
    +然后在应用侧逐行比较。
     
    -## 模块怎么拆,决定了后面会不会缠在一起
    +但当数据规模进入千万级、亿级,或者 source / target 分布在不同数据库、不同机房、不同网络链路时,问题就变了。
     
    -当前仓库的核心模块边界大致是这样:
    +此时真正的瓶颈通常不是 diff 算法本身,而是:
     
    -| 模块 | 主要职责 |
    -| --- | --- |
    -| `consilens-cli` | 解析配置、创建连接器和数据集句柄(DatasetHandle)、组装执行链路、选择比对策略 |
    -| `consilens-core` | 比对规划(DefaultComparePlanner)、`ChecksumDiffer` / `JoinDiffer`、分段递归、差异模型、线程池 |
    -| `consilens-connector` | 连接器抽象(ConnectorProvider / DatasetHandle / CapabilitySet)、数据库方言、SQL 生成、类型规范化、元数据访问 |
    -| `consilens-spi` | 运行时插件加载,负责发现并装配数据库方言 |
    -| `consilens-sink` | 输出 SPI、内置 console/json/csv/table sink、生命周期桥接 |
    -| `consilens-dist` | 发行包组装,把 `bin/`、`conf/`、`libs/`、`plugins/` 打成可运行产物 |
    -| `consilens-common` | 跨模块共享的基础模型和工具类 |
    +```text
    +数据传输成本
    +数据库查询压力
    +应用侧内存占用
    +长任务失败恢复
    +跨数据库类型差异
    +```
    +
    +所以 Consilens checksum 模式的核心目标不是“写一个更快的逐行比较器”,而是:
    +
    +> 在不牺牲最终行级差异定位能力的前提下,尽可能减少原始数据搬运。
    +
    +这句话是整个设计的出发点。
    +
    +---
    +
    +## 一、整体策略:先在数据库端做摘要过滤,再对可疑区域精查
    +
    +Consilens checksum 的执行链路可以抽象成四个阶段:
    +
    +1、
    +
    +这不是一个简单的 checksum 校验。
    +
    +它本质上是一个**递归收敛算法**:
    +
    +```text
    +大范围摘要一致  ->  直接跳过
    +大范围摘要不一致 ->  继续切小
    +小范围仍不一致  ->  拉明细做精确 diff
    +```
    +
    +```mermaid
    +flowchart LR
    +    A["大范围 segment"] --> B["count + checksum"]
    +    B -->|一致| C["直接跳过"]
    +    B -->|不一致| D["继续切小"]
    +    D --> E["小范围精确 diff"]
    +```
     
    -这个拆法的价值在于:**算法、方言、输出三条线彼此独立。**
    +这种设计的关键在于:
     
    -换句话说:
    +> checksum 不是最终答案,而是过滤器。
     
    -- 你可以换连接器插件,不碰算法;
    -- 可以加输出格式,不动 `ChecksumDiffer`;
    -- 也可以改算法收敛逻辑,不需要重新设计 CLI 配置结构。
    +最终仍然要落到行级差异,只是不会一开始就把全表搬回来。
     
    -这比把所有能力都塞进一个“大型服务类”里更适合长期演进。
    +---
     
    -## 主执行链路其实很短,但每一跳都要稳
    +## 二、核心设计一:基于能力而不是数据库类型选择执行计划
     
    -从 CLI 到最终结果,主链路并不复杂:
    +Consilens 的执行计划不是简单写死为:
     
     ```text
    -读取配置
    -  ->
    -创建 source / target DatasetHandle
    -  ->
    -DefaultComparePlanner 基于 CapabilitySet 生成比对计划
    -  ->
    -PlanExecutor 执行比对计划
    -  ->
    -执行比较,产出 DiffResult
    -  ->
    -通过 SinkManager 路由到 console / json / csv / table
    +MySQL      -> checksum
    +PostgreSQL -> checksum
    +Oracle     -> local diff
     ```
     
    -这里有两个容易被忽略、但实际上很关键的点。
    +而是由 `DefaultComparePlanner` 根据数据源能力选择策略。
     
    -第一,**比较对象不是直接的 JDBC 连接,而是被封装后的 `DatasetHandle`**。  
    -这样算法层不需要知道连接池怎么建、SQL 怎么拼、schema 怎么处理,它拿到的是一个已经准备好的“可比较表段”。
    +2、
     
    -第二,**输出不是在算法里硬编码写文件**。  
    -算法只产出统一的差异模型和执行树,真正的输出由 sink 子系统接手。
    +这个决策非常重要。
     
    -这两个边界一旦守住,后续的扩展成本会低很多。
    +因为 Consilens 面向的是“跨源”,不是单一数据库。
     
    -## `checksum` 为什么是主路径
    +所以 planner 不应该问:
     
    -因为它最符合大表、跨数据源场景的现实约束。
    +```text
    +你是不是 MySQL?
    +```
    +
    +而应该问:
    +
    +```text
    +你能不能在服务端完成 hash 聚合?
    +你能不能在服务端完成 join?
    +你能不能下推过滤条件?
    +```
    +
    +```mermaid
    +flowchart TD
    +    A["ComparePlanner"] --> B{"能下推 checksum / hash 聚合?"}
    +    B -->|能| C["优先 pushdown checksum"]
    +    B -->|不能| D{"能服务端 join?"}
    +    D -->|能| E["选择 join 计划"]
    +    D -->|不能| F["退回本地 / 流式路径"]
    +```
    +
    +这让整个架构更容易扩展到新的数据源。
    +
    +新增一个连接器时,核心问题不是改 planner,而是让连接器声明自己的能力,并实现对应 SQL 生成逻辑。
     
    -### Phase 1:边界探测
    +---
     
    -真正执行 checksum 之前,Consilens 不会先傻乎乎地对全表做一次哈希聚合。  
    -它先拿两侧表的基础边界:
    +## 三、核心设计二:按主键空间切片,而不是按 offset 分页
    +
    +大表比对最容易想到的是分页:
     
     ```sql
    -SELECT COUNT(*) FROM orders;
    -SELECT MIN(order_id) FROM orders;
    -SELECT MAX(order_id) FROM orders;
    +LIMIT 10000 OFFSET 0;
    +LIMIT 10000 OFFSET 10000;
    +LIMIT 10000 OFFSET 20000;
     ```
     
    -这一步的目的不是“先看看”,而是决定后面的执行策略:
    +但 offset 分页在大表场景里问题很多:
     
    -- 如果总行数已经低于阈值,直接进入本地精确比较;
    -- 如果规模明显偏大,就开始走分段收敛。
    +```text
    +越往后越慢
    +数据变化时不稳定
    +不同数据库执行代价差异大
    +不适合递归收敛
    +```
     
    -这么做的原因很简单:  
    -**全表 checksum 虽然听起来直接,但对超大表来说未必是最便宜的第一步。**
    +Consilens 选择的是基于主键范围的 segment:
     
    -### Phase 2:首轮多分段,而不是一上来纯二分
    +```text
    +[minKey, maxKey)
    +```
     
    -Consilens 的第一轮切分更接近“多路分段”,不是传统意义上的纯二分。
    +例如:
     
    -它会根据 `bisectionFactor` 先把较大的一侧表切成多个范围段,再在另一侧创建镜像范围。  
    -这样做有两个收益:
    +```text
    +[1000, 2000)
    +[2000, 3000)
    +[3000, 4000)
    +```
     
    -1. 第一轮可以并行跑多个分段;
    -2. 大量一致区间可以在第一层就直接退出。
    +这背后有三个技术原因。
     
    -这比“全表一刀两半,然后再继续一刀两半”的纯二分更适合生产数据分布。  
    -因为真实世界里的差异通常不是均匀撒在整张表上,而是集中在少数批次、少数时间窗或少数主键范围里。
    +第一,范围条件更容易利用索引。
     
    -### Phase 3:子段收敛,何时继续多分段,何时切二分
    +```sql
    +WHERE id >= 1000 AND id < 2000
    +```
     
    -每个子段先算自己的 `row_count + checksum`。  
    -之后的决策逻辑大致是:
    +第二,半开区间天然避免重叠和漏数。
     
    -- checksum 一致:直接跳过;
    -- 当前段已经小到低于 `bisectionThreshold`:本地精确比较;
    -- 子段仍然足够大:继续多路切分;
    -- 否则:切换为真正的二分。
    +第三,递归切分时可以继续保持同一套边界语义。
     
    -当前实现里,这两个阈值最关键:
    +3、
     
    -- **进入本地比较**:`totalRows < bisectionThreshold`
    -- **从多分段切到二分**:`maxRows <= bisectionThreshold * bisectionFactor`
    +所以,`TableSegment` 不是一个普通分页对象,而是 checksum 递归算法里的基本执行单元。
     
    -这套规则的本质,是尽量让“大而干净”的范围在前面快速退出,让“真正脏的热区”才走更深的递归。
    +---
     
    -## 为什么 `xor` 能成为默认更合理的选择
    +## 四、核心设计三:checksum 必须建立在标准化之后
     
    -`concat` 和 `xor` 的差别,表面看是两种 checksum 算法,实质上是两种资源消耗模型。
    +跨库一致性校验不能直接对数据库原始值做 hash。
     
    -`concat` 更直观,但通常需要:
    +因为同一个业务值,在不同数据库里的物理或文本表达可能不同。
     
    -- 对行 hash 排序;
    -- 通过 `GROUP_CONCAT` 或 `STRING_AGG` 聚成一个长字符串;
    -- 再做一次整体 hash。
    +例如:
     
    -这意味着排序、内存和临时表压力都比较重。
    +| 类型 | 可能差异 |
    +| --- | --- |
    +| CHAR / VARCHAR | 尾部空格、字符集、排序规则 |
    +| DECIMAL | 精度、scale、尾零 |
    +| TIMESTAMP | 时区、格式、毫秒精度 |
    +| JSON | 字段顺序、存储格式 |
    +| BLOB | 二进制展示形式 |
    +
    +所以 Consilens 在生成 checksum SQL 时,会通过 `DataTypeHandler.normalizeColumn()` 对字段做标准化。
    +
    +整体逻辑是:
    +
    +4、
    +
    +这意味着 Consilens 比较的不是“数据库底层字节”,而是“标准化后的语义值”。
     
    -`xor` 则更像一套面向大表的工程折中:
    +这是 checksum 能跨数据库成立的前提。
     
    -- 先得到行 hash;
    -- 再做 XOR 聚合;
    -- 不依赖行顺序;
    -- 因而不需要 `ORDER BY`。
    +否则 checksum 只会把大量格式差异误判成数据差异。
     
    -这不是说 `xor` 在数学上更“高级”,而是它更符合大表校验的基本目标:  
    -**尽量少做那些只为了结果稳定而引入的重操作。**
    +---
     
    -## `row-hash` 真正优化的是“精确比较阶段”
    +## 五、核心设计四:count + checksum 双条件判断
     
    -很多系统的瓶颈其实不在前面的 checksum,而在最后一步“把小段拉出来逐行比”。
    +一个 segment 是否可以跳过,不只看 checksum。
     
    -Consilens 的 `row-hash` 本地比较模式,思路是先用一层更轻的行指纹做过滤。  
    -它不会一上来就查完整行,而是先查:
    +Consilens 使用的是:
     
     ```text
    -主键 + row_hash
    +source.count == target.count
    +AND
    +source.checksum == target.checksum
     ```
     
    -这里的 `row_hash` 不是随便拼出来的字符串。  
    -当前方言接口里,对 row-hash SQL 的要求非常明确:
    +只有两个条件都满足,才认为该 segment 一致。
     
    -- 列值先做规范化;
    -- 列之间用 ASCII 31 作为分隔符;
    -- `NULL` 用 ASCII 1 作为哨兵值,而不是直接当空串;
    -- 再对整行 canonical representation 计算 MD5。
    +5、
     
    -这几个细节背后各有原因。分隔符用 ASCII 31,是因为普通分隔符(比如 `|`)可能出现在业务数据里,拼接结果就会有歧义;ASCII 31 几乎不会出现在正常文本里,更适合做 canonical row boundary。NULL 用 ASCII 1 做哨兵而不是压成空串,是因为空串和 NULL 在业务语义上通常不是同一件事——压平了会丢信息,校验结果就失准了。至于“先拉指纹再拉完整行”——生产上更常见的情况是一个小段里绝大多数行都没问题,先过滤一遍指纹,只对不一致的主键再查完整行,能明显减少精确阶段的数据传输量。
    +这个判断看起来简单,但很关键。
     
    -所以 `row-hash` 不是用来替代 checksum 的,而是用来优化 checksum 之后那一步“精确定位”。
    +checksum 是摘要,摘要天然存在碰撞概率。
     
    -## 跨数据源真正难的地方,在方言和规范化
    +count 不能消除所有碰撞,但它能提供另一个独立信号,过滤掉大量明显不一致的情况。
     
    -表面上看,跨数据源校验像是在做“同一个 SQL 的多数据库版本”。  
    -实际上复杂得多,因为数据库差异不是一层,而是好几层叠在一起:
    +这是一种很典型的工程设计:
     
    -- checksum SQL 写法不一样;
    -- 时间格式化函数不一样;
    -- 布尔值转字符串的方式不一样;
    -- 数值精度表达不一样;
    -- 元数据查询和 schema 处理也不一样。
    +> 不追求单点绝对可靠,而是组合多个低成本信号,提高整体判断可靠性。
     
    -所以 `DatabaseDialect` 没有设计成一个“万能大类”,而是进一步拆成多个职责明确的组件:
    +---
     
    -- `SqlQueryGenerator`
    -- `MetadataQueryGenerator`
    -- `DataTypeHandler`
    -- `TransactionManager`
    -- `CapabilityProvider`
    -- `ConnectionPoolOptimizer`
    +## 六、核心设计五:大段多路切,小段二分,最终本地精查
     
    -这个设计的核心价值是:**数据库差异被拆散了,而不是被堆在一起。**
    +递归切分不是永远二分,也不是永远按固定 fan-out 切。
     
    -举个最直接的例子。  
    -同样是“把时间列规范成字符串”,MySQL 更可能走 `DATE_FORMAT()`,PostgreSQL 则要走 `TO_CHAR()`;  
    -同样是“把布尔值标准化成 1/0”,两边的表达式也完全不同。
    +Consilens 当前策略可以理解为:
     
    -如果这些逻辑直接写死在算法里,`ChecksumDiffer` 很快就会变成一个到处是数据库特判的怪物类。  
    -把这些差异下沉到方言层,算法才能维持相对稳定。
    +```text
    +大范围:多路切分,提高收敛速度
    +中等范围:继续缩小问题区域
    +小范围:停止递归,进入本地精确比较
    +```
     
    -## 插件机制为什么重要
    +这里的核心不是“切得越细越好”。
     
    -数据源支持不是编译期写死的,Consilens 用的是 JDK 原生 `ServiceLoader`。
    +切分本身也有成本:
     
    -运行时流程大概是:
    +```text
    +更多 SQL 查询
    +更多异步任务
    +更多数据库连接占用
    +更多调度开销
    +```
     
    -1. CLI 根据 `source.type` / `target.type`,或者 JDBC URL,识别连接器类型;
    -2. `DialectFactory` 通过插件运行时加载 `DatabaseDialectProvider`;
    -3. Provider 创建对应的 `DatabaseDialect`;
    -4. `DatasetHandle` 封装连接池和方言,向上暴露统一的数据集访问和能力查询。
    +所以 Consilens 需要在两件事之间做平衡:
     
    -每个插件 jar 里都带着自己的 SPI 注册文件:
    +```text
    +继续切分,减少明细拉取
    +提前本地比较,减少递归开销
    +```
    +
    +这也是 `bisectionThreshold`、`bisectionFactor`、`maxDepth`、`activeSegmentBudget` 这些参数存在的意义。
    +
    +---
    +
    +## 七、核心设计六:终局比较有 full 和 row-hash 两种路径
    +
    +checksum 只能告诉我们某个范围存在差异。
    +
    +但最终用户需要的是:
     
     ```text
    -META-INF/services/com.consilens.connector.api.DatabaseDialectProvider
    +哪一行缺失?
    +哪一行多出?
    +哪一行字段不一致?
    +具体哪几个字段不同?
     ```
     
    -这件事带来的直接好处是:  
    -**新增数据源支持时,不需要改 CLI 主流程,只要补一个连接器模块并打进 `plugins/` 就行。对于非 JDBC 数据源(如 ES、MongoDB、HDFS),实现 `ConnectorProvider` + `DatasetHandle` 并声明对应能力即可。**
    +所以最终必须进入本地精确比较。
     
    -## 为什么输出链路要单独抽出来
    +当前有两种模式。
     
    -算法在意的是“差异是什么”,但工程流程在意的是“差异往哪儿去”。
    +### 1. full 模式
     
    -Consilens 把输出做成单独的 sink 子系统,内置了:
    +直接拉取小 segment 内的完整行:
     
    -- `console`
    -- `json`
    -- `csv`
    -- `table`
    +```text
    +source full rows
    +target full rows
    +local diff
    +```
     
    -而且支持两类数据:
    +优点:
     
    -- `diff-record`
    -- `result`
    +```text
    +实现简单
    +结果直观
    +排障容易
    +```
    +
    +缺点:
    +
    +```text
    +宽表场景下传输量较大
    +低差异率时会拉取很多无差异数据
    +```
     
    -其中一个很值得注意的实现细节是:  
    -`DefaultDiffLifecycle` 被放在 `consilens-sink-api`,专门负责把生命周期事件桥接到 `SinkManager`。这么放不是随手为之,而是为了**避免 `consilens-core` 和 sink 模块之间形成循环依赖**。
    +### 2. row-hash 模式
    +
    +先只拉主键和行级 hash:
    +7、
    +
    +适合:
    +
    +```text
    +宽表
    +差异行很少
    +网络成本敏感
    +```
     
    -这意味着算法层只需要在合适的时机发出事件:
    +这是一个典型的两阶段优化:
     
    -- `onDiffStart`
    -- `onSegmentComplete`
    -- `onDifferencesFound`
    -- `onDiffComplete`
    -- `onDiffError`
    +```text
    +先用轻量行摘要定位可疑 key
    +再只对可疑 key 回查完整数据
    +```
     
    -至于这些事件最终是打印到控制台、写到 JSON 文件,还是落到数据库表,算法并不关心。
    +```mermaid
    +flowchart LR
    +    A["checksum 命中异常 segment"] --> B{"终局比较模式"}
    +    B -->|full| C["直接拉完整行"]
    +    B -->|row-hash| D["先拉主键 + 行 hash"]
    +    D --> E["筛出可疑 key"]
    +    E --> F["回查可疑 key 的完整行"]
    +    C --> G["本地精确 diff"]
    +    F --> G
    +```
     
    -这也是为什么 Consilens 能做到“输出能力丰富,但主流程并不显得很重”。
    +---
     
    -## 并发模型:IO 和 CPU 分开,不让两类工作互相拖累
    +## 八、核心设计七:用并发预算约束递归任务树
     
    -一致性校验天然是混合负载:
    +递归算法如果不加控制,很容易把问题从“数据比较”变成“任务爆炸”。
     
    -- 一部分工作是数据库访问、checksum 查询、数据拉取,明显偏 IO;
    -- 另一部分工作是本地差异计算、结果组装、对象转换,更偏 CPU。
    +例如一个大 segment 被切成 32 个子 segment,每个子 segment 又继续切分,任务数量会迅速扩大。
     
    -如果这两类任务放在一个线程池里,慢查询很容易把本地计算拖住,反过来 CPU 密集任务也可能把数据库查询饿死。
    +所以 Consilens 引入了类似 `activeSegmentBudget` 的约束。
     
    -Consilens 的做法是把 IO 和 CPU 线程池分开配置。  
    -这样至少能保证一件事:数据库端慢,不会直接把应用侧所有执行路径都堵死。
    +8、
     
    -## `infoTree` 为什么有价值
    +这个设计说明 Consilens 并不是单纯追求算法上的最优切分,而是在考虑真实运行环境:
     
    -很多校验工具在结果里只告诉你“有几条差异”。  
    -这对真正排障并不够。
    +```text
    +数据库连接数有限
    +线程池容量有限
    +查询队列有限
    +系统需要可控退化
    +```
     
    -Consilens 的 `DiffResult` 里除了差异明细和统计信息,还有一棵执行树 `infoTree`。  
    -它记录的不只是结果,还包括过程:
    +真正可上线的校验系统,必须能控制自己的资源边界。
     
    -- 当前分段范围;
    -- 两侧行数;
    -- split 类型;
    -- checksum 是否命中;
    -- 是否进入本地比较;
    -- 执行耗时。
    +---
     
    -对于大表校验来说,这棵树非常有用。  
    -因为很多时候你要排查的不只是“数据哪里错了”,还有“为什么这次校验跑得这么慢”。
    +## 九、整体技术决策图
     
    -## 代码里明确写死的边界,也是一种架构选择
    +可以把 Consilens checksum 模式总结成下面这张技术决策图:
     
    -有些限制不是“以后再说”,而是当前实现明确规定的行为:
    +9、
     
    -- `join` 会校验两侧数据集是否来自**同一个 JDBC URL**,否则直接报错;
    -- `strategy.mode=local` 当前会在配置校验阶段被拒绝;
    -- `diff` 服务里对 `LOCAL` 也保留了未实现保护;
    -- `config validate --test-connection` 现在只会打印“not yet implemented”。
    +这张图比单纯讲 checksum 算法更重要。
     
    -这些限制看起来不够完美,但它们至少保证了一件事:  
    -**系统不会把不成立的假设悄悄带进执行阶段。**
    +因为它表达的是整个系统的工程判断:
     
    -对于一个做数据校验的系统来说,清楚地暴露边界,往往比“表面上什么都支持”更可靠。
    +```text
    +什么时候下推?
    +什么时候跳过?
    +什么时候继续切?
    +什么时候停止递归?
    +什么时候拉明细?
    +什么时候控制并发?
    +```
     
    -## 这套架构想守住的,其实只有一件事
    +```mermaid
    +flowchart TD
    +    A["进入 compare 请求"] --> B["按能力选择执行计划"]
    +    B --> C["生成 segment 并下推 checksum"]
    +    C --> D{"count + checksum 是否一致?"}
    +    D -->|是| E["跳过该段"]
    +    D -->|否| F{"是否继续切分?"}
    +    F -->|是| G["递归切分并受并发预算约束"]
    +    F -->|否| H["进入 full / row-hash 精查"]
    +    G --> D
    +    H --> I["输出主键级 / 字段级差异"]
    +```
     
    -从外面看,Consilens 像是在做“跨数据源 diff”。  
    -从里面看,它真正想守住的是一条更重要的原则:
    +这些决策共同构成了 Consilens checksum 模式的设计核心。
     
    -**算法负责收敛问题,方言负责保证可比,输出负责承接结果。**
     
    -只有这三件事不缠在一起,系统才有可能在后续继续长下去:
     
    -- 要扩数据源,不必动算法;
    -- 要改输出,不必改方言;
    -- 要调优 checksum 收敛策略,也不用推倒整个 CLI。
    +## 加入我们
    +Consilens 的目标是成为最好的跨数据源数据一致性比对的开源项目,为更多的用户去解决数据一致性校验中遇到的问题。在此我们真诚欢迎更多的贡献者参与到社区建设中来,和我们一起成长,携手共建更好的社区。
     
    -这也是为什么 Consilens 的技术实现看起来不像一个“大而全”的框架,反而更像一组边界清楚的模块。  
    -做这类基础工具,很多时候克制比堆功能更重要。
    +**项目地址:**
    +https://github.com/datavane/consilens
    +**问题和建议:**
    +https://github.com/datavane/consilens/issues
    +**贡献代码:**
    +https://github.com/datavane/consilens/pull
    diff --git "a/website/\351\205\215\347\275\256\350\256\262\350\247\243/00-\351\205\215\347\275\256\350\256\262\350\247\243\346\200\273\350\247\210.md" "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/00-\351\205\215\347\275\256\350\256\262\350\247\243\346\200\273\350\247\210.md"
    new file mode 100644
    index 0000000..bdf1215
    --- /dev/null
    +++ "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/00-\351\205\215\347\275\256\350\256\262\350\247\243\346\200\273\350\247\210.md"
    @@ -0,0 +1,55 @@
    +# Consilens 配置实战系列
    +
    +很多工具的文档会把参数一项一项列出来。这样的文档当然完整,但对真正要落地的人来说,往往还差一步:**我手上这个数据校验需求,到底应该怎么配?为什么这么配?后面出了问题又该从哪里排查?**
    +
    +这套文章就是为了解决这个问题。
    +
    +我会把 Consilens 的配置能力放回真实的数据一致性场景里来讲:从第一次跑通两张表的核对,到跨库字段对齐、类型标准化、结果落库审计,再到生产环境里的性能调优和场景化模板。你不需要先把所有字段背下来,只要顺着场景往下走,就能逐步形成自己的配置判断力。
    +
    +## 建议阅读顺序
    +
    +```mermaid
    +flowchart TD
    +    A["01 先跑通:建立整体心智模型"] --> B["02 comparison:定义业务一致性"]
    +    B --> C["03 strategy / normalization / concurrency:把任务跑稳"]
    +    C --> D["04 result:让结果进入审计闭环"]
    +    D --> E["05 场景模板:直接按场景落配置"]
    +    E --> F["cheatsheet:速查字段与边界"]
    +```
    +
    +1. **01-先跑通,再理解:从一份配置看懂 Consilens.md**  
    +   先建立完整心智模型:两端数据从哪里来,比什么,怎么比,结果去哪里。
    +
    +2. **02-真正决定准不准的是 comparison.md**  
    +   主键、比较字段、过滤条件、字段映射、排障上下文都在这一层。这里讲清楚了,误报会少很多。
    +
    +3. **03-让任务跑得稳:strategy、normalization 与 concurrency.md**  
    +   讲 checksum 和 join 怎么选,跨库类型差异怎么消噪,大表任务如何逐步调优。
    +
    +4. **04-结果不是终点:result 与审计闭环.md**  
    +   控制台、JSON、CSV、结果表、差异明细表各自适合什么场景,如何让对账结果进入治理流程。
    +
    +5. **05-七个常见场景,直接拿去改.md**  
    +   把最常见的使用方式整理成模板。读完前五篇后,这一篇会变成你的日常配置起点。
    +
    +6. **06-配置速查.md**  
    +   一页速查。适合已经理解整体思路后,快速确认字段和边界。
    +
    +## 这套文章适合谁
    +
    +- 第一次接触 Consilens,希望尽快跑通一个对账任务的人;
    +- 正在做数据迁移、数仓同步、湖仓建设,需要校验两端数据一致性的人;
    +- 已经能写配置,但经常被字段差异、时间精度、布尔值、结果落库、性能问题困扰的人;
    +- 想把 Consilens 接入生产治理、审计、告警链路的人。
    +
    +## 一句话理解 Consilens 配置
    +
    +一份好的 Consilens 配置,本质上是在回答五个问题:
    +
    +1. 我要比较哪两份数据?
    +2. 哪些记录在业务上算同一条?
    +3. 哪些字段真正决定一致性?
    +4. 用什么方式更稳、更快地完成比较?
    +5. 差异结果要给谁看、落到哪里、如何追踪?
    +
    +后面的所有参数,都是围绕这五个问题展开的。
    diff --git "a/website/\351\205\215\347\275\256\350\256\262\350\247\243/01-\345\205\210\350\267\221\351\200\232\357\274\214\345\206\215\347\220\206\350\247\243\357\274\232\344\273\216\344\270\200\344\273\275\351\205\215\347\275\256\347\234\213\346\207\202 Consilens.md" "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/01-\345\205\210\350\267\221\351\200\232\357\274\214\345\206\215\347\220\206\350\247\243\357\274\232\344\273\216\344\270\200\344\273\275\351\205\215\347\275\256\347\234\213\346\207\202 Consilens.md"
    new file mode 100644
    index 0000000..f3c596e
    --- /dev/null
    +++ "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/01-\345\205\210\350\267\221\351\200\232\357\274\214\345\206\215\347\220\206\350\247\243\357\274\232\344\273\216\344\270\200\344\273\275\351\205\215\347\275\256\347\234\213\346\207\202 Consilens.md"	
    @@ -0,0 +1,339 @@
    +# Consilens 配置系列| 基础配置讲解 
    +
    +> 导读:
    +> 本文从“先跑通、再拆解”的视角讲清 Consilens 的基础配置结构:如何从一份最小可用配置出发,理解 `source / target`、`comparison`、`strategy`、`result` 这四个核心配置块分别负责什么,以及怎样把一个真实对账需求翻译成一份能执行、能排障、能继续扩展的配置。
    +>
    +> Github:
    +> https://github.com/datavane/consilens
    +> 欢迎关注、Star、Fork,参与贡献
    +
    +
    +
    +学习一个配置型工具,最怕一上来就掉进字段细节里,Consilens 也是一样。你当然可以从 `source.type`、`comparison.keys`、`strategy.mode` 一个个字段看下去,但这样很容易看完一圈之后仍然不知道:**我到底该怎么把一个真实的对账需求翻译成配置?**
    +
    +更好的方式,是先看一份最小但完整的配置,然后把它拆开。
    +
    +## 一份最常见的表对表核对
    +
    +假设我们要校验 MySQL 里的订单源表和 PostgreSQL 里的订单明细表是否一致,可以先写成这样:
    +
    +```yaml
    +source:
    +  type: mysql
    +  name: order-source
    +  connection:
    +    url: jdbc:mysql://localhost:3306/ods
    +    username: root
    +    password: 123456
    +  resource:
    +    type: table
    +    name: ods_order
    +
    +target:
    +  type: postgresql
    +  name: order-target
    +  connection:
    +    url: jdbc:postgresql://localhost:5432/dwd
    +    username: postgres
    +    password: 123456
    +  resource:
    +    type: table
    +    name: dwd_order
    +
    +comparison:
    +  keys:
    +    source:
    +      - order_id
    +    target:
    +      - order_id
    +  fields:
    +    source:
    +      - buyer_id
    +      - amount
    +      - order_status
    +      - updated_at
    +    target:
    +      - buyer_id
    +      - amount
    +      - order_status
    +      - updated_at
    +
    +strategy:
    +  mode: checksum
    +  algorithm: xor
    +
    +result:
    +  sinks:
    +    - format: console
    +      type: result
    +```
    +
    +这份配置并不复杂,但它已经把 Consilens 的工作方式讲完了。
    +
    +```mermaid
    +flowchart 
    +    A["source / target
    定义两侧数据集"] --> B["comparison
    定义主键和比较口径"] + B --> C["strategy
    决定执行路径"] + C --> D["result
    决定结果去向"] +``` + +- `source` 和 `target`:我要比较哪两份数据; +- `comparison`:怎样把两侧记录对齐,哪些字段需要判断一致; +- `strategy`:用什么方式执行比较; +- `result`:结果输出到哪里。 + +先抓住这四块,后面再看任何高级能力都不会乱。 + +## source / target:确定要比对的数据集 + +很多人第一次写配置时,会把 `source` 和 `target` 理解成“左边数据库”和“右边数据库”。这没错,但还不够准确。 + +在 Consilens 里,`source` 和 `target` 更应该理解成两份**待比较的数据集**。这份数据集可以是一张表,也可以是一段 SQL 组织出来的业务视图。 + +最常见的是表: + +```yaml +source: + type: mysql + connection: + url: jdbc:mysql://localhost:3306/ods?useSSL=false&serverTimezone=UTC + username: root + password: 123456 + resource: + type: table + name: ods_order +``` + +这里真正值得注意的是 `resource`。 + +当 `resource.type: table` 时,`resource.name` 就是表名。连接信息放在 `connection` 里,JDBC URL、用户名、密码是最常见的三件套。除此之外,连接器需要的其他属性也可以继续写在 `connection` 下,例如 PostgreSQL 的 `currentSchema`、应用名等。 + +```yaml +source: + type: postgresql + connection: + url: jdbc:postgresql://localhost:5432/bh + username: postgres + password: 123456 + currentSchema: public + applicationName: consilens +``` + +这些扩展属性会交给连接器处理。也就是说,Consilens 不强行替你抽象掉所有数据库细节,而是给你保留必要的控制权。 + +## 表不合适时,就把 SQL 当成业务视图 + +真实项目里,两边数据很少永远长得一模一样。 + +比如源端叫 `status`,目标端叫 `order_status`;源端要过滤逻辑删除,目标端要过滤归档状态;或者两边都需要先做一次投影再比较。这个时候,不要急着把所有逻辑都塞进 `comparison`,更自然的办法是把数据集直接定义成 SQL。 + +```yaml +source: + type: mysql + connection: + url: jdbc:mysql://localhost:3306/ods + username: root + password: 123456 + resource: + type: sql + path: | + SELECT + order_id, + buyer_id, + amount, + status AS order_status, + updated_at + FROM ods_order + WHERE deleted = 0 + +target: + type: postgresql + connection: + url: jdbc:postgresql://localhost:5432/dwd + username: postgres + password: 123456 + resource: + type: sql + path: | + SELECT + order_id, + buyer_id, + total_amount AS amount, + order_status, + updated_at + FROM dwd_order + WHERE is_deleted = false +``` + +这里的 `resource.path` 放的是 SQL 文本,不是文件路径。当前版本要求它以 `SELECT` 或 `WITH` 开头,不要带分号,也不要写 SQL 注释。 + +我的建议很简单: + +```mermaid +flowchart TD + A["准备定义数据集"] --> B{"只是表直比?"} + B -->|是| C["用 table 资源"] + B -->|否,需要过滤/投影/表达式| D["用 sql 资源"] + C --> E["把业务语义放到 comparison"] + D --> F["先在 SQL 里整理业务口径"] +``` + +- 如果你要比较的是物理表,而且只是轻量字段对齐,用表资源; +- 如果你要比较的是一个业务口径,尤其需要过滤、投影、改名、表达式计算,用 SQL 资源。 + +这个选择会直接影响后面配置是否清爽。 + +## name:它不参与比较,但有助于排错 + +`source.name` 和 `target.name` 不是必填项。很多示例里为了简短也会省略。 + +但只要任务会长期运行,我建议你写上。 + +```yaml +source: + name: order-source + +target: + name: order-target +``` + +它不会改变比较结果,却会出现在日志、任务标识、结果追踪这些地方。生产排障时,一个清晰的名字往往比你想象中更重要。 + +配置不是只给机器读的,也是给后面接手的人读的。 + +## comparison:真正的业务语义从这里开始 + +`source` 和 `target` 解决的是“从哪里取数”,`comparison` 解决的是“怎样才算相同”。 + +最基本的写法是: + +```yaml +comparison: + keys: + source: + - order_id + target: + - order_id + fields: + source: + - col_int + - col_decimal + - amount + - status + - updated_at + target: + - col_int + - col_decimal + - amount + - status + - updated_at +``` + +这里有两个概念要分清。 + +`keys` 是业务主键。Consilens 会先用它判断两边哪两条记录应该放在一起比较。它必须配置,而且两侧数量要一一对应。 + +`fields` 是一致性字段。也就是当两边找到同一条记录以后,哪些字段需要逐个判断是否相同。 + +如果两边结构非常接近,你也可以不写 `fields`: + +```yaml +comparison: + keys: + source: + - id + target: + - id +``` + +这时系统会默认比较所有非主键列。这个写法适合初次全量核对,尤其是同构表迁移之后想快速看差异全貌。 + +## strategy:先求稳,再谈快 + +Consilens 当前最常用的策略是 `checksum`。 + +```yaml +strategy: + mode: checksum + algorithm: xor +``` + +对于跨库、大表、链路复杂的场景,`checksum` 是更稳妥的默认选择。它不是简单把两边数据全部拉回来硬比,而是通过校验和、分段、局部比较等机制逐步缩小差异范围。 + +还有一种策略是 `join`: + +```yaml +strategy: + mode: join + algorithm: concat +``` + +`join` 适合同域、同实例、数据库端能直接完成 Join 的情况。它可能很快,但边界也更明确。只要你不确定两边是否能在数据库侧自然 Join,就不要把它当成默认选项。 + +一句话: + +> 跨库和生产任务,优先 checksum;明确同域可 Join,再考虑 join。 + +## result:不要让差异只停在控制台 + +最初试跑时,控制台输出足够了。 + +```yaml +result: + sinks: + - format: console + type: result +``` + +但到了生产环境,一次对账的价值不只是“跑完了”。你通常还需要回答:差异有多少?哪类差异最多?明细能不能落库?后续能不能接告警、工单、治理看板? + +所以 `result` 支持多个 sink: + +```yaml +result: + failOnSinkError: true + sinks: + - format: console + type: result + + - format: json + type: diff-record + properties: + path: ./output/diff-${taskId}.json + pretty: true + + - format: table + type: diff-record + properties: + type: postgresql + url: jdbc:postgresql://localhost:5432/audit + username: postgres + password: 123456 + tableName: diff_result_detail + createTable: true + batchSize: 1000 +``` + +这就是从“工具试跑”走向“治理闭环”的关键一步。 + +## 先记住这张心智图 + +一份 Consilens 配置,可以按这条线来读: + +```mermaid +flowchart + A["source / target
    从哪里取数"] --> B["comparison
    比什么、怎么对齐"] + B --> C["strategy
    怎么执行"] + C --> D["result
    结果去哪里"] + B -.补充能力.-> E["normalization"] + C -.补充能力.-> F["concurrency"] + A -.读取细节.-> G["readOptions"] +``` + +后面所有高级配置,都是在这条主线上补能力: + +- `normalization`:解决跨库类型语义不一致; +- `concurrency`:解决大任务的并发和吞吐; +- `readOptions`:解决数据读取链路的细节控制。 + +掌握这条主线后,你再回头看字段,就不会觉得它们是一堆散乱参数,而是一套围绕数据一致性展开的工程语言。 diff --git "a/website/\351\205\215\347\275\256\350\256\262\350\247\243/02-\347\234\237\346\255\243\345\206\263\345\256\232\345\207\206\344\270\215\345\207\206\347\232\204\346\230\257 comparison.md" "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/02-\347\234\237\346\255\243\345\206\263\345\256\232\345\207\206\344\270\215\345\207\206\347\232\204\346\230\257 comparison.md" new file mode 100644 index 0000000..7b2a98d --- /dev/null +++ "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/02-\347\234\237\346\255\243\345\206\263\345\256\232\345\207\206\344\270\215\345\207\206\347\232\204\346\230\257 comparison.md" @@ -0,0 +1,317 @@ +# 02|真正决定准不准的是 comparison + +> 导读: +> 本文聚焦 Consilens 里最决定“比得准不准”的 `comparison` 配置,重点解释 `keys`、`fields`、`exclude`、`filters`、`mappings`、`extraColumns` 分别在业务一致性定义里承担什么角色,以及如何把主键口径、字段语义、过滤边界和对齐规则表达清楚,避免把误报带进后续执行链路。 +> +> Github: +> https://github.com/datavane/consilens +> 欢迎关注、Star、Fork,参与贡献 + +很多数据对账任务跑不准,并不是算法不行,而是“比什么”没有讲清楚。 + +两张表看起来像是同一份业务数据,但主键口径可能不同,字段名可能不同,过滤边界可能不同,时间字段和审计字段还可能天然不一致。如果这些问题没有在 `comparison` 里表达清楚,后面的策略再高级,也只是在更快地产生误报。 + +所以这一篇只讲一件事:怎样把业务一致性翻译成 Consilens 能执行的比较规则。 + +## keys:先找到同一条业务记录 + +Consilens 不是把两边数据拿来整行盲比。它会先根据主键找到“同一条业务记录”,然后再判断字段是否一致。 + +因此 `comparison.keys` 是必填项。 + +```yaml +comparison: + keys: + source: + - order_id + target: + - order_id +``` + +如果业务主键是复合主键,也直接写成数组: + +```yaml +comparison: + keys: + source: + - order_id + - item_id + target: + - order_id + - item_id +``` + +这里有一个朴素但非常重要的判断标准: + +> keys 不一定是数据库物理主键,但必须能在业务上稳定定位一条记录。 + +如果 keys 选错了,后面会出现两类典型问题: + +- 本来是同一条记录,却对不起来,变成源缺失或目标缺失; +- 本来是多条业务记录,却被错误合并,导致差异判断失真。 + +所以写 Consilens 配置时,第一件事不是填连接串,而是问清楚:这份数据的业务主键是什么。 + +## fields:只把真正关心的列放进一致性判断 + +`fields` 决定同一条记录对齐之后,哪些字段需要判断一致。 + +```yaml +comparison: + keys: + source: + - order_id + target: + - order_id + fields: + source: + - buyer_id + - amount + - order_status + - pay_time + target: + - buyer_id + - amount + - order_status + - pay_time +``` + +这个配置适合大多数生产场景。因为生产里的表通常有很多辅助字段:创建时间、更新时间、同步批次、写入时间、来源系统、扩展字段。它们对排障有用,但不一定应该决定“业务是否一致”。 + +我的经验是: + +- 财务、订单、库存这类强一致场景,`fields` 要显式写; +- 初次迁移核对、同构表全列扫描,可以先省略 `fields` 看全貌; +- 一旦进入稳定任务,最好把真正关心的字段收敛下来。 + +## 不写 fields:默认比较所有非主键列 + +如果两边结构接近,可以只写主键: + +```yaml +comparison: + keys: + source: + - id + target: + - id +``` + +这时 Consilens 会比较所有非主键列。 + +这不是偷懒,而是一个很实用的阶段性策略。比如你刚做完一批表迁移,还不知道差异主要在哪里,可以先全列比较,把问题面打开。等差异类型看清楚后,再把不该参与一致性判断的列排除出去。 + +## exclude:把噪音列拿掉 + +真实系统里,总会有一些字段不适合比较。最常见的是: + +- `created_at`、`updated_at`; +- `sync_time`、`batch_id`; +- CDC offset、写入版本号; +- 由目标端重新生成的审计字段。 + +这时可以使用 `exclude`: + +```yaml +comparison: + keys: + source: + - id + target: + - id + exclude: + source: + - created_at + - updated_at + - sync_time + target: + - created_at + - updated_at + - sync_time +``` + +`exclude` 在省略 `fields` 时特别有用。你可以先让系统比较所有非主键列,再明确排除那些已知噪音字段。 + +但不要把 `exclude` 当成长期掩盖问题的工具。如果一个字段总是产生差异,要么它确实不属于一致性口径,要么你的同步链路或标准化规则有问题。前者用 `exclude`,后者应该去修配置或链路。 + +## filters:比较一个业务切片 + +不是每一次校验都要扫完整张表。很多对账任务只关心某一天、某个租户、某个业务状态。 + +```yaml +comparison: + keys: + source: + - order_id + target: + - order_id + fields: + source: + - col_int + - col_decimal + - amount + - status + - updated_at + target: + - col_int + - col_decimal + - amount + - status + - updated_at + filters: + source: "dt = '2026-05-05' AND tenant_id = 1001" + target: "dt = '2026-05-05' AND tenant_id = 1001" +``` + +这里最容易犯的错误,是只给一边加过滤条件,或者两边条件看起来相似但业务边界不同。 + +我的建议是把 `filters` 当成“对账合同”的一部分: + +- 要么两边都写; +- 要么两边都不写; +- 写了就要确保它们表达的是同一个业务切片。 + +对账最怕的不是慢,而是左右两边比的根本不是同一批数据。 + +## mappings:字段不一样时,把它们投影成同一套逻辑字段 + +很多跨系统对账最麻烦的地方,不是数据不同,而是表达方式不同。 + +源端叫 `user_id`,目标端叫 `customer_id`;源端金额叫 `amount`,目标端叫 `total_amount`;源端时间要取 `DATE(created_at)`,目标端要取 `DATE(created_time)`。 + +这时可以用 `mappings`。 + +```yaml +comparison: + keys: + source: + - order_id + target: + - id + mappings: + - name: order_id + source: + column: order_id + target: + column: id + key: true + + - name: buyer_id + source: + column: user_id + target: + column: customer_id + + - name: order_amount + source: + column: amount + target: + column: total_amount + + - name: biz_date + source: + expression: "DATE(created_at)" + target: + expression: "DATE(created_time)" + +``` + +你可以把 `mappings` 理解成一层“逻辑字段模型”。 + +- `name` 是逻辑字段名,也是结果里更容易读懂的字段名; +- `column` 表示直接取列; +- `expression` 表示用表达式做轻量投影; +- `literal` 表示给一个常量; +- `key: true` 表示这个逻辑字段承担主键角色; +- `compare: false` 表示这个逻辑字段不会进入比较字段集合,当前版本不要把它当成“自动随差异结果带出上下文”的能力来依赖。 + +这里有两个边界一定要记住。 + +第一,即使用了 `mappings`,`comparison.keys` 仍然要配置。`keys` 解决原始数据如何对齐,`mappings[*].key` 解决映射后的逻辑主键如何命名。 + +第二,`fields` 和 `mappings` 不要同时使用。一个是直接指定比较字段,一个是先投影出逻辑字段再比较。两条路选一条,配置会清楚很多。 + +## SQL 资源和 mappings 怎么选 + +如果你已经用了 `resource.type: sql`,很多字段改名和表达式处理其实可以直接写在 SQL 里。 + +```yaml +resource: + type: sql + path: | + SELECT + order_id, + user_id AS buyer_id, + DATE(created_at) AS biz_date, + amount + FROM ods_order + WHERE deleted = 0 +``` + +这种情况下,再叠一层复杂 `mappings` 反而会让排障变难。 + +我的实践建议是: + +- 表资源 + 字段轻量对齐:用 `mappings`; +- SQL 资源 + 业务视图塑形:优先在 SQL 里完成; +- 复杂清洗、聚合、过滤:不要塞进 `mappings.expression`,应该放到 SQL 资源里。 + +`mappings.expression` 适合表达式,不适合承载一段复杂查询。 + +## extraColumns:把排障上下文带出来 + +有些字段不参与一致性判断,但差异发生时你很想看到它们。 + +比如 `tenant_id`、`updated_at`、`biz_date`、`source_system`。这些字段能帮助你定位问题,但它们未必应该作为比较字段。 + +```yaml +comparison: + keys: + source: + - order_id + target: + - order_id + fields: + source: + - amount + - order_status + target: + - amount + - order_status + extraColumns: + - tenant_id + - updated_at +``` + +`extraColumns` 的定位很明确: + +> 它服务于排障和结果上下文,不服务于一致性判断。 + +真正决定相等性的列,还是应该放在 `fields` 或 `mappings` 里。 + +补一句当前版本的边界:`extraColumns` 适合和 `fields` 搭配使用;如果你已经走 `mappings` 路径,就不要再假设 `extraColumns` 会被自动编译进逻辑比较列。 + +## 一个判断顺序 + +写 `comparison` 时,可以按这个顺序问自己: + +```mermaid +flowchart TD + A["先确认业务主键"] --> B["决定是否显式写 fields"] + B --> C["排除噪音列 exclude"] + C --> D["限定业务切片 filters"] + D --> E{"两边字段口径不同?"} + E -->|是| F["用 mappings 或 SQL 资源做对齐"] + E -->|否| G["保持直接字段比较"] + F --> H["补充 extraColumns / 排障上下文"] + G --> H +``` + +1. 哪些字段能稳定定位一条业务记录?写进 `keys`。 +2. 我是想全列比较,还是只比较核心业务字段?决定是否写 `fields`。 +3. 有没有天然不该比较的噪音列?写进 `exclude`。 +4. 这次只比较某个业务切片吗?写进 `filters`。 +5. 两边字段名或表达式不同吗?考虑 `mappings` 或 SQL 资源。 +6. 差异发生时还需要哪些上下文?放进 `extraColumns`。 + +只要这个顺序清楚,`comparison` 就不会变成参数堆砌,而会变成一份可读的业务一致性说明。 diff --git "a/website/\351\205\215\347\275\256\350\256\262\350\247\243/03-\350\256\251\344\273\273\345\212\241\350\267\221\345\276\227\347\250\263\357\274\232strategy\343\200\201normalization \344\270\216 concurrency.md" "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/03-\350\256\251\344\273\273\345\212\241\350\267\221\345\276\227\347\250\263\357\274\232strategy\343\200\201normalization \344\270\216 concurrency.md" new file mode 100644 index 0000000..d12fea7 --- /dev/null +++ "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/03-\350\256\251\344\273\273\345\212\241\350\267\221\345\276\227\347\250\263\357\274\232strategy\343\200\201normalization \344\270\216 concurrency.md" @@ -0,0 +1,312 @@ +# 03|让任务跑得稳:strategy、normalization 与 concurrency + +> 导读: +> 本文从生产运行视角拆解 Consilens 任务稳定性的关键配置,重点说明 `strategy` 怎样决定比较路径,`normalization` 怎样消除跨库类型与格式差异,`concurrency` 怎样控制 CPU 与 I/O 并发预算,并给出一套更贴近真实场景的调优和排查顺序,帮助你把“能跑”变成“跑得稳”。 +> +> Github: +> https://github.com/datavane/consilens +> 欢迎关注、Star、Fork,参与贡献 + +把 `comparison` 写清楚,解决的是“比得准”。 + +但在生产环境里,仅仅比得准还不够。大表要能跑得动,跨库要尽量少误报,任务跑慢时要知道从哪里调。这个时候就要看三块配置:`strategy`、`normalization` 和 `concurrency`。 + +它们分别回答三个问题: + +- 用什么执行策略去比较? +- 不同数据库的类型差异怎么统一? +- 大任务怎么合理利用并发? + +## strategy:先选一条正确的路 + +Consilens 当前公开的主要策略是 `checksum` 和 `join`。 + +很多人会问:哪一个更快? + +我更建议先问:哪一个更适合你的数据环境? + +## checksum:跨库和大表的默认选择 + +大多数时候,你应该从 `checksum` 开始。 + +```yaml +strategy: + mode: checksum + algorithm: xor + bisectionFactor: 4 + bisectionThreshold: 20000 + batchSize: 2000 + enableProfiling: false + localCompare: + mode: full +``` + +`checksum` 的思路可以简单理解为:先用校验和快速判断一个数据段是否一致,如果不一致,再继续拆分,最后只对有问题的小段做更细的比较。 + +```mermaid +flowchart LR + A["大段数据"] --> B["分段 checksum"] + B -->|一致| C["直接跳过"] + B -->|不一致| D["继续切分"] + D --> E["小段局部比较"] +``` + +这很适合跨库场景。因为跨库最怕把两边全量数据都拉回来逐行硬比,成本高,链路长,风险也大。 + +几个参数可以先这样理解: + +| 参数 | 它解决什么问题 | 建议起点 | +| --- | --- | --- | +| `algorithm` | 校验和算法 | 大多数场景用 `xor` | +| `bisectionFactor` | 每次把差异段拆成几份 | 常用 `4` | +| `bisectionThreshold` | 小到什么程度进入局部比较 | 可从 `20000` 起步 | +| `batchSize` | 单批读取大小 | 可从 `1000` 或 `2000` 起步 | +| `enableProfiling` | 是否打开剖析日志 | 排障时再开 | +| `localCompare.mode` | 末段怎么做本地比较 | 默认 `full` 更稳 | + +如果不写 `bisectionThreshold`,系统会按 `batchSize * 10` 推导默认值。这个设计很好理解:批量越大,终局小段阈值也可以相应放大。 + +## localCompare:末段比较要稳还是要轻 + +`checksum` 会先缩小范围,但最后总要处理那些确实存在差异的小段。 + +当前 `localCompare.mode` 支持: + +- `full` +- `row-hash` + +```yaml +strategy: + mode: checksum + algorithm: xor + localCompare: + mode: row-hash +``` + +如果你刚开始上生产,我建议先用默认的 `full`。它更直观,也更稳。 + +当你已经确认任务规模、字段数量和差异分布,并且末段比较成本明显偏高时,再考虑 `row-hash`。 + +调优不要从激进开始。生产任务最怕的是你不知道自己为什么快,也不知道快在哪里引入了风险。 + +## join:不是默认项,而是高速通道 + +`join` 适合一个很明确的前提:两边数据能在同一个执行域里由数据库端完成 Join。 + +```yaml +strategy: + mode: join + algorithm: concat +``` + +比如同库、同实例、同类库,数据库本身很擅长做 Join,这时 `join` 会非常直接。 + +但下面这些情况,不建议一上来就用 `join`: + +- 两边跨库跨域; +- 其中一边是 SQL 资源; +- 你不确定连接器是否支持服务器端 Join; +- 数据库侧 Join 会给业务库带来明显压力。 + +所以我的判断标准是: + +> 能明确证明它适合 join,再用 join;否则 checksum 是更稳的起点。 + +## normalization:跨库对账的降噪层 + +跨库对账最烦人的问题,往往不是数据真的错了,而是两边表达方式不一样。 + +MySQL 里 `tinyint(1)` 表示布尔,PostgreSQL 里是 `boolean`;一个库时间精度到毫秒,另一个只到秒;金额一个保留两位,一个保留四位;空字符串和 NULL 在不同系统里的语义也可能不同。 + +这些问题如果不处理,就会制造大量误报。 + +`normalization` 就是为这件事准备的。 + +```yaml +normalization: + global: + decimal: + precision: 2 + rounding: true + timestamp: + format: "yyyy-MM-dd HH:mm:ss" + timezone: "UTC" + + source: + boolean: + trueValue: "1" + falseValue: "0" + nullValue: "" + + target: + boolean: + trueValue: "true" + falseValue: "false" + nullValue: "" +``` + +阅读方式很简单: + +- `global`:两边都生效; +- `source`:只对源端生效; +- `target`:只对目标端生效。 + +端侧配置优先级高于全局配置。 + +## 数字:先统一精度,再谈一致 + +金额、税额、汇率、折扣这些字段,最容易因为精度差异产生噪音。 + +```yaml +normalization: + global: + decimal: + precision: 4 + rounding: true +``` + +这里要结合业务判断。财务核对可能需要严格到分甚至更高精度;运营报表可能只需要到两位小数。不要让默认精度替你做业务决策。 + +## 时间:最容易被忽略,也最容易误报 + +时间字段的问题通常出在三个地方:时区、格式、精度。 + +```yaml +normalization: + global: + timestamp: + format: "yyyy-MM-dd HH:mm:ss" + timezone: "UTC" + comparisonMode: "TRUNCATE_TO_SECOND" +``` + +`comparisonMode` 当前支持: + +- `EXACT` +- `DATE_ONLY` +- `TRUNCATE_TO_SECOND` +- `TRUNCATE_TO_DAY` + +如果你在做订单状态、支付流水、库存变更这类任务,时间精度要慎重;如果只是按天统计口径,`DATE_ONLY` 或 `TRUNCATE_TO_DAY` 可能更符合业务真实语义。 + +## 布尔:跨 MySQL、PostgreSQL 时尤其常见 + +```yaml +normalization: + source: + boolean: + trueValue: "1" + falseValue: "0" + nullValue: "" + target: + boolean: + trueValue: "true" + falseValue: "false" + nullValue: "" +``` + +这类配置看起来小,但很实用。很多“明明业务一样却一直 mismatch”的问题,最后都落在布尔值、空值和时间精度上。 + +## 二进制和字符串:把表达方式统一起来 + +二进制字段可以指定编码: + +```yaml +normalization: + global: + binary: + encoding: "hex" + uppercase: true +``` + +当前二进制编码支持 `hex` 和 `base64`。 + +字符串标准化当前最常用的是统一空值语义: + +```yaml +normalization: + global: + string: + nullValue: "" +``` + +不要小看 NULL 和空串。跨系统同步里,它们经常是误报大户。 + +## concurrency:调优不是把线程数拉满 + +当任务进入大表、跨库、长期运行阶段,`concurrency` 才值得认真配置。 + +```yaml +concurrency: + io: + core: 8 + max: 32 + queueSize: 10000 + keepAliveSeconds: 60 + threadNamePrefix: consilens-io- + cpu: + core: 4 + max: 8 + queueSize: 10000 + keepAliveSeconds: 60 + threadNamePrefix: consilens-cpu- +``` + +你可以把它理解为两套线程池: + +- `io`:负责数据库读取、网络等待这类偏 I/O 的工作; +- `cpu`:负责哈希、比较、计算这类偏 CPU 的工作。 + +调优时不要一上来就把并发开大。更稳的方式是: + +1. 先用保守参数跑通; +2. 看数据库负载、网络吞吐、JVM 指标; +3. 如果数据库等待明显,逐步调 `io`; +4. 如果计算吃紧,再考虑调 `cpu`; +5. 每次只改一类参数,避免不知道到底是哪一项起作用。 + +并发不是越大越好。对账任务通常会连着生产库、分析库、审计库,线程数开得太猛,最先出问题的可能不是 Consilens,而是被你打满的数据库。 + +## readOptions:必要时控制读取细节 + +大表和慢链路场景下,可以在数据源上加 `readOptions`。 + +```yaml +source: + type: mysql + connection: + url: jdbc:mysql://localhost:3306/ods + username: root + password: 123456 + resource: + type: table + name: ods_order + readOptions: + fetchSize: 2000 +``` + +在当前 JDBC 路径里,`fetchSize` 是最常用的参数。它不会改变比较逻辑,但会影响读取过程中的批量行为。 + +## 一套实用的排查顺序 + +当一个任务“跑不稳”时,可以按这个顺序看: + +```mermaid +flowchart TD + A["任务跑不稳"] --> B{"先判断问题类型"} + B -->|误报多| C["先查 comparison / normalization"] + B -->|慢且差异段多| D["查 strategy 分段和批量参数"] + B -->|数据库压力高| E["先收敛 filters / batchSize / 读取口径"] + B -->|网络等待明显| F["逐步调 concurrency.io"] + B -->|CPU 计算吃紧| G["逐步调 concurrency.cpu"] + B -->|瓶颈不清楚| H["临时打开 enableProfiling"] +``` + +1. 差异是不是误报?先看 `comparison` 和 `normalization`。 +2. 差异段很多,任务很慢?看 `strategy` 的分段和批量参数。 +3. 数据库压力高?先别加并发,先看 filters、batchSize、读取口径。 +4. 网络等待明显?逐步调 `concurrency.io`。 +5. 哈希和比较计算吃紧?再看 `concurrency.cpu`。 +6. 不知道慢在哪里?临时打开 `enableProfiling`,看证据再调。 + +生产调优最重要的不是参数表,而是顺序。先判断问题类型,再动对应旋钮。 diff --git "a/website/\351\205\215\347\275\256\350\256\262\350\247\243/04-\347\273\223\346\236\234\344\270\215\346\230\257\347\273\210\347\202\271\357\274\232result \344\270\216\345\256\241\350\256\241\351\227\255\347\216\257.md" "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/04-\347\273\223\346\236\234\344\270\215\346\230\257\347\273\210\347\202\271\357\274\232result \344\270\216\345\256\241\350\256\241\351\227\255\347\216\257.md" new file mode 100644 index 0000000..528b93b --- /dev/null +++ "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/04-\347\273\223\346\236\234\344\270\215\346\230\257\347\273\210\347\202\271\357\274\232result \344\270\216\345\256\241\350\256\241\351\227\255\347\216\257.md" @@ -0,0 +1,422 @@ +# 04|结果不是终点:result 与审计闭环 + +> 导读: +> 本文围绕 Consilens 的 `result` 配置展开,解释一次数据校验任务的结果为什么不该只停留在控制台摘要,而应该继续沉淀到 JSON、结果表、差异明细表,并进一步接入告警、工单、治理看板等审计闭环。文章重点梳理不同 sink 的作用边界、默认输出结构以及从试跑到生产化落库的演进路径。 +> +> Github: +> https://github.com/datavane/consilens +> 欢迎关注、Star、Fork,参与贡献 + +很多人第一次跑数据对账,最关心的是控制台最后那一行:到底有没有差异? + +这当然重要。但如果你要把 Consilens 用在生产环境里,只知道“有差异”远远不够。 + +你还需要知道:差异是什么类型?缺源端还是缺目标端?哪些字段不一致?这次任务属于哪个表、哪个租户、哪个批次?结果能不能落库?能不能进入告警、工单、治理看板? + +这就是 `result` 的价值。 + +它不是简单的输出配置,而是把一次对账任务接入数据治理闭环的入口。 + +## 先从一个组合输出开始 + +试跑时,控制台摘要足够: + +```yaml +result: + sinks: + - format: console + type: result +``` + +但生产里更常见的写法,是同时输出摘要、明细和审计表: + +```yaml +result: + failOnSinkError: true + sinks: + - format: console + type: result + + - format: json + type: diff-record + properties: + path: ./output/diff-${taskId}.json + pretty: true + + - format: table + type: diff-record + properties: + type: postgresql + url: jdbc:postgresql://localhost:5432/audit + username: postgres + password: 123456 + tableName: diff_result_detail + createTable: true + batchSize: 1000 +``` + +这段配置的思路很清楚: + +- 控制台看任务摘要,方便值班和调试; +- JSON 留一份明细,方便临时排查和二次消费; +- 表 sink 把差异沉淀到审计库,给后续治理系统使用。 + +```mermaid +flowchart LR + A["一次 diff 任务"] --> B["console result
    值班摘要"] + A --> C["json diff-record
    排查留痕"] + A --> D["table diff-record
    审计沉淀"] + D --> E["告警 / 工单 / 治理看板"] +``` + +如果你希望某个 sink 暂时不生效,不用删除配置,可以加: + +```yaml +enabled: false +``` + +## format 和 type:一个说去哪,一个说写什么 + +`result.sinks` 里最容易混淆的是 `format` 和 `type`。 + +`format` 表示输出介质,`type` 表示输出内容层级。 + +| format | type | 适合场景 | +| --- | --- | --- | +| `console` | `result` | 控制台摘要 | +| `console` | `diff-record` | 控制台查看部分差异明细 | +| `json` | `result` | 结果摘要留痕 | +| `json` | `diff-record` | 差异明细文件 | +| `csv` | `result` | 摘要导出给人看 | +| `csv` | `diff-record` | 明细导出、人工分析、导入其他系统 | +| `table` | `result` | 任务摘要入库 | +| `table` | `diff-record` | 差异明细入库审计 | + +用一句话记: + +> format 解决“写到哪里”,type 解决“写哪一层结果”。 + +## failOnSinkError:这次输出失败,任务算不算失败 + +默认情况下,`failOnSinkError` 是 `true`。也就是说,只要某个 sink 写失败,任务就会失败。 + +```yaml +result: + failOnSinkError: false +``` + +把它改成 `false`,才是“记录告警但主比较流程继续”的模式。这适合开发调试,或者某些辅助输出不关键的场景。 + +但如果你已经把结果表接到了告警平台、审计报表、治理工单,我建议打开严格模式: + +```yaml +result: + failOnSinkError: true +``` + +这样任何一个关键 sink 出问题,任务会更早失败。生产里,静默丢结果比任务失败更危险。 + +## console:最适合试跑和值班 + +摘要输出: + +```yaml +result: + sinks: + - format: console + type: result + properties: + showStatistics: true +``` + +明细输出: + +```yaml +result: + sinks: + - format: console + type: diff-record + properties: + maxRows: 50 +``` + +`maxRows` 很适合控制台调试。你能看到差异长什么样,又不会把终端刷爆。 + +控制台输出的定位是“快速看一眼”,不要把它当成生产留痕。 + +## json:适合留痕和二次消费 + +结果摘要: + +```yaml +result: + sinks: + - format: json + type: result + properties: + path: ./output/result-${taskId}.json + pretty: true +``` + +差异明细: + +```yaml +result: + sinks: + - format: json + type: diff-record + properties: + path: ./output/diff-${taskId}.json + pretty: true +``` + +如果你希望在默认差异字段基础上补充业务信息,可以使用 `mergeDefaults`: + +```yaml +result: + sinks: + - format: json + type: diff-record + properties: + path: ./output/diff-${taskId}.json + pretty: true + mergeDefaults: true + columns: + - name: biz_date + value: ${src.biz_date} + - name: tenant_id + value: ${src.tenant_id} +``` + +`diff-record` 的默认字段包括: + +- `operation` +- `primaryKey` +- `sourceValues` +- `targetValues` +- `changedColumns1` +- `changedColumns2` + +对于 `type: result`,可以这样理解:不写 `columns` 就输出默认统计;写了 `columns` 就进入完整自定义输出模式。 + +## csv:适合给人看,也适合交给其他系统 + +差异明细 CSV: + +```yaml +result: + sinks: + - format: csv + type: diff-record + properties: + path: ./output/diff-${taskId}.csv + delimiter: "," + includeHeader: true +``` + +结果摘要 CSV: + +```yaml +result: + sinks: + - format: csv + type: result + properties: + path: ./output/result-${taskId}.csv + includeHeader: true +``` + +CSV 的好处是门槛低。运营、测试、数据分析同学都能打开看,也容易被其他系统导入。 + +如果是长期生产任务,CSV 更适合作为辅助输出;真正的主链路建议还是落表。 + +## table:生产审计的主力 + +表 sink 是最值得认真设计的一类输出。 + +```yaml +result: + sinks: + - format: table + type: diff-record + properties: + type: postgresql + url: jdbc:postgresql://localhost:5432/audit + username: postgres + password: 123456 + tableName: diff_result_detail + createTable: true + dropIfExists: false + batchSize: 1000 +``` + +有两条规则要记住: + +1. `properties.type` 必填; +2. 当前表 sink 写入目标支持 `mysql` 和 `postgresql`。 + +常用参数如下: + +| 参数 | 作用 | +| --- | --- | +| `type` | 写入目标数据库类型 | +| `url` / `username` / `password` | 写入连接 | +| `driver` | 可选,覆盖默认驱动 | +| `maxPoolSize` | 连接池大小 | +| `tableName` | 指定表名 | +| `prefix` | 未指定 `tableName` 时的表名前缀 | +| `suffixTimestamp` | 是否自动追加时间戳后缀 | +| `createTable` | 是否自动建表 | +| `dropIfExists` | 是否先删再建 | +| `defaultColumnLength` | 默认字符串列长度 | +| `batchSize` | 批量写入大小 | + +开发和 PoC 阶段可以让系统自动建表。到了生产环境,我更建议把表结构纳入数据库变更管理,让结果表成为治理体系的一部分,而不是一个临时输出副产物。 + +## 默认表结构:快速落地很有用 + +如果不自定义列,`type: result` 的默认摘要表会包含这些核心列: + +- `nl_dq_execution_id` +- `src_table` +- `tgt_table` +- `diff_count` +- `src_missing` +- `tgt_missing` +- `mismatch_count` +- `run_status` +- `completed_at` + +`type: diff-record` 的默认宽表会包含: + +- `nl_dq_execution_id` +- `nl_dq_diff_type` +- `nl_dq_diff_columns1` +- `nl_dq_diff_columns2` +- 源端字段的 `*_1` +- 目标端字段的 `*_2` + +这套默认结构很适合快速落地。尤其是刚开始接 Consilens 时,不要急着过度设计结果模型。先用默认宽表把差异沉淀下来,等治理流程稳定后再自定义。 + +## columns:把结果改造成你的审计模型 + +当你要接入公司已有的数据质量平台、审计表、告警系统时,通常需要自定义输出列。 + +```yaml +result: + sinks: + - format: table + type: diff-record + properties: + type: postgresql + url: jdbc:postgresql://localhost:5432/audit + username: postgres + password: 123456 + tableName: diff_result_detail + createTable: true + batchSize: 1000 + columns: + - name: task_id + value: ${taskId} + columnType: VARCHAR(64) + + - name: diff_type + value: ${operation} + columnType: VARCHAR(32) + + - name: pk_value + value: ${primaryKey} + columnType: TEXT + + - name: src_amount + value: ${src.amount} + defaultValue: "0" + columnType: NUMERIC(18,2) + + - name: tgt_amount + value: ${tgt.amount} + defaultValue: "0" + columnType: NUMERIC(18,2) + + - name: changed_columns + value: ${changedColumns} + columnType: JSONB + + - name: written_at + value: ${timestamp} + columnType: TIMESTAMP +``` + +每一列都有自己的职责: + +- `name`:输出列名; +- `value`:值模板; +- `defaultValue`:值为空时的兜底; +- `columnType`:表 sink 中用于建表和类型转换。 + +`columnType` 不只是为了建表好看。它还会影响插入时的类型转换。金额、时间、JSON 这类字段,建议显式声明。 + +## 常用占位符 + +差异明细 `diff-record` 常用: + +| 占位符 | 含义 | +| --- | --- | +| `${taskId}` | 当前任务 ID | +| `${operation}` | 差异类型,如 `mismatch`、`source_missing`、`target_missing` | +| `${primaryKey}` | 主键值字符串 | +| `${changedColumns}` | 变更列数组 JSON | +| `${changedColumns1}` | 源端变更列数组 JSON | +| `${changedColumns2}` | 目标端变更列数组 JSON | +| `${src.col}` | 源端某列值 | +| `${tgt.col}` | 目标端某列值 | +| `${sourceTable}` | 源数据集名称 | +| `${targetTable}` | 目标数据集名称 | +| `${strategy}` | 当前策略 | +| `${algorithm}` | 当前算法 | +| `${timestamp}` | 当前时间 | + +最终结果 `result` 常用: + +| 占位符 | 含义 | +| --- | --- | +| `${status}` | 整体结果,`EQUAL` 或 `DIFF` | +| `${totalDifferences}` | 总差异数 | +| `${sourceMissingCount}` | 源端缺失数 | +| `${targetMissingCount}` | 目标缺失数 | +| `${mismatchCount}` | 不一致数 | +| `${sourceRowCount}` | 源端行数 | +| `${targetRowCount}` | 目标端行数 | +| `${statistics_json}` | 统计摘要 JSON 字符串 | + +也可以写常量: + +```yaml +value: production +``` + +或者把常量和占位符混写: + +```yaml +value: task_${taskId}_${operation} +``` + +## 一个生产结果链路的建议 + +如果你要从零设计 Consilens 的结果输出,可以按这个顺序走: + +```mermaid +flowchart TD + A["开发阶段
    console result + console diff-record"] --> B["联调阶段
    增加 json diff-record"] + B --> C["PoC 阶段
    table diff-record 默认宽表"] + C --> D["生产阶段
    columns 对齐审计模型"] + D --> E["治理阶段
    接告警 / 工单 / 看板 / 回放"] +``` + +1. 开发阶段:`console result` + `console diff-record`,快速看效果; +2. 联调阶段:增加 `json diff-record`,方便保留样例; +3. PoC 阶段:使用 `table diff-record` 默认宽表,先把明细沉淀下来; +4. 生产阶段:设计审计表,用 `columns` 输出公司统一模型; +5. 治理阶段:结果表接告警、工单、看板和回放链路。 + +做到这一步,Consilens 就不只是一个对账工具,而是数据质量体系里的一个稳定节点。 diff --git "a/website/\351\205\215\347\275\256\350\256\262\350\247\243/05-\344\270\203\344\270\252\345\270\270\350\247\201\345\234\272\346\231\257\357\274\214\347\233\264\346\216\245\346\213\277\345\216\273\346\224\271.md" "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/05-\344\270\203\344\270\252\345\270\270\350\247\201\345\234\272\346\231\257\357\274\214\347\233\264\346\216\245\346\213\277\345\216\273\346\224\271.md" new file mode 100644 index 0000000..04fcfc3 --- /dev/null +++ "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/05-\344\270\203\344\270\252\345\270\270\350\247\201\345\234\272\346\231\257\357\274\214\347\233\264\346\216\245\346\213\277\345\216\273\346\224\271.md" @@ -0,0 +1,399 @@ +# 05|七个常见场景,直接拿去改 + +> 导读: +> 本文把 Consilens 在实际项目里最常见的几类配置场景整理成可直接改造的模板,包括同构表全量核对、忽略审计列、字段映射对齐、按业务切片比较、按时间窗滚动检查以及结果落库等典型用法。重点不是机械复制模板,而是帮助你快速找到最接近自己业务的起点,再按连接信息、主键口径、比较字段和结果链路完成落地。 +> +> Github: +> https://github.com/datavane/consilens +> 欢迎关注、Star、Fork,参与贡献 + +前面几篇讲的是思路。这一篇更直接:把常见场景整理成模板。 + +模板不是让你无脑复制,而是给你一个可靠起点。真正落地时,至少要改三类内容:连接信息、业务主键、比较字段。 + +## 场景一:两个同构表做全量核对 + +适合迁移验收、同构同步链路核验、初次摸底。 + +```yaml +source: + type: mysql + connection: + url: jdbc:mysql://localhost:3306/ods + username: root + password: 123456 + resource: + type: table + name: user_info + +target: + type: postgresql + connection: + url: jdbc:postgresql://localhost:5432/dwd + username: postgres + password: 123456 + resource: + type: table + name: user_info + +comparison: + keys: + source: + - id + target: + - id + +strategy: + mode: checksum + algorithm: xor + +result: + sinks: + - format: console + type: result +``` + +这里故意没有写 `fields`,系统会比较所有非主键列。第一次摸底很适合这样做。 + +## 场景二:同构表,但忽略审计列 + +适合同步链路会重新生成写入时间、批次号、更新时间的情况。 + +```yaml +comparison: + keys: + source: + - id + target: + - id + exclude: + source: + - created_at + - updated_at + - sync_time + target: + - created_at + - updated_at + - sync_time +``` + +这类配置的关键不是“排除越多越好”,而是只排除那些业务上确实不应该决定一致性的字段。 + +## 场景三:两边字段名不同,但业务语义一致 + +适合 ODS 到 DWD、业务库到分析库、旧系统到新系统迁移。 + +```yaml +comparison: + keys: + source: + - order_id + target: + - id + mappings: + - name: order_id + source: + column: order_id + target: + column: id + key: true + + - name: amount + source: + column: pay_amount + target: + column: total_amount + + - name: status + source: + column: order_status + target: + column: status +``` + +这里用 `mappings` 把两边投影成同一套逻辑字段。结果里看到的是 `order_id`、`amount`、`status`,比直接看两边原始字段名更容易理解。 + +## 场景四:先做业务塑形,再比较 + +适合两边不是简单字段改名,而是需要过滤、投影、表达式处理。 + +```yaml +source: + type: mysql + connection: + url: jdbc:mysql://localhost:3306/ods + username: root + password: 123456 + resource: + type: sql + path: | + SELECT + order_id, + buyer_id, + DATE(pay_time) AS biz_date, + amount + FROM ods_order + WHERE pay_status = 'SUCCESS' + +target: + type: postgresql + connection: + url: jdbc:postgresql://localhost:5432/dwd + username: postgres + password: 123456 + resource: + type: sql + path: | + SELECT + order_id, + buyer_id, + biz_date, + paid_amount AS amount + FROM dwd_paid_order + +comparison: + keys: + source: + - order_id + target: + - order_id + fields: + source: + - buyer_id + - biz_date + - amount + target: + - buyer_id + - biz_date + - amount + +strategy: + mode: checksum + algorithm: xor +``` + +如果你已经选择 SQL 资源,就尽量在 SQL 里把字段口径整理清楚。这样比在多个配置块之间来回跳更容易排障。 + +## 场景五:只比较某个业务切片 + +适合按天、按租户、按业务状态做校验。 + +```yaml +comparison: + keys: + source: + - order_id + target: + - order_id + fields: + source: + - buyer_id + - amount + - status + target: + - buyer_id + - amount + - status + filters: + source: "dt = '2026-05-05' AND tenant_id = 1001" + target: "dt = '2026-05-05' AND tenant_id = 1001" +``` + +注意:两边过滤条件表达的业务边界必须一致。字段名可以不同,但业务口径不能不同。 + +## 场景六:按批次滚动检查最近变更 + +适合增量同步链路、订单状态校验、需要持续巡检但当前仍由外部调度控窗的场景。 + +```yaml +comparison: + keys: + source: + - order_id + target: + - order_id + fields: + source: + - buyer_id + - amount + - order_status + - updated_at + target: + - buyer_id + - amount + - order_status + - updated_at + filters: + source: "updated_at >= '2026-05-10 09:00:00' AND updated_at < '2026-05-10 09:30:00'" + target: "updated_at >= '2026-05-10 09:00:00' AND updated_at < '2026-05-10 09:30:00'" +``` + +这里的时间窗应该由外部调度系统按批次改写,持续校验要用“调度 + filters”来实现。 + +## 场景七:差异明细沉淀到审计表 + +适合生产环境接告警、治理平台、工单和审计报表。 + +```yaml +result: + failOnSinkError: true + sinks: + - format: console + type: result + + - format: table + type: diff-record + properties: + type: postgresql + url: jdbc:postgresql://localhost:5432/audit + username: postgres + password: 123456 + tableName: diff_result_detail + createTable: true + batchSize: 1000 +``` + +如果只是 PoC,可以先用默认表结构。进入生产后,再根据公司审计模型用 `columns` 自定义输出。 + +## 一个完整生产模板 + +下面是一份更接近生产的订单校验配置。它把字段对齐、类型标准化和结果落库都放进来了。 + +```yaml +source: + type: mysql + name: order-source + connection: + url: jdbc:mysql://localhost:3306/ods?useSSL=false&serverTimezone=UTC + username: root + password: 123456 + resource: + type: table + name: ods_order + readOptions: + fetchSize: 2000 + +target: + type: postgresql + name: order-target + connection: + url: jdbc:postgresql://localhost:5432/dwd?currentSchema=public + username: postgres + password: 123456 + resource: + type: table + name: dwd_order + +comparison: + keys: + source: + - order_id + target: + - id + mappings: + - name: order_id + source: + column: order_id + target: + column: id + key: true + - name: buyer_id + source: + column: buyer_id + target: + column: buyer_id + - name: amount + source: + column: amount + target: + column: total_amount + - name: order_status + source: + column: status + target: + column: order_status + - name: updated_at + source: + column: updated_at + target: + column: updated_at + filters: + source: "tenant_id = 1001" + target: "tenant_id = 1001" + +normalization: + global: + decimal: + precision: 2 + rounding: true + timestamp: + format: "yyyy-MM-dd HH:mm:ss" + timezone: "UTC" + comparisonMode: "TRUNCATE_TO_SECOND" + +strategy: + mode: checksum + algorithm: xor + bisectionFactor: 4 + bisectionThreshold: 20000 + batchSize: 2000 + localCompare: + mode: full + +result: + failOnSinkError: true + sinks: + - format: console + type: result + - format: table + type: diff-record + properties: + type: postgresql + url: jdbc:postgresql://localhost:5432/audit + username: postgres + password: 123456 + tableName: diff_result_detail + createTable: true + batchSize: 1000 + columns: + - name: task_id + value: ${taskId} + columnType: VARCHAR(64) + - name: diff_type + value: ${operation} + columnType: VARCHAR(32) + - name: pk_value + value: ${primaryKey} + columnType: TEXT + - name: changed_columns + value: ${changedColumns} + columnType: JSONB + - name: written_at + value: ${timestamp} + columnType: TIMESTAMP +``` + +如果你还需要“滚动窗口巡检”,请让调度系统按批次改写 `comparison.filters`。 + +这份模板不一定适合直接上生产,但它展示了一个成熟任务应该具备的结构:数据集清楚、比较口径清楚、策略清楚、结果去向清楚。 + +## 使用模板前,先改这五处 + +拿到任何模板,都先检查: + +```mermaid +flowchart TD + A["拿到模板"] --> B["改 source / target 连接和资源"] + B --> C["确认 comparison.keys 是业务主键"] + C --> D["确认 fields / mappings 是真实一致性口径"] + D --> E["确认 filters 两边业务边界一致"] + E --> F["确认 result 符合排障 / 审计链路"] +``` + +1. `source` / `target` 连接和资源是否正确; +2. `comparison.keys` 是否真的是业务主键; +3. `fields` 或 `mappings` 是否表达了真实一致性口径; +4. `filters` 是否两边边界一致; +5. `result` 是否符合你的排障、审计和治理链路。 + +模板的价值不是替你思考,而是帮你少走弯路。 diff --git "a/website/\351\205\215\347\275\256\350\256\262\350\247\243/06-\351\205\215\347\275\256\351\200\237\346\237\245.md" "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/06-\351\205\215\347\275\256\351\200\237\346\237\245.md" new file mode 100644 index 0000000..2fac2b7 --- /dev/null +++ "b/website/\351\205\215\347\275\256\350\256\262\350\247\243/06-\351\205\215\347\275\256\351\200\237\346\237\245.md" @@ -0,0 +1,179 @@ +# Consilens 配置速查表 + +这份速查表适合在已经理解整体配置思路后使用。第一次学习建议先读系列文章,不要直接从字段表开始。 + +## 一份配置的主线 + +```text +source / target → comparison → strategy → result + 数据从哪里来 怎样才算相同 怎么执行 结果去哪里 +``` + +```mermaid +flowchart LR + A["source / target"] --> B["comparison"] + B --> C["strategy"] + C --> D["result"] +``` + +扩展能力: + +- `normalization`:跨库类型标准化; +- `concurrency`:并发调优; +- `readOptions`:读取参数控制。 + +## source / target + +| 字段 | 说明 | +| --- | --- | +| `type` | 数据源类型,如 `mysql`、`postgresql`、`oracle`、`doris`、`starrocks` 等 | +| `name` | 任务中的数据源名称,建议长期任务填写 | +| `connection.url` | JDBC 地址 | +| `connection.username` | 用户名 | +| `connection.password` | 密码 | +| `resource.type` | `table` 或 `sql` | +| `resource.name` | 表资源名称,`type: table` 时使用 | +| `resource.path` | SQL 文本,`type: sql` 时使用 | +| `readOptions.fetchSize` | JDBC 读取批量参数,适合大表调优 | + +经验:表资源适合物理表直比;SQL 资源适合先整理业务口径再比较。 + +## comparison + +| 字段 | 说明 | +| --- | --- | +| `keys.source` / `keys.target` | 必填。两侧业务主键,数量要一一对应 | +| `fields.source` / `fields.target` | 要比较的字段;不写时比较所有非主键列 | +| `exclude.source` / `exclude.target` | 从比较字段中排除噪音列 | +| `filters.source` / `filters.target` | 两侧过滤条件,建议两边同时配置 | +| `mappings` | 字段名不同、表达式不同但业务语义相同时使用 | +| `extraColumns` | 不参与比较,但可作为差异结果上下文字段带出;更适合和 `fields` 搭配使用 | + +经验:`fields` 和 `mappings` 选一条路。SQL 资源里已经整理过字段时,不要再过度使用 `mappings`。 + +## mappings + +| 字段 | 说明 | +| --- | --- | +| `name` | 逻辑字段名 | +| `source.column` / `target.column` | 直接取列 | +| `source.expression` / `target.expression` | 表达式投影 | +| `source.literal` / `target.literal` | 常量 | +| `key: true` | 映射后的逻辑主键字段 | +| `compare: false` | 不进入比较字段集合;当前不要把它当成“自动随差异结果透出上下文”的能力 | + +注意:即使用了 `mappings`,`comparison.keys` 仍然必须配置。 + +## strategy + +| 字段 | 说明 | 建议 | +| --- | --- | --- | +| `mode` | `checksum` 或 `join` | 跨库默认 `checksum` | +| `algorithm` | 校验算法 | 常用 `xor` | +| `bisectionFactor` | 差异段拆分因子 | 常用 `4` | +| `bisectionThreshold` | 小段阈值 | 可从 `20000` 起步 | +| `batchSize` | 单批读取大小 | 可从 `1000` 或 `2000` 起步 | +| `enableProfiling` | 剖析日志 | 排障时打开 | +| `localCompare.mode` | `full` 或 `row-hash` | 默认 `full` 更稳 | + +经验:能明确证明同域可 Join,再用 `join`;否则从 `checksum` 开始。 + +## normalization + +| 类型 | 常见用途 | +| --- | --- | +| `decimal` | 金额、税额、汇率精度统一 | +| `timestamp` | 时间格式、时区、精度统一 | +| `boolean` | `1/0`、`true/false` 等布尔语义统一 | +| `binary` | `hex`、`base64` 编码统一 | +| `string` | NULL 和空串语义统一 | + +常见时间比较模式: + +- `EXACT` +- `DATE_ONLY` +- `TRUNCATE_TO_SECOND` +- `TRUNCATE_TO_DAY` + +经验:大量 mismatch 先看布尔、时间、金额、NULL,而不是急着怀疑数据链路。 + +## 滚动窗口校验 + +当前 CLI 配置模型**没有** `realtime` 顶层节点。 + +如果你要做持续校验,正确做法是: + +1. 外部调度系统计算本轮时间窗; +2. 把窗口写进 `comparison.filters.source` / `comparison.filters.target`; +3. 由外部系统记录 checkpoint。 + +经验:安全延迟和窗口重叠仍然重要,只是它们现在属于调度层,而不是当前 CLI 的内建配置项。 + +## result + +| format | type | 用途 | +| --- | --- | --- | +| `console` | `result` | 控制台摘要 | +| `console` | `diff-record` | 控制台差异明细 | +| `json` | `result` | 摘要文件 | +| `json` | `diff-record` | 明细文件 | +| `csv` | `result` | 摘要 CSV | +| `csv` | `diff-record` | 明细 CSV | +| `table` | `result` | 摘要入库 | +| `table` | `diff-record` | 明细入库 | + +表 sink 当前写入目标支持:`mysql`、`postgresql`。 + +`failOnSinkError`: + +- `false`:sink 失败时记录告警,主流程继续; +- `true`:sink 失败则任务失败,适合生产审计链路。 + +## columns 常用占位符 + +差异明细: + +| 占位符 | 含义 | +| --- | --- | +| `${taskId}` | 当前任务 ID | +| `${operation}` | 差异类型 | +| `${primaryKey}` | 主键值 | +| `${changedColumns}` | 变更列 JSON | +| `${src.col}` | 源端列值 | +| `${tgt.col}` | 目标端列值 | +| `${timestamp}` | 当前时间 | + +最终结果: + +| 占位符 | 含义 | +| --- | --- | +| `${status}` | `EQUAL` 或 `DIFF` | +| `${totalDifferences}` | 总差异数 | +| `${sourceMissingCount}` | 源端缺失数 | +| `${targetMissingCount}` | 目标端缺失数 | +| `${mismatchCount}` | 不一致数 | +| `${sourceRowCount}` | 源端行数 | +| `${targetRowCount}` | 目标端行数 | +| `${statistics_json}` | 统计摘要 JSON | + +## 排查顺序 + +1. 结果差异很多:先看 `keys`、`fields`、`filters` 是否口径一致。 +2. 大量字段误报:看 `normalization`,尤其是时间、金额、布尔、NULL。 +3. 任务慢:看 `filters`、`strategy.batchSize`、`bisectionThreshold`。 +4. 数据库压力高:先收敛比较范围,再考虑并发。 +5. 结果没落下来:看 `result.sinks`、连接配置、`failOnSinkError`。 +6. 滚动窗口任务漏边界:先看调度层怎么算时间窗,再看 `comparison.filters` 是否和业务窗口一致。 + +```mermaid +flowchart TD + A["出现问题"] --> B{"问题类型"} + B -->|差异太多| C["先查 keys / fields / filters"] + B -->|误报多| D["查 normalization"] + B -->|任务慢| E["查 filters / batchSize / bisectionThreshold"] + B -->|数据库压力高| F["先收敛比较范围,再看并发"] + B -->|结果没落下| G["查 result.sinks / 连接 / failOnSinkError"] + B -->|窗口漏边界| H["查调度时间窗和 comparison.filters"] +``` + +最后一句:配置不是为了把字段填满,而是把业务边界讲清楚。