From 666dc0e6bf4ba3afc7de2ea3d8395f825a33566c Mon Sep 17 00:00:00 2001 From: Kai Huang Date: Thu, 7 May 2026 21:30:31 -0700 Subject: [PATCH 1/3] Default empty array() return type to ARRAY MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PPL's `array()` no-arg form delegates to Calcite's `SqlLibraryOperators.ARRAY` return-type inference, which returns ARRAY for an empty operand list and ARRAY when all operands are typeless nulls. Both markers are fine for the v2 engine — `ArrayImplementor.internalCast` only consumes the element type when there are elements to cast, so an empty result Object list flows straight through to ExprCollectionValue regardless of declared element type. The analytics-engine route is stricter. When isthmus walks a RexCall like `mvjoin(array(), '-')`, it reaches its first operand's type and feeds it to `io.substrait.isthmus.TypeConverter.toSubstrait`, which throws `UnsupportedOperationException: Unable to convert the type UNKNOWN`. Substrait has no on-wire encoding for NULL/UNKNOWN element types, so the planner can't serialize the call at all. Two PPL ITs hit this directly: * `CalciteArrayFunctionIT.testMvjoinWithEmptyArray` * `CalciteArrayFunctionIT.testMvdedupWithEmptyArray` Substituting VARCHAR when the inferred element type is NULL or UNKNOWN gives the call a substrait-serializable type without affecting any value computation: the result list is empty either way. # Test plan * Unit tests: `:core:test --tests "*ArrayFunction*"` — passes locally (no existing tests asserted on the empty-array element type). * IT: `CalciteArrayFunctionIT` force-routed through the analytics-engine path via opensearch-project/OpenSearch#21554's plugin set — testMvjoinWithEmptyArray and testMvdedupWithEmptyArray now pass (were UNKNOWN type errors); pass-rate moved 26/60 → 28/60. Companion to opensearch-project/OpenSearch#21554. Signed-off-by: Kai Huang --- .../CollectionUDF/ArrayFunctionImpl.java | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java b/core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java index 9a77a0d5a7c..c436ead741f 100644 --- a/core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java +++ b/core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java @@ -50,6 +50,17 @@ public SqlReturnTypeInference getReturnTypeInference() { RelDataType originalType = SqlLibraryOperators.ARRAY.getReturnTypeInference().inferReturnType(sqlOperatorBinding); RelDataType innerType = originalType.getComponentType(); + // For empty `array()` Calcite infers element type as NULL, which downstream + // serializers (notably the analytics-engine route's substrait converter) + // reject with "Unable to convert the type UNKNOWN". Default to VARCHAR — the + // result is empty either way, so the chosen scalar element type doesn't + // affect any value computation, but it gives the call a substrait-serializable + // type. Existing v2-engine tests (which feed Object lists straight through to + // ExprCollectionValue) are unaffected because the empty list contains no + // elements that need to be cast. + if (innerType == null || isUnknownLikeType(innerType.getSqlTypeName())) { + innerType = typeFactory.createSqlType(SqlTypeName.VARCHAR); + } return createArrayType( typeFactory, typeFactory.createTypeWithNullability(innerType, true), true); } catch (Exception e) { @@ -63,6 +74,17 @@ public UDFOperandMetadata getOperandMetadata() { return null; } + /** + * Calcite's {@link SqlLibraryOperators#ARRAY} infers a {@code NULL}-element array for an empty + * call list and an {@code UNKNOWN}-element array when type inference can't pick one (e.g. all + * operands are typeless nulls). Either of those bubbles up to the analytics-engine route's + * substrait converter as "Unable to convert the type UNKNOWN" — substrait has no encoding for + * either marker. Treat both as needing a concrete fallback. + */ + private static boolean isUnknownLikeType(SqlTypeName sqlTypeName) { + return sqlTypeName == SqlTypeName.NULL || sqlTypeName == SqlTypeName.UNKNOWN; + } + public static class ArrayImplementor implements NotNullImplementor { @Override public Expression implement( From aa82704d93e8d3dda944002fd01881ce1251299b Mon Sep 17 00:00:00 2001 From: Kai Huang Date: Fri, 8 May 2026 11:04:16 -0700 Subject: [PATCH 2/3] =?UTF-8?q?Add=20unit=20tests=20for=20empty/UNKNOWN=20?= =?UTF-8?q?ARRAY=20=E2=86=92=20VARCHAR=20fallback?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cover the four shapes that exercise the return-type inference path introduced in 666dc0e6b: * array() — 0 operands, fallback fires → ARRAY * array(NULL) — typeless-null operand, fallback fires → ARRAY * array(1) — INTEGER operand, fallback does NOT fire → ARRAY * array('a', 'b') — VARCHAR operands, fallback does NOT fire → ARRAY The third case is the regression guard requested by review — confirms concrete element types pass through unchanged and the fallback is scoped strictly to the {@code NULL}/{@code UNKNOWN} markers. The harness uses Calcite's {@link ExplicitOperatorBinding} bound to {@link SqlLibraryOperators#ARRAY} so the inference's internal {@code SqlLibraryOperators.ARRAY.getReturnTypeInference().inferReturnType(...)} call resolves the same operator the production code delegates to — mocking {@code SqlOperatorBinding} directly hits NPEs deep inside Calcite's least-restrictive-type computation. Addresses review feedback on #5421. Signed-off-by: Kai Huang --- .../CollectionUDF/ArrayFunctionImplTest.java | 81 +++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/core/src/test/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImplTest.java b/core/src/test/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImplTest.java index 6dbc1901fa7..600a802615a 100644 --- a/core/src/test/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImplTest.java +++ b/core/src/test/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImplTest.java @@ -14,6 +14,12 @@ import java.util.Collections; import java.util.List; import java.util.stream.Collectors; +import org.apache.calcite.rel.type.RelDataType; +import org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.calcite.rel.type.RelDataTypeSystem; +import org.apache.calcite.sql.ExplicitOperatorBinding; +import org.apache.calcite.sql.fun.SqlLibraryOperators; +import org.apache.calcite.sql.type.SqlTypeFactoryImpl; import org.apache.calcite.sql.type.SqlTypeName; import org.junit.jupiter.api.Test; @@ -302,4 +308,79 @@ public void testArrayWithCharTypePreservesNulls() { assertNull(list.get(1), "Null should be preserved during CHAR type conversion"); assertEquals("y", list.get(2)); } + + // ==================== RETURN-TYPE INFERENCE TESTS ==================== + // These tests cover the return-type fallback the analytics-engine route depends on: + // when Calcite can't infer a concrete element type (no operands, or all-null operands), + // we substitute VARCHAR so the call's return type is substrait-serializable. Without the + // fallback Calcite emits ARRAY / ARRAY, which fails substrait conversion + // with "Unable to convert the type UNKNOWN" downstream. + + /** array() — empty operand list — returns ARRAY. */ + @Test + public void testReturnTypeForEmptyCallIsVarcharArray() { + RelDataType returnType = inferReturnType(); + assertEquals(SqlTypeName.ARRAY, returnType.getSqlTypeName()); + RelDataType element = returnType.getComponentType(); + assertNotNull(element); + assertEquals(SqlTypeName.VARCHAR, element.getSqlTypeName()); + assertTrue(element.isNullable(), "Element type should be nullable per existing semantics"); + } + + /** array(NULL) — single typeless-null operand — also falls back to ARRAY. */ + @Test + public void testReturnTypeForAllNullOperandsIsVarcharArray() { + RelDataTypeFactory typeFactory = newTypeFactory(); + RelDataType nullType = typeFactory.createSqlType(SqlTypeName.NULL); + RelDataType returnType = inferReturnType(nullType); + assertEquals(SqlTypeName.ARRAY, returnType.getSqlTypeName()); + RelDataType element = returnType.getComponentType(); + assertNotNull(element); + assertEquals(SqlTypeName.VARCHAR, element.getSqlTypeName()); + } + + /** array(1) — INTEGER operand — preserves the inferred element type (no fallback). */ + @Test + public void testReturnTypeForIntegerOperandPreservesType() { + RelDataTypeFactory typeFactory = newTypeFactory(); + RelDataType intType = typeFactory.createSqlType(SqlTypeName.INTEGER); + RelDataType returnType = inferReturnType(intType); + assertEquals(SqlTypeName.ARRAY, returnType.getSqlTypeName()); + RelDataType element = returnType.getComponentType(); + assertNotNull(element); + assertEquals( + SqlTypeName.INTEGER, + element.getSqlTypeName(), + "Concrete element types must not be affected by the VARCHAR fallback"); + } + + /** array('a', 'b') — VARCHAR operands — already VARCHAR, fallback path doesn't fire. */ + @Test + public void testReturnTypeForVarcharOperandPreservesType() { + RelDataTypeFactory typeFactory = newTypeFactory(); + RelDataType varcharType = typeFactory.createSqlType(SqlTypeName.VARCHAR); + RelDataType returnType = inferReturnType(varcharType, varcharType); + assertEquals(SqlTypeName.ARRAY, returnType.getSqlTypeName()); + assertEquals(SqlTypeName.VARCHAR, returnType.getComponentType().getSqlTypeName()); + } + + /** + * Helper — invokes {@code new ArrayFunctionImpl().getReturnTypeInference().inferReturnType(...)} + * via Calcite's {@link ExplicitOperatorBinding}, which is the public test harness for exercising + * a return-type inference against a specific operand-type list. We bind it to {@link + * SqlLibraryOperators#ARRAY} so the inference's internal call to {@code + * SqlLibraryOperators.ARRAY.getReturnTypeInference().inferReturnType(...)} resolves the same + * operator the lambda delegates to. + */ + private static RelDataType inferReturnType(RelDataType... operandTypes) { + RelDataTypeFactory typeFactory = newTypeFactory(); + ExplicitOperatorBinding binding = + new ExplicitOperatorBinding( + typeFactory, SqlLibraryOperators.ARRAY, Arrays.asList(operandTypes)); + return new ArrayFunctionImpl().getReturnTypeInference().inferReturnType(binding); + } + + private static RelDataTypeFactory newTypeFactory() { + return new SqlTypeFactoryImpl(RelDataTypeSystem.DEFAULT); + } } From 22d0cce676078babd5e57f25264d35a24b159e62 Mon Sep 17 00:00:00 2001 From: Kai Huang Date: Fri, 8 May 2026 11:06:07 -0700 Subject: [PATCH 3/3] Trim inline comment per review feedback The "why" lives in the PR description; the inline comment now points there instead of duplicating it. Addresses dai-chen's review note on Signed-off-by: Kai Huang #5421. --- .../function/CollectionUDF/ArrayFunctionImpl.java | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java b/core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java index c436ead741f..318f32a41be 100644 --- a/core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java +++ b/core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java @@ -50,14 +50,7 @@ public SqlReturnTypeInference getReturnTypeInference() { RelDataType originalType = SqlLibraryOperators.ARRAY.getReturnTypeInference().inferReturnType(sqlOperatorBinding); RelDataType innerType = originalType.getComponentType(); - // For empty `array()` Calcite infers element type as NULL, which downstream - // serializers (notably the analytics-engine route's substrait converter) - // reject with "Unable to convert the type UNKNOWN". Default to VARCHAR — the - // result is empty either way, so the chosen scalar element type doesn't - // affect any value computation, but it gives the call a substrait-serializable - // type. Existing v2-engine tests (which feed Object lists straight through to - // ExprCollectionValue) are unaffected because the empty list contains no - // elements that need to be cast. + // Default empty/unknown element type to VARCHAR — see PR description for why. if (innerType == null || isUnknownLikeType(innerType.getSqlTypeName())) { innerType = typeFactory.createSqlType(SqlTypeName.VARCHAR); }