Skip to content

[fix](fe) Reject non-null defaults for complex columns#63528

Open
mrhhsg wants to merge 3 commits into
apache:masterfrom
mrhhsg:fix/reject-complex-type-default
Open

[fix](fe) Reject non-null defaults for complex columns#63528
mrhhsg wants to merge 3 commits into
apache:masterfrom
mrhhsg:fix/reject-complex-type-default

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 22, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Complex columns could be created or added with non-null string defaults such as ARRAY DEFAULT '[]'. The default value is stored as a literal string instead of a typed complex value, which makes CREATE TABLE, ALTER TABLE ADD COLUMN, and partial-update default behavior inconsistent. Reject non-null defaults for ARRAY, MAP, STRUCT, JSON, and VARIANT columns during column definition validation while preserving no default and explicit DEFAULT NULL. Update existing regression DDLs and expected outputs that previously relied on empty array defaults.

Release note

Reject non-null default literals for complex type columns.

Check List (For Author)

  • Test:
    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.info.ColumnDefinitionTest
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d datatype_p0/complex_types -s test_complex_default_value
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d query_p0/expression -s test_default_expr
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_alter_table_column
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d unique_with_mow_p0/partial_update -s test_primary_key_partial_update_default_value
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d unique_with_mow_p0/partial_update -s test_primary_key_partial_update_complex_type
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d unique_with_mow_c_p0/partial_update -s test_primary_key_partial_update_complex_type
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d query_p0/sql_functions/table_function -s explode
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_dup_schema_key_add
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_unique_schema_key_change_add
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_agg_schema_key_add
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_modify_struct
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d mysql_fulltext_array_contains -s load
    • Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d nereids_function_p0 -s load
  • Behavior changed: Yes. Non-null default literals for complex columns are rejected.
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Complex columns could be created or added with non-null string defaults such as ARRAY DEFAULT '[]'. The default value is stored as a literal string instead of a typed complex value, which makes CREATE TABLE, ALTER TABLE ADD COLUMN, and partial-update default behavior inconsistent. Reject non-null defaults for ARRAY, MAP, STRUCT, JSON, and VARIANT columns during column definition validation while preserving no default and explicit DEFAULT NULL. Update existing regression DDLs and expected outputs that previously relied on empty array defaults.

### Release note

Reject non-null default literals for complex type columns.

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.info.ColumnDefinitionTest
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d datatype_p0/complex_types -s test_complex_default_value
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d query_p0/expression -s test_default_expr
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_alter_table_column
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d unique_with_mow_p0/partial_update -s test_primary_key_partial_update_default_value
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d unique_with_mow_p0/partial_update -s test_primary_key_partial_update_complex_type
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d unique_with_mow_c_p0/partial_update -s test_primary_key_partial_update_complex_type
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d query_p0/sql_functions/table_function -s explode
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_dup_schema_key_add
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_unique_schema_key_change_add
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_agg_schema_key_add
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d schema_change_p0 -s test_modify_struct
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d mysql_fulltext_array_contains -s load
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d nereids_function_p0 -s load
- Behavior changed: Yes. Non-null default literals for complex columns are rejected.
- Does this need documentation: No
@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 22, 2026

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

Checkpoint conclusions:

  • Goal and tests: The PR rejects non-null defaults for ARRAY, MAP, STRUCT, JSON, and VARIANT columns while preserving omitted defaults and explicit DEFAULT NULL. The new unit test and regression case cover create-table failures, allowed null/no-default cases, and ALTER ADD COLUMN rejection; existing expected outputs were updated for the new NULL default behavior.
  • Scope and clarity: The main validation change is small and focused in ColumnDefinition, with broad test DDL updates required by the behavior change.
  • Concurrency and lifecycle: Not applicable; this is DDL validation and test data update logic with no new shared mutable state, locks, or lifecycle-sensitive objects.
  • Configuration and compatibility: No new configuration items. Existing persisted tables/defaults are not migrated here; the change affects new DDL validation only.
  • Parallel paths: Create table and schema-change ADD COLUMN paths both flow through ColumnDefinition validation. Legacy ColumnDef translation also delegates into ColumnDefinition, so the old and Nereids paths appear covered.
  • Data correctness: Rejecting string literals for complex defaults avoids storing untyped complex defaults and aligns DEFAULT() / partial-update behavior with nullable complex columns.
  • Test coverage: Coverage is reasonable for the changed behavior. I did not run the test suite locally in this review runner; I reviewed the author-listed test coverage and checked the patch for whitespace issues with git diff --check.
  • Observability: Not applicable; this is user-facing analysis validation with explicit error messages.
  • User focus: No additional user-provided review focus was present.

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 22, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31359 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a0a934b5e3bdeb73a0b1015026abea3c6902fc57, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17766	4071	4018	4018
q2	q3	10831	1365	806	806
q4	4682	477	348	348
q5	7583	2349	2126	2126
q6	250	186	144	144
q7	960	761	646	646
q8	9487	1802	1552	1552
q9	5152	4947	4945	4945
q10	6356	2082	1780	1780
q11	432	270	244	244
q12	627	429	295	295
q13	18118	3431	2778	2778
q14	261	255	248	248
q15	q16	817	774	706	706
q17	1010	951	996	951
q18	7041	5631	5439	5439
q19	1296	1335	1078	1078
q20	650	486	288	288
q21	6314	2968	2625	2625
q22	459	521	342	342
Total cold run time: 100092 ms
Total hot run time: 31359 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4821	4756	4816	4756
q2	q3	4906	5248	4632	4632
q4	2155	2224	1399	1399
q5	5014	4541	4621	4541
q6	254	180	128	128
q7	1917	1721	1550	1550
q8	2518	2159	2160	2159
q9	7634	7235	7293	7235
q10	4485	4480	3970	3970
q11	522	406	347	347
q12	714	734	514	514
q13	3097	3383	2805	2805
q14	284	281	260	260
q15	q16	678	698	613	613
q17	1280	1263	1256	1256
q18	7295	6813	6850	6813
q19	1134	1112	1129	1112
q20	2214	2190	1926	1926
q21	5420	4654	4540	4540
q22	530	474	402	402
Total cold run time: 56872 ms
Total hot run time: 50958 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169370 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a0a934b5e3bdeb73a0b1015026abea3c6902fc57, data reload: false

query5	4334	665	513	513
query6	336	221	203	203
query7	4250	564	305	305
query8	326	234	215	215
query9	8837	3967	3970	3967
query10	463	338	295	295
query11	5598	2402	2236	2236
query12	182	126	123	123
query13	1262	645	461	461
query14	6018	5394	5023	5023
query14_1	4356	4359	4315	4315
query15	212	200	182	182
query16	1016	450	447	447
query17	1131	719	588	588
query18	2614	489	354	354
query19	213	197	158	158
query20	134	126	126	126
query21	208	143	124	124
query22	13765	13620	13350	13350
query23	17288	16409	15938	15938
query23_1	16170	16079	16051	16051
query24	7407	1748	1293	1293
query24_1	1308	1302	1267	1267
query25	557	471	416	416
query26	1313	320	173	173
query27	2678	579	349	349
query28	4428	1984	1929	1929
query29	1030	652	530	530
query30	306	240	202	202
query31	1138	1090	959	959
query32	98	79	79	79
query33	549	359	305	305
query34	1176	1166	671	671
query35	773	788	702	702
query36	1362	1319	1221	1221
query37	159	109	91	91
query38	3218	3121	3100	3100
query39	933	912	885	885
query39_1	873	871	876	871
query40	244	152	132	132
query41	72	70	68	68
query42	116	112	111	111
query43	327	334	297	297
query44	
query45	217	208	203	203
query46	1114	1187	738	738
query47	2280	2348	2113	2113
query48	409	417	293	293
query49	648	517	407	407
query50	1037	358	257	257
query51	4298	4266	4157	4157
query52	108	109	95	95
query53	262	281	207	207
query54	335	290	266	266
query55	98	92	100	92
query56	322	316	320	316
query57	1396	1389	1278	1278
query58	308	284	280	280
query59	1563	1638	1424	1424
query60	335	336	324	324
query61	182	177	190	177
query62	690	627	563	563
query63	246	201	203	201
query64	2430	839	645	645
query65	
query66	1691	477	354	354
query67	30192	30064	29947	29947
query68	
query69	461	337	293	293
query70	1044	945	1009	945
query71	311	280	271	271
query72	3086	2774	2443	2443
query73	828	767	414	414
query74	5094	4890	4744	4744
query75	2658	2590	2275	2275
query76	2325	1146	817	817
query77	402	403	339	339
query78	12172	12206	11661	11661
query79	1489	1043	760	760
query80	905	555	455	455
query81	489	277	240	240
query82	1353	172	123	123
query83	359	285	251	251
query84	261	147	109	109
query85	939	557	469	469
query86	430	335	323	323
query87	3451	3352	3199	3199
query88	3582	2668	2669	2668
query89	443	384	335	335
query90	1886	186	180	180
query91	183	167	155	155
query92	78	88	80	80
query93	1474	1613	865	865
query94	625	335	303	303
query95	653	465	353	353
query96	1057	857	352	352
query97	2740	2735	2622	2622
query98	239	229	240	229
query99	1172	1145	1039	1039
Total cold run time: 254207 ms
Total hot run time: 169370 ms

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Complex array columns no longer use non-null empty array defaults. Some Cloud P0 expected outputs still assumed omitted array columns were filled with empty arrays, and the array-function expected output still assumed the nullable boolean-array load table kept missing values. Refresh the affected stream-load, HTTP-stream, and array-function expected outputs to match the new DEFAULT NULL/no-default behavior and explicitly loaded boolean-array data.

### Release note

None

### Check List (For Author)

- Test:
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d load_p0/stream_load -s test_stream_load_properties
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d load_p0/http_stream -s test_http_stream_properties
    - Regression test: ./run-regression-test.sh --conf output/local-regression/regression-conf-46001.groovy --run -d nereids_function_p0/scalar_function -s nereids_scalar_fn_Array1
- Behavior changed: No
- Does this need documentation: No
@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 22, 2026

run cloud_p0

@github-actions
Copy link
Copy Markdown
Contributor

Possible file(s) that should be tracked in LFS detected: 🚨

The following file(s) exceeds the file size limit: 1048576 bytes, as set in the .yml configuration files:

  • regression-test/data/nereids_function_p0/scalar_function/Array1.out

Consider using git-lfs to manage large files.

@github-actions github-actions Bot added the lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request label May 22, 2026
@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 22, 2026

run compile

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants