Skip to content

Conversation

@zstack-robot-2
Copy link
Collaborator

Resolves: ZSTAC-79460

Change-Id: I6b77656f6668796677626b6c6b7361696e6a6274

sync from gitlab !8973

bustezero and others added 30 commits September 4, 2025 17:20
fix apply userdata when vm attach none default nic

Resolves: ZSTAC-75776

Change-Id: I6868776c9fbf656008d74d039ea7eb3979dde737
Change-Id: I7a7670696e696f706e697864656f77636f746270
<fix>[compute]: fix nicManager setNicDriverType

See merge request zstackio/zstack!8337
<fix>[root]: bump version to 5.4.0

See merge request zstackio/zstack!8359
Change-Id: I776370656a6b707279747a666b7362767a6d6a67
Resolves: ZSTAC-74198

Change-Id: I637a74198f696e77766377626f78647269707470

<fix>[sdk]: support pushing user data to Keycloak

Resolves: ZSTAC-74189

Change-Id: I7375741896676741897574646678757a7a7a7578

<fix>[iam2]: multe region adapter

1、APICreateAccountMsg support set `type`
2、APICreateIAM2VirtualIDMsg support set `type`
3、APICreateIAM2VirtualIDMsg.attributes support set `uuid`
4、APICreateIAM2OrganizationMsg.attributes support set `uuid`
5、APICreateIAM2VirtualIDGroupMsg.attributes support set `uuid`
6、APICreateIAM2ProjectMsg.attributes support set `uuid`

Resolves: ZSTAC-74189

Change-Id: I62726d7a736b67706b6176726d75636568677068

<fix>[sdk]: sdk add morph flag

Resolves: ZSTAC-74189

Change-Id: I656b6d6c6c72687a6669616864736361637a7475

<fix>[sdk]: sdk add morph flag 2

Resolves: ZSTAC-74189

Change-Id: I717a756d6c6f75726d74716d6c707063666d786e

<fix>[iam2]: CreateIAM2ProjectFromTemplate add params `linkAccountUuid`

Resolves: ZSTAC-74189

Change-Id: I6370787a6865776c616e7665796364666e716b71

<fix>[iam2]: CreateIAM2Project add params `linkAccountUuid`

Resolves: ZSTAC-74189

Change-Id: I6a717564676d63647379797177746a796f646e61

<fix>[iam2]: SetIAM2ProjectLoginExpired add params `loginExpiredAttributeUuid`

Resolves: ZSTAC-74189

Change-Id: I756a6967636463777775617a636c79617a676774
Resolves: ZSTAC-77221

Change-Id: I687377786d6f7a73636e61687a776a6b686b6d7a
<fix>[compute]: allow vm state change from unknown to nostate

See merge request zstackio/zstack!8349
<fix>[ceph]: correct backup param

See merge request zstackio/zstack!8363
<fix>[vm]: add default vm gpu alarm and event

See merge request zstackio/zstack!8305
Resolves: ZSTAC-75806

Change-Id: I616963747a756966766c6d6f7a6375796c626875

Signed-off-by: zhangjianjun <jianjun.zhang@zstack.io>
<fix>[sso]: migrate sso config storage from SystemTagVO to AttributesVO

See merge request zstackio/zstack!8346
Resolves: ZSTAC-75666

Change-Id: I6d766a70776676646862777475656e6f6162656f

Signed-off-by: zhangjianjun <jianjun.zhang@zstack.io>
Resolves: ZSTAC-77707

Change-Id: I71637966757761786c6b6467636c626171726f6a
<fix>[VirtualRouter]: clean pf rule on backup VirtualRouterVmVO

See merge request zstackio/zstack!8370
Resolves: ZSTAC-64577

Change-Id: I6676737167747a68767479776572626872686e6f
<fix>[dependencies]: update quartz to version 2.4.0

See merge request zstackio/zstack!8351
Resolves: ZSTAC-75481

Change-Id: I71727a7163766c72687677667a6d6b75726b6e76

Signed-off-by: zhangjianjun <jianjun.zhang@zstack.io>
Resolves: ZSTAC-77662

Change-Id: I6868727a64716d6661777064646f627779727764
<fix>[ovn]: config ovn ovs by ResourceConfig

See merge request zstackio/zstack!8389
<fix>[nic]: Cleaning dirty Data

See merge request zstackio/zstack!8391
<fix>[xinfini]: support set volume qos

See merge request zstackio/zstack!8374
- sso login support saml 2.0

DBImpact

Resolves: ZSTAC-76199

Change-Id: I656d6e726463787970636666756b746263766a79
Resolves: ZSV-9789

Change-Id: I74656b756a74716e6a697a6875766c766e73766e
Add new GuestOS in guestOsCategory and guestOsCharacter

Resolves: ZSTAC-77340

Change-Id: I6868776c69929b27a5b34e1bb544e419e4724e10
<fix>[securityGroup]: strip space tab newLine in rule

See merge request zstackio/zstack!8368
<improve>[conf]: Add new GuestOS

See merge request zstackio/zstack!8402
Resolves: ZSTAC-78037

Change-Id: I71696c6579646b6f6774717a766c6d6365656c6e
<fix>[storage]: extrenal ps save imageCache size with actualsize

See merge request zstackio/zstack!8398
<feature>[sdk]: support saml2.0

See merge request zstackio/zstack!8392
AlanJager and others added 28 commits November 29, 2025 13:22
Resolves: ZSTAC-80119

Change-Id: I73726b787377706261676f6874726f6e61696f62
<fix>[sdk]: Update sdk

See merge request zstackio/zstack!8779
Resolves: ZSTAC-79936

Change-Id: I65657166766872677463796e796d6f6f78746964
DBImpact

Resolves: ZSTAC-79981

Change-Id: I706a65776377706f73766f62786e667472677a6c
Resolves: ZSTAC-79981

Change-Id: I766568646477627a766c6570617871786a776675
<fix>[conf]: Add isolated field to gpu and gpu spec

See merge request zstackio/zstack!8796
Resolves: ZSTAC-80203

Change-Id: I73736d736c7775677273766572787462756d7a71
DBImpact

Resolves: ZSTAC-80334

Change-Id: I6d7470667a6b666a617575707474626a626a7a68
<fix>[sdk]: Fix sdk and upgrade gpuVendor field

See merge request zstackio/zstack!8793
<fix>[conf]: Fix model services whose cpu arch missed in db

See merge request zstackio/zstack!8806
Resolves: ZSTAC-78057

Change-Id: I67617a667567756a7578706a6c78646e6267756d
Resolves: ZSTAC-80381

Change-Id: I7673656d7769756d706b736467797868666f7a69
<fix>[sdk]: Update sdk and add new fields

See merge request zstackio/zstack!8800
DBImpact

Resolves: ZSTAC-80203

Change-Id: I6473736b7a716b77796b6b796467716e7062627a
<fix>[conf]: Fix missing records in ModelServiceGpuVendorVO

See merge request zstackio/zstack!8818
1. add snapshot placement capability to external storage
2. ext ps controller cannot use extension point, move it to exp ps.
3. retain the ask meaning of AskSnapshotCapabilityMsg.

Resolves: ZSTAC-79582 ZSTAC-79481

Change-Id: I7a7074707079616b696f7375757577666c79756b
<fix>[storage]: use specify msg get volume path

See merge request zstackio/zstack!8810
<fix>[compute]: Export public method for static ip case

See merge request zstackio/zstack!8824
DBImpact

Resolves: ZSTAC-79761

Change-Id: I7662666569656f6c736d736667747679726a666c
<feature>[conf]: Upgrade GpuDeviceVO add gpuStatus

See merge request zstackio/zstack!8701
- add column supportMetrics for ModelServiceInstanceGroupVO

DBImpact

Resolves: ZSTAC-80375

Change-Id: I6862697569686b7a6b6e787673696e71736f6169
<fix>[conf]: update ModelServiceInstanceGroupVO schema

See merge request zstackio/zstack!8827
Fix status from lowercase to uppercase, to avoid
Enum.valueOf failure

DBImpact

Resolves: ZSTAC-80451

Change-Id: I67646d73796f7076626d6a616664746c76757979
Resolves: ZSTAC-80172

Change-Id: I7774626b796a6b626a626b78656d6f69656a7274
<fix>[conf]: Fix gpu work status upgrade sql

See merge request zstackio/zstack!8833
<fix>[sdk]: Update sdk

See merge request zstackio/zstack!8835
<fix>[ceph]: change default password

See merge request zstackio/zstack!8837
APIImpact

Resolves: ZSTAC-12345

Change-Id: I736a6c736b7267716d6e66706c7963796379756c
@coderabbitai
Copy link

coderabbitai bot commented Dec 30, 2025

Walkthrough

本次变更引入插件抽象与SSO/OAuth2框架、主机/网络/VM分配与迁移多处控制流重构与扩展,新增多类扩展点与工厂管理器,强化NIC/IP分配与磁盘/主存储选择,新增主机电源/IPMI执行器,多项全局配置与系统标签,且包含大量数据库迁移与新表。

Changes

Cohort / File(s) Summary
Git Hooks & Tools
./.gitconfig/hooks/commit-msg, ./.gitconfig/hooks/prepare-commit-msg, build/bump_version.py, build/deploydb.sh
Python2/3 I/O兼容与UTF-8统一;JIRA模式新增;版本批量更新工具新增;DB部署脚本支持greatdb分支。
Meta & Docs
README*.md, CONTRIBUTORS, .gitignore, VERSION, PJNUM
文档链接与内容更新;忽略规则新增;版本号更新至5.x;新增PJNUM文件。
Abstraction/Plugin 基础
abstraction/src/main/java/org/zstack/abstraction/*, .../crypto/CryptoClientDriver.java, .../sns/*, .../sso/*
新增插件接口、校验器、异常、OptionType、端点数据与OAuth2提供方默认实现、RequestData等。
Allocator 栈增强
compute/.../allocator/*
支持多集群指派;Host/L2/L3过滤重写与扩展点引入;预检查与并行过滤;保留容量接口签名加超参;OS一致性过滤;主存储过滤改为基于Spec;排序链支持跳过保留。
Host 管理
compute/.../host/*
新增IPMI电源执行器与接口;拦截器新增多API校验;周期任务刷新电源状态;网络服务类型管理;追踪器支持跳过ping、传递上次错误;连接/重连流程与事件钩子扩展;系统/全局配置新增。
VM 核心与工厂
compute/.../vm/VmFactoryManager*.java, .../VmInstanceManagerImpl.java, .../InitializeResourceConfigExtensionPoint.java
新增VM工厂管理器与实现;创建流程重构为共享链式流程;资源配置初始化扩展点加入。
VM 网络与NIC/IP
compute/.../vm/VmAllocateNic*Flow.java, .../VmAllocateNicIpFlow.java, .../VmPostReleaseNicFlow.java, .../VmReturnReleaseNicFlow.java, .../VmNic*.*, CustomNicOperator.java, MacOperator.java, StaticIpOperator.java, VmSystemTags.java
引入独立IP分配Flow;NIC创建/驱动选择/类型判定重构;自定义NIC/静态IP操作器;系统标签大幅扩展与敏感输出处理;回收流程异步化。
VM 主存储/卷/ISO/镜像
compute/.../vm/VmAllocatePrimaryStorage*Flow.java, .../VmAllocateVolumeFlow.java, .../ApplianceVmAllocatePrimaryStorageFlow.java, .../VmInstantiate*Flow.java, .../VmAttach*Flow.java, .../VmDownloadIsoFlow.java, .../VmAssignDeviceIdToAttachingVolumeFlow.java
主存储选择由“必选”改“候选集合”;数据卷/ISO流程安全性与上下文传递增强;连接性校验与回滚路径完善;分配组合与排序重构。
VM 迁移/启动/停止 扩展
compute/.../vm/VmMigrate*Flow.java, .../VmStartOnHypervisorFlow.java, .../VmStopOnHypervisorFlow.java, VmInstanceExtensionPointEmitter.java, VmTracer.java
迁移前后扩展点改为异步回调;新增downTime传递;启动/停止消息携带debug/状态记录;NoState状态接受。
VM 规格/拓扑/硬件
CpuTopology.java, VmHardwareSystemTags.java, VmHardwareExtensionPoint.java, VmCpuVendor.java, VmGlobalConfig.java, VmGlobalProperty.java (新增)
新增CPU拓扑计算与校验;硬件相关系统标签与扩展点;全局配置项扩展(音频、最大VCPU、HA/唯一名等)。
Cluster & Zone
compute/.../cluster/*, compute/.../zone/*, .../arch/*
集群OS更新流程重构并支持单主机;架构资源配置初始化/验证/自动填充;Zone默认标记更新改为串行队列。
Quota 与工具
compute/.../vm/quota/*, VmQuotaOperator.java, VmQuotaUtil.java, VmInstanceUtils.java, NewVmInstanceMsgBuilder.java, VmAfterInstantiateVolumeInAttachingVolumeFlow.java
新增运行中CPU/内存/数量/数据卷/容量配额定义;创建参数与规格推导与校验增强;扩展点发射器改异步。
设备与地址管理
compute/.../vm/devices/VmInstanceDeviceManagerImpl.java
新增按资源删除地址与从归档恢复已存在设备地址;校验增强。
删除/调整
VmCdRomMemorySnapshotGroupExtension.java (删除), 多处构造器/初始化改为final字段
删除CD-ROM内存快照扩展;若干类改为即时初始化,减少构造逻辑。
DB 迁移脚本
conf/db/**
大量新表/视图/存储过程/列增改、FK/索引调整、数据迁移与清洗;含SSO/OAuth2、网络、主机、NVMe、模型服务/模板/容器/观测、许可计费、ZDFS等领域。

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User as User
  participant Client as OAuth2ProviderDriver impl
  participant IdP as Identity Provider
  participant App as ZStack SSO Service

  User->>Client: Request login (OIDC/OAuth2)
  Client->>IdP: buildRequestTokenData(authorization_code)
  IdP-->>Client: access_token / id_token
  Client->>IdP: buildRequestUserInfoData(headers)
  IdP-->>Client: userinfo JSON
  Client->>App: parseUserInfo() -> principal map
  App-->>User: session established
  note over Client,IdP: 默认实现可被覆盖;支持refresh/introspect
Loading
sequenceDiagram
  autonumber
  participant API as API Layer
  participant VM as VmAllocateNicIpFlow
  participant Net as L3/IPAM Service
  participant DB as Database
  participant Bus as CloudBus

  API->>VM: Run(flow, spec with NICs)
  VM->>DB: Load NICs & static IP tags
  VM->>Net: AllocateIpMsg (per L3, v4/v6)
  Net-->>VM: AllocateIpReply (UsedIp)
  VM->>DB: Persist UsedIpVO & update VmNicVO
  VM->>Bus: Progress report
  VM-->>API: next() or fail(error)
  note over VM,Net: 支持静态IP/IPv6与回滚(ReturnIp)
Loading
sequenceDiagram
  autonumber
  participant User as Operator
  participant Manager as HostManagerImpl
  participant Exec as HostIpmiPowerExecutor
  participant DB as Database
  participant Host as Physical Host

  User->>Manager: APIGetHostPowerStatus / PowerOn/Off/Reset
  Manager->>Exec: getPowerStatus()/powerOn()/...
  Exec->>Host: ipmitool command
  Host-->>Exec: stdout/return code
  Exec->>DB: updateIpmiPowerStatus
  Exec-->>Manager: Completion(success/error)
  Manager-->>User: APIEvent with result
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~180 minutes

Poem

田野里,代码风吹拂,耳朵竖成弧~
我把插件与云海缝成新版图。
NIC跳舞,IP像星河归宿;
主机轻按电门,月色里嗡嗡述。
迁徙的VM踏着序列图,
咔嗒,数据库开新路。
啊,一只兔——提笔写下:5.0,跳!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题符合 [scope]: 格式,长度不超过72字符,并清晰描述主要改动(将 VolumeSpec 字段从 int 改为 long)。
Description check ✅ Passed PR 描述包含问题编号 ZSTAC-79460 与 Change-Id,并标注为从 GitLab 同步,内容与变更目的相关,满足宽松描述检查。
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
compute/src/main/java/org/zstack/compute/vm/VmReleaseResourceFlow.java (1)

23-30: 字段初始化顺序导致 NullPointerException

Line 30 在实例字段初始化时调用 pluginRgty.getExtensionList(...),但此时 Spring 的 @Autowired 注入尚未完成,pluginRgty 仍为 null,会导致运行时 NPE。

Java 字段初始化顺序:

  1. 实例字段初始化器执行
  2. 构造函数执行
  3. Spring 依赖注入(@Autowired)在构造期间/之后完成
🔎 建议的修复方案

方案 1:使用 @PostConstruct(推荐)

+import javax.annotation.PostConstruct;
+
 @Configurable(preConstruction = true, autowire = Autowire.BY_TYPE)
 public class VmReleaseResourceFlow implements Flow {
     private static final CLogger logger = Utils.getLogger(VmReleaseResourceFlow.class);
     
     @Autowired
     private PluginRegistry pluginRgty;
     
-    private final List<VmReleaseResourceExtensionPoint> extensions = pluginRgty.getExtensionList(VmReleaseResourceExtensionPoint.class);
+    private List<VmReleaseResourceExtensionPoint> extensions;
+    
+    @PostConstruct
+    private void init() {
+        extensions = pluginRgty.getExtensionList(VmReleaseResourceExtensionPoint.class);
+    }

方案 2:使用构造器注入

 @Configurable(preConstruction = true, autowire = Autowire.BY_TYPE)
 public class VmReleaseResourceFlow implements Flow {
     private static final CLogger logger = Utils.getLogger(VmReleaseResourceFlow.class);
     
-    @Autowired
-    private PluginRegistry pluginRgty;
+    private final PluginRegistry pluginRgty;
+    private final List<VmReleaseResourceExtensionPoint> extensions;
     
-    private final List<VmReleaseResourceExtensionPoint> extensions = pluginRgty.getExtensionList(VmReleaseResourceExtensionPoint.class);
+    @Autowired
+    public VmReleaseResourceFlow(PluginRegistry pluginRgty) {
+        this.pluginRgty = pluginRgty;
+        this.extensions = pluginRgty.getExtensionList(VmReleaseResourceExtensionPoint.class);
+    }
compute/src/main/java/org/zstack/compute/allocator/HostCapacityReserveManagerImpl.java (2)

180-205: 关键缺陷:未设置 finder.hypervisorType

getReservedHostCapacityByZones 方法中,虽然查询时过滤了 hypervisorType (lines 188-190),但创建 ReservedCapacityFinder 后没有设置 finder.hypervisorType 字段。这与 getReservedHostCapacityByClusters 方法的实现不一致(该方法在 line 226 设置了此字段)。

虽然查询已经按 hypervisorType 过滤了主机列表,但 finder 内部的 findReservedCapacityByHypervisorType() 方法可能依赖此字段进行额外的逻辑处理,缺少此设置可能导致行为不一致。

🔎 建议的修复
     ReservedCapacityFinder finder = new ReservedCapacityFinder();
     finder.hostUuids = huuids;
+    finder.hypervisorType = hypervisorType;
     Collection<ReservedHostCapacity> col = finder.find().values();

237-241: 潜在的运行时异常风险

Line 240 直接调用 .iterator().next() 而未检查集合是否为空。如果 hostUuids 为空列表,finder.find().values() 将返回空集合,调用 .iterator().next() 会抛出 NoSuchElementException

建议添加空值检查或确保调用方保证 hostUuids 非空。

🔎 建议的防御性修复
 @Override
 public ReservedHostCapacity getReservedHostCapacityByHosts(List<String> hostUuids) {
+    if (hostUuids == null || hostUuids.isEmpty()) {
+        ReservedHostCapacity ret = new ReservedHostCapacity();
+        ret.setReservedCpuCapacity(0);
+        ret.setReservedMemoryCapacity(0);
+        return ret;
+    }
     ReservedCapacityFinder finder = new ReservedCapacityFinder();
     finder.hostUuids = hostUuids;
     return finder.find().values().iterator().next();
 }
conf/db/upgrade/V4.4.0__schema.sql (3)

9-9: 建议使用IF EXISTS确保幂等性

DROP INDEX语句缺少IF EXISTS子句,如果升级脚本重复执行会导致错误。建议修改为:

🔎 建议的修复
-DROP INDEX idxLicenseHistoryVOHash ON LicenseHistoryVO;
+DROP INDEX IF EXISTS idxLicenseHistoryVOHash ON LicenseHistoryVO;

或者使用存储过程检查索引是否存在后再删除。

Based on learnings, ZStack数据库升级脚本需要通过information_schema检查来确保幂等性。


121-122: 外键约束删除缺少幂等性检查

Lines 121、124、127的DROP FOREIGN KEY语句缺少存在性检查,如果升级脚本重复执行将会失败。由于MySQL不支持DROP FOREIGN KEY IF EXISTS语法,建议使用存储过程通过information_schema检查约束是否存在后再删除。

🔎 建议的幂等性处理模式
DELIMITER $$
CREATE PROCEDURE dropForeignKeyIfExists(
    IN tableName VARCHAR(64),
    IN constraintName VARCHAR(64)
)
BEGIN
    DECLARE constraintExists INT DEFAULT 0;
    
    SELECT COUNT(*) INTO constraintExists 
    FROM information_schema.TABLE_CONSTRAINTS 
    WHERE CONSTRAINT_SCHEMA = 'zstack' 
        AND TABLE_NAME = tableName 
        AND CONSTRAINT_NAME = constraintName 
        AND CONSTRAINT_TYPE = 'FOREIGN KEY';
    
    IF constraintExists > 0 THEN
        SET @sql = CONCAT('ALTER TABLE `zstack`.`', tableName, '` DROP FOREIGN KEY `', constraintName, '`');
        PREPARE stmt FROM @sql;
        EXECUTE stmt;
        DEALLOCATE PREPARE stmt;
    END IF;
END $$
DELIMITER ;

CALL dropForeignKeyIfExists('HostNumaNodeVO', 'HostNumaNodeVO_HostEO_uuid_fk');
CALL dropForeignKeyIfExists('VmInstanceNumaNodeVO', 'VmInstanceNumaNodeVO_VmInstanceEO_uuid_fk');
CALL dropForeignKeyIfExists('HostAllocatedCpuVO', 'HostAllocatedCpuVO_HostEO_uuid_fk');

DROP PROCEDURE IF EXISTS dropForeignKeyIfExists;

Based on learnings, ZStack数据库升级脚本需要通过information_schema检查表和约束的存在性来确保脚本可以安全重复执行。

Also applies to: 124-125, 127-128


130-140: Timestamp 列缺少 DEFAULT 值,违反编码规范

lastOpDatecreateDate 列应分别添加 DEFAULT CURRENT_TIMESTAMPDEFAULT CURRENT_TIMESTAMP NOT NULL。当前定义缺少默认值会导致新插入的记录这两个字段为 NULL,且不符合编码规范要求(应使用 DEFAULT CURRENT_TIMESTAMP 而非 DEFAULT '0000-00-00 00:00:00')。此外,createDate 应添加 NOT NULL 约束以保证数据完整性。

compute/src/main/java/org/zstack/compute/vm/VmMigrationCheckL2NetworkOnHostFlow.java (1)

39-50: 循环内的数据库查询违反性能最佳实践。

在 for 循环中对每个 L3 网络都执行 dbf.findByUuid() 查询对应的 L2NetworkVO,这会导致 N+1 查询问题,影响性能。

根据编码规范:"禁止循环里套查询,避免嵌套查询带来的性能问题。"

建议在循环外批量加载所有需要的 L2NetworkVO,然后在循环内从 Map 中获取。

🔎 建议的优化方案
     List<CheckL2NetworkOnHostMsg> cmsgs = new ArrayList<CheckL2NetworkOnHostMsg>();
+    // Batch load all L2NetworkVOs
+    List<String> l2Uuids = VmNicSpec.getL3NetworkInventoryOfSpec(spec.getL3Networks())
+            .stream().map(L3NetworkInventory::getL2NetworkUuid).distinct().collect(Collectors.toList());
+    Map<String, L2NetworkVO> l2NetworkMap = Q.New(L2NetworkVO.class)
+            .in(L2NetworkVO_.uuid, l2Uuids)
+            .list()
+            .stream()
+            .collect(Collectors.toMap(L2NetworkVO::getUuid, Function.identity()));
+    
     for (L3NetworkInventory l3 : VmNicSpec.getL3NetworkInventoryOfSpec(spec.getL3Networks())) {
-        CheckL2NetworkOnHostMsg msg = new CheckL2NetworkOnHostMsg();
-        L2NetworkVO l2NetworkVO = dbf.findByUuid(l3.getL2NetworkUuid(), L2NetworkVO.class);
+        L2NetworkVO l2NetworkVO = l2NetworkMap.get(l3.getL2NetworkUuid());
+        if (l2NetworkVO == null) {
+            continue;
+        }
         VSwitchType switchType = VSwitchType.valueOf(l2NetworkVO.getvSwitchType());
         if (!switchType.isAttachToCluster()) {
             continue;
         }
+        CheckL2NetworkOnHostMsg msg = new CheckL2NetworkOnHostMsg();
         msg.setL2NetworkUuid(l3.getL2NetworkUuid());

需要添加 java.util.stream.Collectorsjava.util.function.Function 导入。

compute/src/main/java/org/zstack/compute/vm/VmAllocateHostForMigrateVmFlow.java (1)

44-78: setAllowNoL3Networks(true) 被重复设置

第 46 行在 MigrateVmMessage 条件块内设置 msg.setAllowNoL3Networks(true),但第 78 行又无条件地设置了相同的值。这导致第 46 行的设置变得多余。

如果所有迁移场景都需要 allowNoL3Networks=true,则应移除第 46 行的重复调用;如果只有 MigrateVmMessage 需要,则第 78 行可能是错误的。

🔎 建议移除重复设置
         if (spec.getMessage() instanceof MigrateVmMessage) {
             destHostUuid = ((MigrateVmMessage)spec.getMessage()).getHostUuid();
-            msg.setAllowNoL3Networks(true);
         }
compute/src/main/java/org/zstack/compute/vm/VmHostNameHelper.java (1)

21-37: 两个 getHostName 方法的 IP 版本判断逻辑不一致

getHostName(VmInstanceInventory) 在第 27 行使用 nic.getIpVersion() == IPv6Constants.IPv4 判断,而 getHostName(VmInstanceVO) 在第 45 行使用 NetworkUtils.isIpv4Address(nic.getIp())。这两种方式的语义不同:

  • 前者依赖 NIC 的 ipVersion 属性
  • 后者通过解析 IP 地址字符串判断

建议统一两个方法的判断逻辑以保持一致性。

Also applies to: 39-55

compute/src/main/java/org/zstack/compute/vm/InstantiateVmFromNewCreatedStruct.java (1)

127-145: fromMessage(CreateVmInstanceMsg) 中缺少 sshKeyPairUuids 的初始化

fromMessage(CreateVmInstanceMsg msg) 方法(第 148-165 行)中,CreateVmInstanceMsg 包含 getSshKeyPairUuids() 方法,但未将其值传递给 struct。应在返回前添加:

         struct.setDisableL3Networks(msg.getDisableL3Networks());
         struct.setDiskAOs(msg.getDiskAOs());
+        struct.setSshKeyPairUuids(msg.getSshKeyPairUuids());
         return struct;

注:InstantiateNewCreatedVmInstanceMsg 不包含此字段,故第一个 fromMessage 方法无需修改。

🟠 Major comments (47)
conf/db/upgrade/V4.7.0.2__schema.sql-10-10 (1)

10-10: 添加 NOT NULL 列需使用存储过程处理历史数据

根据编码规范,当存在 NOT NULL 约束时,必须使用存储过程或其他函数来处理历史数据。虽然此处 action 列有 DEFAULT 'ACCEPT',MySQL 会自动为现有行填充默认值,但建议明确使用存储过程以确保升级场景的安全性和可控性。

conf/db/upgrade/V4.6.31__schema.sql-65-66 (1)

65-66: 代码中包含中文文本,且第66行存在 WHERE 条件错误

根据编码规范,代码中不应包含中文。此外,第66行的 WHERE 子句检查 subject = '',但实际更新的是 recoverySubject,这很可能是一个复制粘贴错误。

🔎 修复 WHERE 条件错误
-UPDATE SNSTextTemplateVO SET recoverySubject = replace('报警器 %{ALARM_NAME} %{TITLE_ALARM_RESOURCE_NAME} %{ALARM_CURRENT_STATUS}','%','$') WHERE recoverySubject IS NULL or subject = '';
+UPDATE SNSTextTemplateVO SET recoverySubject = replace('报警器 %{ALARM_NAME} %{TITLE_ALARM_RESOURCE_NAME} %{ALARM_CURRENT_STATUS}','%','$') WHERE recoverySubject IS NULL or recoverySubject = '';

关于中文文本:根据编码规范,应使用英文替代。如果这是用户可见的模板内容,可以考虑将其存储在国际化资源文件中。Based on coding guidelines, "代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写"。

conf/db/upgrade/V5.3.6__schema.sql-29-41 (1)

29-41: SQL 注释中包含中文,违反编码规范。

与上一个 INSERT 语句相同的问题:中文注释需要替换为英文,并且缺少幂等性检查。

🔎 建议修复
 INSERT INTO SystemTagVO
 (`uuid`, `resourceUuid`, `resourceType`, `inherent`, `type`, `tag`, `createDate`, `lastOpDate`)
 SELECT
-    REPLACE(UUID(), '-', ''),                  -- 生成不含连字符的uuid
-    uuid,                                       -- 使用DatasetVO的uuid作为resourceUuid
-    'DatasetVO',                               -- resourceType
-    1,                                         -- inherent
-    'System',                                  -- type
-    'dataset::datatype::Text',                 -- tag
-    CURRENT_TIMESTAMP(),                       -- createDate
-    CURRENT_TIMESTAMP()                        -- lastOpDate
+    REPLACE(UUID(), '-', ''),                  -- generate uuid without hyphens
+    uuid,                                       -- use DatasetVO uuid as resourceUuid
+    'DatasetVO',                               -- resourceType
+    1,                                         -- inherent
+    'System',                                  -- type
+    'dataset::datatype::Text',                 -- tag
+    CURRENT_TIMESTAMP(),                       -- createDate
+    CURRENT_TIMESTAMP()                        -- lastOpDate
 FROM DatasetVO
-WHERE `system` = true;
+WHERE `system` = true
+AND NOT EXISTS (
+    SELECT 1 FROM SystemTagVO st
+    WHERE st.resourceUuid = DatasetVO.uuid
+    AND st.resourceType = 'DatasetVO'
+    AND st.tag = 'dataset::datatype::Text'
+);
conf/db/upgrade/V5.3.6__schema.sql-15-27 (1)

15-27: SQL 注释中包含中文,违反编码规范。

根据编码规范要求,代码中不应包含中文,包括注释。请将所有中文注释替换为英文。

此外,此 INSERT 语句缺少幂等性检查。如果升级脚本重复执行,会插入重复的 SystemTagVO 记录。建议添加存在性检查以确保安全重复执行。

🔎 建议修复
 INSERT INTO SystemTagVO
 (`uuid`, `resourceUuid`, `resourceType`, `inherent`, `type`, `tag`, `createDate`, `lastOpDate`)
 SELECT
-    REPLACE(UUID(), '-', ''),                  -- 生成不含连字符的uuid
-    uuid,                                       -- 使用DatasetVO的uuid作为resourceUuid
-    'DatasetVO',                               -- resourceType
-    1,                                         -- inherent
-    'System',                                  -- type
-    'dataset::usage::scenarios::ModelEval',    -- tag
-    CURRENT_TIMESTAMP(),                       -- createDate
-    CURRENT_TIMESTAMP()                        -- lastOpDate
+    REPLACE(UUID(), '-', ''),                  -- generate uuid without hyphens
+    uuid,                                       -- use DatasetVO uuid as resourceUuid
+    'DatasetVO',                               -- resourceType
+    1,                                         -- inherent
+    'System',                                  -- type
+    'dataset::usage::scenarios::ModelEval',    -- tag
+    CURRENT_TIMESTAMP(),                       -- createDate
+    CURRENT_TIMESTAMP()                        -- lastOpDate
 FROM DatasetVO
-WHERE `system` = true;
+WHERE `system` = true
+AND NOT EXISTS (
+    SELECT 1 FROM SystemTagVO st
+    WHERE st.resourceUuid = DatasetVO.uuid
+    AND st.resourceType = 'DatasetVO'
+    AND st.tag = 'dataset::usage::scenarios::ModelEval'
+);
compute/src/main/java/org/zstack/compute/allocator/AttachedVolumePrimaryStorageAllocatorFlow.java-40-42 (1)

40-42: 改进错误消息的准确性并优化重复调用

当前实现存在以下问题:

  1. 错误消息不准确:错误提示"cannot find root volume"只适用于根卷为 null 的情况,但对于根卷存在但状态不是 Ready 的场景则会产生误导。
  2. 重复调用vm.getRootVolume() 在第 40 行被调用了两次,应缓存结果以提高效率。
  3. 类型安全性:使用字符串比较卷状态虽然可行,但直接比较枚举会更加类型安全。
🔎 建议的改进方案
-        if (vm.getRootVolume() == null || !VolumeStatus.Ready.toString().equals(vm.getRootVolume().getStatus())) {
-            fail(Platform.operr("cannot find root volume of vm[uuid:%s]", vm.getUuid()));
+        VolumeInventory rootVolume = vm.getRootVolume();
+        if (rootVolume == null) {
+            fail(Platform.operr("cannot find root volume of vm[uuid:%s]", vm.getUuid()));
+        }
+        if (!VolumeStatus.Ready.toString().equals(rootVolume.getStatus())) {
+            fail(Platform.operr("root volume[uuid:%s] of vm[uuid:%s] is not ready, current status: %s", 
+                rootVolume.getUuid(), vm.getUuid(), rootVolume.getStatus()));
         }
compute/src/main/java/org/zstack/compute/vm/VmCpuVendor.java-4-5 (1)

4-5: 枚举常量应使用全大写加下划线的命名规范。

根据 Java 命名规范和代码规范要求,枚举常量应使用全大写字母并用下划线分隔单词(UPPER_SNAKE_CASE),而不是 CamelCase 风格。同时,None 作为常量名不够清晰,建议使用更具描述性的名称如 UNKNOWNUNSPECIFIED

🔎 建议的命名规范修正
 public enum VmCpuVendor {
-    None,
-    AuthenticAMD;
+    UNKNOWN,
+    AUTHENTIC_AMD;
 }

基于代码规范:常量命名要求全部大写,使用下划线分隔单词,并表达清楚避免使用含糊名称。

compute/src/main/java/org/zstack/compute/host/HostPowerExecutor.java-7-16 (1)

7-16: 需要添加 Javadoc 文档。

根据编码规范,接口方法必须配有有效的 Javadoc 注释。当前接口及其方法缺少文档说明,建议补充:

  • 接口的整体用途和职责
  • 每个方法的功能描述
  • 参数说明(特别是 forcereturnEarly 的具体含义)
  • 返回值说明(如 getPowerStatus 的返回值含义)
  • 可能抛出的异常

根据编码规范的要求。

🔎 建议的 Javadoc 文档示例
 /**
- * @Author : jingwang
- * @create 2023/4/13 10:01 AM
+ * Host power management executor interface.
+ * Provides operations to control host power state including power on/off/reset and status query.
+ *
+ * @author jingwang
+ * @since 2023/4/13
  */
 public interface HostPowerExecutor {
+    /**
+     * Powers off the specified host.
+     *
+     * @param host the host to power off
+     * @param force whether to force power off without graceful shutdown
+     * @param completion callback invoked when operation completes
+     * @param returnEarly whether to return immediately without waiting for operation completion
+     */
     void powerOff(HostVO host, Boolean force, Completion completion, boolean returnEarly);
+    
+    /**
+     * Powers on the specified host.
+     *
+     * @param host the host to power on
+     * @param completion callback invoked when operation completes
+     */
     void powerOn(HostVO host, Completion completion);
+    
+    /**
+     * Resets (reboots) the specified host.
+     *
+     * @param host the host to reset
+     * @param completion callback invoked when operation completes
+     * @param returnEarly whether to return immediately without waiting for operation completion
+     */
     void powerReset(HostVO host, Completion completion, boolean returnEarly);
+    
+    /**
+     * Queries the current power status of the specified host.
+     *
+     * @param host the host to query
+     * @return the current power status
+     */
     HostPowerStatus getPowerStatus(HostVO host);
 }
conf/db/upgrade/V4.5.0__schema.sql-131-132 (1)

131-132: 时间戳列缺少 NOT NULL 和默认值

SSORedirectTemplateVO 表的时间戳列定义不完整。

🔎 建议修复
-    `lastOpDate` timestamp ON UPDATE CURRENT_TIMESTAMP,
-    `createDate` timestamp,
+    `lastOpDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+    `createDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
conf/db/upgrade/V4.4.46__schema.sql-2-25 (1)

2-25: 重大问题:脚本缺乏幂等性保护

该升级脚本不具备幂等性。如果脚本因故重复执行(如升级失败后重试),会再次尝试插入和更新相同的记录,导致数据重复或执行失败。

根据 ZStack 项目的最佳实践和从学习中获取的经验,数据库升级脚本必须通过检查 INFORMATION_SCHEMA 确保可以安全重复执行。

🔎 建议的修复方案

在存储过程开始处添加执行状态检查:

DELIMITER $$
CREATE PROCEDURE splitHybridLicenseRecords()
BEGIN
+    -- Check if migration has already been performed
+    IF EXISTS (SELECT 1 FROM `zstack`.`LicenseHistoryVO` 
+               WHERE `licenseType` = 'AddOn' AND `prodInfo` = 'hybrid') THEN
+        SELECT 'Migration already completed, skipping...' AS message;
+    ELSE
    IF EXISTS ( SELECT 1
                   FROM INFORMATION_SCHEMA.COLUMNS
                   WHERE table_name = 'LicenseHistoryVO'
                         AND table_schema = 'zstack'
                         AND column_name = 'capacity') THEN
        insert into `zstack`.`LicenseHistoryVO` (`uuid`, `cpuNum`, `hostNum`, `vmNum`, `expiredDate`, `issuedDate`, `uploadDate`, `licenseType`, `userName`, `prodInfo`, `createDate`, `lastOpDate`, `hash`, `source`, `managementNodeUuid`, `mergedTo`, `capacity`)
            select `uuid`, `cpuNum`, `hostNum`, `vmNum`, `expiredDate`, `issuedDate`, `uploadDate`, 'AddOn' as `licenseType`, `userName`, 'hybrid' as `prodInfo`, `createDate`, `lastOpDate`, `hash`, `source`, `managementNodeUuid`, `mergedTo`, `capacity`
            from `zstack`.`LicenseHistoryVO`
            where `licenseType`='Hybrid';
    ELSE
        insert into `zstack`.`LicenseHistoryVO` (`uuid`, `cpuNum`, `hostNum`, `vmNum`, `expiredDate`, `issuedDate`, `uploadDate`, `licenseType`, `userName`, `prodInfo`, `createDate`, `lastOpDate`, `hash`, `source`, `managementNodeUuid`, `mergedTo`)
            select `uuid`, `cpuNum`, `hostNum`, `vmNum`, `expiredDate`, `issuedDate`, `uploadDate`, 'AddOn' as `licenseType`, `userName`, 'hybrid' as `prodInfo`, `createDate`, `lastOpDate`, `hash`, `source`, `managementNodeUuid`, `mergedTo`
            from `zstack`.`LicenseHistoryVO`
            where `licenseType`='Hybrid';
    END IF;
    update `zstack`.`LicenseHistoryVO` set `licenseType`='Paid' where `licenseType`='Hybrid';
+    END IF;
    SELECT CURTIME();
END $$
DELIMITER ;

基于学习:ZStack 数据库升级脚本需要通过 information_schema 检查来确保幂等性。

conf/db/upgrade/V5.3.46__schema.sql-4-33 (1)

4-33: 确保存储过程的幂等性以避免重复执行时的数据损坏。

该存储过程直接更新 AuditsVO 表中的 resourceUuidresourceType 字段,但缺少幂等性检查。如果升级脚本被重复执行,已经迁移过的记录(resourceUuid 已是 projectUuid)可能会被错误地再次处理,导致数据损坏。

根据项目经验,数据库升级脚本应通过 information_schema 检查或其他机制确保可以安全地重复执行。

🔎 建议的幂等性改进方案
 UPDATE AuditsVO a
     JOIN IAM2ProjectAccountRefVO i
 ON a.resourceUuid = i.accountUuid
     SET a.resourceUuid = i.projectUuid,
         a.resourceType = CASE
         WHEN a.resourceType = 'AccountVO' THEN 'IAM2ProjectVO'
         ELSE a.resourceType
 END
-WHERE a.apiName = 'org.zstack.header.identity.APIUpdateQuotaMsg';
+WHERE a.apiName = 'org.zstack.header.identity.APIUpdateQuotaMsg'
+  AND a.resourceType = 'AccountVO';

通过添加 AND a.resourceType = 'AccountVO' 条件,确保只处理尚未迁移的记录。

基于项目经验(learnings),数据库升级脚本需要具备幂等性保证。

build/deploydb.sh-51-54 (1)

51-54: else 分支中使用已被移除的 MySQL 8.0 GRANT 语法

第 51-54 行的 GRANT ... identified by 语法在 MySQL 8.0 中已被移除,此脚本在 MySQL 8.0+ 环境下会执行失败。第 30-42 行的 if 分支已使用了正确的模式:先使用 CREATE USER IF NOT EXISTS ... IDENTIFIED BY 创建用户,再使用 GRANT 授予权限。请同样更新 else 分支以保持一致。

conf/db/upgrade/V4.6.11__schema.sql-1-1 (1)

1-1: 必须使用存储过程处理历史数据中的 NULL 值。

vSwitchType 修改为 NOT NULL 时,应当使用存储过程先检查并处理历史数据中的 NULL 值,而不是直接设置默认值。根据编码规范,当添加 NOT NULL 约束时,必须使用存储过程或其他函数来处理历史数据。

建议先用存储过程将现有的 NULL 值更新为 'LinuxBridge',然后再添加 NOT NULL 约束。

🔎 建议的修复方案
+DELIMITER $$
+CREATE PROCEDURE updateL2NetworkVSwitchType()
+    BEGIN
+        UPDATE `zstack`.`L2NetworkEO` SET `vSwitchType` = 'LinuxBridge' WHERE `vSwitchType` IS NULL;
+    END $$
+DELIMITER ;
+
+CALL updateL2NetworkVSwitchType();
+DROP PROCEDURE IF EXISTS updateL2NetworkVSwitchType;
+
 ALTER TABLE `zstack`.`L2NetworkEO` MODIFY `vSwitchType` varchar(32) NOT NULL DEFAULT 'LinuxBridge' AFTER `type`;
conf/db/upgrade/V4.5.1.1__schema.sql-1-2 (1)

1-2: 确保升级脚本幂等性。

ALTER TABLE ADD COLUMN 语句在字段已存在时会失败。根据 ZStack 升级脚本最佳实践,应通过 information_schema 检查字段是否存在,以确保脚本可安全重复执行。建议使用存储过程将两个 ADD COLUMN 操作包装,并在执行前检查 information_schema.COLUMNS 中的字段存在性。

关于 encryptSubjectDN 字段的 varchar(128) 长度限制需要验证,以确保它对应用中实际使用的 X.509 Subject Distinguished Names 足够。如果长度无法保证,请根据编码规范改用 text 类型。

compute/src/main/java/org/zstack/compute/host/PostHostExtensionPointForNuma.java-100-103 (1)

100-103: 同样的Trace日志性能问题

与Line 81-84相同,此处的trace日志也在predicate中执行O(m×n)次,存在相同的性能隐患。

compute/src/main/java/org/zstack/compute/host/PostHostExtensionPointForNuma.java-81-84 (1)

81-84: Trace日志在predicate中可能影响性能

noneMatch 的predicate lambda内部记录trace日志会导致O(m×n)次日志调用(m个旧节点 × n个新节点),即使trace级别未启用,字符串拼接操作仍会执行。

建议:

  • 将详细比对日志移到最终的 forEach 中,仅对实际需要删除的节点记录
  • 或使用条件判断:if (logger.isTraceEnabled()) { ... }
🔎 优化后的日志记录
                            oldHostNumaNodes.stream()
                                    .filter(oldNumaNode -> newHostNumaNodes.stream()
                                            .noneMatch(newNumaNode -> {
-                                                logger.trace(String.format("oldNumaNode.getNodeCPUs(): %s, newNumaNode.getNodeCPUs(): %s, equals: %s", oldNumaNode.getNodeCPUs(), newNumaNode.getNodeCPUs(), newNumaNode.getNodeCPUs().equals(oldNumaNode.getNodeCPUs())));
-                                                logger.trace(String.format("oldNumaNode.getNodeDistance(): %s, newNumaNode.getNodeDistance(): %s, equals: %s", oldNumaNode.getNodeDistance(), newNumaNode.getNodeDistance(), newNumaNode.getNodeDistance().equals(oldNumaNode.getNodeDistance())));
-                                                logger.trace(String.format("oldNumaNode.getNodeMemSize(): %s, newNumaNode.getNodeMemSize(): %s, equals: %s", oldNumaNode.getNodeMemSize(), newNumaNode.getNodeMemSize(), Objects.equals(newNumaNode.getNodeMemSize(), oldNumaNode.getNodeMemSize())));
-                                                logger.trace(String.format("oldNumaNode.getNodeID(): %s, newNumaNode.getNodeID(): %s, equals: %s", oldNumaNode.getNodeID(), newNumaNode.getNodeID(), Objects.equals(newNumaNode.getNodeID(), oldNumaNode.getNodeID())));
-
                                                return newNumaNode.getNodeCPUs().equals(oldNumaNode.getNodeCPUs())
                                                        && newNumaNode.getNodeDistance().equals(oldNumaNode.getNodeDistance())
                                                        && Objects.equals(newNumaNode.getNodeMemSize(), oldNumaNode.getNodeMemSize())
                                                        && Objects.equals(newNumaNode.getNodeID(), oldNumaNode.getNodeID());
                                            }))
                                    .forEach(oldNumaNode -> {
-                                        logger.trace("remove old numa node: " + JSONObjectUtil.toJsonString(oldNumaNode));
+                                        if (logger.isTraceEnabled()) {
+                                            logger.trace("remove old numa node: " + JSONObjectUtil.toJsonString(oldNumaNode));
+                                        }
                                        dbf.removeByPrimaryKey(oldNumaNode.getId(), HostNumaNodeVO.class);
                                    });
conf/db/upgrade/V4.7.0__schema.sql-233-237 (1)

233-237: addHaStrategyConditionOnVmHaLevel 幂等性问题

addHostHaStatus 类似,此过程在插入 HaStrategyConditionVO 时未检查是否已存在 hostStorageState 记录,可能导致重复执行时出错或产生重复数据。

conf/db/upgrade/V4.7.0__schema.sql-272-273 (1)

272-273: 存储过程缺乏幂等性保护

addHostHaStatus 过程直接执行 INSERT 而未检查记录是否已存在。如果升级脚本重复运行,会因 uuid 唯一约束导致失败。

基于检索到的学习记录,ZStack 升级脚本应通过 information_schema 检查来确保可安全重复执行。

🔎 建议修复
-            insert into HostHaStateVO(`uuid`, `state`,`lastOpDate`, `createDate`)values (hostUuid, 'None', current_time_stamp, current_time_stamp);
+            INSERT IGNORE INTO HostHaStateVO(`uuid`, `state`,`lastOpDate`, `createDate`)values (hostUuid, 'None', current_time_stamp, current_time_stamp);

或在过程开始时检查表是否已有数据。

compute/src/main/java/org/zstack/compute/vm/devices/VmInstanceDeviceManagerImpl.java-239-260 (1)

239-260: 缺少功能开关检查。

新方法 revertExistingDeviceAddressFromArchive 与现有的 revertDeviceAddressFromArchive(第217-237行)功能类似,但缺少了功能开关检查。现有方法在第218行检查 deviceAddressRecordingDisabled() 并在功能禁用时返回空列表,而此新方法没有进行该检查。

这可能导致在设备地址记录功能被禁用时仍然创建设备地址记录,违反了功能开关的约定。

🔎 建议添加功能开关检查
 @Override
 public List<VmInstanceDeviceAddressVO> revertExistingDeviceAddressFromArchive(String vmInstanceUuid, String archiveForResourceUuid) {
+    if (deviceAddressRecordingDisabled()) {
+        return Collections.emptyList();
+    }
+
     VmInstanceDeviceAddressGroupVO group = Q.New(VmInstanceDeviceAddressGroupVO.class)
             .eq(VmInstanceDeviceAddressGroupVO_.resourceUuid, archiveForResourceUuid)
             .find();
compute/src/main/java/org/zstack/compute/vm/devices/VmInstanceDeviceManagerImpl.java-133-144 (1)

133-144: 验证缺失:未检查设备是否存在。

此新方法 deleteVmDeviceAddress(String resourceUuid) 直接删除设备地址记录,但未验证 resourceUuid 对应的设备是否存在。这与双参数版本的方法(第115-131行)不一致,后者通过 checkParams 进行了完整的参数验证。

resourceUuid 不存在时,SQL DELETE 会静默成功但不删除任何记录,可能掩盖潜在的调用错误。建议:

  • 如果允许静默删除不存在的设备,请在注释中明确说明此行为
  • 如果需要验证,应调用 vmDeviceExists(resourceUuid) 并在设备不存在时返回错误
🔎 建议的验证逻辑
 @Override
 public ErrorCode deleteVmDeviceAddress(String resourceUuid) {
     if (resourceUuid == null) {
         return operr("missing parameter, resourceUuid is requested");
     }
+
+    if (!vmDeviceExists(resourceUuid) && !vmExists(resourceUuid)) {
+        return operr("cannot find vm device or vm with uuid: %s", resourceUuid);
+    }

     SQL.New(VmInstanceDeviceAddressVO.class)
             .eq(VmInstanceDeviceAddressVO_.resourceUuid, resourceUuid)
             .delete();

     return null;
 }

Committable suggestion skipped: line range outside the PR's diff.

compute/src/main/java/org/zstack/header/vm/SshKeyPairAssociateExtensionPoint.java-7-13 (1)

7-13: 必须添加接口方法的 Javadoc 注释。

根据编码规范要求,接口方法必须配有有效的 Javadoc 注释。请为以下三个方法添加详细的文档说明:

  • associateSshKeyPair: 说明关联操作的行为、参数含义、返回的 ErrorCode 情况
  • fetchAssociatedSshKeyPairs: 说明获取逻辑、返回值说明(特别是是否可能返回 null 或空列表)
  • cloneSshKeyPairsToVm: 说明克隆行为、参数含义

基于编码规范:"接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。"

🔎 建议的 Javadoc 模板
 public interface SshKeyPairAssociateExtensionPoint {
+    /**
+     * Associate SSH key pairs with a VM instance.
+     *
+     * @param vmUuid the UUID of the VM instance
+     * @param keyPairUuids the list of SSH key pair UUIDs to associate
+     * @return ErrorCode if association fails, null if successful
+     */
     ErrorCode associateSshKeyPair(String vmUuid, List<String> keyPairUuids);
 
+    /**
+     * Fetch all associated SSH key pairs for a VM instance.
+     *
+     * @param vmUuid the UUID of the VM instance
+     * @return list of SSH key pair UUIDs, never null (returns empty list if none found)
+     */
     List<String> fetchAssociatedSshKeyPairs(String vmUuid);
 
+    /**
+     * Clone SSH key pair associations from one VM to another.
+     *
+     * @param originVmUuid the UUID of the source VM
+     * @param destVmUuid the UUID of the destination VM
+     */
     void cloneSshKeyPairsToVm(String originVmUuid, String destVmUuid);
 }
compute/src/main/java/org/zstack/compute/vm/VmQuotaUtil.java-153-159 (1)

153-159: getRequiredMemory 缺少 null 处理

getRequiredCpu 类似,当 VM 不存在时 findValue() 返回 null,该方法会直接返回 null,可能导致调用方出现 NPE。

建议与 getRequiredCpu 保持一致的 null 处理方式。

🔎 建议的修复方案
     @Transactional(readOnly = true)
     public Long getRequiredMemory(String vmInstanceUuid) {
-        return Q.New(VmInstanceVO.class)
+        Long memorySize = Q.New(VmInstanceVO.class)
                 .select(VmInstanceVO_.memorySize)
                 .eq(VmInstanceVO_.uuid, vmInstanceUuid)
                 .findValue();
+        return memorySize == null ? 0L : memorySize;
     }
compute/src/main/java/org/zstack/compute/vm/VmQuotaUtil.java-143-151 (1)

143-151: getRequiredCpu 存在潜在的 NPE 风险

vmInstanceUuid 对应的 VM 不存在时,findValue() 返回 null,随后 Integer.toUnsignedLong(cpuNum) 会抛出 NullPointerException

建议添加 null 检查或在方法文档中明确调用者需保证 VM 存在。

🔎 建议的修复方案
     @Transactional(readOnly = true)
     public Long getRequiredCpu(String vmInstanceUuid) {
         Integer cpuNum = Q.New(VmInstanceVO.class)
                 .select(VmInstanceVO_.cpuNum)
                 .eq(VmInstanceVO_.uuid, vmInstanceUuid)
                 .findValue();
 
+        if (cpuNum == null) {
+            return 0L;
+        }
         return Integer.toUnsignedLong(cpuNum);
     }
compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java-38-40 (1)

38-40: 线程安全改进良好,但静态字段缺少配置入口。

trackersHashMap 改为 ConcurrentHashMap 提升了线程安全性,这是必要的改进。

但是,alwaysStartRightNow 静态字段被初始化为 false 却没有看到任何设置该字段的方法。如果这个字段用于控制测试行为(从 line 262 的使用来看),建议:

  1. 添加包可见或测试可见的 setter 方法
  2. 或使用系统属性/全局配置来控制该标志
  3. 添加注释说明该字段的用途和设置时机
🔎 建议添加配置方法
     private static boolean alwaysStartRightNow = false;
+
+    // Visible for testing
+    static void setAlwaysStartRightNow(boolean value) {
+        alwaysStartRightNow = value;
+    }
compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java-399-415 (1)

399-415: 事件处理器缺少清理逻辑。

onHostPingSkip() 方法注册了两个事件处理器(HOST_PING_SKIPHOST_PING_CANCEL_SKIP),但在 stop() 方法(line 418-420)中没有看到相应的注销逻辑。

根据 ZStack 的组件生命周期管理,如果这些事件处理器在组件停止时不清理,可能导致:

  1. 内存泄漏(处理器持有组件引用)
  2. 在组件重启时重复注册处理器
  3. 停止后的组件仍然响应事件

建议在 stop() 方法中添加事件处理器的注销逻辑。

🔎 建议的修复方案

如果 EventFacade 支持注销,添加清理逻辑:

+    private EventCallback pingSkipCallback;
+    private EventCallback pingCancelSkipCallback;
+
     private void onHostPingSkip() {
-        evtf.on(HostCanonicalEvents.HOST_PING_SKIP, new EventCallback() {
+        pingSkipCallback = new EventCallback() {
             @Override
             protected void run(Map tokens, Object data) {
                 HostCanonicalEvents.HostPingSkipData d = (HostCanonicalEvents.HostPingSkipData) data;
                 Long deadline = System.currentTimeMillis() / 1000 + d.getSkipTimeInSec();
                 skippedPingHostDeadline.put(d.getHostUuid(), deadline);
             }
-        });
-        evtf.on(HostCanonicalEvents.HOST_PING_CANCEL_SKIP, new EventCallback() {
+        };
+        evtf.on(HostCanonicalEvents.HOST_PING_SKIP, pingSkipCallback);
+
+        pingCancelSkipCallback = new EventCallback() {
             @Override
             protected void run(Map tokens, Object data) {
                 HostCanonicalEvents.HostPingSkipData d = (HostCanonicalEvents.HostPingSkipData) data;
                 skippedPingHostDeadline.invalidate(d.getHostUuid());
             }
-        });
+        };
+        evtf.on(HostCanonicalEvents.HOST_PING_CANCEL_SKIP, pingCancelSkipCallback);
     }

     @Override
     public boolean stop() {
+        // Clean up event handlers
+        if (pingSkipCallback != null) {
+            evtf.off(HostCanonicalEvents.HOST_PING_SKIP, pingSkipCallback);
+        }
+        if (pingCancelSkipCallback != null) {
+            evtf.off(HostCanonicalEvents.HOST_PING_CANCEL_SKIP, pingCancelSkipCallback);
+        }
         return true;
     }

注:如果 EventFacade 不支持 off 方法或采用其他清理机制,请调整相应实现。

Committable suggestion skipped: line range outside the PR's diff.

compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java-226-238 (1)

226-238: 默认的 HostReconnectTaskFactory.createTaskWithLastConnectError 实现会丢弃错误上下文。

验证确认了该问题。默认接口实现显式忽略 errorCode 参数,直接返回 createTask(uuid, completion)。虽然 KVMHostReconnectTaskFactory 覆盖了该方法,但只处理特定错误类型(HOST_PASSWORD_HAS_BEEN_CHANGED),其他错误情况仍会调用父方法导致错误信息丢失。

建议:

  1. 为所有 HostReconnectTaskFactory 实现添加错误上下文处理,避免信息丢失
  2. createTaskWithLastConnectError 中添加日志记录未被特定处理的错误类型,便于诊断
compute/src/main/java/org/zstack/compute/allocator/HostCapacityReserveManagerImpl.java-72-110 (1)

72-110: hypervisorType 字段的使用存在混淆

findReservedCapacityByHypervisorType() 方法的实现存在设计混淆:

  1. Lines 73-76:检查 finder.hypervisorType 是否在扩展点映射中,如果不存在则提前返回
  2. Lines 85-91:从数据库查询主机并按实际的 hypervisorType(从数据库获取)进行分组
  3. Lines 97-109:遍历分组后的所有 hypervisor 类型并调用对应的扩展点

问题是:finder.hypervisorType 字段似乎只用于 line 73-76 的早期返回检查和日志记录 (lines 94-95, 103-105),但实际处理时会遍历所有从数据库查询到的 hypervisor 类型,而不是只处理 finder.hypervisorType 指定的类型。

这导致:

  • 如果调用方已经在查询时过滤了 hypervisorType(如 getReservedHostCapacityByZonesgetReservedHostCapacityByClusters),则 finder.hypervisorType 字段是冗余的
  • 如果调用方没有过滤(如 getReservedHostCapacityByHosts),则可能返回多种 hypervisor 类型的容量

建议明确 hypervisorType 字段的用途:是用于过滤还是仅用于日志记录?

compute/src/main/java/org/zstack/compute/host/HostIpmiPowerExecutor.java-184-209 (1)

184-209: 修复泛型原始类型使用

在多处 return new Pair(...) 语句中使用了原始类型而非泛型类型。应该使用 new Pair<>(...)new Pair<HostPowerStatus, ErrorCode>(...) 以保证类型安全。

🔎 建议的修复
     public static Pair<HostPowerStatus, ErrorCode> getPowerStatusWithErrorCode(HostIpmiVO ipmi) {
         if (mockedPowerStatus != null) {
-            return new Pair(mockedPowerStatus, null);
+            return new Pair<>(mockedPowerStatus, null);
         }
 
         if (isIpmiUnConfigured(ipmi)) {
-            return new Pair(HostPowerStatus.UN_CONFIGURED,
+            return new Pair<>(HostPowerStatus.UN_CONFIGURED,
                     operr("ipmi information is not complete."));
         }
 
         ShellResult rst = IPMIToolCaller.fromHostIpmiVo(ipmi).status();
         if (rst.getRetCode() == 0) {
             if (rst.getStdout().trim().equals("Chassis Power is on")) {
-                return new Pair(HostPowerStatus.POWER_ON, null);
+                return new Pair<>(HostPowerStatus.POWER_ON, null);
             } else if (rst.getStdout().trim().equals("Chassis Power is off")) {
-                return new Pair(HostPowerStatus.POWER_OFF, null);
+                return new Pair<>(HostPowerStatus.POWER_OFF, null);
             } else {
-                return new Pair(HostPowerStatus.POWER_UNKNOWN, operr("host[%s] got unexpected return value", ipmi.getUuid()));
+                return new Pair<>(HostPowerStatus.POWER_UNKNOWN, operr("host[%s] got unexpected return value", ipmi.getUuid()));
             }
         } else {
-            return new Pair(HostPowerStatus.POWER_UNKNOWN, operr("host[%s] can not connect ipmi[%s], because:%s",
+            return new Pair<>(HostPowerStatus.POWER_UNKNOWN, operr("host[%s] can not connect ipmi[%s], because:%s",
                     ipmi.getUuid(),
                     ipmi.getIpmiAddress(),
                     rst.getStderr()));
         }
     }
compute/src/main/java/org/zstack/compute/host/HostIpmiPowerExecutor.java-236-242 (1)

236-242: 违反封装原则:IPMIToolCaller 的字段应为私有

内部类 IPMIToolCaller 的所有字段(hostnameportusernamepasswordpassFile)都声明为 public,这违反了封装原则。特别是 password 字段,暴露为公共字段存在安全风险。

建议将这些字段改为 private,因为它们已通过工厂方法 fromHostIpmiVofromHostVo 进行初始化,外部不需要直接访问。

🔎 建议的修复
     protected static class IPMIToolCaller {
         private final String interfaceToUse = "lanplus";
-        public String hostname;
-        public int port;
-        public String username;
-        public String password;
-        public String passFile;
+        private String hostname;
+        private int port;
+        private String username;
+        private String password;
+        private String passFile;

Committable suggestion skipped: line range outside the PR's diff.

conf/db/upgrade/V4.3.35__schema.sql-1-8 (1)

1-8: 升级脚本缺少幂等性保障。

数据库升级脚本直接执行 DROP FOREIGN KEYADD CONSTRAINT,但没有使用 IF EXISTS 检查。如果升级过程中断后重新执行,脚本会因为外键约束不存在而失败。

根据 learnings 中的经验,ZStack 升级脚本需要通过 information_schema 检查约束的存在性以确保可以安全重复执行。

🔎 建议的修复方案
-ALTER TABLE `zstack`.`HostNumaNodeVO` DROP FOREIGN KEY `HostNumaNodeVO_HostEO_uuid_fk`;
-ALTER TABLE `zstack`.`HostNumaNodeVO` ADD CONSTRAINT `HostNumaNodeVO_HostEO_uuid_fk` FOREIGN KEY (`hostUuid`) REFERENCES `HostEO` (`uuid`) ON DELETE CASCADE ;
+-- Drop FK only if exists
+SET @exist := (SELECT COUNT(*) FROM information_schema.TABLE_CONSTRAINTS 
+    WHERE CONSTRAINT_SCHEMA = 'zstack' 
+    AND TABLE_NAME = 'HostNumaNodeVO' 
+    AND CONSTRAINT_NAME = 'HostNumaNodeVO_HostEO_uuid_fk' 
+    AND CONSTRAINT_TYPE = 'FOREIGN KEY');
+SET @sqlstmt := IF(@exist > 0, 'ALTER TABLE `zstack`.`HostNumaNodeVO` DROP FOREIGN KEY `HostNumaNodeVO_HostEO_uuid_fk`', 'SELECT ''FK does not exist''');
+PREPARE stmt FROM @sqlstmt;
+EXECUTE stmt;
+
+-- Add FK only if not exists
+SET @exist := (SELECT COUNT(*) FROM information_schema.TABLE_CONSTRAINTS 
+    WHERE CONSTRAINT_SCHEMA = 'zstack' 
+    AND TABLE_NAME = 'HostNumaNodeVO' 
+    AND CONSTRAINT_NAME = 'HostNumaNodeVO_HostEO_uuid_fk' 
+    AND CONSTRAINT_TYPE = 'FOREIGN KEY');
+SET @sqlstmt := IF(@exist = 0, 'ALTER TABLE `zstack`.`HostNumaNodeVO` ADD CONSTRAINT `HostNumaNodeVO_HostEO_uuid_fk` FOREIGN KEY (`hostUuid`) REFERENCES `HostEO` (`uuid`) ON DELETE CASCADE', 'SELECT ''FK already exists''');
+PREPARE stmt FROM @sqlstmt;
+EXECUTE stmt;

对其他两个外键约束应用相同的模式。

conf/db/upgrade/V4.6.21__schema.sql-102-107 (1)

102-107: 表重命名操作缺少幂等性检查。

根据已有的学习记录,直接使用 RENAME TABLEALTER TABLE ... RENAME AS 不能保证幂等性。应通过 information_schema.tables 检查表的存在性,确保升级脚本可以安全地重复执行。

🔎 建议的修改
-- Check if source table exists and target doesn't before renaming
DROP PROCEDURE IF EXISTS renameScsiLunVOToLunVO;
DELIMITER $$
CREATE PROCEDURE renameScsiLunVOToLunVO()
BEGIN
    IF EXISTS (SELECT 1 FROM information_schema.tables WHERE table_schema = 'zstack' AND table_name = 'ScsiLunVO')
       AND NOT EXISTS (SELECT 1 FROM information_schema.tables WHERE table_schema = 'zstack' AND table_name = 'LunVO') THEN
        ALTER TABLE `ScsiLunVO` RENAME AS LunVO;
    END IF;
END $$
DELIMITER ;
CALL renameScsiLunVOToLunVO();
DROP PROCEDURE IF EXISTS renameScsiLunVOToLunVO;

# for compatibility - also check if view already exists
DROP VIEW IF EXISTS `ScsiLunVO`;
CREATE VIEW `ScsiLunVO` AS SELECT uuid, name, wwid, vendor, model, wwn, serial, type, hctl, path, size, state, source, multipathDeviceUuid, createDate, lastOpDate FROM `LunVO` WHERE source IN ('iSCSI', 'fiberChannel');
abstraction/src/main/java/org/zstack/abstraction/sns/EndpointDriver.java-9-11 (1)

9-11: 为接口方法添加Javadoc文档

根据编码规范,接口方法必须配有有效的Javadoc注释。当前send方法缺少文档说明,特别是:

  • 参数message的用途和约束
  • 返回值boolean的含义(成功/失败?)
  • 可能抛出的异常
🔎 建议的补充
 public interface EndpointDriver extends PluginDriver {
+    /**
+     * Send a message to the endpoint.
+     *
+     * @param message the endpoint data to send
+     * @return true if the message was sent successfully, false otherwise
+     */
     boolean send(PluginEndpointData message);
 }

根据编码规范要求。

abstraction/src/main/java/org/zstack/abstraction/InvalidPluginDefinitionException.java-3-4 (1)

3-4: 建议为异常类添加标准构造函数

当前异常类完全为空,缺少标准的构造函数。这会导致使用时无法传递有意义的错误信息或原因链。建议至少添加以下构造函数:

  • 无参构造函数
  • 带消息的构造函数
  • 带消息和原因的构造函数
🔎 建议的补充
 public class InvalidPluginDefinitionException extends RuntimeException {
+    public InvalidPluginDefinitionException() {
+        super();
+    }
+
+    public InvalidPluginDefinitionException(String message) {
+        super(message);
+    }
+
+    public InvalidPluginDefinitionException(String message, Throwable cause) {
+        super(message, cause);
+    }
+
+    public InvalidPluginDefinitionException(Throwable cause) {
+        super(cause);
+    }
 }

根据Java异常类的最佳实践。

abstraction/src/main/java/org/zstack/abstraction/PluginMethod.java-3-5 (1)

3-5: 缺少 @Retention@Target 元注解

该注解缺少关键的元注解定义:

  • 如果需要在运行时通过反射读取此注解(插件系统通常需要),需要添加 @Retention(RetentionPolicy.RUNTIME)
  • 应添加 @Target 来限制注解的使用范围(如 ElementType.METHOD
🔎 建议修复
 package org.zstack.abstraction;
 
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+@Retention(RetentionPolicy.RUNTIME)
+@Target(ElementType.METHOD)
 public @interface PluginMethod {
     boolean required() default true;
 }
compute/src/main/java/org/zstack/compute/allocator/QuotaAllocatorFlow.java-35-51 (1)

35-51: 违反编码规范:新代码中使用了 TypedQuery

根据编码规范:"新增代码里面避免使用 TypedQuery"。此处应使用项目推荐的 Q.New()SimpleQuery 替代。

此外,第 49 行调用 next(candidates) 后缺少 return 语句,导致控制流继续执行到 throwExceptionIfIAmTheFirstFlow(),在单元测试模式下可能引发意外行为。

🔎 建议使用 Q.New() 并添加 return
         if (CoreGlobalProperty.UNIT_TEST_ON) {
             if (candidates == null || candidates.isEmpty()) {
-                String sql = "select h from HostVO h where h.clusterUuid in (:cuuids)";
-                Set<String> clusterUuids = new HashSet<>();
-                clusterUuids.add(spec.getVmInstance().getClusterUuid());
-
-                TypedQuery<HostVO> hq = dbf.getEntityManager().createQuery(sql, HostVO.class);
-                hq.setParameter("cuuids", clusterUuids);
-                if (usePagination()) {
-                    hq.setFirstResult(paginationInfo.getOffset());
-                    hq.setMaxResults(paginationInfo.getLimit());
-                }
-                candidates = hq.getResultList();
+                SimpleQuery<HostVO> q = dbf.createQuery(HostVO.class);
+                q.add(HostVO_.clusterUuid, SimpleQuery.Op.EQ, spec.getVmInstance().getClusterUuid());
+                if (usePagination()) {
+                    q.setStart(paginationInfo.getOffset());
+                    q.setLimit(paginationInfo.getLimit());
+                }
+                candidates = q.list();
                 logger.debug(String.format("vm clusteruuid:%s, candidates size:%s", spec.getVmInstance().getClusterUuid(), candidates.size()));
                 next(candidates);
+                return;
             }
         }
compute/src/main/java/org/zstack/compute/vm/NewVmInstanceMsgBuilder.java-51-52 (1)

51-52: 清理被注释的代码

代码中存在被注释掉的旧实现,应该删除以保持代码整洁。

🔎 建议删除被注释的代码
             nicSpecs.add(vmNicSpec);
-//            nicSpecs.add(new VmNicSpec(l3Invs,
-//                    vmNicParms != null && !vmNicParms.isEmpty() ? vmNicParms.get(i) : null));
         }

As per coding guidelines, 应删除被注释的代码(commented-out code)以保持代码整洁。

compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java-219-229 (1)

219-229: 扩展点回调处理存在问题

deleteHostNetworkLabelupdateHostNetworkLabel 方法使用 safeForEach 遍历扩展点,但每个扩展点都接收同一个 Completion 对象。如果有多个扩展点,第一个调用 completion.success()completion.fail() 后,后续扩展点的回调将无效或产生意外行为。

应该使用链式调用或 While 模式来正确处理多个扩展点的异步回调。

🔎 建议修复思路
private void deleteHostNetworkLabel(APIDeleteHostNetworkServiceTypeMsg msg, Completion completion) {
    HostNetworkLabelVO vo = dbf.findByUuid(msg.getUuid(), HostNetworkLabelVO.class);
    List<HostNetworkLabelExtensionPoint> exts = pluginRgty.getExtensionList(HostNetworkLabelExtensionPoint.class);
    if (exts.isEmpty()) {
        completion.success();
        return;
    }
    
    new While<>(exts).each((ext, whileCompletion) -> {
        ext.deleteHostNetworkLabel(vo.toInventory(), new Completion(whileCompletion) {
            @Override
            public void success() {
                whileCompletion.done();
            }
            
            @Override
            public void fail(ErrorCode errorCode) {
                whileCompletion.addError(errorCode);
                whileCompletion.done();
            }
        });
    }).run(new WhileDoneCompletion(completion) {
        @Override
        public void done(ErrorCodeList errorCodeList) {
            if (!errorCodeList.getCauses().isEmpty()) {
                completion.fail(errorCodeList.getCauses().get(0));
            } else {
                completion.success();
            }
        }
    });
}
compute/src/main/java/org/zstack/compute/allocator/DesignatedHostAllocatorFlow.java-35-37 (1)

35-37: SQL 注入风险

使用 String.join 直接拼接 SQL 可能存在 SQL 注入风险。如果 clusterUuids 来自外部输入且包含恶意数据,可能导致安全问题。建议使用参数化查询。

🔎 建议使用参数化查询
-        if (!CollectionUtils.isEmpty(clusterUuids)) {
-            sql.append(String.format("h.clusterUuid in ('%s') and ", String.join("','", clusterUuids)));
-        }
+        // Move to parameterized query approach with TypedQuery.setParameter

考虑重构整个方法使用参数化查询而非字符串拼接,或者至少对 clusterUuids 中的值进行验证(确保是有效的 UUID 格式)。

Committable suggestion skipped: line range outside the PR's diff.

compute/src/main/java/org/zstack/compute/vm/CpuTopology.java-31-44 (1)

31-44: 除法运算可能导致除零错误或产生零值

cpuCorescpuThreads 为 0 时,除法运算会抛出 ArithmeticException。此外,如果 cpuNum 小于除数,结果会是 0,这可能不是期望的行为。

建议在计算前验证这些值:

🔎 建议修复

calculateValidTopology 方法开头添加验证:

if ((cpuSockets != null && cpuSockets <= 0) ||
    (cpuCores != null && cpuCores <= 0) ||
    (cpuThreads != null && cpuThreads <= 0)) {
    String errMsg = "cpu topology values must be positive";
    if (throwException) {
        throw new OperationFailureException(operr(errMsg));
    }
    return errMsg;
}
compute/src/main/java/org/zstack/compute/vm/VmInstantiateOtherDiskFlow.java-150-163 (1)

150-163: 回滚时使用的磁盘大小可能不正确

在第 157 行,回滚逻辑使用 diskAO.getSize() 释放主存储空间。但是,当通过 diskOfferingUuid 创建卷时(第 80-84 行),实际分配的大小是从 DiskOfferingVO 查询得到的 size,而不是 diskAO.getSize()(此时可能为 0)。

这会导致释放的空间大小与实际分配的大小不匹配。

🔎 建议修复

将分配的大小保存到共享变量中,在回滚时使用:

+            Long allocatedDiskSize;

             private void setupCreateVolumeFromDiskSizeFlows(Long diskSize) {
+                allocatedDiskSize = diskSize;
                 flow(new Flow() {
                     ...
                     @Override
                     public void rollback(FlowRollback chain, Map data) {
                         if (!isSuccessAllocatePS) {
                             chain.rollback();
                             return;
                         }
                         ReleasePrimaryStorageSpaceMsg msg = new ReleasePrimaryStorageSpaceMsg();
-                        msg.setDiskSize(diskAO.getSize());
+                        msg.setDiskSize(allocatedDiskSize);
                         ...
                     }
                 });

Committable suggestion skipped: line range outside the PR's diff.

abstraction/src/main/java/org/zstack/abstraction/sso/OAuth2ProviderDriver.java-27-36 (1)

27-36: 常量名称存在拼写错误,需要修复

OAuth2PluginConstants.GRANT_TYPE_OCDE_VALUE 的常量名中 "OCDE" 是拼写错误,应修改为 GRANT_TYPE_CODE_VALUE。常量定义位于 OAuth2PluginConstants.java 第 20 行,当前定义为 public static final String GRANT_TYPE_OCDE_VALUE = "authorization_code";,需要将常量名改为 GRANT_TYPE_CODE_VALUE 以符合 GRANT_TYPE_PASSWORD_VALUE 等其他常量的命名规范。

compute/src/main/java/org/zstack/compute/vm/MacOperator.java-174-178 (1)

174-178: deviceId 为负数时可能生成无效的 MAC 地址

deviceId 为负的 short 值时,Integer.toHexString(deviceId) 会产生一个带有 ffff 前缀的 8 位十六进制字符串(例如 -1 会变成 "ffffffff")。这会导致生成的 MAC 地址格式不正确。

建议添加范围检查或使用无符号处理:

🔎 建议修复
-        String deviceIdStr = Integer.toHexString(deviceId);
+        String deviceIdStr = Integer.toHexString(deviceId & 0xFF);
         if (deviceIdStr.length() < 2) {
             deviceIdStr = "0" + deviceIdStr;
+        } else if (deviceIdStr.length() > 2) {
+            deviceIdStr = deviceIdStr.substring(deviceIdStr.length() - 2);
         }
compute/src/main/java/org/zstack/compute/vm/CpuTopology.java-13-19 (1)

13-19: 构造函数缺少输入验证,可能抛出 NumberFormatException

当传入的字符串参数不是有效数字时(如空字符串或非数字字符),Integer.valueOf() 会抛出 NumberFormatException。建议添加输入验证或异常处理。

🔎 建议修复
     public CpuTopology(int cpuNum, String cpuSockets, String cpuCores, String cpuThreads) {
         this.cpuNum = cpuNum;
 
-        this.cpuSockets = cpuSockets == null ? null : Integer.valueOf(cpuSockets);
-        this.cpuCores = cpuCores == null ? null : Integer.valueOf(cpuCores);
-        this.cpuThreads = cpuThreads == null ? null : Integer.valueOf(cpuThreads);
+        this.cpuSockets = parseIntOrNull(cpuSockets);
+        this.cpuCores = parseIntOrNull(cpuCores);
+        this.cpuThreads = parseIntOrNull(cpuThreads);
+    }
+
+    private static Integer parseIntOrNull(String value) {
+        if (value == null || value.trim().isEmpty()) {
+            return null;
+        }
+        try {
+            return Integer.valueOf(value.trim());
+        } catch (NumberFormatException e) {
+            return null;
+        }
     }

Committable suggestion skipped: line range outside the PR's diff.

compute/src/main/java/org/zstack/compute/vm/VmInstanceUtils.java-166-175 (1)

166-175: 潜在的空指针异常风险

Line 167 查询 InstanceOfferingVO 后直接使用 ivo.getState()ivo.getType(),但如果传入的 instanceOfferingUuid 对应的记录不存在,Q.New(...).find() 会返回 null,导致 NPE。

建议在使用前添加空值检查:

🔎 建议的修复方案
        // InstanceOffering takes precedence over CPU/memory settings.
        InstanceOfferingVO ivo = Q.New(InstanceOfferingVO.class).eq(InstanceOfferingVO_.uuid, instanceOfferingUuid).find();

+       if (ivo == null) {
+           throw new ApiMessageInterceptionException(operr("instance offering[uuid:%s] not found", instanceOfferingUuid));
+       }
+
        if (ivo.getState() == InstanceOfferingState.Disabled) {
            throw new ApiMessageInterceptionException(operr("instance offering[uuid:%s] is Disabled, can't create vm from it", instanceOfferingUuid));
        }
compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java-2801-2811 (1)

2801-2811: 迁移后清理 VM_STATE_PAUSED_AFTER_MIGRATE 的条件逻辑似乎反了

当前实现:

if (!Objects.equals(d.getOldState(), VmInstanceState.Migrating.toString())
        && Objects.equals(d.getNewState(), VmInstanceState.Running.toString())) {
    VmSystemTags.VM_STATE_PAUSED_AFTER_MIGRATE.delete(d.getVmUuid());
}

这会在「旧状态不是 Migrating 且新状态是 Running」时删除 tag,也就是说:

  • Stopped -> RunningUnknown -> Running 等都会删;
  • 唯独 Migrating -> Running 不会 删,和方法名“deleteMigrateSystemTagWhenVmStateChangedToRunning”以及语义预期相反。

更合理的条件应该是“从 Migrating 切到 Running 才清理”:

建议修改示例
-                if (!Objects.equals(d.getOldState(), VmInstanceState.Migrating.toString()) && Objects.equals(d.getNewState(), VmInstanceState.Running.toString())) {
+                if (Objects.equals(d.getOldState(), VmInstanceState.Migrating.toString())
+                        && Objects.equals(d.getNewState(), VmInstanceState.Running.toString())) {
                     VmSystemTags.VM_STATE_PAUSED_AFTER_MIGRATE.delete(d.getVmUuid());
                 }
compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java-1145-1167 (1)

1145-1167: 避免对 Collections.emptyList() 调用 add 造成运行时异常

extEmitterHandleSystemTag()extEmitterHandleSshKeyPair() 都以 Collections.emptyList() 初始化 result,在 msg == null 分支中马上 result.add(...),一旦该分支被触发会直接抛出 UnsupportedOperationException

虽然目前调用点总是传入非空 msg,但这属于明显的易错代码,后续重构时很容易踩雷,建议改为返回新的列表或直接返回单元素列表。

建议修改示例
-    private List<ErrorCode> extEmitterHandleSystemTag(final CreateVmInstanceMsg msg, final APICreateMessage cmsg, VmInstanceVO finalVo) {
-        List<ErrorCode> result = Collections.emptyList();
-        if (msg == null) {
-            result.add(operr("CreateVmInstanceMsg cannot be null"));
-            return result;
-        } else if (cmsg != null && cmsg.getSystemTags() != null && !cmsg.getSystemTags().isEmpty()) {
-            return extEmitter.handleSystemTag(finalVo.getUuid(), cmsg.getSystemTags());
-        } else if (cmsg == null && msg.getSystemTags() != null && !msg.getSystemTags().isEmpty()) {
-            return extEmitter.handleSystemTag(finalVo.getUuid(), msg.getSystemTags());
-        }
-        return result;
-    }
+    private List<ErrorCode> extEmitterHandleSystemTag(final CreateVmInstanceMsg msg, final APICreateMessage cmsg, VmInstanceVO finalVo) {
+        if (msg == null) {
+            return Collections.singletonList(operr("CreateVmInstanceMsg cannot be null"));
+        }
+        if (cmsg != null && cmsg.getSystemTags() != null && !cmsg.getSystemTags().isEmpty()) {
+            return extEmitter.handleSystemTag(finalVo.getUuid(), cmsg.getSystemTags());
+        }
+        if (cmsg == null && msg.getSystemTags() != null && !msg.getSystemTags().isEmpty()) {
+            return extEmitter.handleSystemTag(finalVo.getUuid(), msg.getSystemTags());
+        }
+        return Collections.emptyList();
+    }

extEmitterHandleSshKeyPair() 同理可改为直接在 msg == null 分支返回单元素列表或抛出更显式的异常,而不是在空列表上 add

conf/db/upgrade/V5.1.0__schema.sql-19-19 (1)

19-19: lastFailedReason 设为 NOT NULL 但无默认值且无历史数据处理

根据编码规范,当使用 NOT NULL 约束时,必须使用存储过程或其他函数处理历史数据。新创建的任务记录可能尚未有失败原因。

建议改为 DEFAULT '' 或允许 NULL:

-    `lastFailedReason` varchar(1024) NOT NULL,
+    `lastFailedReason` varchar(1024) DEFAULT NULL,
conf/db/upgrade/V5.3.28__schema.sql-141-152 (1)

141-152: 表结构改变需要数据迁移逻辑,不能直接 DROP TABLE

HostHaStateVO 从 V4.7.0 到 V5.3.28 的表结构发生根本性改变(主键从 uuid 改为 id,字段改变,新增外键),直接 DROP TABLE 会丢失升级前的所有数据。应添加数据迁移逻辑,如存储过程,将旧表数据转换后迁移至新表,或至少通过 CREATE TABLE IF NOT EXISTS 来检查表是否已存在。

另外,行 148 的 DEFAULT '0000-00-00 00:00:00' 应改为 DEFAULT CURRENT_TIMESTAMP,以符合编码规范。

Comment on lines +56 to +59
if (autoSetClusterResourceConfigs.containsKey(architecture)) {
logger.debug(String.format("The [architecture:%s] has already configured by other class, do not do this", architecture));
return;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

严重逻辑错误:return 语句会导致整个方法提前退出

当检测到某个 architecture 已被配置时,使用 return 会立即退出整个 initClusterResourceConfig() 方法,导致后续的 ClusterArchitectureResourceConfig 子类不会被处理。应该使用 continue 跳过当前迭代。

🔎 建议修复
                 if (autoSetClusterResourceConfigs.containsKey(architecture)) {
                     logger.debug(String.format("The [architecture:%s] has already configured by other class, do not do this", architecture));
-                    return;
+                    continue;
                 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (autoSetClusterResourceConfigs.containsKey(architecture)) {
logger.debug(String.format("The [architecture:%s] has already configured by other class, do not do this", architecture));
return;
}
if (autoSetClusterResourceConfigs.containsKey(architecture)) {
logger.debug(String.format("The [architecture:%s] has already configured by other class, do not do this", architecture));
continue;
}
🤖 Prompt for AI Agents
In
compute/src/main/java/org/zstack/compute/cluster/arch/ClusterResourceConfigInitializerImpl.java
around lines 56 to 59, the code uses "return" when an architecture is already
configured which prematurely exits the entire initClusterResourceConfig() method
and prevents processing remaining ClusterArchitectureResourceConfig
implementations; change the control flow to use "continue" to skip only the
current iteration so subsequent config classes are still handled, ensuring the
loop iterates over all implementations while leaving the existing debug log
intact.

Comment on lines +90 to +91
detachDataVolume(hinvs);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

存在竞态条件风险:卷分离与主机删除并发执行。

detachDataVolume(hinvs) 是异步执行的(内部使用 While + CloudBusCallBack),但此处调用后立即继续执行主机删除流程(92-123行),没有等待卷分离完成。这可能导致:

  • 主机删除消息在卷尚未分离完成时就被发送
  • 资源冲突或清理不完整
  • 删除操作失败或状态不一致
🔎 建议的修复方案

detachDataVolume 改为同步等待完成,或者将主机删除逻辑移到 detachDataVolume 的完成回调中:

-        detachDataVolume(hinvs);
-
-        List<HostDeletionMsg> msgs = new ArrayList<HostDeletionMsg>();
-        for (HostInventory hinv : hinvs) {
-            HostDeletionMsg msg = new HostDeletionMsg();
-            msg.setForceDelete(action.isActionCode(CascadeConstant.DELETION_FORCE_DELETE_CODE));
-            msg.setHostUuid(hinv.getUuid());
-            bus.makeTargetServiceIdByResourceUuid(msg, HostConstant.SERVICE_ID, hinv.getUuid());
-            msgs.add(msg);
-        }
-
-        bus.send(msgs, 10, new CloudBusListCallBack(completion) {
-            @Override
-            public void run(List<MessageReply> replies) {
+        detachDataVolume(hinvs, new Completion(completion) {
+            @Override
+            public void success() {
+                List<HostDeletionMsg> msgs = new ArrayList<HostDeletionMsg>();
+                for (HostInventory hinv : hinvs) {
+                    HostDeletionMsg msg = new HostDeletionMsg();
+                    msg.setForceDelete(action.isActionCode(CascadeConstant.DELETION_FORCE_DELETE_CODE));
+                    msg.setHostUuid(hinv.getUuid());
+                    bus.makeTargetServiceIdByResourceUuid(msg, HostConstant.SERVICE_ID, hinv.getUuid());
+                    msgs.add(msg);
+                }
+
+                bus.send(msgs, 10, new CloudBusListCallBack(completion) {
+                    @Override
+                    public void run(List<MessageReply> replies) {
+                        // existing logic
+                    }
+                });
+            }
+
+            @Override
+            public void fail(ErrorCode errorCode) {
+                // decide whether to proceed or fail
+                completion.fail(errorCode);
+            }
+        });

并相应修改 detachDataVolume 方法签名以接受 Completion 参数。

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In compute/src/main/java/org/zstack/compute/host/HostCascadeExtension.java
around lines 90-91, detachDataVolume(hinvs) is invoked asynchronously but the
host deletion flow continues immediately, causing a race where deletion may
proceed before volumes are detached; make detachDataVolume synchronous or chain
the deletion logic into its completion callback: change detachDataVolume to
accept a Completion (or return a Future) and either block/wait for its
successful completion before continuing the deletion steps at lines 92-123, or
move the host deletion code into the Completion's success branch and handle
errors in its fail branch to ensure volumes are detached before sending the host
delete messages.

Comment on lines +190 to +225
private void detachDataVolume(List<HostInventory> hinvs) {
List<DetachDataVolumeFromHostMsg> dmsgs = hinvs.stream().map(hostInv -> {
VolumeHostRefVO refVO = Q.New(VolumeHostRefVO.class).eq(VolumeHostRefVO_.hostUuid, hostInv.getUuid()).find();
if (refVO == null) {
return null;
}
String volumeInstallPath = Q.New(VolumeVO.class).select(VolumeVO_.installPath)
.eq(VolumeVO_.uuid, refVO.getVolumeUuid()).findValue();
return new VolumeHostRef(refVO, volumeInstallPath);
}).filter(Objects::nonNull).map(volumeHostRef -> {
DetachDataVolumeFromHostMsg dmsg = new DetachDataVolumeFromHostMsg();
dmsg.setHostUuid(volumeHostRef.refVO.getHostUuid());
dmsg.setMountPath(volumeHostRef.refVO.getMountPath());
dmsg.setDevice(volumeHostRef.refVO.getDevice());
dmsg.setVolumeUuid(volumeHostRef.refVO.getVolumeUuid());
dmsg.setVolumeInstallPath(volumeHostRef.volumeInstallPath);
bus.makeTargetServiceIdByResourceUuid(dmsg, HostConstant.SERVICE_ID, volumeHostRef.refVO.getHostUuid());
return dmsg;
}).collect(Collectors.toList());
if (dmsgs.isEmpty()) {
return;
}
new While<>(dmsgs).each((umsg, com) -> bus.send(umsg, new CloudBusCallBack(com) {
@Override
public void run(MessageReply reply) {
if (!reply.isSuccess()) {
logger.warn(String.format("failed to detach volume on host, %s", reply.getError()));
}
com.done();
}
})).run(new WhileDoneCompletion(null) {
@Override
public void done(ErrorCodeList errorCodeList) {
}
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

detachDataVolume 方法存在多个严重问题。

问题 1 - 严重 BUG:仅分离单个卷(第 192 行)

Q.New(VolumeHostRefVO.class).eq(...).find() 仅返回一条记录。如果主机挂载了多个数据卷,只会分离其中一个,其余卷会被遗漏。

应该使用 list() 代替 find(),并配合 flatMap 处理多个卷。

问题 2 - 性能问题:循环中嵌套查询(第 191-198 行)

在 stream 的 map 中对每个主机执行两次数据库查询(第 192 行和第 196-197 行),形成 N+1 查询问题。违反编码规范:"禁止循环里套查询,避免嵌套查询带来的性能问题"。

建议批量查询所有相关的 VolumeHostRefVOVolumeVO,再进行内存关联。

问题 3 - 异步执行未同步(第 212-224 行)

完成处理器为空(221-223 行),确认了"即发即弃"设计。结合第 90 行的调用方式,会导致卷分离与主机删除并发执行(已在 90-91 行标记)。

🔎 建议的完整修复方案
-    private void detachDataVolume(List<HostInventory> hinvs) {
+    private void detachDataVolume(List<HostInventory> hinvs, Completion completion) {
+        // Batch query all VolumeHostRefVO for all hosts
+        List<String> hostUuids = hinvs.stream().map(HostInventory::getUuid).collect(Collectors.toList());
+        List<VolumeHostRefVO> refVOs = Q.New(VolumeHostRefVO.class)
+                .in(VolumeHostRefVO_.hostUuid, hostUuids)
+                .list();
+        
+        if (refVOs.isEmpty()) {
+            completion.success();
+            return;
+        }
+        
+        // Batch query all volumes
+        List<String> volumeUuids = refVOs.stream()
+                .map(VolumeHostRefVO::getVolumeUuid)
+                .collect(Collectors.toList());
+        List<VolumeVO> volumes = Q.New(VolumeVO.class)
+                .select(VolumeVO_.uuid, VolumeVO_.installPath)
+                .in(VolumeVO_.uuid, volumeUuids)
+                .list();
+        
+        // Build volume install path map
+        Map<String, String> volumeInstallPathMap = volumes.stream()
+                .collect(Collectors.toMap(VolumeVO::getUuid, VolumeVO::getInstallPath));
+        
+        // Build detach messages
-        List<DetachDataVolumeFromHostMsg> dmsgs = hinvs.stream().map(hostInv -> {
-            VolumeHostRefVO refVO = Q.New(VolumeHostRefVO.class).eq(VolumeHostRefVO_.hostUuid, hostInv.getUuid()).find();
-            if (refVO == null) {
-                return null;
-            }
-            String volumeInstallPath = Q.New(VolumeVO.class).select(VolumeVO_.installPath)
-                    .eq(VolumeVO_.uuid, refVO.getVolumeUuid()).findValue();
-            return new VolumeHostRef(refVO, volumeInstallPath);
-        }).filter(Objects::nonNull).map(volumeHostRef -> {
+        List<DetachDataVolumeFromHostMsg> dmsgs = refVOs.stream().map(refVO -> {
+            String volumeInstallPath = volumeInstallPathMap.get(refVO.getVolumeUuid());
+            if (volumeInstallPath == null) {
+                logger.warn(String.format("Cannot find install path for volume[uuid:%s]", refVO.getVolumeUuid()));
+                return null;
+            }
             DetachDataVolumeFromHostMsg dmsg = new DetachDataVolumeFromHostMsg();
-            dmsg.setHostUuid(volumeHostRef.refVO.getHostUuid());
-            dmsg.setMountPath(volumeHostRef.refVO.getMountPath());
-            dmsg.setDevice(volumeHostRef.refVO.getDevice());
-            dmsg.setVolumeUuid(volumeHostRef.refVO.getVolumeUuid());
-            dmsg.setVolumeInstallPath(volumeHostRef.volumeInstallPath);
-            bus.makeTargetServiceIdByResourceUuid(dmsg, HostConstant.SERVICE_ID, volumeHostRef.refVO.getHostUuid());
+            dmsg.setHostUuid(refVO.getHostUuid());
+            dmsg.setMountPath(refVO.getMountPath());
+            dmsg.setDevice(refVO.getDevice());
+            dmsg.setVolumeUuid(refVO.getVolumeUuid());
+            dmsg.setVolumeInstallPath(volumeInstallPath);
+            bus.makeTargetServiceIdByResourceUuid(dmsg, HostConstant.SERVICE_ID, refVO.getHostUuid());
             return dmsg;
-        }).collect(Collectors.toList());
+        }).filter(Objects::nonNull).collect(Collectors.toList());
+        
         if (dmsgs.isEmpty()) {
+            completion.success();
             return;
         }
+        
         new While<>(dmsgs).each((umsg, com) -> bus.send(umsg, new CloudBusCallBack(com) {
             @Override
             public void run(MessageReply reply) {
                 if (!reply.isSuccess()) {
                     logger.warn(String.format("failed to detach volume on host, %s", reply.getError()));
                 }
                 com.done();
             }
-        })).run(new WhileDoneCompletion(null) {
+        })).run(new WhileDoneCompletion(completion) {
             @Override
             public void done(ErrorCodeList errorCodeList) {
+                // Detach failures are logged but don't fail the operation
+                // to ensure host deletion can proceed
+                completion.success();
             }
         });
     }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
compute/src/main/java/org/zstack/compute/host/HostCascadeExtension.java around
lines 190-225: the current detachDataVolume implementation only detaches one
VolumeHostRef per host (uses find()), performs N+1 queries inside the stream,
and fires async detach calls without properly waiting/error-aggregating; fix by:
1) replace the single-record query with a batch query — collect all host UUIDs
from hinvs, query VolumeHostRefVO.list() for those host UUIDs, and collect all
VolumeHostRefVOs (use flatMap semantics so multiple refs per host are handled);
2) batch-query VolumeVOs for all referenced volume UUIDs (one query) and build a
map volumeUuid->installPath, then build DetachDataVolumeFromHostMsg for every
VolumeHostRefVO using the map (no per-item DB queries); 3) send the messages
with the existing While<> but provide a proper WhileDoneCompletion that
aggregates errors (or returns a completion/future) so the caller can wait/handle
failures instead of "fire-and-forget" — at minimum log/collect errorCodeList in
done(...) and propagate or surface it to the caller (or change method signature
to return a Completion or throw on failure) so detach operations are
synchronized with host deletion.

Comment on lines +111 to +116
VmNicType type = nicManager.getVmNicType(spec.getVmInventory().getUuid(), nw);
if (type == null) {
errs.add(Platform.operr("there is no available nicType on L3 network [%s]", nw.getUuid()));
wcomp.allDone();
}
VmInstanceNicFactory vnicFactory = vmMgr.getVmInstanceNicFactory(type);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

控制流问题:wcomp.allDone() 后缺少 return 语句

type == null 时,代码调用了 wcomp.allDone() 但没有返回。这会导致后续代码继续执行,从而可能产生 NullPointerException(在第 116 行使用 type)。

🔎 建议修复
             VmNicType type = nicManager.getVmNicType(spec.getVmInventory().getUuid(), nw);
             if (type == null) {
                 errs.add(Platform.operr("there is no available nicType on L3 network [%s]", nw.getUuid()));
                 wcomp.allDone();
+                return;
             }
             VmInstanceNicFactory vnicFactory = vmMgr.getVmInstanceNicFactory(type);
🤖 Prompt for AI Agents
In compute/src/main/java/org/zstack/compute/vm/VmAllocateNicFlow.java around
lines 111 to 116, the control flow calls wcomp.allDone() when type == null but
does not return, which allows execution to continue and use the null type;
modify the flow to immediately return after wcomp.allDone() (or alternatively
place the subsequent VmInstanceNicFactory lookup and usages inside an else
branch) so that no code runs when type is null and a NullPointerException cannot
occur.

Comment on lines 82 to 91
for (VolumeSpec vspec : volumeSpecs) {
CreateVolumeMsg msg = new CreateVolumeMsg();
Set<String> tags = new HashSet<>();
if (vspec != null && vspec.getTags() != null) {
tags.addAll(vspec.getTags());
if (vspec != null) {
if (vspec.getTags() != null) {
tags.addAll(vspec.getTags());
}
} else {
continue;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

严重:索引不匹配导致的数据关联错误

prepareMsg 方法中使用 continue 跳过 null 或非 Root/Data 类型的 VolumeSpec,但在 run 方法的回调中(第144行),直接使用 replies.indexOf(r)spec.getVolumeSpecs() 获取 vspec,这会导致索引不匹配。

例如:如果 volumeSpecs 有 3 个元素 [A, null, B],msgs 只会有 2 个消息,但回调中 replies.indexOf(r) 返回 0 或 1 时,会错误地从 volumeSpecs 中取到 A 或 null,而不是正确的 A 或 B。

🔎 建议修复思路

应该在构建消息时同时维护一个有效的 VolumeSpec 列表,或者在消息中携带关联信息以便在回调中正确匹配:

     protected List<CreateVolumeMsg> prepareMsg(Map<String, Object> ctx) {
+        List<VolumeSpec> validVolumeSpecs = new ArrayList<>();
         // ...
         for (VolumeSpec vspec : volumeSpecs) {
             // ...
             if (vspec == null) {
                 continue;
             }
             // ... build msg ...
+            validVolumeSpecs.add(vspec);
             msgs.add(msg);
         }
+        ctx.put("validVolumeSpecs", validVolumeSpecs);
         return msgs;
     }

然后在 run 方法中使用 validVolumeSpecs 而非 spec.getVolumeSpecs()

Committable suggestion skipped: line range outside the PR's diff.

@@ -0,0 +1,9 @@
ALTER TABLE `zstack`.`GuestOsCategoryVO` MODIFY COLUMN osRelease varchar(64) NOT NULL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

修改列为 NOT NULL 未处理历史数据

根据编码规范,当添加 NOT NULL 约束时,必须使用存储过程或其他方式处理历史数据。如果 GuestOsCategoryVO 表中存在 osRelease 为 NULL 的记录,此 ALTER 语句将失败。

🔎 建议修复
+-- 先处理可能存在的 NULL 值
+UPDATE `zstack`.`GuestOsCategoryVO` SET osRelease = 'unknown' WHERE osRelease IS NULL;
+
 ALTER TABLE `zstack`.`GuestOsCategoryVO` MODIFY COLUMN osRelease varchar(64) NOT NULL;

或者使用存储过程检查并处理:

DELIMITER $$
CREATE PROCEDURE UpdateGuestOsCategoryOsRelease()
BEGIN
    UPDATE `zstack`.`GuestOsCategoryVO` SET osRelease = 'unknown' WHERE osRelease IS NULL;
    ALTER TABLE `zstack`.`GuestOsCategoryVO` MODIFY COLUMN osRelease varchar(64) NOT NULL;
END $$
DELIMITER ;
CALL UpdateGuestOsCategoryOsRelease();
DROP PROCEDURE IF EXISTS UpdateGuestOsCategoryOsRelease;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ALTER TABLE `zstack`.`GuestOsCategoryVO` MODIFY COLUMN osRelease varchar(64) NOT NULL;
-- Handle possible existing NULL values
UPDATE `zstack`.`GuestOsCategoryVO` SET osRelease = 'unknown' WHERE osRelease IS NULL;
ALTER TABLE `zstack`.`GuestOsCategoryVO` MODIFY COLUMN osRelease varchar(64) NOT NULL;
🤖 Prompt for AI Agents
In conf/db/upgrade/V4.7.0.3__schema.sql around line 1, the ALTER TABLE sets
osRelease to NOT NULL but does not handle existing NULL values; update existing
NULL osRelease rows to a safe default (e.g., 'unknown' or appropriate value)
before applying the NOT NULL constraint. Implement this as a transactional
migration: (1) update GuestOsCategoryVO rows where osRelease IS NULL to the
chosen default, (2) verify no NULLs remain, and (3) run the ALTER TABLE to
MODIFY COLUMN osRelease varchar(64) NOT NULL; optionally wrap the update + alter
in a stored procedure or transaction as per project conventions and drop the
procedure afterward if used.

Comment on lines +25 to +27
ALTER TABLE `zstack`.`BareMetal2BondingNicRefVO` modify column nicUuid varchar(32) DEFAULT NULL;
CALL ADD_COLUMN('BareMetal2BondingNicRefVO', 'provisionNicUuid', 'VARCHAR(32)', 1, NULL);
CALL ADD_CONSTRAINT('BareMetal2BondingNicRefVO', 'fkBareMetal2BondingNicRefVOProvisionNicVO', 'provisionNicUuid', 'BareMetal2InstanceProvisionNicVO', 'uuid', 'CASCADE');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# 查找ADD_COLUMN和ADD_CONSTRAINT存储过程定义
rg -n "CREATE.*PROCEDURE.*ADD_COLUMN" --type=sql -A 25 | head -60

Repository: MatheMatrix/zstack

Length of output: 2150


🏁 Script executed:

# 查找ADD_CONSTRAINT存储过程定义
rg -n "CREATE.*PROCEDURE.*ADD_CONSTRAINT" --type=sql -A 25 | head -60

Repository: MatheMatrix/zstack

Length of output: 2076


🏁 Script executed:

# 查看V4.7.13__schema.sql文件的完整内容和上下文
fd "V4.7.13__schema.sql" --type f -x cat -n {}

Repository: MatheMatrix/zstack

Length of output: 1635


第25行应使用存储过程而非直接DDL修改,以确保幂等性

虽然nicUuid改为可空列是安全的向后兼容操作,但第25行直接使用ALTER TABLE修改列属性,在升级脚本重复执行时缺乏保护。根据ZStack最佳实践,应创建存储过程(如MODIFY_COLUMN)来检查列的当前状态,仅在必要时执行修改。

第26-27行做法正确:

  • ADD_COLUMN存储过程通过INFORMATION_SCHEMA.COLUMNS检查列存在性(beforeMigrate.sql:160-164),确保幂等性
  • ADD_CONSTRAINT存储过程通过INFORMATION_SCHEMA.TABLE_CONSTRAINTS检查约束存在性(beforeMigrate.sql:197-201),确保幂等性

建议将第25行重构为类似的存储过程调用模式。

🤖 Prompt for AI Agents
In conf/db/upgrade/V4.7.13__schema.sql around lines 25-27, the direct ALTER
TABLE on line 25 should be replaced with a stored-procedure based change to
ensure idempotency; update the script to call the existing MODIFY_COLUMN (or add
a similar stored procedure) that checks INFORMATION_SCHEMA for the current
nicUuid column definition and only performs the nullable change if it differs,
mirroring how ADD_COLUMN and ADD_CONSTRAINT protect against repeat runs, so the
upgrade becomes safe to re-run without error.

Comment on lines +1 to +2
ALTER TABLE `zstack`.`L2NetworkEO` ADD COLUMN `isolated` boolean NOT NULL DEFAULT FALSE AFTER `virtualNetworkId`;
ALTER TABLE `zstack`.`L2NetworkEO` ADD COLUMN `pvlan` varchar(128) DEFAULT NULL AFTER `virtualNetworkId`;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

列位置定义错误,需要修正。

第2行的 pvlan 列定义为 AFTER 'virtualNetworkId',但第1行已经在 virtualNetworkId 之后添加了 isolated 列。当这两条语句顺序执行时,MySQL 会将 pvlan 插入到 virtualNetworkIdisolated 之间,最终列顺序为:virtualNetworkId, pvlan, isolated, ...

然而第4行的视图定义显示列的顺序应该是 isolated, pvlan,这表明原意是 pvlan 应该在 isolated 之后。

🔎 建议的修复
 ALTER TABLE `zstack`.`L2NetworkEO` ADD COLUMN `isolated` boolean NOT NULL DEFAULT FALSE AFTER `virtualNetworkId`;
-ALTER TABLE `zstack`.`L2NetworkEO` ADD COLUMN `pvlan` varchar(128) DEFAULT NULL AFTER `virtualNetworkId`;
+ALTER TABLE `zstack`.`L2NetworkEO` ADD COLUMN `pvlan` varchar(128) DEFAULT NULL AFTER `isolated`;
🤖 Prompt for AI Agents
In conf/db/upgrade/V4.8.0.6__schema.sql around lines 1-2, the ALTER statements
add isolated then pvlan both AFTER `virtualNetworkId`, which yields column order
`virtualNetworkId, pvlan, isolated` but the view expects `isolated, pvlan`;
change the second ALTER to add pvlan AFTER `isolated` (or swap the two ALTERs so
pvlan is added after isolated) so the final column order is `virtualNetworkId,
isolated, pvlan` and matches the view definition.

CONSTRAINT `fkEthernetVfPciDeviceVOL3NetworkEO` FOREIGN KEY (`l3NetworkUuid`) REFERENCES `L3NetworkEO` (`uuid`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE ConsoleProxyVO ADD COLUMN `expiredDate` timestamp NOT NULL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

ConsoleProxyVO.expiredDate 以 NOT NULL 形式直接新增,升级时可能失败

当前语句:

ALTER TABLE ConsoleProxyVO ADD COLUMN `expiredDate` timestamp NOT NULL;

在 MySQL 上若 ConsoleProxyVO 已有历史数据,这种“NOT NULL 且无默认值”的新增列在严格 SQL 模式下会直接失败,且不符合“NOT NULL 时需要用存储过程/函数处理历史数据”的升级规范。

建议改为两步或三步走,例如:

  1. 先以可为空形式新增列并为历史数据回填:
    ALTER TABLE ConsoleProxyVO ADD COLUMN `expiredDate` timestamp NULL;
    UPDATE ConsoleProxyVO SET expiredDate = CURRENT_TIMESTAMP WHERE expiredDate IS NULL;
  2. 如确需强制非空,再修改为 NOT NULL:
    ALTER TABLE ConsoleProxyVO MODIFY COLUMN `expiredDate` timestamp NOT NULL;

这样既满足历史数据处理要求,又避免升级脚本在已有环境上直接报错中断。

🤖 Prompt for AI Agents
In conf/db/upgrade/V5.0.0__schema.sql around line 81, adding
ConsoleProxyVO.expiredDate as NOT NULL directly will fail on MySQL with existing
rows; instead add the column as nullable, backfill existing rows with an
appropriate value (e.g. CURRENT_TIMESTAMP or other business-appropriate default)
and then alter the column to NOT NULL; implement this as a two-step (ADD COLUMN
... NULL; UPDATE ... SET expiredDate = ... WHERE expiredDate IS NULL) followed
by a MODIFY/ALTER to NOT NULL to avoid upgrade failures on strict SQL mode.

Comment on lines +160 to +169
DECLARE cur CURSOR FOR SELECT pci.uuid, pci.vendor FROM PciDeviceVO pci;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

OPEN cur;
read_loop: LOOP
FETCH cur INTO vendor, pciDeviceUuid;
IF done THEN
LEAVE read_loop;
END IF;
UPDATE MdevDeviceVO SET vendor = vendor WHERE parentUuid = pciDeviceUuid;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

严重错误:FETCH 变量顺序与 SELECT 列顺序不匹配

游标定义为 SELECT pci.uuid, pci.vendor FROM PciDeviceVO pci,但 FETCH 语句为 FETCH cur INTO vendor, pciDeviceUuid,顺序相反。这会导致 uuid 被赋给 vendor,vendor 被赋给 pciDeviceUuid,最终 UPDATE 语句会将 uuid 值写入 vendor 字段。

🔎 建议修复
-        DECLARE cur CURSOR FOR SELECT pci.uuid, pci.vendor FROM PciDeviceVO pci;
+        DECLARE cur CURSOR FOR SELECT pci.vendor, pci.uuid FROM PciDeviceVO pci;

或修改 FETCH 语句:

-            FETCH cur INTO vendor, pciDeviceUuid;
+            FETCH cur INTO pciDeviceUuid, vendor;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DECLARE cur CURSOR FOR SELECT pci.uuid, pci.vendor FROM PciDeviceVO pci;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
read_loop: LOOP
FETCH cur INTO vendor, pciDeviceUuid;
IF done THEN
LEAVE read_loop;
END IF;
UPDATE MdevDeviceVO SET vendor = vendor WHERE parentUuid = pciDeviceUuid;
DECLARE cur CURSOR FOR SELECT pci.uuid, pci.vendor FROM PciDeviceVO pci;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
read_loop: LOOP
FETCH cur INTO pciDeviceUuid, vendor;
IF done THEN
LEAVE read_loop;
END IF;
UPDATE MdevDeviceVO SET vendor = vendor WHERE parentUuid = pciDeviceUuid;
🤖 Prompt for AI Agents
In conf/db/upgrade/V5.1.8__schema.sql around lines 160 to 169 the cursor SELECT
lists columns as (pci.uuid, pci.vendor) but the FETCH uses (vendor,
pciDeviceUuid) in reverse order; update the FETCH to match the SELECT order
(FETCH cur INTO pciDeviceUuid, vendor) or alternatively change the SELECT to
(pci.vendor, pci.uuid) so the fetched variables align, then keep the UPDATE as
UPDATE MdevDeviceVO SET vendor = vendor WHERE parentUuid = pciDeviceUuid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.