-
Notifications
You must be signed in to change notification settings - Fork 0
<fix>[ceph]: support thirdparty_ceph to bm root volume #3189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 5.5.6
Are you sure you want to change the base?
Conversation
总览该变更通过数据库架构升级、新扩展点接口和Ceph存储逻辑集成,实现第三方克隆支持。添加数据库列处理裸金属2实例配置,引入扩展机制验证实例类型并转换块卷,在Ceph克隆流程中集成第三方SDK交互。 变更明细
代码审查难度估计🎯 3 (中等复杂) | ⏱️ ~20 分钟 诗
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.40.4)plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javaComment |
support thirdparty_ceph to bm root volume Resolves: ZSTAC-73396 Change-Id: I6f6f616777677a6f74696c6c736c737a626f7863
9fb0ae3 to
7609ddf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @conf/db/upgrade/V4.8.0.8__schema.sql:
- Around line 30-34: The DROP FOREIGN KEY for constraint
fkBareMetal2InstanceProvisionNicVOInstanceVO on table
BareMetal2InstanceProvisionNicVO lacks an existence check and makes the
migration non-idempotent; change the direct ALTER TABLE ... DROP FOREIGN KEY to
first verify the constraint exists (e.g. query INFORMATION_SCHEMA or reuse the
repository's DROP_CONSTRAINT helper) and only execute the drop when present,
then proceed with the CALL ADD_CONSTRAINT(...) as before so the script is safe
to run multiple times.
In @conf/db/upgrade/V5.5.6__schema.sql:
- Around line 30-34: The DROP FOREIGN KEY for
fkBareMetal2InstanceProvisionNicVOInstanceVO on table
BareMetal2InstanceProvisionNicVO lacks an existence check and breaks
idempotency; modify V5.5.6__schema.sql to query information_schema
(constraints/KEY_COLUMN_USAGE) to confirm the constraint exists before executing
ALTER TABLE `BareMetal2InstanceProvisionNicVO` DROP FOREIGN KEY
`fkBareMetal2InstanceProvisionNicVOInstanceVO`, and only run the DROP when the
check returns a positive match so the upgrade script is safe to re-run; keep the
subsequent CALL ADD_CONSTRAINT(...) as-is after the conditional drop.
In
@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java:
- Around line 2660-2693: The loop over
CephPrimaryStorageCheckInstanceTypeExtensionPoint has wrong control flow and
null handling: replace the current if (!result) { break; } with if
(!Boolean.TRUE.equals(result)) { continue; } so a null/false just skips that
extension instead of aborting all, and after ext.convertToBlockVolume(vo); add a
break; to stop after the first supporting extension; also review usages of
cmd.token/cmd.monIp/cmd.tpTimeout to avoid emitting sensitive token values in
logs or audit output.
🧹 Nitpick comments (1)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java (1)
1-13: 接口契约不清晰且Boolean返回值易引入 NPE(建议收紧签名并补全 Javadoc)
目前String uuid的语义不明确(不同方法可能需要 vmUuid/volumeUuid/psUuid),并且Boolean存在返回null的潜在风险,调用方若按 primitive 语义使用会出问题;同时接口方法缺少有效 Javadoc。
📜 Review details
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
sdk/src/main/java/org/zstack/sdk/BareMetal2ChassisNicInventory.javais excluded by!sdk/**sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceInventory.javais excluded by!sdk/**sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceProvisionNicInventory.javais excluded by!sdk/**
📒 Files selected for processing (4)
conf/db/upgrade/V4.8.0.8__schema.sqlconf/db/upgrade/V5.5.6__schema.sqlplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
🧰 Additional context used
📓 Path-based instructions (3)
**/*.*
⚙️ CodeRabbit configuration file
**/*.*: - 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写
Files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.javaconf/db/upgrade/V4.8.0.8__schema.sqlplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javaconf/db/upgrade/V5.5.6__schema.sql
**/*.java
⚙️ CodeRabbit configuration file
**/*.java: ## 1. API 设计要求
- API 命名:
- API 名称必须唯一,不能重复。
- API 消息类需要继承
APIMessage;其返回类必须继承APIReply或APIEvent,并在注释中用@RestResponse进行标注。- API 消息上必须添加注解
@RestRequest,并满足如下规范:
path:
- 针对资源使用复数形式。
- 当 path 中引用消息类变量时,使用
{variableName}格式。- HTTP 方法对应:
- 查询操作 →
HttpMethod.GET- 更新操作 →
HttpMethod.PUT- 创建操作 →
HttpMethod.POST- 删除操作 →
HttpMethod.DELETE- API 类需要实现
__example__方法以便生成 API 文档,并确保生成对应的 Groovy API Template 与 API Markdown 文件。
2. 命名与格式规范
类名:
- 使用 UpperCamelCase 风格。
- 特殊情况:
- VO/AO/EO 类型类除外。
- 抽象类采用
Abstract或Base前缀/后缀。- 异常类应以
Exception结尾。- 测试类需要以
Test或Case结尾。方法名、参数名、成员变量和局部变量:
- 使用 lowerCamelCase 风格。
常量命名:
- 全部大写,使用下划线分隔单词。
- 要求表达清楚,避免使用含糊或不准确的名称。
包名:
- 统一使用小写,使用点分隔符,每个部分应是一个具有自然语义的英文单词(参考 Spring 框架的结构)。
命名细节:
- 避免在父子类或同一代码块中出现相同名字的成员或局部变量,防止混淆。
- 命名缩写:
- 不允许使用不必要的缩写,如:
AbsSchedulerJob、condi、Fu等。应使用完整单词提升可读性。
3. 编写自解释代码
意图表达:
- 避免使用布尔型参数造成含义不明确。例如:
- 对于
stopAgent(boolean ignoreError),建议拆分为不同函数(如stopAgentIgnoreError()),或使用枚举表达操作类型。- 命名应尽量用完整的单词组合表达意图,并在名称中体现数据类型或用途(例如在常量与变量名称中,将类型词放在末尾)。
- 避免使用魔法值(Magic Value):
直接使用未经定义的数值或字符串(如 if (status == 5))应替换为枚举或常量。
示例:
// 错误示例:魔法值
if (user.getStatus() == 5) { ... }
// 正确示例:常量或枚举
public static final int STATUS_ACTIVE = 5;
if (user.getStatus() == STATUS_ACTIVE) { ... }
// 或使用枚举
enum UserStatus { ACTIVE, INACTIVE }
注释:
- 代码应尽量做到自解释,对少于两行的说明可以直接写在代码中。
- 对于较长的注释,需要仔细校对并随代码更新,确保内容正确。
- 接口方法不应有多余的修饰符(例如
public),且必须配有有效的 Javadoc 注释。
4. 流程控制和结构优化
if...else 的使用:
- 应尽量减少 if...else 结构的使用,建议:
- 限制嵌套层级最多为两层,且内层不应再出现
else分支。- 尽早返回(Early Return),将条件判断中的处理逻辑提前结束或抽成独立方法。
- 使用 Java Stream 或 Lambda 表达式代替冗长的循环与条件判断。
条件判断:
- if 条件表达不宜过长或过于复杂,必要时可以将条件抽成 boolean 变量描述。
代码块长度:
...
Files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
**/*.sql
⚙️ CodeRabbit configuration file
**/*.sql: - Review the SQL code, make sure has no errors and confirm that:
- Upgrading scene has been carefully handled
- Do not use
DEFAULT 0000-00-00 00:00:00, useDEFAULT CURRENT_TIMESTAMPinstead- When
NOT NULLexists, must usestored procedureor other functions to process historical data, this is very very important- 数据库记录中,如果字符串长度不可控,不要用
vchar,用text类型
Files:
conf/db/upgrade/V4.8.0.8__schema.sqlconf/db/upgrade/V5.5.6__schema.sql
🧠 Learnings (7)
📚 Learning: 2025-08-03T04:10:21.683Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: ZStack 数据库升级脚本 V4.10.16__schema.sql 中的 UPGRADE_VM_METADATA_TABLES_DIRECT 存储过程使用直接 RENAME TABLE 操作,不具备幂等性。需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。
Applied to files:
conf/db/upgrade/V4.8.0.8__schema.sqlconf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-14T06:56:19.585Z
Learnt from: zstack-robot-2
Repo: MatheMatrix/zstack PR: 2435
File: storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java:47-47
Timestamp: 2025-08-14T06:56:19.585Z
Learning: 在VolumeSnapshotGroupBase.java中,VmInstanceResourceMetadataManager的注入和SKIP_RESOURCE_ROLLBACK标记虽然在当前版本中未被使用,但这些导入在大型重构PR中是有意为之的,用于保持代码一致性或为后续功能做准备。
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: 在ZStack项目中,同一包内的类(如VolumeSnapshotGroupVO)不需要显式导入。同时,uuid()辅助方法在API消息类的__example__()方法中可以直接使用,无需静态导入,这是通过项目的构建系统或框架配置实现的。
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: In the ZStack project, API message classes may not require explicit imports for VolumeSnapshotGroupVO references in APIParam annotations or for the uuid() helper method used in __example__() methods, even when these classes/methods are defined in other packages. The build system or framework appears to handle these references automatically.
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-25T03:52:37.301Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2504
File: storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java:775-819
Timestamp: 2025-08-25T03:52:37.301Z
Learning: 在ZStack项目的VolumeSnapshotTreeBase类中,当通过dbf.findByUuid()方法获取VolumeVO对象时,需要进行null检查,因为该方法可能在找不到对应记录时返回null,直接使用可能导致NullPointerException。
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-07-22T02:30:46.123Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2311
File: plugin/sdnController/src/main/java/org/zstack/sdnController/hardwareVxlan/KVMRealizeHardwareVxlanNetworkBackend.java:52-53
Timestamp: 2025-07-22T02:30:46.123Z
Learning: In ZStack, backend realization code (such as KVMRealizeHardwareVxlanNetworkBackend) can safely assume the existence of the VO for the resource being operated on (e.g., HardwareL2VxlanNetworkVO for a given L2NetworkInventory), due to system lifecycle management and strong foreign key constraints in the database schema. Explicit null checks after findByUuid are not required in these contexts.
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-12T05:52:18.323Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2418
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:3873-3885
Timestamp: 2025-08-12T05:52:18.323Z
Learning: 在 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java 的 setNoIpAllocationStaticIp() 流程中,静态 IP 标签删除已统一在 change-ip-in-database 步骤对 voRemoveSet 循环处理:对收集到的旧 IPv4/IPv6 UsedIpVO 均调用 ipOperator.deleteStaticIpByVmUuidAndL3Uuid(vmUuid, l3Uuid, IPv6NetworkUtils.ipv6AddessToTagValue(ip)) 清理对应的 static-ip 标签。因此无需在分支里分别删除 IPv4/IPv6 的标签。
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
🔍 Remote MCP AtlassianMCP
Excellent! Now I have all the relevant context. Let me compile a comprehensive summary of the findings.
Additional Context for PR #3189 Review
Background and Business Context
ZSTAC-73396 is a P0 high-priority feature request from China Export Credit Insurance Corporation for elastic bare metal (弹性裸金属) to support using high-speed cloud disks as root volumes with HA support. The business impact is significant - customer project acceptance is blocked, affecting ~1M total settlement revenue and ~400k ZStack revenue.
Problem Context:
- Previous deployment used legacy cloud disks as system disks, which encountered IO errors (TIC-3691)
- Gateway node has single-point-of-failure risk without HA groups
- Network deployment has single-link risk without aggregation
- Customer requires high-speed cloud disk support for production use to avoid gateway node bottleneck
Solution: Support high-speed cloud disks as elastic bare metal instance root volumes, targeting March 12 launch date.
Status: ZSTAC-73396 is Closed/Fixed in version 4.8.0-zhongxinbao with 20h DEV effort and 40h QA effort. ZSTAC-81282 is a clone issue for version 5.5.6 (currently Open), assigned to 沈今 (Jin Shen).
Technical Changes Summary
The PR implements third-party Ceph clone support for bare-metal root volumes:
-
Database Schema Upgrades (SQL migration scripts):
- Introduces
UpdateBareMetal2InstanceProvisionNicVO()procedure in both V4.8.0.8 and V5.5.6 - Adds
instanceUuidcolumn tracking for provision NIC table - Adds
isPrimaryProvisionNicboolean flags for both BareMetal2InstanceProvisionNicVO and BareMetal2ChassisNicVO tables - Ensures transactional safety with explicit error handling
- Introduces
-
Ceph Primary Storage Enhancement (CephPrimaryStorageBase.java):
- Introduces extension point mechanism for third-party clone support
- For each
CephPrimaryStorageCheckInstanceTypeExtensionPoint, checks if third-party clone is supported - If conditions met (root volume, baremetal2 VM type, third-party token): discovers Ceph mon IP, assigns token, sets SDK timeout, converts volume to block volume before cloning
- Post-clone: populates block volume details and adjusts target volume path
- Adds new fields to
CloneRsp:volumeIdandvolumeStatus
-
New Extension Point Interface (CephPrimaryStorageCheckInstanceTypeExtensionPoint.java):
isSupportCloneByThirdParty(String uuid)- check third-party clone supportconvertToBlockVolume(VolumeVO vo)- convert to block volumeisBlockVolume(String uuid)- identify block volumespopulateBlockVolumeDetails(String uuid, String volumeId, String volumeStatus)- populate results
Related Issue History
- TIC-3691: IO error issue when elastic bare metal + Ceph storage uses cloud disk as system disk. Status: Closed with unresolved risk assessment on production readiness.
- ZSTAC-79783: Earlier version of this fix for version 4.8.30-zhongxinbao (merged per GitLab automation)
- Referenced Confluence pages document detailed QA test plans and UI/backend implementation specs
Key Review Points
- The extension point design allows flexible third-party Ceph integration without hardcoding
- Schema migrations use defensive patterns (checks
IF ... EXISTSbefore adding columns) - Volume transformation logic (regular cloud disk → block volume) is critical for third-party clone compatibility
- Changes are targeted for 5.5.6 release branch per target PR branch specification
🔇 Additional comments (3)
conf/db/upgrade/V5.5.6__schema.sql (1)
1-74: 存储过程整体结构良好整体设计遵循了良好的升级脚本实践:
- 使用
INFORMATION_SCHEMA检查列存在性确保幂等性- 使用事务 (
START TRANSACTION/COMMIT) 保证原子性- 使用
EXIT HANDLER FOR SQLEXCEPTION处理异常并回滚- 历史数据迁移逻辑正确(
instanceUuid从uuid复制,isPrimaryProvisionNic按条件初始化)除上述外键删除的幂等性问题外,其余逻辑正确。
conf/db/upgrade/V4.8.0.8__schema.sql (1)
1-74: 两个版本分支使用相同的迁移逻辑此文件 (
V4.8.0.8__schema.sql) 与V5.5.6__schema.sql内容完全一致,用于在不同版本分支(4.8.x 和 5.5.x)中执行相同的 schema 升级。这符合跨版本 backport 的常规做法。请确保对两个文件应用相同的修复。
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (1)
487-517:CloneRsp新增字段 OK,但建议确认序列化兼容性与空值语义
如果 agent 并非总是返回volumeId/volumeStatus,建议在调用侧按“可空”处理并避免作为必填字段使用。
conf/db/upgrade/V4.8.0.8__schema.sql
Outdated
| ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO` | ||
| DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`; | ||
|
|
||
| CALL ADD_CONSTRAINT('BareMetal2InstanceProvisionNicVO', 'fkBareMetal2InstanceProvisionNicVOInstanceVO', | ||
| 'instanceUuid', 'BareMetal2InstanceVO', 'uuid', 'CASCADE'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
外键删除操作同样缺少存在性检查
与 V5.5.6__schema.sql 相同的问题:DROP FOREIGN KEY 操作缺少存在性检查,需要在删除前验证外键是否存在以确保脚本幂等性。
请参考 V5.5.6__schema.sql 中的修复建议,在此文件中应用相同的修改。
🤖 Prompt for AI Agents
In @conf/db/upgrade/V4.8.0.8__schema.sql around lines 30 - 34, The DROP FOREIGN
KEY for constraint fkBareMetal2InstanceProvisionNicVOInstanceVO on table
BareMetal2InstanceProvisionNicVO lacks an existence check and makes the
migration non-idempotent; change the direct ALTER TABLE ... DROP FOREIGN KEY to
first verify the constraint exists (e.g. query INFORMATION_SCHEMA or reuse the
repository's DROP_CONSTRAINT helper) and only execute the drop when present,
then proceed with the CALL ADD_CONSTRAINT(...) as before so the script is safe
to run multiple times.
| ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO` | ||
| DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`; | ||
|
|
||
| CALL ADD_CONSTRAINT('BareMetal2InstanceProvisionNicVO', 'fkBareMetal2InstanceProvisionNicVOInstanceVO', | ||
| 'instanceUuid', 'BareMetal2InstanceVO', 'uuid', 'CASCADE'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
外键删除操作缺少存在性检查,影响脚本幂等性
DROP FOREIGN KEY 操作在执行前没有检查外键是否存在。如果数据库处于部分升级状态(外键已被删除但列迁移未完成),此操作会失败并导致整个事务回滚。
建议在删除外键前先检查其是否存在,类似于列的存在性检查方式:
🔧 建议的修复方案
+ -- Check if foreign key exists before dropping
+ SELECT COUNT(*)
+ INTO @fk_exists
+ FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
+ WHERE TABLE_SCHEMA = 'zstack'
+ AND TABLE_NAME = 'BareMetal2InstanceProvisionNicVO'
+ AND CONSTRAINT_NAME = 'fkBareMetal2InstanceProvisionNicVOInstanceVO'
+ AND CONSTRAINT_TYPE = 'FOREIGN KEY';
+
+ IF @fk_exists > 0 THEN
ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO`
DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`;
+ END IF;Based on learnings,ZStack 升级脚本需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。
🤖 Prompt for AI Agents
In @conf/db/upgrade/V5.5.6__schema.sql around lines 30 - 34, The DROP FOREIGN
KEY for fkBareMetal2InstanceProvisionNicVOInstanceVO on table
BareMetal2InstanceProvisionNicVO lacks an existence check and breaks
idempotency; modify V5.5.6__schema.sql to query information_schema
(constraints/KEY_COLUMN_USAGE) to confirm the constraint exists before executing
ALTER TABLE `BareMetal2InstanceProvisionNicVO` DROP FOREIGN KEY
`fkBareMetal2InstanceProvisionNicVOInstanceVO`, and only run the DROP when the
check returns a positive match so the upgrade script is safe to re-run; keep the
subsequent CALL ADD_CONSTRAINT(...) as-is after the conditional drop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (1)
2652-2715: 安全性:thirdparty token 通过 cmd 下发,建议明确避免日志泄露。
cmd.token看起来是“平台 token”,通常属于敏感信息;建议在AgentCommand.token字段上加@NoLogging(或确认现有链路不会把 cmd 序列化打印到日志)。
🤖 Fix all issues with AI agents
In @conf/db/upgrade/V5.5.6__schema.sql:
- Around line 30-31: ALTER TABLE DROP FOREIGN KEY is executed unconditionally
and will fail if fkBareMetal2InstanceProvisionNicVOInstanceVO doesn't exist;
modify V5.5.6__schema.sql to first check information_schema for a constraint
named fkBareMetal2InstanceProvisionNicVOInstanceVO on table
BareMetal2InstanceProvisionNicVO in schema zstack and only run ALTER TABLE
`zstack`.`BareMetal2InstanceProvisionNicVO` DROP FOREIGN KEY
`fkBareMetal2InstanceProvisionNicVOInstanceVO` when that check returns true,
ensuring the migration is idempotent and safe to re-run.
In
@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java:
- Around line 2675-2691: The code sets cmd.monIp in CephPrimaryStorageBase but
HttpCaller.doCall() unconditionally overwrites cmd.monIp, so third-party SDKs
cannot rely on the specified mon; fix by either making HttpCaller.doCall()
populate cmd.monIp only when it is null (check HttpCaller.doCall() and change
assignment to if (cmd.monIp == null) cmd.monIp = base.getSelf().getHostname();)
or introduce a distinct field (e.g., cmd.thirdPartyMonIp) set in
CephPrimaryStorageBase (where cmd.monIp is currently assigned) and update all
call sites and the agent/HttpCaller to use thirdPartyMonIp for SDK usage while
preserving monIp for HTTP routing; update references in CephPrimaryStorageBase
(cmd.monIp assignment near ext.convertToBlockVolume(...) and similar blocks at
the other noted location) and in HttpCaller.doCall()/agent handling to use the
new conditional or new field consistently.
- Around line 2704-2711: In the success-callback loop in CephPrimaryStorageBase
where you iterate CephPrimaryStorageCheckInstanceTypeExtensionPoint, replace the
current `if (!ext.isBlockVolume(...)) { break; }` with `continue` so one
non-matching extension doesn't stop others, and change the installPath rewrite
to only call makeVolumeInstallPathByTargetPool when ret.installPath is non-null
and not already a Ceph install path for the target pool (e.g., check
ret.installPath.startsWith("ceph://") or startsWith(targetCephPoolName + "/") or
a similar prefix/pattern); otherwise leave volumePath alone—use these checks
around the assignment that currently does `volumePath = ret.installPath == null
? volumePath : makeVolumeInstallPathByTargetPool(ret.installPath,
targetCephPoolName)`.
🧹 Nitpick comments (2)
conf/db/upgrade/V5.5.6__schema.sql (1)
9-15: DDL 语句会导致隐式 COMMIT,ROLLBACK 无法回滚 DDL 操作MySQL/MariaDB 中
ALTER TABLE、DROP FOREIGN KEY等 DDL 语句会触发隐式 COMMIT。如果在 DDL 执行后发生错误,ROLLBACK无法撤销已完成的 DDL 变更。虽然存在性检查(通过 INFORMATION_SCHEMA)已确保脚本具备幂等性,但异常处理器给人一种"可完全回滚"的错觉。建议:
- 在注释中说明 DDL 的事务限制
- 或移除事务包装,依赖幂等性检查来保证可重复执行
基于 learnings,这是 ZStack 升级脚本中已知的限制。
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (1)
487-517: CloneRsp 新增字段 OK,但建议统一字段可见性与序列化风格,避免“public 字段 + getter/setter”混用。
当前volumeId/volumeStatus是public,同时又提供 getter/setter;若 JSON 反序列化依赖 public 字段也许没问题,但风格不一致会增加后续维护成本。建议改为 private 字段 + getter/setter(如不影响序列化)
public static class CloneRsp extends AgentResponse { public Long size; public Long actualSize; public String installPath; - public String volumeId; - public String volumeStatus; + private String volumeId; + private String volumeStatus;
📜 Review details
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
sdk/src/main/java/org/zstack/sdk/BareMetal2ChassisNicInventory.javais excluded by!sdk/**sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceInventory.javais excluded by!sdk/**sdk/src/main/java/org/zstack/sdk/BareMetal2InstanceProvisionNicInventory.javais excluded by!sdk/**
📒 Files selected for processing (3)
conf/db/upgrade/V5.5.6__schema.sqlplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
🚧 Files skipped from review as they are similar to previous changes (1)
- plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageCheckInstanceTypeExtensionPoint.java
🧰 Additional context used
📓 Path-based instructions (3)
**/*.*
⚙️ CodeRabbit configuration file
**/*.*: - 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写
Files:
conf/db/upgrade/V5.5.6__schema.sqlplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
**/*.sql
⚙️ CodeRabbit configuration file
**/*.sql: - Review the SQL code, make sure has no errors and confirm that:
- Upgrading scene has been carefully handled
- Do not use
DEFAULT 0000-00-00 00:00:00, useDEFAULT CURRENT_TIMESTAMPinstead- When
NOT NULLexists, must usestored procedureor other functions to process historical data, this is very very important- 数据库记录中,如果字符串长度不可控,不要用
vchar,用text类型
Files:
conf/db/upgrade/V5.5.6__schema.sql
**/*.java
⚙️ CodeRabbit configuration file
**/*.java: ## 1. API 设计要求
- API 命名:
- API 名称必须唯一,不能重复。
- API 消息类需要继承
APIMessage;其返回类必须继承APIReply或APIEvent,并在注释中用@RestResponse进行标注。- API 消息上必须添加注解
@RestRequest,并满足如下规范:
path:
- 针对资源使用复数形式。
- 当 path 中引用消息类变量时,使用
{variableName}格式。- HTTP 方法对应:
- 查询操作 →
HttpMethod.GET- 更新操作 →
HttpMethod.PUT- 创建操作 →
HttpMethod.POST- 删除操作 →
HttpMethod.DELETE- API 类需要实现
__example__方法以便生成 API 文档,并确保生成对应的 Groovy API Template 与 API Markdown 文件。
2. 命名与格式规范
类名:
- 使用 UpperCamelCase 风格。
- 特殊情况:
- VO/AO/EO 类型类除外。
- 抽象类采用
Abstract或Base前缀/后缀。- 异常类应以
Exception结尾。- 测试类需要以
Test或Case结尾。方法名、参数名、成员变量和局部变量:
- 使用 lowerCamelCase 风格。
常量命名:
- 全部大写,使用下划线分隔单词。
- 要求表达清楚,避免使用含糊或不准确的名称。
包名:
- 统一使用小写,使用点分隔符,每个部分应是一个具有自然语义的英文单词(参考 Spring 框架的结构)。
命名细节:
- 避免在父子类或同一代码块中出现相同名字的成员或局部变量,防止混淆。
- 命名缩写:
- 不允许使用不必要的缩写,如:
AbsSchedulerJob、condi、Fu等。应使用完整单词提升可读性。
3. 编写自解释代码
意图表达:
- 避免使用布尔型参数造成含义不明确。例如:
- 对于
stopAgent(boolean ignoreError),建议拆分为不同函数(如stopAgentIgnoreError()),或使用枚举表达操作类型。- 命名应尽量用完整的单词组合表达意图,并在名称中体现数据类型或用途(例如在常量与变量名称中,将类型词放在末尾)。
- 避免使用魔法值(Magic Value):
直接使用未经定义的数值或字符串(如 if (status == 5))应替换为枚举或常量。
示例:
// 错误示例:魔法值
if (user.getStatus() == 5) { ... }
// 正确示例:常量或枚举
public static final int STATUS_ACTIVE = 5;
if (user.getStatus() == STATUS_ACTIVE) { ... }
// 或使用枚举
enum UserStatus { ACTIVE, INACTIVE }
注释:
- 代码应尽量做到自解释,对少于两行的说明可以直接写在代码中。
- 对于较长的注释,需要仔细校对并随代码更新,确保内容正确。
- 接口方法不应有多余的修饰符(例如
public),且必须配有有效的 Javadoc 注释。
4. 流程控制和结构优化
if...else 的使用:
- 应尽量减少 if...else 结构的使用,建议:
- 限制嵌套层级最多为两层,且内层不应再出现
else分支。- 尽早返回(Early Return),将条件判断中的处理逻辑提前结束或抽成独立方法。
- 使用 Java Stream 或 Lambda 表达式代替冗长的循环与条件判断。
条件判断:
- if 条件表达不宜过长或过于复杂,必要时可以将条件抽成 boolean 变量描述。
代码块长度:
...
Files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
🧠 Learnings (9)
📚 Learning: 2025-08-03T04:10:21.683Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: ZStack 数据库升级脚本 V4.10.16__schema.sql 中的 UPGRADE_VM_METADATA_TABLES_DIRECT 存储过程使用直接 RENAME TABLE 操作,不具备幂等性。需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。
Applied to files:
conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-03T04:10:21.683Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: 在 ZStack 数据库升级脚本中,直接使用 RENAME TABLE 不能保证幂等性。应该通过 information_schema.tables 检查表的存在性,只在源表存在且目标表不存在时才执行重命名操作,以确保升级脚本可以安全地重复执行。
Applied to files:
conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-03T03:42:34.349Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2293
File: conf/db/upgrade/V4.10.16__schema.sql:69-82
Timestamp: 2025-08-03T03:42:34.349Z
Learning: 在 ZStack 项目中进行数据库 DDL 升级时,由于 MySQL/MariaDB 中的 RENAME TABLE、ALTER TABLE 等 DDL 语句会隐式 COMMIT,因此应使用临时表 + 数据复制的方式来确保升级可逆,而不是依赖事务的 ROLLBACK 机制。
Applied to files:
conf/db/upgrade/V5.5.6__schema.sql
📚 Learning: 2025-08-25T03:52:37.301Z
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2504
File: storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java:775-819
Timestamp: 2025-08-25T03:52:37.301Z
Learning: 在ZStack项目的VolumeSnapshotTreeBase类中,当通过dbf.findByUuid()方法获取VolumeVO对象时,需要进行null检查,因为该方法可能在找不到对应记录时返回null,直接使用可能导致NullPointerException。
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:56:19.585Z
Learnt from: zstack-robot-2
Repo: MatheMatrix/zstack PR: 2435
File: storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java:47-47
Timestamp: 2025-08-14T06:56:19.585Z
Learning: 在VolumeSnapshotGroupBase.java中,VmInstanceResourceMetadataManager的注入和SKIP_RESOURCE_ROLLBACK标记虽然在当前版本中未被使用,但这些导入在大型重构PR中是有意为之的,用于保持代码一致性或为后续功能做准备。
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: 在ZStack项目中,同一包内的类(如VolumeSnapshotGroupVO)不需要显式导入。同时,uuid()辅助方法在API消息类的__example__()方法中可以直接使用,无需静态导入,这是通过项目的构建系统或框架配置实现的。
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-14T06:48:00.549Z
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2435
File: header/src/main/java/org/zstack/header/storage/snapshot/group/APICheckMemorySnapshotGroupConflictMsg.java:3-7
Timestamp: 2025-08-14T06:48:00.549Z
Learning: In the ZStack project, API message classes may not require explicit imports for VolumeSnapshotGroupVO references in APIParam annotations or for the uuid() helper method used in __example__() methods, even when these classes/methods are defined in other packages. The build system or framework appears to handle these references automatically.
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-07-22T02:30:46.123Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2311
File: plugin/sdnController/src/main/java/org/zstack/sdnController/hardwareVxlan/KVMRealizeHardwareVxlanNetworkBackend.java:52-53
Timestamp: 2025-07-22T02:30:46.123Z
Learning: In ZStack, backend realization code (such as KVMRealizeHardwareVxlanNetworkBackend) can safely assume the existence of the VO for the resource being operated on (e.g., HardwareL2VxlanNetworkVO for a given L2NetworkInventory), due to system lifecycle management and strong foreign key constraints in the database schema. Explicit null checks after findByUuid are not required in these contexts.
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
📚 Learning: 2025-08-12T05:52:18.323Z
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2418
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:3873-3885
Timestamp: 2025-08-12T05:52:18.323Z
Learning: 在 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java 的 setNoIpAllocationStaticIp() 流程中,静态 IP 标签删除已统一在 change-ip-in-database 步骤对 voRemoveSet 循环处理:对收集到的旧 IPv4/IPv6 UsedIpVO 均调用 ipOperator.deleteStaticIpByVmUuidAndL3Uuid(vmUuid, l3Uuid, IPv6NetworkUtils.ipv6AddessToTagValue(ip)) 清理对应的 static-ip 标签。因此无需在分支里分别删除 IPv4/IPv6 的标签。
Applied to files:
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
🧬 Code graph analysis (1)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (2)
header/src/main/java/org/zstack/header/vm/VmInstanceSpec.java (2)
VmInstanceSpec(25-911)ImageSpec(123-164)core/src/main/java/org/zstack/core/db/Q.java (1)
Q(16-193)
🔍 Remote MCP AtlassianMCP
Relevant Jira context: ZSTAC-73396
-
Key: ZSTAC-73396 — Summary: "弹性裸金属支持使用云盘作为系统盘且支持HA" (support elastic bare-metal using cloud disk as root + HA). Status: Closed. Assignee: 沈今 (jin.shen).
-
Description highlights business impact (customer acceptance blocked, ~¥1M+ total, ~¥400k ZStack revenue) and blockers: IO errors when using cloud disk as root (TIC-3691), gateway single-point-of-failure (no HA group), and single-link network risk. Solution intent: enable high-speed cloud disk as bm root and HA; development tracked and planned (reference Confluence page linked in Jira).
-
Comments record related merge requests already merged for earlier branches (ZSTAC-79783 → 4.8.30-zhongxinbao) implementing thirdparty_ceph support across repos (premium, zstack-utility, zstack). This PR (targeting 5.5.6) appears to be the backport/sync of that work.
🔇 Additional comments (5)
conf/db/upgrade/V5.5.6__schema.sql (4)
47-52: isPrimaryProvisionNic 设置逻辑需确认业务合理性将所有现有
BareMetal2InstanceProvisionNicVO记录的isPrimaryProvisionNic设为TRUE。请确认这是预期行为 —— 如果一个 instance 可能有多个 provision NIC,将全部设为 primary 可能不符合业务语义(通常只有一个应为 primary)。
61-67: BareMetal2ChassisNicVO 的更新逻辑正确此处仅将
isProvisionNic = TRUE的记录设置isPrimaryProvisionNic = TRUE,逻辑合理。与上面的全量更新形成对比,这里的条件过滤更为精确。
1-74: 整体结构良好:使用存储过程处理 NOT NULL 列,符合编码规范脚本使用存储过程处理历史数据,通过 INFORMATION_SCHEMA 进行存在性检查保证幂等性,符合 ZStack 数据库升级脚本的最佳实践。
17-38: 两个辅助存储过程已具备幂等性
ADD_COLUMN和ADD_CONSTRAINT存储过程都通过INFORMATION_SCHEMA检查实现了幂等性:
ADD_COLUMN:通过检查INFORMATION_SCHEMA.COLUMNS确认列存在性,不存在时才执行ALTER TABLE ADD COLUMNADD_CONSTRAINT:通过检查INFORMATION_SCHEMA.TABLE_CONSTRAINTS确认约束存在性,不存在时才执行ALTER TABLE ADD CONSTRAINT脚本可以安全地重复执行。
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (1)
57-63: 移除未使用的导入SystemTagVO和SystemTagVO_搜索结果表明这两个导入仅出现在 import 语句中,代码里没有实际使用它们。与
VmInstanceVO/VmInstanceVO_有在后续代码中被使用不同,这两个导入应该被删除,以避免维护混淆。⛔ Skipped due to learnings
Learnt from: zstack-robot-2 Repo: MatheMatrix/zstack PR: 2435 File: storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java:47-47 Timestamp: 2025-08-14T06:56:19.585Z Learning: 在VolumeSnapshotGroupBase.java中,VmInstanceResourceMetadataManager的注入和SKIP_RESOURCE_ROLLBACK标记虽然在当前版本中未被使用,但这些导入在大型重构PR中是有意为之的,用于保持代码一致性或为后续功能做准备。Learnt from: MatheMatrix Repo: MatheMatrix/zstack PR: 2418 File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:3873-3885 Timestamp: 2025-08-12T05:52:18.323Z Learning: 在 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java 的 setNoIpAllocationStaticIp() 流程中,静态 IP 标签删除已统一在 change-ip-in-database 步骤对 voRemoveSet 循环处理:对收集到的旧 IPv4/IPv6 UsedIpVO 均调用 ipOperator.deleteStaticIpByVmUuidAndL3Uuid(vmUuid, l3Uuid, IPv6NetworkUtils.ipv6AddessToTagValue(ip)) 清理对应的 static-ip 标签。因此无需在分支里分别删除 IPv4/IPv6 的标签。Learnt from: ZStack-Robot Repo: MatheMatrix/zstack PR: 2504 File: storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java:775-819 Timestamp: 2025-08-25T03:52:37.301Z Learning: 在ZStack项目的VolumeSnapshotTreeBase类中,当通过dbf.findByUuid()方法获取VolumeVO对象时,需要进行null检查,因为该方法可能在找不到对应记录时返回null,直接使用可能导致NullPointerException。Learnt from: ZStack-Robot Repo: MatheMatrix/zstack PR: 2418 File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:3856-3865 Timestamp: 2025-08-12T05:39:14.846Z Learning: 在 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java 的 setNoIpAllocationStaticIp 流程中,从数据库加载的 VmNicVO 实例其 getUsedIps() 为空时也不会为 null(返回空集合)。因此无需对 getUsedIps() 再做 Optional/空列表归一化的空指针保护;若找不到对应 NIC,使用 orElse(new VmNicVO()) 的约定允许后续逻辑通过 NPE 暴露问题,与项目既有约定一致。Learnt from: MatheMatrix Repo: MatheMatrix/zstack PR: 2763 File: sdk/src/main/java/org/zstack/sdk/UpdateHostKernelInterfaceAction.java:31-31 Timestamp: 2025-10-20T11:27:25.928Z Learning: UpdateHostKernelInterface API (`sdk/src/main/java/org/zstack/sdk/UpdateHostKernelInterfaceAction.java`) 在 PR #2763 时尚未被实际使用,因此对其参数约束的变更(如 name 字段从必需改为可选)不会造成破坏性影响。Learnt from: MatheMatrix Repo: MatheMatrix/zstack PR: 2311 File: plugin/sdnController/src/main/java/org/zstack/sdnController/hardwareVxlan/KVMRealizeHardwareVxlanNetworkBackend.java:52-53 Timestamp: 2025-07-22T02:30:46.123Z Learning: In ZStack, backend realization code (such as KVMRealizeHardwareVxlanNetworkBackend) can safely assume the existence of the VO for the resource being operated on (e.g., HardwareL2VxlanNetworkVO for a given L2NetworkInventory), due to system lifecycle management and strong foreign key constraints in the database schema. Explicit null checks after findByUuid are not required in these contexts.
| ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO` | ||
| DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DROP FOREIGN KEY 缺少存在性检查,可能导致执行失败
当外键 fkBareMetal2InstanceProvisionNicVOInstanceVO 不存在时(例如新数据库或脚本重复执行),此语句会抛出异常。
建议在执行前检查外键是否存在:
🔧 建议修复:添加外键存在性检查
IF instanceUuid_exists = 0 THEN
CALL ADD_COLUMN('BareMetal2InstanceProvisionNicVO', 'instanceUuid', 'VARCHAR(32)', FALSE, '');
UPDATE `zstack`.`BareMetal2InstanceProvisionNicVO`
SET `instanceUuid` = `uuid`;
+ -- Check if foreign key exists before dropping
+ IF EXISTS (
+ SELECT 1 FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
+ WHERE TABLE_SCHEMA = 'zstack'
+ AND TABLE_NAME = 'BareMetal2InstanceProvisionNicVO'
+ AND CONSTRAINT_NAME = 'fkBareMetal2InstanceProvisionNicVOInstanceVO'
+ AND CONSTRAINT_TYPE = 'FOREIGN KEY'
+ ) THEN
ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO`
DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`;
+ END IF;
CALL ADD_CONSTRAINT('BareMetal2InstanceProvisionNicVO', 'fkBareMetal2InstanceProvisionNicVOInstanceVO',
'instanceUuid', 'BareMetal2InstanceVO', 'uuid', 'CASCADE');
UPDATE `zstack`.`BareMetal2InstanceProvisionNicVO`
SET `uuid` = REPLACE(UUID(), '-', '');
END IF;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO` | |
| DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`; | |
| -- Check if foreign key exists before dropping | |
| IF EXISTS ( | |
| SELECT 1 FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS | |
| WHERE TABLE_SCHEMA = 'zstack' | |
| AND TABLE_NAME = 'BareMetal2InstanceProvisionNicVO' | |
| AND CONSTRAINT_NAME = 'fkBareMetal2InstanceProvisionNicVOInstanceVO' | |
| AND CONSTRAINT_TYPE = 'FOREIGN KEY' | |
| ) THEN | |
| ALTER TABLE `zstack`.`BareMetal2InstanceProvisionNicVO` | |
| DROP FOREIGN KEY `fkBareMetal2InstanceProvisionNicVOInstanceVO`; | |
| END IF; |
🤖 Prompt for AI Agents
In @conf/db/upgrade/V5.5.6__schema.sql around lines 30 - 31, ALTER TABLE DROP
FOREIGN KEY is executed unconditionally and will fail if
fkBareMetal2InstanceProvisionNicVOInstanceVO doesn't exist; modify
V5.5.6__schema.sql to first check information_schema for a constraint named
fkBareMetal2InstanceProvisionNicVOInstanceVO on table
BareMetal2InstanceProvisionNicVO in schema zstack and only run ALTER TABLE
`zstack`.`BareMetal2InstanceProvisionNicVO` DROP FOREIGN KEY
`fkBareMetal2InstanceProvisionNicVOInstanceVO` when that check returns true,
ensuring the migration is idempotent and safe to re-run.
| List<CephPrimaryStorageCheckInstanceTypeExtensionPoint> exts = pluginRgty.getExtensionList(CephPrimaryStorageCheckInstanceTypeExtensionPoint.class); | ||
| for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) { | ||
| Boolean result = ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid()); | ||
| if (!result) { | ||
| break; | ||
| } | ||
| boolean isRootVolume = Q.New(VolumeVO.class) | ||
| .eq(VolumeVO_.uuid, msg.getVolume().getUuid()) | ||
| .eq(VolumeVO_.type, VolumeType.Root) | ||
| .isExists(); | ||
| boolean isBareMetal2Instance = Q.New(VmInstanceVO.class) | ||
| .eq(VmInstanceVO_.uuid, msg.getVolume().getVmInstanceUuid()) | ||
| .eq(VmInstanceVO_.type, "baremetal2") | ||
| .isExists(); | ||
| boolean hasToken = CephSystemTags.THIRDPARTY_PLATFORM.hasTag(msg.getPrimaryStorageUuid()); | ||
| if (isRootVolume && isBareMetal2Instance && hasToken) { | ||
| CephPrimaryStorageVO cephPrimaryStorageVO = dbf.findByUuid(msg.getPrimaryStorageUuid(), CephPrimaryStorageVO.class); | ||
| String monIp = cephPrimaryStorageVO.getMons() | ||
| .stream() | ||
| .filter(v -> v.getStatus() == MonStatus.Connected) | ||
| .map(CephPrimaryStorageMonVO::getHostname) | ||
| .findAny() | ||
| .orElseThrow(() -> new OperationFailureException( | ||
| operr("all ceph mons of primary storage[uuid:%s] are not in Connected state", cephPrimaryStorageVO.getUuid()) | ||
| )); | ||
| cmd.token = CephSystemTags.THIRDPARTY_PLATFORM.getTokenByResourceUuid(msg.getPrimaryStorageUuid(), | ||
| CephSystemTags.THIRDPARTY_PLATFORM_TOKEN); | ||
| cmd.monIp = monIp; | ||
| cmd.tpTimeout = CephGlobalConfig.THIRD_PARTY_SDK_TIMEOUT.value(String.class); | ||
| VolumeVO vo = Q.New(VolumeVO.class) | ||
| .eq(VolumeVO_.uuid, msg.getVolume().getUuid()).find(); | ||
| ext.convertToBlockVolume(vo); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
第三方 clone 扩展点循环的控制流有明显 bug:break 会阻断后续扩展点;且 Boolean 可能 NPE。
if (!result) { break; }:一旦第一个扩展返回 false,就不会再检查其他扩展(通常应continue)。Boolean result直接!result:返回 null 会 NPE。- 另外,
isRootVolume/isBareMetal2Instance/hasToken每次循环都会重复 DB 查询,建议提到循环外计算一次。
可行修复(continue + null-safe + 提前计算条件)
- List<CephPrimaryStorageCheckInstanceTypeExtensionPoint> exts = pluginRgty.getExtensionList(CephPrimaryStorageCheckInstanceTypeExtensionPoint.class);
- for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
- Boolean result = ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid());
- if (!result) {
- break;
- }
- boolean isRootVolume = Q.New(VolumeVO.class)
- .eq(VolumeVO_.uuid, msg.getVolume().getUuid())
- .eq(VolumeVO_.type, VolumeType.Root)
- .isExists();
- boolean isBareMetal2Instance = Q.New(VmInstanceVO.class)
- .eq(VmInstanceVO_.uuid, msg.getVolume().getVmInstanceUuid())
- .eq(VmInstanceVO_.type, "baremetal2")
- .isExists();
- boolean hasToken = CephSystemTags.THIRDPARTY_PLATFORM.hasTag(msg.getPrimaryStorageUuid());
- if (isRootVolume && isBareMetal2Instance && hasToken) {
+ List<CephPrimaryStorageCheckInstanceTypeExtensionPoint> exts =
+ pluginRgty.getExtensionList(CephPrimaryStorageCheckInstanceTypeExtensionPoint.class);
+
+ boolean isRootVolume = Q.New(VolumeVO.class)
+ .eq(VolumeVO_.uuid, msg.getVolume().getUuid())
+ .eq(VolumeVO_.type, VolumeType.Root)
+ .isExists();
+ boolean isBareMetal2Instance = Q.New(VmInstanceVO.class)
+ .eq(VmInstanceVO_.uuid, msg.getVolume().getVmInstanceUuid())
+ .eq(VmInstanceVO_.type, "baremetal2")
+ .isExists();
+ boolean hasToken = CephSystemTags.THIRDPARTY_PLATFORM.hasTag(msg.getPrimaryStorageUuid());
+
+ for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
+ if (!Boolean.TRUE.equals(ext.isSupportCloneByThirdParty(msg.getVolume().getVmInstanceUuid()))) {
+ continue;
+ }
+ if (isRootVolume && isBareMetal2Instance && hasToken) {
CephPrimaryStorageVO cephPrimaryStorageVO = dbf.findByUuid(msg.getPrimaryStorageUuid(), CephPrimaryStorageVO.class);
String monIp = cephPrimaryStorageVO.getMons()
.stream()
.filter(v -> v.getStatus() == MonStatus.Connected)
.map(CephPrimaryStorageMonVO::getHostname)
.findAny()
.orElseThrow(() -> new OperationFailureException(
operr("all ceph mons of primary storage[uuid:%s] are not in Connected state", cephPrimaryStorageVO.getUuid())
));
cmd.token = CephSystemTags.THIRDPARTY_PLATFORM.getTokenByResourceUuid(msg.getPrimaryStorageUuid(),
CephSystemTags.THIRDPARTY_PLATFORM_TOKEN);
cmd.monIp = monIp;
cmd.tpTimeout = CephGlobalConfig.THIRD_PARTY_SDK_TIMEOUT.value(String.class);
VolumeVO vo = Q.New(VolumeVO.class)
.eq(VolumeVO_.uuid, msg.getVolume().getUuid()).find();
ext.convertToBlockVolume(vo);
}
}另外强烈建议把 "baremetal2" 抽成常量/枚举,避免魔法值扩散。
| if (isRootVolume && isBareMetal2Instance && hasToken) { | ||
| CephPrimaryStorageVO cephPrimaryStorageVO = dbf.findByUuid(msg.getPrimaryStorageUuid(), CephPrimaryStorageVO.class); | ||
| String monIp = cephPrimaryStorageVO.getMons() | ||
| .stream() | ||
| .filter(v -> v.getStatus() == MonStatus.Connected) | ||
| .map(CephPrimaryStorageMonVO::getHostname) | ||
| .findAny() | ||
| .orElseThrow(() -> new OperationFailureException( | ||
| operr("all ceph mons of primary storage[uuid:%s] are not in Connected state", cephPrimaryStorageVO.getUuid()) | ||
| )); | ||
| cmd.token = CephSystemTags.THIRDPARTY_PLATFORM.getTokenByResourceUuid(msg.getPrimaryStorageUuid(), | ||
| CephSystemTags.THIRDPARTY_PLATFORM_TOKEN); | ||
| cmd.monIp = monIp; | ||
| cmd.tpTimeout = CephGlobalConfig.THIRD_PARTY_SDK_TIMEOUT.value(String.class); | ||
| VolumeVO vo = Q.New(VolumeVO.class) | ||
| .eq(VolumeVO_.uuid, msg.getVolume().getUuid()).find(); | ||
| ext.convertToBlockVolume(vo); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convertToBlockVolume() 放在 clone 之前且无回滚,失败路径可能把 Volume 状态改脏。
这段在 NoRollbackFlow 内执行:一旦 httpCall(CLONE_PATH...) 失败,当前 flow 会 fail,但 DB 侧转换可能已发生;后续重试/清理可能遇到不一致状态。至少需要:
- 要么把转换放到 clone 成功后;
- 要么把该 flow 改成可 rollback,并在 rollback 里恢复转换(或让扩展点提供 revert)。
| if (isRootVolume && isBareMetal2Instance && hasToken) { | ||
| CephPrimaryStorageVO cephPrimaryStorageVO = dbf.findByUuid(msg.getPrimaryStorageUuid(), CephPrimaryStorageVO.class); | ||
| String monIp = cephPrimaryStorageVO.getMons() | ||
| .stream() | ||
| .filter(v -> v.getStatus() == MonStatus.Connected) | ||
| .map(CephPrimaryStorageMonVO::getHostname) | ||
| .findAny() | ||
| .orElseThrow(() -> new OperationFailureException( | ||
| operr("all ceph mons of primary storage[uuid:%s] are not in Connected state", cephPrimaryStorageVO.getUuid()) | ||
| )); | ||
| cmd.token = CephSystemTags.THIRDPARTY_PLATFORM.getTokenByResourceUuid(msg.getPrimaryStorageUuid(), | ||
| CephSystemTags.THIRDPARTY_PLATFORM_TOKEN); | ||
| cmd.monIp = monIp; | ||
| cmd.tpTimeout = CephGlobalConfig.THIRD_PARTY_SDK_TIMEOUT.value(String.class); | ||
| VolumeVO vo = Q.New(VolumeVO.class) | ||
| .eq(VolumeVO_.uuid, msg.getVolume().getUuid()).find(); | ||
| ext.convertToBlockVolume(vo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd.monIp 在这里赋值很可能会被 HttpCaller.doCall() 覆盖,导致第三方 clone 依赖的 monIp 不可控。
HttpCaller.doCall() 每次发送前都会 cmd.monIp = base.getSelf().getHostname();,因此你在 clone flow 里提前写入的 cmd.monIp 可能根本不会到 agent(或变成随机挑选的 mon)。如果第三方 SDK 需要“指定的 monIp”,这会直接造成功能不稳定。
修复方向(示例):HttpCaller 仅在 monIp 为空时填充
- cmd.monUuid = base.getSelf().getUuid();
- cmd.monIp = base.getSelf().getHostname();
+ cmd.monUuid = base.getSelf().getUuid();
+ if (cmd.monIp == null) {
+ cmd.monIp = base.getSelf().getHostname();
+ }如果 monIp 语义需要区分“ceph mon 用于 httpCall 的地址”和“第三方平台 SDK 需要的 monIp”,建议新增独立字段(例如 thirdPartyMonIp)而不是复用 monIp。
Also applies to: 3624-3636
🤖 Prompt for AI Agents
In
@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
around lines 2675 - 2691, The code sets cmd.monIp in CephPrimaryStorageBase but
HttpCaller.doCall() unconditionally overwrites cmd.monIp, so third-party SDKs
cannot rely on the specified mon; fix by either making HttpCaller.doCall()
populate cmd.monIp only when it is null (check HttpCaller.doCall() and change
assignment to if (cmd.monIp == null) cmd.monIp = base.getSelf().getHostname();)
or introduce a distinct field (e.g., cmd.thirdPartyMonIp) set in
CephPrimaryStorageBase (where cmd.monIp is currently assigned) and update all
call sites and the agent/HttpCaller to use thirdPartyMonIp for SDK usage while
preserving monIp for HTTP routing; update references in CephPrimaryStorageBase
(cmd.monIp assignment near ext.convertToBlockVolume(...) and similar blocks at
the other noted location) and in HttpCaller.doCall()/agent handling to use the
new conditional or new field consistently.
| List<CephPrimaryStorageCheckInstanceTypeExtensionPoint> exts = pluginRgty.getExtensionList(CephPrimaryStorageCheckInstanceTypeExtensionPoint.class); | ||
| for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) { | ||
| if (!ext.isBlockVolume(msg.getVolume().getUuid())) { | ||
| break; | ||
| } | ||
| volumePath = ret.installPath == null ? volumePath : makeVolumeInstallPathByTargetPool(ret.installPath, targetCephPoolName); | ||
| ext.populateBlockVolumeDetails(msg.getVolume().getUuid(), ret.volumeId, ret.volumeStatus); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
成功回调里的扩展点循环同样不应 break;且 installPath 重写逻辑可能拼错路径。
if (!ext.isBlockVolume(...)) { break; }:同样会阻断后续扩展;更合理是continue(除非你能保证扩展点只有一个)。volumePath = ... makeVolumeInstallPathByTargetPool(ret.installPath, targetCephPoolName):如果ret.installPath已经是完整的ceph://...或pool/uuid,这里会生成错误路径。至少应先判断前缀并保底。
可行修复(continue + installPath 前缀判断)
List<CephPrimaryStorageCheckInstanceTypeExtensionPoint> exts = pluginRgty.getExtensionList(CephPrimaryStorageCheckInstanceTypeExtensionPoint.class);
for (CephPrimaryStorageCheckInstanceTypeExtensionPoint ext : exts) {
if (!ext.isBlockVolume(msg.getVolume().getUuid())) {
- break;
+ continue;
}
- volumePath = ret.installPath == null ? volumePath : makeVolumeInstallPathByTargetPool(ret.installPath, targetCephPoolName);
+ if (ret.installPath != null) {
+ volumePath = ret.installPath.startsWith("ceph://")
+ ? ret.installPath
+ : makeVolumeInstallPathByTargetPool(ret.installPath, targetCephPoolName);
+ }
ext.populateBlockVolumeDetails(msg.getVolume().getUuid(), ret.volumeId, ret.volumeStatus);
}🤖 Prompt for AI Agents
In
@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
around lines 2704 - 2711, In the success-callback loop in CephPrimaryStorageBase
where you iterate CephPrimaryStorageCheckInstanceTypeExtensionPoint, replace the
current `if (!ext.isBlockVolume(...)) { break; }` with `continue` so one
non-matching extension doesn't stop others, and change the installPath rewrite
to only call makeVolumeInstallPathByTargetPool when ret.installPath is non-null
and not already a Ceph install path for the target pool (e.g., check
ret.installPath.startsWith("ceph://") or startsWith(targetCephPoolName + "/") or
a similar prefix/pattern); otherwise leave volumePath alone—use these checks
around the assignment that currently does `volumePath = ret.installPath == null
? volumePath : makeVolumeInstallPathByTargetPool(ret.installPath,
targetCephPoolName)`.
support thirdparty_ceph to bm root volume
Resolves: ZSTAC-73396
Change-Id: I6f6f616777677a6f74696c6c736c737a626f7863
sync from gitlab !9008