diff --git a/assets/img/automated-deployment-pin.png b/assets/img/automated-deployment-pin.png
new file mode 100644
index 0000000000..410ef27ac9
Binary files /dev/null and b/assets/img/automated-deployment-pin.png differ
diff --git a/assets/img/automated-deployment-production-test.png b/assets/img/automated-deployment-production-test.png
new file mode 100644
index 0000000000..23297a0b78
Binary files /dev/null and b/assets/img/automated-deployment-production-test.png differ
diff --git a/assets/img/automated-deployment-restart.png b/assets/img/automated-deployment-restart.png
new file mode 100644
index 0000000000..efd0b998e8
Binary files /dev/null and b/assets/img/automated-deployment-restart.png differ
diff --git a/assets/img/automated-deployment-supersede.png b/assets/img/automated-deployment-supersede.png
new file mode 100644
index 0000000000..f96a0925ac
Binary files /dev/null and b/assets/img/automated-deployment-supersede.png differ
diff --git a/en/operations/automated-deployments.html b/en/operations/automated-deployments.html
index 9d09410017..e305ef799a 100644
--- a/en/operations/automated-deployments.html
+++ b/en/operations/automated-deployments.html
@@ -8,6 +8,9 @@
+
+ See pipeline graph for details on the visual elements. +
Vespa Cloud provides:
+ If the production zones span multiple cloud providers (e.g., both AWS and GCP), + system tests are run separately for each cloud provider, + using test nodes from that provider. + This ensures the application starts and works correctly on each provider's infrastructure + before production deployment. +
Read more about system tests.
@@ -90,6 +100,10 @@+ Like system tests, staging tests are run separately for each cloud provider + when the production zones span multiple providers. +
Read more about staging tests.
@@ -134,7 +148,6 @@The deployment orchestration is flexible. One can configure dependencies between deployments to production zones, @@ -143,6 +156,188 @@
+
++ The deployment pipeline is visualized as a graph in the + Vespa Cloud Console. + Each node represents a step in the pipeline, + and edges show dependencies between steps. + Hover over any node to see details and available actions. +
+ +| Shape | +Step type | +Description | +
|---|---|---|
| + + | +Instance | ++ The application instance. Hover to see target versions, cancel/deploy/pin controls, and block windows. + | +
| + + | +Test | ++ System test, staging test, or production test. + Hover to see run status, versions, and abort/restart actions. + | +
| + + | +Production deployment | ++ A deployment to a production zone. + Hover to see run status, versions, and abort/restart/defer actions. + | +
| + + | +Delay | ++ A configured delay between steps. + | +
| Indicator | +Meaning | +Description | +
|---|---|---|
| + + | +Completed | +The step has completed successfully on the current version. The color corresponds to the deployed version. | +
| + + | +Running | +A deployment or test is currently in progress. Shown as an animated gradient between the source and target version colors. | +
| + + | +Failed | +The last run of this step failed. | +
| + + | +Unknown / initial | +No version has been deployed to this step yet. | +
| + + | +Pending change | +A newer version is queued and waiting to be deployed to this step. | +
| + + | +Paused / deferred | +Deployments to this step are temporarily postponed. | +
| + + | +Application blocked | +Application changes are blocked by a block window. Shown as vertical bars. | +
| + + | +Platform blocked | +Platform upgrades are blocked by a block window. Shown as horizontal bars. | +
+ Each version deployed through the pipeline is assigned a distinct color. + This makes it easy to see at a glance which zones are on the same version + and where a rollout is in progress. + A thumbtack icon on a node indicates that the version is + pinned. +
+On a higher level, instances can also depend on each other in the same way. This makes it easy to configure a deployment process @@ -168,6 +363,97 @@
+ The deployment pipeline deploys one revision at a time through the production zones. + When a revision is being deployed, it must complete deployment to all declared production zones + before the next revision begins its production rollout. + System and staging tests for newer revisions may run in parallel, + but production deployment is serialized. +
++ For example, if build 90 is being deployed to the second of two production zones, + build 91 will not start deploying to the first zone until build 90 has completed in all zones — + even if build 91 has already passed system and staging tests. +
+ ++ To override the currently deploying revision and force a newer build through the pipeline, + hover over the instance node in the pipeline graph and use the TARGET VERSIONS controls. + Select the desired build number from the revision dropdown and click deploy. + This updates the instance's deployment target. + Any running production job for the old revision will be aborted, + and the pipeline will start deploying the new revision from the first production zone. +
+
++ To cancel the currently deploying revision without selecting a new one, + click cancel. + This lets the pipeline pick the next revision automatically. +
+ ++ Pinning locks the pipeline to a specific platform version or application revision, + preventing automatic upgrades. + This is useful for forcing a downgrade, holding a known-good revision during an incident, + or preventing the system from picking up a new platform version. +
++ To pin a version, hover over the instance node in the pipeline graph. + Under TARGET VERSIONS, select the desired version from the dropdown + and click pin. + A reason is required — enter a description and click submit pin. + Platform and revision can be pinned independently. +
+
++ While pinned, no newer platform versions or revisions will be deployed for the pinned dimension. + The dropdown and deploy button are disabled to prevent accidental changes. + To unpin, hover over the instance node and click unpin, + which allows newer versions to move through the pipeline again. +
++ For example, to roll back to a previous revision: +
++ When a production deployment fails repeatedly, an exponential cooldown is applied before + the job is automatically retried. The cooldown period grows with the time between the first + failure and the last completed run. This prevents the system from continuously retrying + a failing deployment. +
++ The cooldown applies only when the target versions match those of the failing runs. + If the target changes (e.g., a new revision is set as the deployment target), + the cooldown resets and the new revision can be deployed immediately. +
++ To manually re-trigger a failed deployment and bypass the cooldown, + hover over the failed zone node in the pipeline graph and click restart. +
+
+
++ To temporarily hold off deployments to a specific production zone, + hover over the zone node in the pipeline graph and click defer. + This postpones deployments for 72 hours. + Click enable to resume scheduling before the deferral period expires. +
+Each new submission is assigned an increasing build number, @@ -207,23 +493,34 @@
{% highlight xml %}
-
- tensor-type-change
-
-{% endhighlight %}
+
+<validation-overrides>
+ <allow until="YYYY-MM-DD"
+ comment="Use fewer dimensions">tensor-type-change</allow>
+</validation-overrides>
+
+Production tests are optional and configured in deployment.xml. - Production tests do not have access to the Vespa endpoints, for security reasons. - Dependent steps in the release pipeline will stop if the tests fail, - but upgraded regions will remain on the version where the test failed. - A production test is hence used to block deployments to subsequent zones - and only makes sense in a multi-zone deployment. + A production test is placed after a deployment zone in the pipeline and acts as a gate: + if it fails, the rollout stops and subsequent zones will not receive the new version. + This is useful in multi-zone deployments where the first zone serves as a canary. + Production tests run against the endpoints of the preceding production region in the pipeline.
- +
Note that one or both of the application revision and platform may be upgraded during the staging test, depending on what upgrade scenario the test is run to verify. - These changes are usually kept separate, but in some cases is necessary to allow them to roll out together. +
+ +
+ When both a platform upgrade and a revision change are pending,
+ the rollout setting in
+ deployment.xml
+ controls how they interact in production zones:
+
simultaneous (default): Revision changes deploy independently of platform upgrades.
+ A revision can catch up to and pass an ongoing platform upgrade.
+ leading: When a revision catches up to a platform upgrade,
+ the two changes fuse and roll out together.
+ separate: The revision waits for the platform upgrade to complete,
+ unless the upgrade is failing.
+
+ With the default simultaneous strategy,
+ a new revision will not be held back by an ongoing platform upgrade.
In <deployment>, or <instance>.
Determines the strategy for upgrading the application, or one of its instances.
-By default, application revision changes and Vespa platform changes are deployed separately.
-The exception is when an upgrade fails; then, the latest application revision is deployed
-together with the upgrade, as these may be necessary to fix the upgrade failure.
+By default, application revision changes deploy independently of platform upgrades,
+and an application revision can catch up to and pass an ongoing platform upgrade.
+See the rollout attribute below to change this behavior.
| rollout | -No, default separate |
+ No, default simultaneous |
|