[integ-tests-framework] Make capacity reservations for all instance types#7461
Open
hanwen-cluster wants to merge 1 commit into
Open
[integ-tests-framework] Make capacity reservations for all instance types#7461hanwen-cluster wants to merge 1 commit into
hanwen-cluster wants to merge 1 commit into
Conversation
…ypes 1. With aws#7440, we started to make capacity reservations for {"c5.xlarge", "m6g.xlarge", "m6i.xlarge"}, and use other similar instance types if a capacity reservation fails to creat. This commit expands the logic to all instance types. 1.1. With instance types <= .xlarge, we make duplicate capacity reservations because multiple tests in parallel could use the same instance types, therefore need multiple capacity reservations. With instance types >.xlarge, we make only one capacity reservation because tests with larger instance types usually make capacity reservations early in the test definition (e.g. test_efa in commercial makes capacity reservation in `develop.yaml`), therefore this second layer of capacity reservation shouldn't make duplicate capacity reservations. 1.2. With instance types supporting EFA, create the capacity reservation in a placement group. With instance types not supporting EFA, create the capacity reservation without a placement group. 2. With this commit, resolve_instance_with_capacity allows specifying alternative_instance_types. Prior to this commit alternative_instance_types was always calculated with `get_similar_instance_types`, which could be too restrictive, so don't give too many alternatives for instance types like `c5n.18xlarge` 3. Improve test_efa in isolated_regions to take a flag to use any efa instances to avoid Insufficient Capacity Error. test_efa in commercial doesn't need this, because it could try out different regions. In isolated regions, the test has to run in a specific region.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of changes
1.1. With instance types <= .xlarge, we make duplicate capacity reservations because multiple tests in parallel could use the same instance types, therefore need multiple capacity reservations. With instance types >.xlarge, we make only one capacity reservation because tests with larger instance types usually make capacity reservations early in the test definition (e.g. test_efa in commercial makes capacity reservation in
develop.yaml), therefore this second layer of capacity reservation shouldn't make duplicate capacity reservations.1.2. With instance types supporting EFA, create the capacity reservation in a placement group. With instance types not supporting EFA, create the capacity reservation without a placement group.
get_similar_instance_types, which could be too restrictive, so don't give too many alternatives for instance types likec5n.18xlargeTests
In the above tests, the test in us-east-1 passed completely. The test in ap-southeast-5 failed some checks in fabtest because it was using g6.8xlarge. This failure is not a regression from this PR, and won't surface in isolated regions because fabtest is not run in isolated regions.
Checklist
developadd the branch name as prefix in the PR title (e.g.[release-3.6]).Please review the guidelines for contributing and Pull Request Instructions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.