-
Notifications
You must be signed in to change notification settings - Fork 23
initial update #879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
engineeredcurlz
wants to merge
66
commits into
main
Choose a base branch
from
test-refactor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
initial update #879
Changes from all commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
984bfb2
initial update
engineeredcurlz aeab336
Merge branch 'main' into test-refactor
lokesh-keyan 032f59b
wip: add create_deployment function to crud
lokesh-keyan bb4bca8
add import for handle_worload_operation function
47caf3c
add test for success
c435e36
change operation name
11be2fc
update operation name in test
5e73464
add test for failure
8530049
add exception test
b724860
Merge branch 'main' into test-refactor
engineeredcurlz 4e7a73b
Linting error: removed elif and else
69ce1e1
Merge branch 'test-refactor' of https://github.com/Azure/telescope in…
7a8dad4
fixed the spacing
5e52b71
removed extra spaces
364c264
Add deployment_name for consistency and to reference later
0bc8275
verify deployment using wait condition
6604ac0
Add logging for maniest and to wait for deployment - debug
3e6beea
add logger for deployment success
ae99b54
verify pods are available in deployment
6623cb2
add failure count
8916a99
add logger to verify deployment
a4273ac
add unit test for create_deployment method
bf6143a
ran lint
712939e
Add test for deployment partial sucess
engineeredcurlz b6f248c
Add test for multiple deployments
engineeredcurlz e0d8037
Add test for progressive scaling failure
engineeredcurlz e6feced
Add test in node_pool_crud for returns false early exit
engineeredcurlz e3d142d
Add test in node_pool_crud for scale up fails but continues to scale …
engineeredcurlz 769573e
Add test for node_pool_crud for scale down fails operation continues
engineeredcurlz ce8197e
Add test in node_pool_crud for deployment partial success
engineeredcurlz be9af74
pipeline test
engineeredcurlz af59e12
linting
engineeredcurlz ba865d5
yaml lint
engineeredcurlz 1a348f9
add python security dependency
engineeredcurlz 193af95
fix dependency
engineeredcurlz d433f19
Merge branch 'main' into test-refactor
engineeredcurlz 7598c61
update
engineeredcurlz a006d4c
added matrix variables to pipeline
engineeredcurlz d1f5ebb
testing: set GPU_NODE_POOL to empty string
engineeredcurlz 2ca536b
update vm size
engineeredcurlz 8a96356
testing change vm size with available quota
engineeredcurlz a65546e
update node count + vm size
engineeredcurlz 0c62bca
update: topology selection
engineeredcurlz 2279753
add deployment step after scale-up operation
engineeredcurlz 1028864
wire deployment parameters through k8s-crud-gpu topology
engineeredcurlz 3ca87d7
correct deployment command routing and kwargs in handle_workload_oper…
engineeredcurlz 8ceb256
correct topology name and add deployment matrix variables
engineeredcurlz b8f876e
update handle_workload_operations tests to match deployment command
engineeredcurlz 767f0cc
fix yamllint and pylint warnings
engineeredcurlz d9ac29f
add correct indentation
engineeredcurlz e6ccf1f
iterate multi-doc YAML generator when applying deployment manifests
engineeredcurlz eae0409
refactor: seperate deploy workloads into its own pipelinee step
engineeredcurlz 3a6a9b0
fix: execute k8s workload operations displayname
engineeredcurlz 7a79129
fix: prevent infinite loop in azure node pool deployment tests
engineeredcurlz d0b6578
Merge branch 'main' into test-refactor
engineeredcurlz 3fa6295
fix: await Azure LRO poller to prevent scale race condition
engineeredcurlz 6b7a448
fix: replace hardcoded timeout with self.step_timeout in create_deplo…
engineeredcurlz 8e3445c
refactor: convert f-string logger calls to %-style in create_deployment
engineeredcurlz 2aac24d
feat: make label_selector derive from parameter
engineeredcurlz 7522f1a
feat: remove hardcoding add namespace parameter
engineeredcurlz 65fdd00
fix: remove --deployment-name CLI
engineeredcurlz b0be1b1
fix: use hyphen for --number-of-deployments
engineeredcurlz c5d01be
fix: return error on unknown workload command
engineeredcurlz 946dea9
revert: restore original docstring line wrapping
engineeredcurlz e349428
Merge branch 'main' into test-refactor
engineeredcurlz 1ed3985
Merge branch 'main' into test-refactor
engineeredcurlz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -146,6 +146,33 @@ def handle_node_pool_operation(node_pool_crud, args): | |
| logger.error(f"Error during '{command}' operation: {str(e)}") | ||
| return 1 | ||
|
|
||
| def handle_workload_operations(node_pool_crud, args): | ||
| """Handle workload operations (deployment, statefulset, jobs) based on the command""" | ||
| command = args.command | ||
| result = None | ||
|
|
||
| try: | ||
| if command == "deployment": | ||
| # Prepare deploy arguments | ||
| deploy_kwargs = { | ||
| "node_pool_name": args.node_pool_name, | ||
| "replicas": args.replicas, | ||
| "manifest_dir": args.manifest_dir, | ||
| "number_of_deployments": args.number_of_deployments | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add else here else:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. have added this to stop false successes and return logged errors |
||
|
|
||
| result = node_pool_crud.create_deployment(**deploy_kwargs) | ||
| else: | ||
| logger.error("Unknown workload command: '%s'", command) | ||
| return 1 | ||
| # Check if the operation was successful | ||
| if result is False: | ||
| logger.error(f"Operation '{command}' failed") | ||
| return 1 | ||
| return 0 | ||
| except Exception as e: | ||
| logger.error(f"Error during '{command}' operation: {str(e)}") | ||
| return 1 | ||
|
|
||
| def handle_node_pool_all(node_pool_crud, args): | ||
| """Handle the all-in-one node pool operation command (create, scale up, scale down, delete)""" | ||
|
|
@@ -320,6 +347,31 @@ def main(): | |
| ) | ||
| all_parser.set_defaults(func=handle_node_pool_operation) | ||
|
|
||
| # Deployment command - add after the "all" command parser | ||
| deployment_parser = subparsers.add_parser( | ||
| "deployment", parents=[common_parser], help="create deployments" | ||
| ) | ||
| deployment_parser.add_argument("--node-pool-name", required=True, help="Node pool name") | ||
| deployment_parser.add_argument( | ||
| "--number-of-deployments", | ||
| type=int, | ||
| default=1, | ||
| help="Number of deployments" | ||
| ) | ||
| deployment_parser.add_argument( | ||
| "--replicas", | ||
| type=int, | ||
| default=10, | ||
| help="Number of deployment replicas" | ||
| ) | ||
| deployment_parser.add_argument( | ||
| "--manifest-dir", | ||
| required=True, | ||
| help="Directory containing Kubernetes manifest files for the deployment" | ||
| ) | ||
|
|
||
| deployment_parser.set_defaults(func=handle_workload_operations) | ||
|
|
||
| # Arguments provided, run node pool operations and collect benchmark results | ||
| try: | ||
| args = parser.parse_args() | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| metadata: | ||
| name: myapp-{{NODE_POOL_NAME}}-{{INDEX}} | ||
| labels: | ||
| app: {{LABEL_VALUE}} | ||
| spec: | ||
| template: | ||
| metadata: | ||
| name: | ||
| labels: | ||
| app: {{LABEL_VALUE}} | ||
| spec: | ||
| containers: | ||
| - name: {{LABEL_VALUE}} | ||
| image: mcr.microsoft.com/oss/nginx/nginx:1.21.6 | ||
| ports: | ||
| - containerPort: 80 | ||
| replicas: {{DEPLOYMENT_REPLICAS}} | ||
| selector: | ||
| matchLabels: | ||
| app: {{LABEL_VALUE}} | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| name: myapp-{{NODE_POOL_NAME}}-{{INDEX}} | ||
| spec: | ||
| ports: | ||
| - port: 80 | ||
| name: myapp | ||
| clusterIP: None | ||
| selector: | ||
| app: {{LABEL_VALUE}} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,4 +12,4 @@ coverage==7.6.12 | |
| semver==3.0.4 | ||
| requests==2.32.4 | ||
| pyyaml==6.0.2 | ||
| pyOpenSSL==24.0.0 | ||
| pyopenssl>=24.0.0 | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we add poller here ? We do not need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while running the pipeline I was getting failure error "OperationNotAllowed: Operation is not allowed because there's an in progress scale node pool operation". Operation was trying to move forward, while prior operation was still in progress.
begin_create_or_update was already returning a poller but it was being discarded. Adding poller.result (line 473) enforces a wait and check so that the prior operation can finish before moving on to the next. thereby fixing the failure I was receiving.