Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new medium-difficulty coding challenge called "Max 2D Subarray Sum" that requires computing the maximum sum of any contiguous 2D subarray of a fixed window size. The challenge includes starter templates for multiple GPU programming frameworks and comprehensive test cases.
- Implements reference solution using 2D prefix sum approach with PyTorch
- Provides starter templates for Triton, PyTorch, Mojo, and CUDA implementations
- Includes example, functional, and performance test cases with various edge cases
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
challenges/medium/55_max_2d_subarray_sum/challenge.py |
Challenge implementation with reference solution, test generation, and function signatures |
challenges/medium/55_max_2d_subarray_sum/challenge.html |
HTML documentation describing the problem, examples, and constraints |
challenges/medium/55_max_2d_subarray_sum/starter/starter.triton.py |
Triton starter template with function signature |
challenges/medium/55_max_2d_subarray_sum/starter/starter.pytorch.py |
PyTorch starter template with function signature |
challenges/medium/55_max_2d_subarray_sum/starter/starter.mojo |
Mojo starter template with function signature |
challenges/medium/55_max_2d_subarray_sum/starter/starter.cu |
CUDA starter template with function signature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert output.dtype == torch.int32 | ||
|
|
||
| psum = input.cumsum(dim=0).cumsum(dim=1) | ||
| padded = torch.zeros((N+1, N+1), dtype=torch.int32) |
There was a problem hiding this comment.
The padded tensor is created on CPU by default with torch.zeros, but the input tensor is on GPU (cuda device). This will cause a runtime error when trying to assign psum to padded[1:, 1:] since tensors need to be on the same device. The padded tensor should be created on the same device as the input tensor.
| padded = torch.zeros((N+1, N+1), dtype=torch.int32) | |
| padded = torch.zeros((N+1, N+1), dtype=torch.int32, device=input.device) |
| "window_size": 7 | ||
| }) | ||
|
|
||
| # increasing_sequence |
There was a problem hiding this comment.
The comment says "increasing_sequence" but the test generates random integers using torch.randint(-10, 11, ...). This comment is misleading and should be updated to reflect what the test actually does, such as "mixed_positive_negative" or "random_values".
| # increasing_sequence | |
| # mixed_positive_negative |
| from math import ceildiv | ||
|
|
||
| # input, output are device pointers (i.e. pointers to memory on the GPU) | ||
| @export |
|
|
||
| // input, output are device pointers (i.e. pointers to memory on the GPU) | ||
| extern "C" void solve(const int* input, int* output, int N, int window_size) { | ||
|
|
There was a problem hiding this comment.
There is trailing whitespace on this line. This should be removed for code cleanliness.
| @@ -0,0 +1,7 @@ | |||
| import torch | |||
| import triton | |||
There was a problem hiding this comment.
Import of 'triton' is not used.
| import triton |
| @@ -0,0 +1,7 @@ | |||
| import torch | |||
| import triton | |||
| import triton.language as tl | |||
There was a problem hiding this comment.
Import of 'tl' is not used.
| import triton.language as tl |
add max 2d subarray sum