Skip to content

Hardware runtime decision and utility#29

Open
Brandon-Wilson wants to merge 4 commits intoharrism:masterfrom
Brandon-Wilson:master
Open

Hardware runtime decision and utility#29
Brandon-Wilson wants to merge 4 commits intoharrism:masterfrom
Brandon-Wilson:master

Conversation

@Brandon-Wilson
Copy link
Copy Markdown

Hardware selection is now a runtime decision, such that precompiled
binaries can execute on a system without a GPU without crashing. In
addition, the target can be selected at runtime to define custom
hardware selection logic.

Several utility features have been added, including device runtime
functions (getting the number of devices / setting the active device),
basic shared memory support, and 3D execution grids.

There has been careful consideration with all of these changes to avoid
compatibility issues with pre-existing code.

Hardware selection is now a runtime decision, such that precompiled
binaries can execute on a system without a GPU without crashing.  In
addition, the target can be selected at runtime to define custom
hardware selection logic.

Several utility features have been added, including device runtime
functions (getting the number of devices / setting the active device),
basic shared memory support, and 3D execution grids.

There has been careful consideration with all of these changes to avoid
compatibility issues with pre-existing code.
@Brandon-Wilson
Copy link
Copy Markdown
Author

NOTE: My advisors at JPL are interested in using HEMI / CUDA for image processing on upcoming missions, but would like to maintain full CPU compatibility. Preferably, these changes could be pushed to the master branch for support in the future.

Thank you for your time!

Copy link
Copy Markdown
Owner

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm not sure some of the added complexity is in the spirit of hemi's original design. Please address the comments. I also think there's too much in this for one pull request. Please separate the CPU/GPU runtime execution selection from all of the utility pieces. That would make it easier for me to test and integrate.

Also, I won't accept any new functionality without tests. Please add tests (see the google tests already included in hemi for reference) for your new functionality.

}

template <typename Function, typename... Arguments>
cudaError_t getMaxBlocksForDevice(unsigned int& NumBlocks, ExecutionPolicy &p)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment to explain what this getMaxBlocksForDevice... function is for vs. configureGrid would help me review this.


namespace hemi
{
///////Global Grid Gets//////////////////////////////////////////////////////
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't match the style of other comments in the project. And what do you mean by "Gets"?

return 0;
#endif
}
int globalThreadIndex() {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the deep indent?

}
int globalThreadIndex() {
#ifdef HEMI_DEV_CODE
return (blockIdx.x + blockIdx.y * gridDim.x + blockIdx.z * gridDim.x * gridDim.y) // get block idx
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so you went full-3D for the global index.... I think this needs more design thought. The common case is 1D, and this is going to make the common case slow. I would leave globalThreadIndex() alone, and then make new accessors such as : globalThreadIndex1D, globalThreadIndex2D, globalThreadIndex3D. Or something like that. Do you use a lot of 3D blocks/grids?

HEMI_DEV_CALLABLE_INLINE
int globalThreadCount() {
#ifdef HEMI_DEV_CODE
return blockDim.x * gridDim.x * blockDim.y * gridDim.y * blockDim.z * gridDim.z;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about globalThreadIndex above.

@@ -0,0 +1,26 @@
///////////////////////////////////////////////////////////////////////////////
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider renaming this hemi.h and renaming hemi.h hemi_core.h... (So the convenient default is to include everything, but if you want to pare it down you can).

hemi/launch.h Outdated

// Automatic parallel launch
template <typename Function, typename... Arguments>
void launch(Arguments... args);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a launch without a kernel?

hemi/launch.h Outdated

// Launch function with an explicit execution policy / configuration
template <typename Function, typename... Arguments>
void launch(const ExecutionPolicy &p, Arguments... args);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

//
// Automatic parallel launch
//
template <typename Function, typename... Arguments>
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does one pass a concrete function reference to this launch()? Can you show a working example?

ExecutionPolicy p = policy;
if (p.getLocation() == hemi::device && queryForDevice())
{
checkCuda(configureGrid(p, Kernel<Function, Arguments...>));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix spacing ("Kernel << <" --> "Kernel<<<"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants