Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUDA acceleration #933

Merged
merged 35 commits into from
Apr 17, 2019

Conversation

SamCarlberg
Copy link
Member

@SamCarlberg SamCarlberg commented Apr 10, 2019

Overview

Adds CUDA support to the GRIP runtime. A CUDA-enabled build will set a flag in MANIFEST.MF to let the runtime know that it is is using OpenCV builds with CUDA extensions; this is required since the CUDA-enabled OpenCV binaries cannot load if there is no compatible CUDA runtime installed on the system.

Add a MatWrapper class for semi-transparently working with an image on CPU and GPU memory. Data is lazily copied between host and device when needed.

Add fixed exit codes for use with SafeShutdown, to keep the codes organized. Add an exit code for missing CUDA rutnime for CUDA-enabled builds.

Add a flagChanged method to OutputSocket to avoid hacky calls to socket.setValue(socket.getValue().get())

Add a partial OutputSocket implementation with CudaSocket - contains a boolean value to flag if CUDA acceleration is available. Will always contain false if CUDA is not available, either as a missing runtime or if a CPU-only build of OpenCV is used. Displayed as a regular checkbox, which will be disabled if CUDA is unavailable.

TODO

  • Windows CUDA runtime detection
  • Mac CUDA runtime detection
  • Linux CUDA runtime detection
  • Confirm expected behavior of CUDA-enabled builds on systems without CUDA
  • Investigate more usecases for CUDA acceleration (eg cascade classifiers)

Updated Operations with CUDA Support

  • BlurOperation: Can CUDA-accelerate Gaussian blurs and bilateral filters. Box and median blurs are still CPU-only. Bilateral filter is 10x-100x faster in CUDA depending on blur radius
  • CannyEdgeOperation
  • DesaturateOperation
  • NormalizeOperation (single-channel inputs only)
  • CV Absdiff
  • CV add
  • CV addWeighted
  • CV bitwise_and
  • CV bitwise_not
  • CV bitwise_or
  • CV bitwise_xor
  • CV compare
  • CV subtract
  • CV cvtColor
  • CV Sobel (~15x speedup versus CPU)
  • CV Threshold

Note: speedups of CPU versus CUDA are as measured on my desktop computer

Component Name Speed
CPU i7-7700k 4800MHz
RAM 2x16GB DDR4 2133MHz
GPU GTX 980Ti 1450MHz core/3750MHz mem

Re-enable CompatibilityTest. Fix some backwards compatbility issues.

Not really a fan of injecting the Injector into the Operations class. Need to see if there's a better solution
Currently requires custom CUDA bindings from javaccp-presets (see bytedeco/javacpp-presets#416)
Trying to use CUDA-accelerated operations without an nvidia GPU and running nvidia drivers will probably crash the app
ProjectTest.testPerformSerializedPipelineWithMats is broken... don't know exactly why, but opencv_core.compare is segfauting
No more socket.setValue(socket.getValue()) hacks
Socket is disabled when CUDA is unavailable
Add enumeration of exit codes to keep things organized
CUDA median filter is broken on 3.4.3, so keep it running on CPU only
Makes CUDA operation cleanups actually free the used memory

if (withCuda) {
version = "$version-cuda"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to just build a different version of GRIP with CUDA support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. CUDA-enabled OpenCV will fail to load if there's no CUDA runtime available, and I'm not sure if it's possible to trick JavaCPP into loading the non-CUDA version at runtime

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classloader hackyness? A thought you could experiment with after this PR is merged.

* accessed from CPU land when they have been most recently used from CUDA code.
*/
@SuppressWarnings("PMD.GodClass")
public final class MatWrapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class thread safe? Does it need to be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and I don't think it needs to be. Mat is not thread safe and we've been fine so far with our existing synchronization constructs.

} else {
return ifGpu.apply(gpuMat);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Clean & straightforward!

* CUDA-accelerated code path. If no compatible CUDA runtime is available, sockets of this type
* will <i>always</i> have a value of {@code false} and cannot be changed.
*/
public class CudaSocket extends InputSocketImpl<Boolean> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CudaInputSocket?

* CUDA is required by OpenCV but no compatible runtime is available on the system.
*/
public static final int CUDA_UNAVAILABLE = 0x04;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer an enum with code field on it over this.

@JLLeitschuh
Copy link
Member

So far, everything looks good. Poke me when you want me to take another pass.

This lets us create CudaDetectors etc. before loading OpenCV in the core module
Move logger setup to its own class, since the core module may not load before app exits
Call flagChanged in ProjectTest
FIxes issue on Windows when using non-CUDA OpenCV
import java.util.logging.SimpleFormatter;
import java.util.logging.StreamHandler;

public final class Loggers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid change.

}

bind(CudaVerifier.class).in(Scopes.SINGLETON);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to move this logic to a construction phase? Like a method GripCudaModule.create() method, that way there's no dynamic logic inside of configure?

Related: /~https://github.com/google/guice/wiki/AvoidConditionalLogicInModules

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class CudaVerifierTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOC and composition makes this a pretty trivial class 🙂

Generate a resource file at build time to flag CUDA usage
May also fix Mac, but that remains untested
Since there's no definite location to check, and because linking is funky on Windows (only seems to want CUDA 10.0; 10.1 cannot be used). Linux, of course, will work fine with 10.1
@codecov-io
Copy link

codecov-io commented Apr 12, 2019

Codecov Report

Merging #933 into master will decrease coverage by 1.5%.
The diff coverage is 39.38%.

@@             Coverage Diff              @@
##             master     #933      +/-   ##
============================================
- Coverage     54.24%   52.73%   -1.51%     
  Complexity        1        1              
============================================
  Files           307      325      +18     
  Lines          8372     8853     +481     
  Branches        542      563      +21     
============================================
+ Hits           4541     4669     +128     
- Misses         3633     3979     +346     
- Partials        198      205       +7

Is platform-agnostic and doesn't need to hardcode install locations - CUDA just needs to be on the PATH
@SamCarlberg SamCarlberg marked this pull request as ready for review April 16, 2019 20:21
@JLLeitschuh JLLeitschuh removed the wip label Apr 17, 2019
Copy link
Member

@JLLeitschuh JLLeitschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@SamCarlberg SamCarlberg merged commit 34afc3d into WPIRoboticsProjects:master Apr 17, 2019
@SamCarlberg SamCarlberg deleted the experiemental/cuda branch April 17, 2019 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants