Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double addition of inputs in PrintfNode prevents inlining and as result debuggin #506

Closed
andrii0lomakin opened this issue Jul 25, 2024 · 3 comments

Comments

@andrii0lomakin
Copy link
Contributor

During the addition of PrintfNode, all its input usages are also registered in the constructor because of the calling of NodeList#set method.
After that, it obviously added to the graph of operations by calling b.add(b.append(print mode));.

That causes a second attempt to add printf inputs to the graph. This causes side effect - argument is used as a parameter of printf registers printf extra usage twice, so as a result during inlining, inputs of two printf nodes, instead of one, will be tried to be replaced by the new inlined node that causes an exception during compilation and as result inability to use printf for debugging.

Let us consider the example:

        var currentInputOffset = inputOffset + context.globalIdy * tensorLength + context.globalIdx;
        var localSnippet = context.allocateFloatLocalArray(64);

        float value = inputTensor.get(currentInputOffset);

        var localOffset = context.localGroupSizeX * context.localIdy;
        localSnippet[localOffset + context.localIdx] = value * value;

        if (context.localIdx == 0) {
            Debug.printf("%f\n", value);
        }

In the graph printf argument will be FloatArray#get method, which will be extracted in the plugin and then:
During construction, FloatArray#get will be added as input and then PrintfNode will be added as extra usage of FloatArray#get
image
But then after the call of b.add(b.append(print mode));. , one more extra usage of FloatArray#get is added
image
That causes an exception during FloatArray#get inlining because second attempt to replace the input of second printf node fails (because those nodes are the same instances):

Unable to build sketch for method: squareSumKernel(not found in inputs, usage: 64|printf)
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to build sketch for method: squareSumKernel(not found in inputs, usage: 64|printf)
	at tornado.runtime@1.0.6/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.buildSketch(TornadoSketcher.java:190)
	at tornado.runtime@1.0.6/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:262)
	at tornado.runtime@1.0.6/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:252)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

So the fix should be simple in a nutshell:

  1. Use NodeList NodeInputList(Node self, T[] elements) instead of set in constructor.
  2. There is no need to call both append and add methods in b.add(b.append(print mode));. The call to the add method is a no-op in this case because the node is already in the graph.

If I get everything right, I will provide the PR soon.

@jjfumero
Copy link
Member

Thanks @andrii0lomakin for the report. We do not use this function and it is only partially supported for OpenCL. Improvemnts on this matter are very welcome.

@andrii0lomakin
Copy link
Contributor Author

@jjfumero, what do you mean by partially supported? How do you debug kernels in such cases? Could you provide more information?

@jjfumero
Copy link
Member

It works for simple test cases (Java primitive types) This was part of the original designed more than 10 years ago and we haven't evolved it. In general, we do not use prints inside GPU kernels. Debugging kernels is a bit tricky and what we do is to use the TornadoVM prebuilt API with the changes we want to include in our compiler optimizations, including extensions for printf if available for the target platform.

We also use external C/C++ tools to check and debug kernels (in the case of SPIR-V, we pass LLVM SPIR-V Khronos validator) for example. In some occasions we have also used Intel VTune for OpenCL:
https://jjfumero.github.io/posts/2022/02/profiling-tornadovm-with-intel-vtune/

jjfumero added a commit that referenced this issue Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants