Double addition of inputs in PrintfNode prevents inlining and as result debuggin #506

andrii0lomakin · 2024-07-25T09:30:26Z

During the addition of PrintfNode, all its input usages are also registered in the constructor because of the calling of NodeList#set method.
After that, it obviously added to the graph of operations by calling b.add(b.append(print mode));.

That causes a second attempt to add printf inputs to the graph. This causes side effect - argument is used as a parameter of printf registers printf extra usage twice, so as a result during inlining, inputs of two printf nodes, instead of one, will be tried to be replaced by the new inlined node that causes an exception during compilation and as result inability to use printf for debugging.

Let us consider the example:

        var currentInputOffset = inputOffset + context.globalIdy * tensorLength + context.globalIdx;
        var localSnippet = context.allocateFloatLocalArray(64);

        float value = inputTensor.get(currentInputOffset);

        var localOffset = context.localGroupSizeX * context.localIdy;
        localSnippet[localOffset + context.localIdx] = value * value;

        if (context.localIdx == 0) {
            Debug.printf("%f\n", value);
        }

In the graph printf argument will be FloatArray#get method, which will be extracted in the plugin and then:
During construction, FloatArray#get will be added as input and then PrintfNode will be added as extra usage of FloatArray#get

But then after the call of b.add(b.append(print mode));. , one more extra usage of FloatArray#get is added

That causes an exception during FloatArray#get inlining because second attempt to replace the input of second printf node fails (because those nodes are the same instances):

Unable to build sketch for method: squareSumKernel(not found in inputs, usage: 64|printf)
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to build sketch for method: squareSumKernel(not found in inputs, usage: 64|printf)
	at tornado.runtime@1.0.6/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.buildSketch(TornadoSketcher.java:190)
	at tornado.runtime@1.0.6/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:262)
	at tornado.runtime@1.0.6/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:252)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

So the fix should be simple in a nutshell:

Use NodeList NodeInputList(Node self, T[] elements) instead of set in constructor.
There is no need to call both append and add methods in b.add(b.append(print mode));. The call to the add method is a no-op in this case because the node is already in the graph.

If I get everything right, I will provide the PR soon.

The text was updated successfully, but these errors were encountered:

jjfumero · 2024-07-25T10:24:37Z

Thanks @andrii0lomakin for the report. We do not use this function and it is only partially supported for OpenCL. Improvemnts on this matter are very welcome.

andrii0lomakin · 2024-07-25T10:25:53Z

@jjfumero, what do you mean by partially supported? How do you debug kernels in such cases? Could you provide more information?

jjfumero · 2024-07-25T10:38:08Z

It works for simple test cases (Java primitive types) This was part of the original designed more than 10 years ago and we haven't evolved it. In general, we do not use prints inside GPU kernels. Debugging kernels is a bit tricky and what we do is to use the TornadoVM prebuilt API with the changes we want to include in our compiler optimizations, including extensions for printf if available for the target platform.

We also use external C/C++ tools to check and debug kernels (in the case of SPIR-V, we pass LLVM SPIR-V Khronos validator) for example. In some occasions we have also used Intel VTune for OpenCL:
https://jjfumero.github.io/posts/2022/02/profiling-tornadovm-with-intel-vtune/

Issue #506 has been fixed.

andrii0lomakin mentioned this issue Jul 27, 2024

Issue #506 has been fixed. #513

Merged

8 tasks

jjfumero added a commit that referenced this issue Jul 30, 2024

Merge pull request #513 from babylonml/printf-dbl-input

f1e670d

Issue #506 has been fixed.

andrii0lomakin closed this as completed Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double addition of inputs in PrintfNode prevents inlining and as result debuggin #506

Double addition of inputs in PrintfNode prevents inlining and as result debuggin #506

andrii0lomakin commented Jul 25, 2024

jjfumero commented Jul 25, 2024

andrii0lomakin commented Jul 25, 2024

jjfumero commented Jul 25, 2024

Double addition of inputs in PrintfNode prevents inlining and as result debuggin #506

Double addition of inputs in PrintfNode prevents inlining and as result debuggin #506

Comments

andrii0lomakin commented Jul 25, 2024

jjfumero commented Jul 25, 2024

andrii0lomakin commented Jul 25, 2024

jjfumero commented Jul 25, 2024