Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV calling NDList.encode() #3033

Closed
david-sitsky opened this issue Mar 19, 2024 · 3 comments · Fixed by #3034
Closed

SIGSEGV calling NDList.encode() #3033

david-sitsky opened this issue Mar 19, 2024 · 3 comments · Fixed by #3034
Labels
bug Something isn't working

Comments

@david-sitsky
Copy link
Contributor

Description

The following unit test crashes on my Linux box with a SIGSEGV when the NDList.encode() method is called.

Expected Behavior

No SIGSEGV.

Error Message

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007c7568f29f52, pid=1108173, tid=1108182
#
# JRE version: OpenJDK Runtime Environment JBR-17.0.6+10-829.9-jcef (17.0.6+10) (build 17.0.6+10-b829.9)
# Java VM: OpenJDK 64-Bit Server VM JBR-17.0.6+10-829.9-jcef (17.0.6+10-b829.9, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [0.26.0-libdjl_torch.so+0x129f52]  c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::operator->() const+0xc
#

hs_err_pid1108173.log

How to Reproduce?

    @Test
    public void testCrash()
    {
        try (NDManager ndManager = NDManager.newBaseManager())
        {
            HuggingFaceTokenizer tokenizer = HuggingFaceTokenizer.newInstance("openai/clip-vit-base-patch32");
            Encoding encoding = tokenizer.encode("fast blue car");
            NDArray attention = ndManager.create(encoding.getAttentionMask());
            NDArray inputIds = ndManager.create(encoding.getIds());
            NDArray placeholder = ndManager.create("");
            placeholder.setName("module_method:get_text_features");
            NDList ndList = new NDList(inputIds.expandDims(0), attention.expandDims(0), placeholder);
            ndList.encode();
        }
    }

Steps to reproduce

Run the above test class. There is no GPU on my machine.

What have you tried to solve it?

I tried using a specific PYTORCH version, as I read the current version has issues elsewhere, using PYTORCH_VERSION=2.0.1 but it made no difference.

@frankfliu
Copy link
Contributor

@david-sitsky

This issue is not related with NDList.encode(). The root cause of this issue is PyTorch doesn't really support String tensor.

The crash can be reproduced with the following code:

        try (NDManager ndManager = NDManager.newBaseManager("PyTorch")) {
            NDArray placeholder = ndManager.create("");
            placeholder.setName("module_method:get_text_features");
            placeholder.toByteBuffer();
        }

I think we should block NDArray operation for PyTorch String tensor to avoid crash

@david-sitsky
Copy link
Contributor Author

Blocking it to avoid crashes sounds good.

I was hoping to use the CLIP model with djl-serving and was using /~https://github.com/deepjavalibrary/djl-demo/blob/master/djl-serving/java-client/src/main/java/ai/djl/examples/serving/javaclient/DJLServingClientExample4.java and /~https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/clip/TextTranslator.java as inspiration for this.

Can you recommend the right way to make this work? Thanks in advance..

@david-sitsky
Copy link
Contributor Author

I assume writing a custom translator for djl-serving will be required for this? /~https://github.com/deepjavalibrary/djl-serving/blob/master/serving/docs/modes.md#serving-example-custom-translator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants