Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ml): ML on Rockchip NPUs #15241

Open
wants to merge 69 commits into
base: main
Choose a base branch
from

Conversation

yoni13
Copy link
Contributor

@yoni13 yoni13 commented Jan 11, 2025

Goals: ML on Rockchip NPUs.
Testing on board: #13243 (reply in thread)

TODO:

  • It's works on my OrangePi 3B (RK3566).
  • Build Docker images.
  • Build models for more SOCs.
  • Able to set thread numbers for each model type (e.g. visual -> 2 threads) via environment variables.
  • NPU core masks for RK3576/3588
  • Decide model path (immich-app/ViT-B-32__openai/textual/rk3566.rknn) .
  • Export script that accepts CLI arguments for the Immich model name and SoC, then exports the model.
  • Able to maximize NPU usage by using rknnpool.py
  • Documentation.
  • Write tests.
  • Model on Huggingface.
  • Make Docker images work out of the box (without downloading models manually).
  • Test on PC and other arm-based boards to ensure it doesn't break something.

Nice to have:

  • Rebase my commits (sorry for ugly commit messages).
  • Test if it's working on RK3588 (I don't have one).
  • Support more models.

Issues:

  • Higher ram usage because we need to load the onnx model for the input & output face for facial models.

#13243

@yoni13
Copy link
Contributor Author

yoni13 commented Jan 11, 2025

Docker launch command:

docker run --security-opt systempaths=unconfined --security-opt apparmor=unconfined --device /dev/dri --device /dev/dma_heap --device /dev/rga --device /dev/mpp_service -v /cache:/cache:ro  -p 3004:3003 -v /sys/kernel/debug/:/sys/kernel/debug/:ro --name rknnimmich_name -d rknnimmich

and it works (if you download model to cache ofc)

ViT-B-32 and buffalo_l are loaded with two threads rerunning jobs: 2.7G, and peak 3.5G RAM. (its like running 4 models at the same time)
Update: this statistic is before we only load onnx when required, will update mem usage when I got time

@yoni13 yoni13 marked this pull request as ready for review January 14, 2025 12:20
Copy link
Contributor

@mertalev mertalev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! This is a lot to get through and will need to go through more testing, but the core prediction logic looks pretty simple.

machine-learning/app/models/base.py Outdated Show resolved Hide resolved
machine-learning/app/sessions/rknn.py Outdated Show resolved Hide resolved
docker/hwaccel.ml.yml Outdated Show resolved Hide resolved
machine-learning/rknn/rknnpool.py Outdated Show resolved Hide resolved
Comment on lines +41 to +61
def _load_ort_session(self) -> None:
self.ort_session = ort.InferenceSession(
self.ort_model_path.as_posix(),
)
self.inputs: list[SessionNode] = self.ort_session.get_inputs()
self.outputs: list[SessionNode] = self.ort_session.get_outputs()
del self.ort_session

def get_inputs(self) -> list[SessionNode]:
try:
return self.inputs
except AttributeError:
self._load_ort_session()
return self.inputs

def get_outputs(self) -> list[SessionNode]:
try:
return self.outputs
except AttributeError:
self._load_ort_session()
return self.outputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just raise NotImplementedError for now and change the recognition code to first check if ORT is being used before calling get_inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick question—are we not running recognition on the NPU?

machine-learning/app/config.py Outdated Show resolved Hide resolved
machine-learning/Dockerfile Outdated Show resolved Hide resolved
run_options: Any = None,
) -> list[NDArray[np.float32]]:
input_data: list[NDArray[np.float32]] = [np.ascontiguousarray(v) for v in input_feed.values()]
self.rknnpool.put(input_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested that pool with different job concurrency and TPE settings and verified that there are no errors and that the results don't change based on these settings?

Copy link
Contributor Author

@yoni13 yoni13 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just met one😅 , but unsure where is the limit
It should be added to the docs

model: XLM-Roberta-Large-Vit-B-16Plus
visual tpe:2 running
error while trying to load the textual model (1)
concurrency: all default

I RKNN: [09:50:24.865] RKNN Runtime Information, librknnrt version: 2.3.0 (c949ad889d@2024-11-07T11:35:33)
I RKNN: [09:50:24.867] RKNN Driver Information, version: 0.9.8
I RKNN: [09:50:24.872] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (c949ad889d@2024-11-07T11:39:30)), target: RKNPU lite, target platform: rk3566, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
E RKNN: [09:50:25.353] failed to allocate handle, ret: -1, errno: 12, errstr: Cannot allocate memory
E RKNN: [09:50:25.353] failed to malloc npu memory, size: 1132926656, flags: 0x2
E RKNN: [09:50:25.353] Import rknn model failed!
E RKNN: [09:50:25.353] rknn_init, load model failed!

image

Copy link
Contributor Author

@yoni13 yoni13 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And some looks like rknn doesnt support some operator

E RKNN: [11:18:23.623] Unsupport CPU op: CumSum in this librknnrt.so, please try to register custom op by calling rknn_register_custom_ops or If using rknn, update to the latest toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (PWD: rknn). If using rknn-llm, update from: /~https://github.com/airockchip/rknn-llm

We can register custom_ops according to RKNPU user-guide 5.5.2.1, but probably not now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just met one😅 , but unsure where is the limit

It should be added to the docs

model: XLM-Roberta-Large-Vit-B-16Plus

visual tpe:2 running

error while trying to load the textual model (1)

concurrency: all default


I RKNN: [09:50:24.865] RKNN Runtime Information, librknnrt version: 2.3.0 (c949ad889d@2024-11-07T11:35:33)

I RKNN: [09:50:24.867] RKNN Driver Information, version: 0.9.8

I RKNN: [09:50:24.872] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (c949ad889d@2024-11-07T11:39:30)), target: RKNPU lite, target platform: rk3566, framework name: ONNX, framework layout: NCHW, model inference type: static_shape

E RKNN: [09:50:25.353] failed to allocate handle, ret: -1, errno: 12, errstr: Cannot allocate memory

E RKNN: [09:50:25.353] failed to malloc npu memory, size: 1132926656, flags: 0x2

E RKNN: [09:50:25.353] Import rknn model failed!

E RKNN: [09:50:25.353] rknn_init, load model failed!

image

How much RAM do you have? It could be a simple case of not enough memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And some looks like rknn doesnt support some operator


E RKNN: [11:18:23.623] Unsupport CPU op: CumSum in this librknnrt.so, please try to register custom op by calling rknn_register_custom_ops or If using rknn, update to the latest toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (PWD: rknn). If using rknn-llm, update from: /~https://github.com/airockchip/rknn-llm

We can register custom_ops according to RKNPU user-guide 5.5.2.1, but probably not now

It's normal for it to not support every op. It can just fall back to ONNX for those models like ARM NN does.

Copy link
Contributor Author

@yoni13 yoni13 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just met one😅 , but unsure where is the limit

It should be added to the docs

model: XLM-Roberta-Large-Vit-B-16Plus

visual tpe:2 running

error while trying to load the textual model (1)

concurrency: all default


I RKNN: [09:50:24.865] RKNN Runtime Information, librknnrt version: 2.3.0 (c949ad889d@2024-11-07T11:35:33)

I RKNN: [09:50:24.867] RKNN Driver Information, version: 0.9.8

I RKNN: [09:50:24.872] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (c949ad889d@2024-11-07T11:39:30)), target: RKNPU lite, target platform: rk3566, framework name: ONNX, framework layout: NCHW, model inference type: static_shape

E RKNN: [09:50:25.353] failed to allocate handle, ret: -1, errno: 12, errstr: Cannot allocate memory

E RKNN: [09:50:25.353] failed to malloc npu memory, size: 1132926656, flags: 0x2

E RKNN: [09:50:25.353] Import rknn model failed!

E RKNN: [09:50:25.353] rknn_init, load model failed!

image

How much RAM do you have? It could be a simple case of not enough memory.

I have a 8 gig model of orange pi 3b
And the mem usage
Screenshot_2025-01-16-01-03-27-624_com.android.chrome-edit.jpg

I still have some memory available , so I don't think that's the case(?

Up one means cache/buffer
Down one means usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And some looks like rknn doesnt support some operator


E RKNN: [11:18:23.623] Unsupport CPU op: CumSum in this librknnrt.so, please try to register custom op by calling rknn_register_custom_ops or If using rknn, update to the latest toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (PWD: rknn). If using rknn-llm, update from: /~https://github.com/airockchip/rknn-llm

We can register custom_ops according to RKNPU user-guide 5.5.2.1, but probably not now

It's normal for it to not support every op. It can just fall back to ONNX for those models like ARM NN does.

But there's an issue, it won't throw an error while converting the model, just saying "ooo not supported blah blah blah" in the logs

Maybe I should wrote a bash script automating it?

machine-learning/rknn/export/build_rknn.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:feature documentation Improvements or additions to documentation 🧠machine-learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants