MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
-
Updated
Jan 18, 2025 - Python
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
Explore LLM model deployment based on AXera's AI chips
PicQ: Demo for MiniCPM-o 2.6 to answer questions about images using natural language.
軽量VLMのMiniCPM-V2.6のColaboratoryサンプル
VidiQA: Demo for MiniCPM-V 2.6 to answer questions about videos using natural language.
Add a description, image, and links to the minicpm-v topic page so that developers can more easily learn about it.
To associate your repository with the minicpm-v topic, visit your repo's landing page and select "manage topics."