Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model] UDOP: Unifying Vision, Text, and Layout for Universal Document Processing #20650

Closed
2 tasks done
WaterKnight1998 opened this issue Dec 7, 2022 · 7 comments · Fixed by #22940
Closed
2 tasks done

Comments

@WaterKnight1998
Copy link

WaterKnight1998 commented Dec 7, 2022

Model description

We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation. With a novel Vision-Text-Layout Transformer, UDOP unifies pretraining and multi-domain downstream tasks into a prompt-based sequence generation scheme. UDOP is pretrained on both large-scale unlabeled document corpora using innovative self-supervised objectives and diverse labeled data. UDOP also learns to generate document images from text and layout modalities via masked image reconstruction. To the best of our knowledge, this is the first time in the field of document AI that one model simultaneously achieves high-quality neural document editing and content customization. Our method sets the state-of-the-art on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites. UDOP ranks first on the leaderboard of the Document Understanding Benchmark (DUE).

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

UDOP Paper: https://arxiv.org/abs/2212.02623
UDOP Repo: /~https://github.com/microsoft/UDOP

UDOP Model Weights: https://huggingface.co/ZinengTang/Udop/tree/main

@WaterKnight1998
Copy link
Author

@NielsRogge as you implemented Donut, you might be interested :)

@NielsRogge
Copy link
Contributor

Let's hope they open-source :)

@WaterKnight1998
Copy link
Author

@uakarsh
Copy link

uakarsh commented Dec 23, 2022

Hi @NielsRogge, can I help in this implementation?

@WaterKnight1998
Copy link
Author

@NielsRogge here you have the weights: https://huggingface.co/ZinengTang/Udop/tree/main

@munish0838
Copy link

@WaterKnight1998 Is the model accessible now?

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Mar 14, 2023

@WaterKnight1998 Is the model accessible now?

No, the PR from @raghavanone was closed. @NielsRogge is working on opening a PR with a refactor of UDop code as it was not very good.

I saw he has a branch for this: /~https://github.com/NielsRogge/transformers/tree/add_udop

@NielsRogge NielsRogge mentioned this issue Apr 22, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants