[New Model] UDOP: Unifying Vision, Text, and Layout for Universal Document Processing #20650

WaterKnight1998 · 2022-12-07T13:48:22Z

Model description

We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation. With a novel Vision-Text-Layout Transformer, UDOP unifies pretraining and multi-domain downstream tasks into a prompt-based sequence generation scheme. UDOP is pretrained on both large-scale unlabeled document corpora using innovative self-supervised objectives and diverse labeled data. UDOP also learns to generate document images from text and layout modalities via masked image reconstruction. To the best of our knowledge, this is the first time in the field of document AI that one model simultaneously achieves high-quality neural document editing and content customization. Our method sets the state-of-the-art on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites. UDOP ranks first on the leaderboard of the Document Understanding Benchmark (DUE).

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

UDOP Paper: https://arxiv.org/abs/2212.02623
UDOP Repo: /~https://github.com/microsoft/UDOP

UDOP Model Weights: https://huggingface.co/ZinengTang/Udop/tree/main

WaterKnight1998 · 2022-12-07T13:48:48Z

@NielsRogge as you implemented Donut, you might be interested :)

NielsRogge · 2022-12-07T15:27:50Z

Let's hope they open-source :)

WaterKnight1998 · 2022-12-16T11:59:05Z

@NielsRogge they added the code here /~https://github.com/microsoft/i-Code/tree/main/i-Code-Doc

uakarsh · 2022-12-23T10:23:09Z

Hi @NielsRogge, can I help in this implementation?

WaterKnight1998 · 2023-01-23T09:42:20Z

@NielsRogge here you have the weights: https://huggingface.co/ZinengTang/Udop/tree/main

munish0838 · 2023-03-14T08:22:37Z

@WaterKnight1998 Is the model accessible now?

WaterKnight1998 · 2023-03-14T08:59:53Z

@WaterKnight1998 Is the model accessible now?

No, the PR from @raghavanone was closed. @NielsRogge is working on opening a PR with a refactor of UDop code as it was not very good.

I saw he has a branch for this: /~https://github.com/NielsRogge/transformers/tree/add_udop

WaterKnight1998 added the New model label Dec 7, 2022

hammer mentioned this issue Jan 10, 2023

Assess usage of LayoutLM for extracting structural elements of PDFs deepset-ai/haystack#3058

Closed

raghavanone mentioned this issue Jan 22, 2023

[WIP] Add UDOP models #21239

Closed

NielsRogge mentioned this issue Apr 22, 2023

Add UDOP #22940

Merged

4 tasks

NielsRogge closed this as completed in #22940 Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model] UDOP: Unifying Vision, Text, and Layout for Universal Document Processing #20650

[New Model] UDOP: Unifying Vision, Text, and Layout for Universal Document Processing #20650

WaterKnight1998 commented Dec 7, 2022 •

edited

Loading

WaterKnight1998 commented Dec 7, 2022

NielsRogge commented Dec 7, 2022

WaterKnight1998 commented Dec 16, 2022

uakarsh commented Dec 23, 2022

WaterKnight1998 commented Jan 23, 2023

munish0838 commented Mar 14, 2023

WaterKnight1998 commented Mar 14, 2023 •

edited

Loading

[New Model] UDOP: Unifying Vision, Text, and Layout for Universal Document Processing #20650

[New Model] UDOP: Unifying Vision, Text, and Layout for Universal Document Processing #20650

Comments

WaterKnight1998 commented Dec 7, 2022 • edited Loading

Model description

Open source status

Provide useful links for the implementation

WaterKnight1998 commented Dec 7, 2022

NielsRogge commented Dec 7, 2022

WaterKnight1998 commented Dec 16, 2022

uakarsh commented Dec 23, 2022

WaterKnight1998 commented Jan 23, 2023

munish0838 commented Mar 14, 2023

WaterKnight1998 commented Mar 14, 2023 • edited Loading

WaterKnight1998 commented Dec 7, 2022 •

edited

Loading

WaterKnight1998 commented Mar 14, 2023 •

edited

Loading