Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UDOP #22940

Merged
merged 159 commits into from
Mar 4, 2024
Merged

Add UDOP #22940

Show file tree
Hide file tree
Changes from 155 commits
Commits
Show all changes
159 commits
Select commit Hold shift + click to select a range
5f245d3
First draft
NielsRogge Mar 12, 2023
48fe92e
More improvements
NielsRogge Mar 12, 2023
db2ba36
More improvements
NielsRogge Mar 12, 2023
4c8de84
More fixes
NielsRogge Mar 12, 2023
f396c5c
Fix copies
NielsRogge Mar 12, 2023
11097a7
More improvements
NielsRogge Mar 12, 2023
77401dd
More fixes
NielsRogge Mar 12, 2023
5c320b8
More improvements
NielsRogge Mar 12, 2023
ac5d893
Convert checkpoint
NielsRogge Mar 12, 2023
119946e
More improvements, set up tests
NielsRogge Mar 12, 2023
ee19584
Fix more tests
NielsRogge Mar 13, 2023
09eed84
Add UdopModel
NielsRogge Mar 13, 2023
3b01717
More improvements
NielsRogge Mar 13, 2023
05f339b
Fix equivalence test
NielsRogge Mar 13, 2023
6a365c2
More fixes
NielsRogge Mar 13, 2023
cff78f1
Redesign model
NielsRogge Mar 13, 2023
5dc6455
Extend conversion script
NielsRogge Mar 13, 2023
ce489a1
Use real inputs for conversion script
NielsRogge Mar 13, 2023
c25ae1e
Add image processor
NielsRogge Mar 13, 2023
9370eec
Improve conversion script
NielsRogge Mar 13, 2023
32f9511
Add UdopTokenizer
NielsRogge Mar 13, 2023
ab839a1
Add fast tokenizer
NielsRogge Mar 13, 2023
5cd941d
Add converter
NielsRogge Mar 13, 2023
b138993
Update README's
NielsRogge Mar 13, 2023
8570bc2
Add processor
NielsRogge Mar 13, 2023
77fbe07
Add fully fledged tokenizer
NielsRogge Mar 13, 2023
d6df92c
Add fast tokenizer
NielsRogge Mar 13, 2023
ff1cbee
Use processor in conversion script
NielsRogge Mar 14, 2023
e19dd45
Add tokenizer tests
NielsRogge Mar 20, 2023
56245fa
Fix one more test
NielsRogge Mar 20, 2023
6c4b674
Fix more tests
NielsRogge Mar 20, 2023
39c3892
Fix tokenizer tests
NielsRogge Mar 20, 2023
84a6109
Enable fast tokenizer tests
NielsRogge Mar 20, 2023
7755b46
Fix more tests
NielsRogge Mar 21, 2023
3cabffe
Fix additional_special_tokens of fast tokenizer
NielsRogge Mar 21, 2023
7fdd034
Fix tokenizer tests
NielsRogge Mar 21, 2023
1033d15
Fix more tests
NielsRogge Mar 26, 2023
767d3d5
Fix equivalence test
NielsRogge Mar 26, 2023
1f283a7
Rename image to pixel_values
NielsRogge Mar 26, 2023
3053b21
Rename seg_data to bbox
NielsRogge Mar 26, 2023
767e076
More renamings
NielsRogge Mar 26, 2023
1e2b84e
Remove vis_special_token
NielsRogge Mar 26, 2023
e0812d0
More improvements
NielsRogge Mar 26, 2023
d683166
Add docs
NielsRogge Mar 26, 2023
6df753b
Fix copied from
NielsRogge Mar 27, 2023
fc79c52
Update slow tokenizer
NielsRogge Mar 27, 2023
fe7b9b2
Update fast tokenizer design
NielsRogge Mar 27, 2023
52a5e4f
Make text input optional
NielsRogge Mar 27, 2023
8f27d5a
Add first draft of processor tests
NielsRogge Mar 27, 2023
c5e1cd6
Fix more processor tests
NielsRogge Mar 27, 2023
2871a15
Fix decoder_start_token_id
NielsRogge Mar 27, 2023
acc88a7
Fix test_initialization
NielsRogge Apr 3, 2023
73c2805
Add integration test
NielsRogge Apr 3, 2023
52fdad4
More improvements
NielsRogge Apr 3, 2023
6c86014
Improve processor, add test
NielsRogge Apr 17, 2023
072e6cf
Add more copied from
NielsRogge Apr 22, 2023
87eb3b2
Add more copied from
NielsRogge Apr 22, 2023
1da2beb
Add more copied from
NielsRogge Apr 22, 2023
644434f
Add more copied from
NielsRogge Apr 22, 2023
285fa31
Remove print statement
NielsRogge Apr 22, 2023
9f880c7
Update README and auto mapping
NielsRogge Apr 22, 2023
b2a5221
Delete files
NielsRogge Apr 22, 2023
79ab6fb
Delete another file
NielsRogge Apr 22, 2023
372e2cc
Remove code
NielsRogge Apr 22, 2023
62c7d82
Fix test
NielsRogge Apr 22, 2023
8782272
Fix docs
NielsRogge Apr 22, 2023
7e2816f
Remove asserts
NielsRogge Apr 22, 2023
038ba28
Add doc tests
NielsRogge Apr 22, 2023
1be820c
Include UDOP in exotic model tests
NielsRogge Apr 22, 2023
efe7474
Add expected tesseract decodings
NielsRogge Apr 22, 2023
35d1ebe
Add sentencepiece
NielsRogge Apr 22, 2023
6c89e9b
Use same design as T5
NielsRogge Apr 24, 2023
6bc7119
Add UdopEncoderModel
NielsRogge Apr 24, 2023
23122ec
Add UdopEncoderModel to tests
NielsRogge Apr 24, 2023
52f3612
More fixes
NielsRogge Apr 24, 2023
5c4e0ae
Fix fast tokenizer
NielsRogge Apr 24, 2023
fdaa56b
Fix one more test
NielsRogge May 8, 2023
f3673d5
Remove parallelisable attribute
NielsRogge Jun 5, 2023
5fffded
Fix copies
NielsRogge Jul 3, 2023
bd3e41b
Remove legacy file
NielsRogge Jul 3, 2023
19b3dc1
Copy from T5Tokenizer
NielsRogge Jul 3, 2023
03d1425
Fix rebase
NielsRogge Jul 31, 2023
29e22a7
More fixes, copy from T5
NielsRogge Jul 31, 2023
a2b8440
More fixes
NielsRogge Jul 31, 2023
caddb28
Fix init
NielsRogge Jul 31, 2023
b4307ab
Use ArthurZ/udop for tests
NielsRogge Jul 31, 2023
36e06c3
Make all model tests pass
NielsRogge Jul 31, 2023
83604c4
Remove UdopForConditionalGeneration from auto mapping
NielsRogge Jul 31, 2023
c9f7a32
Fix more tests
NielsRogge Jul 31, 2023
8f151eb
Merge branch 'main' of github.com:huggingface/transformers into add_udop
ArthurZucker Oct 23, 2023
4bdcc24
fixups
ArthurZucker Oct 23, 2023
0868530
more fixups
ArthurZucker Oct 23, 2023
6d98a92
fix the tokenizers
ArthurZucker Oct 23, 2023
dbbb099
remove un-necessary changes
ArthurZucker Oct 23, 2023
536e339
nits
ArthurZucker Oct 23, 2023
24bc54a
nits
ArthurZucker Nov 11, 2023
c07e6e0
Merge branch 'main' of github.com:huggingface/transformers into add_udop
ArthurZucker Nov 15, 2023
7154a22
replace truncate_sequences_boxes with truncate_sequences for fix-copies
ArthurZucker Nov 15, 2023
8f7e1a2
nit current path
ArthurZucker Nov 15, 2023
47df8f6
Merge branch 'main' of github.com:huggingface/transformers into add_udop
ArthurZucker Nov 15, 2023
ccde1fe
add a test for input ids
ArthurZucker Nov 15, 2023
3cbb734
ids that we should get taken from c9f7a32f57440d90ff79890270d376a1cc0…
ArthurZucker Nov 19, 2023
a89cb50
nits converting
ArthurZucker Nov 19, 2023
6141ef3
nits
ArthurZucker Nov 19, 2023
a442f47
Merge branch 'main' of github.com:huggingface/transformers into add_udop
ArthurZucker Nov 19, 2023
5cced36
apply ruff
ArthurZucker Nov 19, 2023
2287850
nits
ArthurZucker Nov 19, 2023
2c14ffa
Merge branch 'add_udop' of github.com:ArthurZucker/transformers into …
ArthurZucker Nov 19, 2023
be59f3f
nits
ArthurZucker Nov 19, 2023
9786467
style
ArthurZucker Nov 19, 2023
3431c6f
fix slow order of addition
ArthurZucker Nov 19, 2023
2de4bd2
fix udop fast range as well
ArthurZucker Nov 19, 2023
a0e6fbc
fixup
ArthurZucker Nov 19, 2023
f60847b
nits
ArthurZucker Nov 20, 2023
82195ad
Add docstrings
NielsRogge Nov 20, 2023
0f095bc
Fix gradient checkpointing
NielsRogge Nov 20, 2023
f9ea5d7
Update code examples
NielsRogge Nov 20, 2023
13786c7
Skip tests
NielsRogge Nov 20, 2023
916b1bb
Update integration test
NielsRogge Nov 20, 2023
4c3c21a
Address comment
NielsRogge Nov 30, 2023
cc65b6d
Fix merge
NielsRogge Nov 30, 2023
0a10756
Make fixup
NielsRogge Nov 30, 2023
edf20bd
Remove extra ids from tokenizer
NielsRogge Dec 21, 2023
d5a123a
Skip test
NielsRogge Dec 21, 2023
f49209b
Fix merge
NielsRogge Dec 21, 2023
ed3cb5e
Fix merge
NielsRogge Jan 15, 2024
c10bcd9
Apply suggestions from code review
NielsRogge Jan 15, 2024
4f7f568
Update year
NielsRogge Jan 15, 2024
571a419
Address comment
NielsRogge Jan 15, 2024
a6f2ca5
Address more comments
NielsRogge Jan 15, 2024
50d248c
Address comments
NielsRogge Jan 15, 2024
f85e700
Add copied from
NielsRogge Jan 15, 2024
8ea9a5d
Fix merge
NielsRogge Feb 12, 2024
ce109f9
Update CI
NielsRogge Feb 12, 2024
af8233f
Rename script
NielsRogge Feb 12, 2024
d5c91b4
Merge remote-tracking branch 'upstream/main' into add_udop
NielsRogge Feb 19, 2024
ce11f16
Update model id
NielsRogge Feb 19, 2024
83fb8ef
Add AddedToken, skip tests
NielsRogge Feb 19, 2024
5e99341
Update CI
NielsRogge Feb 19, 2024
2f6603c
Fix doc tests
NielsRogge Feb 19, 2024
974a591
Do not use Tesseract for the doc tests
NielsRogge Feb 19, 2024
8b89bd0
Remove kwargs
NielsRogge Feb 19, 2024
22066be
Add original inputs
NielsRogge Feb 19, 2024
0564256
Update casting
NielsRogge Feb 19, 2024
6741df4
Merge remote-tracking branch 'upstream/main' into add_udop
NielsRogge Feb 19, 2024
e47585b
Fix doc test
NielsRogge Feb 19, 2024
081773d
Update question
NielsRogge Feb 19, 2024
a1bc1ba
Update question
NielsRogge Feb 19, 2024
084d358
Use LayoutLMv3ImageProcessor
NielsRogge Feb 26, 2024
aa1780e
Update organization
NielsRogge Feb 26, 2024
0655050
Improve docs
NielsRogge Feb 26, 2024
ec4f8ea
Update forward signature
NielsRogge Feb 26, 2024
a89f8e4
Make images optional
NielsRogge Feb 26, 2024
5216563
Remove deprecated device argument
NielsRogge Feb 26, 2024
171469b
Merge remote-tracking branch 'upstream/main' into add_udop
NielsRogge Feb 29, 2024
b59e4e7
Add comment, add add_prefix_space
NielsRogge Mar 4, 2024
430f377
More improvements
NielsRogge Mar 4, 2024
7e68ced
Remove kwargs
NielsRogge Mar 4, 2024
0b7ee3c
Merge remote-tracking branch 'upstream/main' into add_udop
NielsRogge Mar 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .circleci/create_circleci_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,7 @@ def job_name(self):
"pip install -U --upgrade-strategy eager 'git+/~https://github.com/facebookresearch/detectron2.git'",
"sudo apt install tesseract-ocr",
"pip install -U --upgrade-strategy eager pytesseract",
"pip install --upgrade-strategy eager sentencepiece",
"pip install -U --upgrade-strategy eager natten==0.15.1+torch210cpu -f https://shi-labs.com/natten/wheels",
"pip install -U --upgrade-strategy eager python-Levenshtein",
"pip install -U --upgrade-strategy eager opencv-python",
Expand All @@ -485,6 +486,7 @@ def job_name(self):
"tests/models/*layoutlmv*",
"tests/models/*nat",
"tests/models/deta",
"tests/models/udop",
"tests/models/nougat",
],
pytest_num_workers=1,
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal.
1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (from Intel) released with the paper [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding.
1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (from Microsoft Research) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (from Google Research) released with the paper [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
Expand Down
1 change: 1 addition & 0 deletions README_es.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal.
1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (from Intel) released with the paper [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding.
1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (from Microsoft Research) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (from Google Research) released with the paper [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
Expand Down
1 change: 1 addition & 0 deletions README_fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,7 @@ Nombre actuel de points de contrôle : ![](https://img.shields.io/endpoint?url=h
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (de Microsoft), publié dans l'article [TrOCR : Reconnaissance optique de caractères basée sur un transformateur avec des modèles pré-entraînés](https://arxiv.org/abs/2109.10282) par Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (de l'UNC Chapel Hill) a été publié dans l'article [TVLT : Transformer Vision-Language sans texte](https://arxiv.org/abs/2209.14156) par Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal.
1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (d'Intel) a été publié dans l'article [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) par Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding.
1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (de Microsoft Research) publié dans l'article [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) parZineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (de Google Research) a été publié dans l'article [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) par Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler.
1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (de Google Research) a été publié dans l'article [UniMax : Échantillonnage linguistique plus équitable et plus efficace pour l'entraînement préalable multilingue à grande échelle](https://openreview.net/forum?id=kXwdL1cWOAi) par Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (de Microsoft Research) a été publié dans l'article [UniSpeech : Apprentissage unifié de la représentation de la parole avec des données étiquetées et non étiquetées](https://arxiv.org/abs/2101.07597) par Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
Expand Down
1 change: 1 addition & 0 deletions README_hd.md
Original file line number Diff line number Diff line change
Expand Up @@ -458,6 +458,7 @@ conda install conda-forge::transformers
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft) released with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal.
1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (from Intel) released with the paper [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding.
1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (Microsoft Research से) Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal. द्वाराअनुसंधान पत्र [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) के साथ जारी किया गया
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (Google Research से) Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. द्वाराअनुसंधान पत्र [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) के साथ जारी किया गया
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (माइक्रोसॉफ्ट रिसर्च से) साथ में दिया गया पेपर [UniSpeech: यूनिफाइड स्पीच रिप्रेजेंटेशन लर्निंग विद लेबलेड एंड अनलेबल्ड डेटा](https://arxiv.org/abs/2101.07597) चेंगई वांग, यू वू, याओ कियान, केनिची कुमातानी, शुजी लियू, फुरु वेई, माइकल ज़ेंग, ज़ुएदोंग हुआंग द्वारा।
Expand Down
1 change: 1 addition & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -518,6 +518,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (Microsoft から), Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei から公開された研究論文: [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282)
1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill から), Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal から公開された研究論文: [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156)
1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (Intel から), Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding から公開された研究論文: [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995)
1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (Microsoft Research から) Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal. から公開された研究論文 [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623)
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (Google Research から) Yi Tay, Mostafa Dehghani, Vinh Q から公開された研究論文: [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (Google Research から) Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. から公開された研究論文 [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi)
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (Microsoft Research から) Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang から公開された研究論文: [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597)
Expand Down
1 change: 1 addition & 0 deletions README_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (Microsoft 에서) Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei 의 [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) 논문과 함께 발표했습니다.
1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill 에서) Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal 의 [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) 논문과 함께 발표했습니다.
1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (Intel 에서) Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding 의 [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) 논문과 함께 발표했습니다.
1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (Microsoft Research 에서 제공)은 Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.의 [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623)논문과 함께 발표했습니다.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (Google Research 에서) Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzle 의 [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) 논문과 함께 발표했습니다.
1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (Google Research 에서 제공)은 Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.의 [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi)논문과 함께 발표했습니다.
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (Microsoft Research 에서) Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang 의 [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) 논문과 함께 발표했습니다.
Expand Down
1 change: 1 addition & 0 deletions README_zh-hans.md
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,7 @@ conda install conda-forge::transformers
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (来自 Microsoft) 伴随论文 [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) 由 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei 发布。
1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (来自 UNC Chapel Hill) 伴随论文 [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) 由 Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal 发布。
1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (来自 Intel) 伴随论文 [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) 由 Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding 发布.
1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (来自 Microsoft Research) 伴随论文 [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) 由 Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal 发布。
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (来自 Google Research) 伴随论文 [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) 由 Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant 发布。
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (来自 Microsoft Research) 伴随论文 [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) 由 Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang 发布。
Expand Down
1 change: 1 addition & 0 deletions README_zh-hant.md
Original file line number Diff line number Diff line change
Expand Up @@ -469,6 +469,7 @@ conda install conda-forge::transformers
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft) released with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal.
1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (from Intel) released with the paper [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding.
1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (from Microsoft Research) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (from Google Research) released with the paper [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -770,6 +770,8 @@
title: TVLT
- local: model_doc/tvp
title: TVP
- local: model_doc/udop
title: UDOP
- local: model_doc/vilt
title: ViLT
- local: model_doc/vipllava
Expand Down
Loading
Loading