💡 To be AI Researcher, Artist and Good Person...!!
- Learning to Learn without Gradient Descent by Gradient Descent
- Massively Multitask Networks for Drug Discovery
- One-Shot Imitation Learning
- Few-Shot Autoregressive Density Estimation: Towards Learning to Learn Distributions
- Meta-Learning for Low-Resource Neural Machine Translation
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
- SYNTHESIZER: Rethinking Self-Attention in Transformer Models
- Fine-tune BERT for Extractive Summarization
- ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations
- Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
- 대충 쓱 본 논문은 기재하지 않음
- 전체 논문을 다 읽고 나 스스로 다른 정보까지 찾아본 논문들만 기재
- 예를 들어, word2vec같은 경우 개념은 알고 있지만 paper로 뜯어보진 않았기 때문에 기재하지 않음
Reinforcement Learning
- Asynchronous Methods for Deep Reinforcement Learning
A3C
,DeepMind & Montreal
- Continuous Control With Deep Reinforcement Learning
DDPG
,DQN+DPG
,Replay Buffer
,Soft-Update via Polyak Averaging
,Ornstein Uhlenbeck process
,White Gaussian Random process
,DeepMind
- Deterministic Policy Gradient Algorithms
DeepMind
,Policy Gradient
,Actor-Critic
,Deterministic Policy
- Policy Gradient Methods for Reinforcement Learning with Function Approximation
Compatible Function Approximation
,Policy Gradient
,Sutton
- Approximately Optimal Approximate Reinforcement Learning
Kakade & Langford
,Mixture Policy
,Policy Improvement
- True Region Policy Optimiation
Trust Region
,Natural Policy
,Kakade & Langford Thm
,Policy Improvement
,OpenAI
- Proximal Policy Optimization Algorithms
OpenAI
,Practical TRPO
,Clip Gradient
Meta-Learning
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
MAML
,Optimization-Based Meta-Learning
NLP
- Efficient Estimation of Word Representations in Vector Space
Word2Vec
,CBOW
,Skip-Gram
- Distributed Representations of Words and Phrases and their Compositionality
Enhanced vec repr quality
,SubSampling
,Negative Sampling
,Hierarchical Softmax
- Deep contextualized word representations
ELMo
,Feature-Based
,Pre-ELMo + Linear Combination
,SubWord Information by ConvNet
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Transformer's Encoder
,MLM
,NSP
- Neural Machine Translatoin By Jointly Learning to Align and Translate
GRU
,Seq2Seq with Attention
,Bahnau Attention
- Attention Is All You Need
Transformers
,Self-Dot Product Attention
,Seq2Seq
- Advances in Pre-Training Distributed Word Representations
FastText
- Enriching Word Vectors with Subword Information
FastText
- Minimum Risk Training for Neural Machine Translation
MRT
,NMT
- Bag of Tricks for Efficient Text Classification
FastText for Text Classification
,Fast!
- A Fast and Accurate Dependency Parsing using Neural Networks
Parsing
- MaltParser: A Data-Driven Parser-Generator for Dependency Parsing
Parsing
- Incrementality in Deterministic Dependency Parsing
Parsing
- A Neural Probabilistic Language Model
NPLM
- Universal Language Model Fine-tuning for Text Classification
ULMFit
,Fine-Tuning
- The Natural Language Decathlon: Multitask Learning as Question Answering
MultiTask Learning
,anti-curriculum learning
- Phrase-Based & Neural Unsupervised Machine Translation
Initialization
, ``,Back-Translation
- A Structured Self-Attentive Sentence Embedding
Self-Attentive
Graph
- Graph Attention Networks
GNN
,Attention
- MAGNET: Multi-Label Text Classfication using Attention-based Graph Neural Network
GAT
,MLTC
Conversational AI
- Memory Networks
- End-To-End Memory Networks
- Learning Through Dialogue Interactions By Asking Questions
- Hierarchical Attention Networks for Document Classification
- Conversational Decision-Making Model for Predicting the King's Decision in the Annals of the Joseon Dynasty
Fundamental
- Decoupled Neural Interfaces using Synthetic Gradients
- Decoupled Weight Decay Regularization
- Neural Network Ensembles, Cross Validation, and Active Learning
- Sharp Minima Can Generalize For Deep Nets
- Long short-term memory
- Highway Networks
- Recurrent Highway Networks
ETC
- LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
- C3D Learning Spatiotemporal Features with 3D Convolutional Networks
- BPE(Byte-Pair-Encoding); A New Algorithm for Data Compression (C-user journal 1994) paper
- Adjust BPE on NMT; Neural Machine Translation of Rare Words with Subword Units (ACL 2016) paper
- Compare between
n-gram
andbyte-pair-encoding
- Compare between
Wordpiece
SentencePiece
Morphological
- NPLM; A Neural Probabilistic Language Model (jmlr 2003) paper
- NPLM's Reference -> 문장에서 단어의 역할을 학습
- Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks (NIPS 2000) paper
- NN으로 고차원 이진 분산 표현을 실시하는 아이디어 제시
- Extracting distributed representations of concepts and relations from positive and negative propositions (IEEE 2000) link
- Hinton 교수의 연구가 성공적으로 적용된 사례
- Natural Language Processing With Modular Pdp Networks and Distributed Lexicon (Cognitive Science 1991 July) link
- Neural network를 LM에 적용시키려 한 사례
- Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks (NIPS 2000) paper
- NPLM's Reference -> word sequence distribution의 statistical model을 학습
- Sequential neural text compression (IEEE 1996) link
- I Love Schmidhuber a lot :)
- Sequential neural text compression (IEEE 1996) link
- NPLM's Reference -> 문장에서 단어의 역할을 학습
- Word2Vec 2013a; Efficient Estimation of Word Representations in Vector Space (ICLR 2013) paper
- Introduce
Skip-Gram
&CBOW
- Google Team
- Introduce
- Word2Vec 2013b; Distributed Representations of Words and Phrases and their Compositionality (NIPS 2013) paper
- Propose train optimization method such as
negative sampling
- Propose train optimization method such as
- GloVe(Global Word Vectors); GloVe: Global Vectors for Word Representation (ACL 2014) paper
- Stanford Univ.
- Overcome
Word2Vec
andLSA
- Swivel(Submatrix-Wise Vector Embedding Learner); Swivel: Improving Embeddings by Noticing What’s Missing () paper
- Google, source code
- FastText; Enriching Word Vectors with Subword Information (17.06.16, arxiv) paper
A large annotated corpus for learning natural language inference, Bowman et al., 2015 (EMNLP)
A board-coverage challenge corpus for sentence understanding through inference, Williams et al., 2018
SQuad: 100,000+ questions for machine comprehension of text, Rajpurkar et al., 2016
introduction to th conll-2003 shared task: language-independent named entity recognition, Tjong Kim Sang and De Meulder, 2003
- Incrementality in Deterministic Dependency Parsing (ACL, 2003) paper
- MaltParser: A Data-Driven Parser-Generator for Dependency Parsing (LREC, 2005) paper
- A Fast and Accurate Dependency Parser using Neural Network (EMNLP, 2014) paper
- MRT(Minimum Risk Training); Minimum Risk Training for Neural Machine Translation (ACL 2016) paper
- FastText for classification; Bag of Tricks for Efficient Text Classification (ACL 2017) link
- UNMFit; Universal Language Model Fine-tuning for Text Classification (18.05.23, arxiv) paper
Stochastic Answer Networks for Machine Reading Comprehension https://arxiv.org/abs/1712.03556
Enhanced LSTM for Natural Language Inference https://arxiv.org/abs/1609.06038
Deep Semantic Role Labeling: What Works and What’s Next https://www.aclweb.org/anthology/P17-1044/
Extractive
- BertSum; Fine-tune BERT for Extractive Summarization (19.03.25, arxiv) paper
- BertSum-Full Paper; Text Summarization with Pretrained Encoders (19.08.22, arxiv) paper
- Semi-supervised sequence learning (NIPS 2015) paper
Word Representations: A Simple and General Method for Semi-Supervised Learning
institute | subtitle | title | journal | published | etc |
---|---|---|---|---|---|
AllenAI | ELMo | Deep contextualized word representations | ACL | 2018 | paper |
AllenAI | LongFormer | Longformer: The Long-Document Transformer | arxiv | 20.04.10 | paper |
GoogleAI | BERT | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ACL | 2018 | paper |
GoogleAI | ALBERT | ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS | ICLR | 19.09.26 | paper |
GoogleAI | T5 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | JMLR | 19.10.23 | paper |
GoogleAI | PEGASUS | PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization | ICML | 2020 | paper |
GoogleAI | ELECTRA | ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS | ICLR | 2020 | paper |
DeepMind | Compressive Transformers | COMPRESSIVE TRANSFORMERS FOR LONG-RANGE SEQUENCE MODELLING | arxiv | 19.11.13 | paper |
UNC Chapel Hill | LXMERT | LXMERT: Learning Cross-Modality Encoder Representations from Transformers | arxiv | 19.08.20 | paper |
OpenAI | GPT-1 | Improving language understanding with unsupervised learning | OpenAI | 2018 | paper |
OpenAI | GPT-2 | Language Models are Unsupervised Multitask Learners | OpenAI | 2019 | paper |
OpenAI | GPT-3 | Language Models are Few-Shot Learners | OpenAI | 2020 | paper |
FAIR | FastText | Advances in Pre-Training Distributed Word Representations | arxiv | 17.12.26 | paper |
FAIR | XLM | Cross-lingual Language Model Pretraining | arxiv | 19.01.22 | paper |
FAIR | FSMT | Facebook FAIR's WMT19 News Translation Task Submission | arxiv | 19.07.15 | paper |
FAIR | RoBERTa | RoBERTa: A Robustly Optimized BERT Pretraining Approach | arxiv | 19.07.26 | paper |
FAIR | MMBT | Supervised Multimodal Bitransformers for Classifying Images and Text | arxiv | 19.09.06 | paper |
FAIR | BART | BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | arxiv | 19.10.29 | paper |
FAIR | CamemBERT | CamemBERT: a Tasty French Language Model | arxiv | 19.11.10 | paper |
FAIR | mBART | Multilingual Denoising Pre-training for Neural Machine Translation | arxiv | 20.01.22 | paper |
FAIR | RAG | Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | arxiv | 20.05.22 | paper |
Hugging Face | DistilBERT | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | arxiv | 19.10.02 | paper |
Microsoft | Marian | Marian: Cost-effective High-Quality Neural Machine Translation in C++ | ACL | 2018 | paper |
Microsoft | MT-DNN | Multi-Task Deep Neural Networks for Natural Language Understanding | arxiv | 19.05.30 | paper |
Microsoft | LayoutLM | LayoutLM: Pre-training of Text and Layout for Document Image Understanding | arxiv | 19.12.31 | paper |
NVIDIA | MegatronLM | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | arxiv | 19.09.17 | paper |
Univ. of Washington | Grover-Mega | Defending Against Neural Fake News | arxiv | 19.10.29 | paper |
Carnegie Mellon GoogleBrain | Transformer-XL | Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | arxiv | 19.06.02 | paper |
Carnegie Mellon GoogleBrain | XLNet | XLNet: Generalized Autoregressive Pretraining for Language Understanding | arxiv | 19.06.19 | paper |
Carnegie Mellon GoogleBrain | Funnel | Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing | arxiv | 20.06.05 | paper |
Salesforce | CTRL | CTRL: A CONDITIONAL TRANSFORMER LANGUAGE MODEL FOR CONTROLLABLE GENERATION | arxiv | 19.09.11 | paper |
Anonymous authors | MobileBERT | MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer | ICLR | 2020 | paper |
-
Bahdanau Attention; NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE (ICLR 2015) paper
-
Multi-Head Attention; Attention Is All You Needs (NIPS 2017) paper
-
Google Research-Synthesizer; SYNTHESIZER: Rethinking Self-Attention in Transformer Models (20.05.02, arxiv) paper
Sumit Chopra
,Jason Weston
님 연구 추적- Memory Networks (14.10.15, arxiv; ICLR 2015) paper
- End-To-End Memory Networks (NIPS 2015) paper
- Learning Through Dialogue Interactions By Asking Questions (16.12.15, ICLR 2017) paper
- Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, ACL
- Kelvin Guu의 REALM, ACL
- DPR; Dense Passage Retrieval for Open-Domain Question Answering (20.04.10) paper
- Original GAN; Generative Adversarial Net (NIPS 2014) paper
- MAML; Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (ICML 2017) paper
- https://ai.googleblog.com/2018/10/curiosity-and-procrastination-in.html
- Meta-leraning curiosity algorithms
- Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- Novelty search (Lehman & Stanley, 2008)
- Buffers and Nearest Neighbors (Fu et al., 2017)
- Generating goals (Srivastava et al., 2013; Kulkarni et al., 2016)
- Learning progress (Oudeyer et al., 2007; Schmidhuber, 2008)
- Generating diverse skills (Eysenbach et al., 2018)
- Stochastic neural networks (Florensa et al., 2017; Fortunato et al., 2017)
- Count-based exploration (Tang et al., 2017)
- Object-based curiosity measures (Forestier & Oudeyer, 2016)
- Bonus-based (Taiga et al., 2019)
- AutoML Style Approach
- Neural Architecture Search (NAS)
- Hyperparameter optimization for deep networks
- Auto-sklearn, Learning loss funtions to replace cross-entropy for training a fixed architecture on MNIST and CIFAR
- Meta-learning with genetic programming, evolutionary computing
- Programming Automation
- Searching over mathematical operations within neural networks
- Neural networks that learn programs
- Modular Meta-Learning / Hierarchical Meta-Learning, Reinforcement Learning
- Inspired from Cognitive/Brain Science (Attention, Curiosity, Common Sense, etc)
- Agent57 (DeepMind)
- Policy Gradient Theorem Policy Gradient Methods for Reinforcement Learning with Function Approximation (NIPS 2000) paper
- Deterministic Policy Gradient Algorithm
- Continuous Control with Deep Reinforcement Learning
- Approximetely Optimal Approximate Reinforcement Learning
- True Region Policy Optimization
- Proximal Policy Optimization Algorithms
- ACCELERATED METHODS FOR DEEP REINFORCEMENT LEARNING () paper
- Implementation Matters In Deep RL () paper
- CURL: Contrastive Unsupervised Representations for Reinforcement Learning () paper
- Dream to Control: Learning Behaviors by Latent Imagination () paper
- Neural Network Ensembles, Cross Validation, and Active Learning (NIPS 1995) paper
Batch Normalization
Lipschitz gradient
Global Batch Normalization
Input Covariate Shift
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
How Does Batch Normalization Help Optimization?
Layer Normalization https://arxiv.org/abs/1607.06450
LeCun Initialization Efficient BackProp
Xavier initialization Understanding the difficulty of training deep feedforward neural networks
He Initialization Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Nesterov Optimizer (Optimization류 논문들)
weight_standardization
- Long short-term memory (Neural Computation 1997) paper
- LSTM: A Search Space Odyssey (IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017) paper
- Highway Networks (15.05.03, arxiv) paper
- Full Paper: Training Very Deep Networks link
- Recurrent Highway Networks (ICML 2017) paper
- Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (IEEE 2001) paper paper
- Bidirectional LSTM networks for improved phoneme classification and recognition (International Conference on Artificial Neural Networks 05.09.11)
- Sequential neural text compression (IEEE 1996) paper
- Neural expectation maximazation (NIPS 2017) paper
- Accelerated Neural Evolution through Cooperatively Coevolved Synapses (JMLR 2008) paper
- World Models (18.05.09, arxiv) paper
LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
C3D Learning Spatiotemporal Features with 3D Convolutional Networks
n-gram 관련 논문
- Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer
- Interpolated estimation of Markov source parameters from sparse data
Pointing the Unknown Words (몬트리홀 대학)
Seq2Seq Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Real-World Anomaly Detection in Surveillance Videos
self-attention on classification - A Structured Self-Attentive Sentence Embedding