diff --git a/docs/faq/add_op_in_backend.md b/docs/faq/add_op_in_backend.md
index c44a0aa05235..64f6c8ae0f98 100644
--- a/docs/faq/add_op_in_backend.md
+++ b/docs/faq/add_op_in_backend.md
@@ -1,3 +1,22 @@
+
+
# A Beginner's Guide to Implementing Operators in MXNet Backend
## Introduction
diff --git a/docs/faq/bucketing.md b/docs/faq/bucketing.md
index dbfdedde2acf..49994dc4fe97 100644
--- a/docs/faq/bucketing.md
+++ b/docs/faq/bucketing.md
@@ -1,3 +1,22 @@
+
+
# Bucketing in MXNet
When we train recurrent neural networks (RNNs), we _unroll_ the network in time.
For a single example of length T, we would unroll the network T steps.
diff --git a/docs/faq/caffe.md b/docs/faq/caffe.md
index a9bab3cdb549..b99acb07e0bd 100644
--- a/docs/faq/caffe.md
+++ b/docs/faq/caffe.md
@@ -1,3 +1,22 @@
+
+
# How to | Convert from Caffe to MXNet
Key topics covered include the following:
diff --git a/docs/faq/cloud.md b/docs/faq/cloud.md
index 67b28f8b4338..bd2987f0a057 100644
--- a/docs/faq/cloud.md
+++ b/docs/faq/cloud.md
@@ -1,3 +1,22 @@
+
+
# MXNet on the Cloud
Deep learning can require extremely powerful hardware, often for unpredictable durations of time.
diff --git a/docs/faq/develop_and_hack.md b/docs/faq/develop_and_hack.md
index da53fd010bb0..b09824dc2ef6 100644
--- a/docs/faq/develop_and_hack.md
+++ b/docs/faq/develop_and_hack.md
@@ -1,3 +1,22 @@
+
+
# Develop and Hack MXNet
- [Create new operators](new_op.md)
- [Use Torch from MXNet](torch.md)
diff --git a/docs/faq/distributed_training.md b/docs/faq/distributed_training.md
index 8d8666ff066c..32696f34e250 100644
--- a/docs/faq/distributed_training.md
+++ b/docs/faq/distributed_training.md
@@ -1,3 +1,22 @@
+
+
# Distributed Training in MXNet
MXNet supports distributed training enabling us to leverage multiple machines for faster training.
In this document, we describe how it works, how to launch a distributed training job and
diff --git a/docs/faq/env_var.md b/docs/faq/env_var.md
index 83368bf4d0c3..2f649bedafc1 100644
--- a/docs/faq/env_var.md
+++ b/docs/faq/env_var.md
@@ -1,3 +1,22 @@
+
+
Environment Variables
=====================
MXNet has several settings that you can change with environment variables.
diff --git a/docs/faq/faq.md b/docs/faq/faq.md
index 668587ec6888..4682c95d2a47 100644
--- a/docs/faq/faq.md
+++ b/docs/faq/faq.md
@@ -1,3 +1,22 @@
+
+
# Frequently Asked Questions
This topic provides answers to the frequently asked questions on [mxnet/issues](/~https://github.com/dmlc/mxnet/issues). Before posting an issue, please check this page. If you would like to contribute to this page, please make the questions and answers simple. If your answer is extremely detailed, please post it elsewhere and link to it.
diff --git a/docs/faq/finetune.md b/docs/faq/finetune.md
index 04244d15b0b9..33a4ffb32521 100644
--- a/docs/faq/finetune.md
+++ b/docs/faq/finetune.md
@@ -1,3 +1,22 @@
+
+
# Fine-tune with Pretrained Models
diff --git a/docs/faq/float16.md b/docs/faq/float16.md
index b4cd97b30e5c..0fda178c4bde 100644
--- a/docs/faq/float16.md
+++ b/docs/faq/float16.md
@@ -1,3 +1,22 @@
+
+
# Mixed precision training using float16
In this tutorial you will walk through how one can train deep learning neural networks with mixed precision on supported hardware. You will first see how to use float16 (both with Gluon and Symbolic APIs) and then some techniques on achieving good performance and accuracy.
diff --git a/docs/faq/gradient_compression.md b/docs/faq/gradient_compression.md
index e2dbd3271d8d..adf7f62bc193 100644
--- a/docs/faq/gradient_compression.md
+++ b/docs/faq/gradient_compression.md
@@ -1,3 +1,22 @@
+
+
# Gradient Compression
Gradient Compression reduces communication bandwidth, and in some scenarios, it can make training more scalable and efficient without significant loss in convergence rate or accuracy. Example implementations with GPUs, CPUs, and distributed training are provided in this document.
diff --git a/docs/faq/index.md b/docs/faq/index.md
index fe91f7ca43b7..99c448da027d 100644
--- a/docs/faq/index.md
+++ b/docs/faq/index.md
@@ -1,3 +1,22 @@
+
+
# MXNet FAQ
```eval_rst
diff --git a/docs/faq/model_parallel_lstm.md b/docs/faq/model_parallel_lstm.md
index b78b2c574dcc..63f4db6d9df9 100644
--- a/docs/faq/model_parallel_lstm.md
+++ b/docs/faq/model_parallel_lstm.md
@@ -1,3 +1,22 @@
+
+
# Training with Multiple GPUs Using Model Parallelism
Training deep learning models can be resource intensive.
Even with a powerful GPU, some models can take days or weeks to train.
diff --git a/docs/faq/multi_devices.md b/docs/faq/multi_devices.md
index a43879cb5233..46ec837025d9 100644
--- a/docs/faq/multi_devices.md
+++ b/docs/faq/multi_devices.md
@@ -1,3 +1,22 @@
+
+
# Run MXNet on Multiple CPU/GPUs with Data Parallelism
_MXNet_ supports training with multiple CPUs and GPUs, which may be located on different physical machines.
diff --git a/docs/faq/new_op.md b/docs/faq/new_op.md
index 994a2a6f823e..910bb64ef5d9 100644
--- a/docs/faq/new_op.md
+++ b/docs/faq/new_op.md
@@ -1,3 +1,22 @@
+
+
# How to Create New Operators (Layers)
This tutorials walks you through the process of creating new MXNet operators (or layers).
diff --git a/docs/faq/nnpack.md b/docs/faq/nnpack.md
index ed38cb07df7e..690315efe54a 100644
--- a/docs/faq/nnpack.md
+++ b/docs/faq/nnpack.md
@@ -1,3 +1,22 @@
+
+
### NNPACK for Multi-Core CPU Support in MXNet
[NNPACK](/~https://github.com/Maratyszcza/NNPACK) is an acceleration package
for neural network computations, which can run on x86-64, ARMv7, or ARM64 architecture CPUs.
diff --git a/docs/faq/perf.md b/docs/faq/perf.md
index f116ede11d56..d00c904c0873 100644
--- a/docs/faq/perf.md
+++ b/docs/faq/perf.md
@@ -1,3 +1,22 @@
+
+
# Some Tips for Improving MXNet Performance
Even after fixing the training or deployment environment and parallelization scheme,
a number of configuration settings and data-handling choices can impact the _MXNet_ performance.
diff --git a/docs/faq/recordio.md b/docs/faq/recordio.md
index 3091052ef6f3..3ece38617513 100644
--- a/docs/faq/recordio.md
+++ b/docs/faq/recordio.md
@@ -1,3 +1,22 @@
+
+
## Create a Dataset Using RecordIO
RecordIO implements a file format for a sequence of records. We recommend storing images as records and packing them together. The benefits include:
diff --git a/docs/faq/s3_integration.md b/docs/faq/s3_integration.md
index 024356706339..18bd38df71df 100644
--- a/docs/faq/s3_integration.md
+++ b/docs/faq/s3_integration.md
@@ -1,3 +1,22 @@
+
+
# Use data from S3 for training
AWS S3 is a cloud-based object storage service that allows storage and retrieval of large amounts of data at a very low cost. This makes it an attractive option to store large training datasets. MXNet is deeply integrated with S3 for this purpose.
diff --git a/docs/faq/security.md b/docs/faq/security.md
index 0615acda3435..05153e20aaf1 100644
--- a/docs/faq/security.md
+++ b/docs/faq/security.md
@@ -1,3 +1,22 @@
+
+
# MXNet Security Best Practices
MXNet framework has no built-in security protections. It assumes that the MXNet entities involved in model training and inferencing (hosting) are fully trusted. It also assumes that their communications cannot be eavesdropped or tampered with. MXNet consumers shall ensure that the above assumptions are met.
diff --git a/docs/faq/smart_device.md b/docs/faq/smart_device.md
index 2584b4c36caf..1c4a8919d981 100644
--- a/docs/faq/smart_device.md
+++ b/docs/faq/smart_device.md
@@ -1,3 +1,22 @@
+
+
# Deep Learning in a Single File for Smart Devices
Deep learning (DL) systems are complex and often depend on a number of libraries.
diff --git a/docs/faq/visualize_graph.md b/docs/faq/visualize_graph.md
index 06010213242c..4623346a5a81 100644
--- a/docs/faq/visualize_graph.md
+++ b/docs/faq/visualize_graph.md
@@ -1,3 +1,22 @@
+
+
# How to visualize Neural Networks as computation graph
Here, we'll demonstrate how to use ```mx.viz.plot_network```
diff --git a/docs/faq/why_mxnet.md b/docs/faq/why_mxnet.md
old mode 100755
new mode 100644
index ed8cef143070..00f21043cc41
--- a/docs/faq/why_mxnet.md
+++ b/docs/faq/why_mxnet.md
@@ -1,3 +1,22 @@
+
+
# Why MXNet?
Probably, if you've stumbled upon this page, you've heard of _deep learning_.
diff --git a/docs/gluon/index.md b/docs/gluon/index.md
index 96e8e36dbf20..b136efbe0ae5 100644
--- a/docs/gluon/index.md
+++ b/docs/gluon/index.md
@@ -1,3 +1,22 @@
+
+
# About Gluon
![gluon logo](/~https://github.com/dmlc/web-data/blob/master/mxnet/image/image-gluon-logo.png?raw=true)
diff --git a/docs/index.md b/docs/index.md
index ab6a95dc0ddd..1ecdbde909d5 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,3 +1,22 @@
+
+
# MXNet
```eval_rst
diff --git a/docs/install/amazonlinux_setup.md b/docs/install/amazonlinux_setup.md
index 42a4fcb0eb89..baac53461ffd 100644
--- a/docs/install/amazonlinux_setup.md
+++ b/docs/install/amazonlinux_setup.md
@@ -1,3 +1,22 @@
+
+
diff --git a/docs/install/build_from_source.md b/docs/install/build_from_source.md
index e807fb44b599..0effd3b30979 100644
--- a/docs/install/build_from_source.md
+++ b/docs/install/build_from_source.md
@@ -1,3 +1,22 @@
+
+
# Build MXNet from Source
This document explains how to build MXNet from source code.
diff --git a/docs/install/c_plus_plus.md b/docs/install/c_plus_plus.md
index 6ad67e2803db..f95bcc645746 100644
--- a/docs/install/c_plus_plus.md
+++ b/docs/install/c_plus_plus.md
@@ -1,3 +1,22 @@
+
+
## Build the C++ package
The C++ package has the same prerequisites as the MXNet library.
diff --git a/docs/install/centos_setup.md b/docs/install/centos_setup.md
index e5efe42a61dd..a42c337c0f91 100644
--- a/docs/install/centos_setup.md
+++ b/docs/install/centos_setup.md
@@ -1,3 +1,22 @@
+
+
# Installing MXNet on CentOS and other non-Ubuntu Linux systems
Step 1. Install build tools and git on `CentOS >= 7` and `Fedora >= 19`:
diff --git a/docs/install/download.md b/docs/install/download.md
index d7b9440e0a98..998336731d5f 100644
--- a/docs/install/download.md
+++ b/docs/install/download.md
@@ -1,3 +1,22 @@
+
+
# Source Download
These source archives are generated from tagged releases. Updates and patches will not have been applied. For any updates refer to the corresponding branches in the [GitHub repository](/~https://github.com/apache/incubator-mxnet). Choose your flavor of download from the following links:
diff --git a/docs/install/index.md b/docs/install/index.md
index ad3d083a7d02..c4da719e72d9 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -1,3 +1,22 @@
+
+
# Installing MXNet
```eval_rst
diff --git a/docs/install/java_setup.md b/docs/install/java_setup.md
index bd20c9596013..39765730353a 100644
--- a/docs/install/java_setup.md
+++ b/docs/install/java_setup.md
@@ -1,3 +1,22 @@
+
+
# Setup the MXNet Package for Java
The following instructions are provided for macOS and Ubuntu. Windows is not yet available.
diff --git a/docs/install/osx_setup.md b/docs/install/osx_setup.md
index 7d90d3d456f6..e39c006b86ee 100644
--- a/docs/install/osx_setup.md
+++ b/docs/install/osx_setup.md
@@ -1,3 +1,22 @@
+
+
# Installing MXNet from source on OS X (Mac)
**NOTE:** For prebuild MXNet with Python installation, please refer to the [new install guide](http://mxnet.io/install/index.html).
diff --git a/docs/install/raspbian_setup.md b/docs/install/raspbian_setup.md
index 42a4fcb0eb89..baac53461ffd 100644
--- a/docs/install/raspbian_setup.md
+++ b/docs/install/raspbian_setup.md
@@ -1,3 +1,22 @@
+
+
diff --git a/docs/install/scala_setup.md b/docs/install/scala_setup.md
index 9ee9ceac3a3c..a09fcc949e8f 100644
--- a/docs/install/scala_setup.md
+++ b/docs/install/scala_setup.md
@@ -1,3 +1,22 @@
+
+
# Setup the MXNet Package for Scala
The following instructions are provided for macOS and Ubuntu. Windows is not yet available.
diff --git a/docs/install/tx2_setup.md b/docs/install/tx2_setup.md
index 42a4fcb0eb89..baac53461ffd 100644
--- a/docs/install/tx2_setup.md
+++ b/docs/install/tx2_setup.md
@@ -1,3 +1,22 @@
+
+
diff --git a/docs/install/ubuntu_setup.md b/docs/install/ubuntu_setup.md
index bf964182b50a..726888f377cb 100644
--- a/docs/install/ubuntu_setup.md
+++ b/docs/install/ubuntu_setup.md
@@ -1,3 +1,22 @@
+
+
# Installing MXNet on Ubuntu
The following installation instructions are for installing MXNet on computers running **Ubuntu 16.04**. Support for later versions of Ubuntu is [not yet available](#contributions).
diff --git a/docs/install/validate_mxnet.md b/docs/install/validate_mxnet.md
index dfe8d063f602..1dd4df89b149 100644
--- a/docs/install/validate_mxnet.md
+++ b/docs/install/validate_mxnet.md
@@ -1,3 +1,22 @@
+
+
# Validate Your MXNet Installation
- [Python](#python)
diff --git a/docs/install/windows_setup.md b/docs/install/windows_setup.md
old mode 100755
new mode 100644
index b34936140aea..383bca0b9778
--- a/docs/install/windows_setup.md
+++ b/docs/install/windows_setup.md
@@ -1,3 +1,22 @@
+
+
# Installing MXNet on Windows
The following describes how to install with pip for computers with CPUs, Intel CPUs, and NVIDIA GPUs. Further along in the document you can learn how to build MXNet from source on Windows, or how to install packages that support different language APIs to MXNet.
diff --git a/docs/model_zoo/index.md b/docs/model_zoo/index.md
index 034d360b985a..0a65a8d68ff8 100644
--- a/docs/model_zoo/index.md
+++ b/docs/model_zoo/index.md
@@ -1,3 +1,22 @@
+
+
# MXNet Model Zoo
MXNet features fast implementations of many state-of-the-art models reported in the academic literature. This Model Zoo is an
diff --git a/docs/settings.ini b/docs/settings.ini
index e16177604e8d..ec64dd784cdb 100644
--- a/docs/settings.ini
+++ b/docs/settings.ini
@@ -1,3 +1,20 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
[mxnet]
build_mxnet = 0
diff --git a/docs/tutorials/basic/data.md b/docs/tutorials/basic/data.md
index 4a682e83f9fe..e62ea4fa9737 100644
--- a/docs/tutorials/basic/data.md
+++ b/docs/tutorials/basic/data.md
@@ -1,3 +1,22 @@
+
+
# Iterators - Loading data
In this tutorial, we focus on how to feed data into a training or inference program.
Most training and inference modules in MXNet accept data iterators,
diff --git a/docs/tutorials/basic/index.md b/docs/tutorials/basic/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/basic/index.md
+++ b/docs/tutorials/basic/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/basic/module.md b/docs/tutorials/basic/module.md
index f7a4d6e25de7..67f5d8aecd1f 100644
--- a/docs/tutorials/basic/module.md
+++ b/docs/tutorials/basic/module.md
@@ -1,3 +1,22 @@
+
+
# Module - Neural network training and inference
diff --git a/docs/tutorials/basic/ndarray.md b/docs/tutorials/basic/ndarray.md
index 2c171f2627e8..00cac64dd6ba 100644
--- a/docs/tutorials/basic/ndarray.md
+++ b/docs/tutorials/basic/ndarray.md
@@ -1,3 +1,22 @@
+
+
# NDArray - Imperative tensor operations on CPU/GPU
In _MXNet_, `NDArray` is the core data structure for all mathematical
diff --git a/docs/tutorials/basic/ndarray_indexing.md b/docs/tutorials/basic/ndarray_indexing.md
index 35dd8c17f675..e06e5cb4482d 100644
--- a/docs/tutorials/basic/ndarray_indexing.md
+++ b/docs/tutorials/basic/ndarray_indexing.md
@@ -1,3 +1,22 @@
+
+
# NDArray Indexing - Array indexing features
diff --git a/docs/tutorials/basic/reshape_transpose.md b/docs/tutorials/basic/reshape_transpose.md
index 999b22ca2f7e..7407920b0aea 100644
--- a/docs/tutorials/basic/reshape_transpose.md
+++ b/docs/tutorials/basic/reshape_transpose.md
@@ -1,3 +1,22 @@
+
+
## Difference between reshape and transpose operators
What does it mean if MXNet gives you an error like the this?
diff --git a/docs/tutorials/basic/symbol.md b/docs/tutorials/basic/symbol.md
index 5e1e3cd8c62f..5642bab74d64 100644
--- a/docs/tutorials/basic/symbol.md
+++ b/docs/tutorials/basic/symbol.md
@@ -1,3 +1,22 @@
+
+
# Symbol - Neural network graphs
In a [previous tutorial](http://mxnet.io/tutorials/basic/ndarray.html), we introduced `NDArray`,
diff --git a/docs/tutorials/c++/basics.md b/docs/tutorials/c++/basics.md
index aa73a7363b1c..a960b1817635 100644
--- a/docs/tutorials/c++/basics.md
+++ b/docs/tutorials/c++/basics.md
@@ -1,3 +1,22 @@
+
+
Basics
======
diff --git a/docs/tutorials/c++/index.md b/docs/tutorials/c++/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/c++/index.md
+++ b/docs/tutorials/c++/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/c++/mxnet_cpp_inference_tutorial.md b/docs/tutorials/c++/mxnet_cpp_inference_tutorial.md
index ab55a0e787f1..c408b8a06f52 100644
--- a/docs/tutorials/c++/mxnet_cpp_inference_tutorial.md
+++ b/docs/tutorials/c++/mxnet_cpp_inference_tutorial.md
@@ -1,3 +1,22 @@
+
+
# MXNet C++ API inference tutorial
## Overview
diff --git a/docs/tutorials/c++/subgraphAPI.md b/docs/tutorials/c++/subgraphAPI.md
index 0ae4341b287c..a830bcbcf422 100644
--- a/docs/tutorials/c++/subgraphAPI.md
+++ b/docs/tutorials/c++/subgraphAPI.md
@@ -1,3 +1,22 @@
+
+
## Subgraph API
The subgraph API has been proposed and implemented as the default mechanism for integrating backend libraries to MXNet. The subgraph API is a very flexible interface. Although it was proposed as an integration mechanism, it has been used as a tool for manipulating NNVM graphs for graph-level optimizations, such as operator fusion.
diff --git a/docs/tutorials/control_flow/ControlFlowTutorial.md b/docs/tutorials/control_flow/ControlFlowTutorial.md
index 4b6a23136b5d..173027b13bcf 100644
--- a/docs/tutorials/control_flow/ControlFlowTutorial.md
+++ b/docs/tutorials/control_flow/ControlFlowTutorial.md
@@ -1,3 +1,22 @@
+
+
# Hybridize Gluon models with control flows.
MXNet currently provides three control flow operators: `cond`, `foreach` and `while_loop`. Like other MXNet operators, they all have a version for NDArray and a version for Symbol. These two versions have exactly the same semantics. We can take advantage of this and use them in Gluon to hybridize models.
diff --git a/docs/tutorials/control_flow/index.md b/docs/tutorials/control_flow/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/control_flow/index.md
+++ b/docs/tutorials/control_flow/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/embedded/index.md b/docs/tutorials/embedded/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/embedded/index.md
+++ b/docs/tutorials/embedded/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/embedded/wine_detector.md b/docs/tutorials/embedded/wine_detector.md
index 65e6fbaa4d91..b59a52bd8649 100644
--- a/docs/tutorials/embedded/wine_detector.md
+++ b/docs/tutorials/embedded/wine_detector.md
@@ -1,3 +1,22 @@
+
+
# Real-time Object Detection with MXNet On The Raspberry Pi
This tutorial shows developers who work with the Raspberry Pi or similar embedded ARM-based devices how to compile MXNet for those devices and run a pretrained deep network model. It also shows how to use AWS IoT to manage and monitor MXNet models running on your devices.
diff --git a/docs/tutorials/gluon/autograd.md b/docs/tutorials/gluon/autograd.md
index 4b296dd2dd5b..495db34de825 100644
--- a/docs/tutorials/gluon/autograd.md
+++ b/docs/tutorials/gluon/autograd.md
@@ -1,3 +1,22 @@
+
+
# Automatic differentiation
MXNet supports automatic differentiation with the `autograd` package.
diff --git a/docs/tutorials/gluon/custom_layer.md b/docs/tutorials/gluon/custom_layer.md
index 97bdf05aff58..a45622d175cc 100644
--- a/docs/tutorials/gluon/custom_layer.md
+++ b/docs/tutorials/gluon/custom_layer.md
@@ -1,3 +1,22 @@
+
+
# How to write a custom layer in Apache MxNet Gluon API
diff --git a/docs/tutorials/gluon/customop.md b/docs/tutorials/gluon/customop.md
index df10788d6788..5552093484ba 100644
--- a/docs/tutorials/gluon/customop.md
+++ b/docs/tutorials/gluon/customop.md
@@ -1,3 +1,22 @@
+
+
# Creating custom operators with numpy
diff --git a/docs/tutorials/gluon/data_augmentation.md b/docs/tutorials/gluon/data_augmentation.md
index df7462c33044..ce631cea5fbf 100644
--- a/docs/tutorials/gluon/data_augmentation.md
+++ b/docs/tutorials/gluon/data_augmentation.md
@@ -1,3 +1,22 @@
+
+
# Methods of applying data augmentation (Gluon API)
Data Augmentation is a regularization technique that's used to avoid overfitting when training Machine Learning models. Although the technique can be applied in a variety of domains, it's very common in Computer Vision. Adjustments are made to the original images in the training dataset before being used in training. Some example adjustments include translating, cropping, scaling, rotating, changing brightness and contrast. We do this to reduce the dependence of the model on spurious characteristics; e.g. training data may only contain faces that fill 1/4 of the image, so the model trained without data augmentation might unhelpfully learn that faces can only be of this size.
diff --git a/docs/tutorials/gluon/datasets.md b/docs/tutorials/gluon/datasets.md
index 0b0038def633..94b27bbbd331 100644
--- a/docs/tutorials/gluon/datasets.md
+++ b/docs/tutorials/gluon/datasets.md
@@ -1,3 +1,22 @@
+
+
# Gluon `Dataset`s and `DataLoader`
diff --git a/docs/tutorials/gluon/gluon.md b/docs/tutorials/gluon/gluon.md
index 518e99905c04..6cf27ff603d2 100644
--- a/docs/tutorials/gluon/gluon.md
+++ b/docs/tutorials/gluon/gluon.md
@@ -1,3 +1,22 @@
+
+
# Gluon - Neural network building blocks
Gluon package is a high-level interface for MXNet designed to be easy to use while
diff --git a/docs/tutorials/gluon/gluon_from_experiment_to_deployment.md b/docs/tutorials/gluon/gluon_from_experiment_to_deployment.md
index a3a6aab3593c..0394c84d1e02 100644
--- a/docs/tutorials/gluon/gluon_from_experiment_to_deployment.md
+++ b/docs/tutorials/gluon/gluon_from_experiment_to_deployment.md
@@ -1,3 +1,22 @@
+
+
# Gluon: from experiment to deployment, an end to end tutorial
diff --git a/docs/tutorials/gluon/gotchas_numpy_in_mxnet.md b/docs/tutorials/gluon/gotchas_numpy_in_mxnet.md
index c82c63edbc2b..4b2fa95ce885 100644
--- a/docs/tutorials/gluon/gotchas_numpy_in_mxnet.md
+++ b/docs/tutorials/gluon/gotchas_numpy_in_mxnet.md
@@ -1,3 +1,22 @@
+
+
# Gotchas using NumPy in Apache MXNet
diff --git a/docs/tutorials/gluon/hybrid.md b/docs/tutorials/gluon/hybrid.md
index 17e9e1b20d74..c79f79692637 100644
--- a/docs/tutorials/gluon/hybrid.md
+++ b/docs/tutorials/gluon/hybrid.md
@@ -1,3 +1,22 @@
+
+
# Hybrid - Faster training and easy deployment
*Related Content:*
diff --git a/docs/tutorials/gluon/index.md b/docs/tutorials/gluon/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/gluon/index.md
+++ b/docs/tutorials/gluon/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/gluon/info_gan.md b/docs/tutorials/gluon/info_gan.md
index 8b2668ab22e6..2b8cdfe3fc4e 100644
--- a/docs/tutorials/gluon/info_gan.md
+++ b/docs/tutorials/gluon/info_gan.md
@@ -1,3 +1,22 @@
+
+
# Image similarity search with InfoGAN
diff --git a/docs/tutorials/gluon/learning_rate_finder.md b/docs/tutorials/gluon/learning_rate_finder.md
index b571a53f674c..b4627f25bc39 100644
--- a/docs/tutorials/gluon/learning_rate_finder.md
+++ b/docs/tutorials/gluon/learning_rate_finder.md
@@ -1,3 +1,22 @@
+
+
# Learning Rate Finder
diff --git a/docs/tutorials/gluon/learning_rate_schedules.md b/docs/tutorials/gluon/learning_rate_schedules.md
index 88b109e7f33e..3416ab44f992 100644
--- a/docs/tutorials/gluon/learning_rate_schedules.md
+++ b/docs/tutorials/gluon/learning_rate_schedules.md
@@ -1,3 +1,22 @@
+
+
# Learning Rate Schedules
diff --git a/docs/tutorials/gluon/learning_rate_schedules_advanced.md b/docs/tutorials/gluon/learning_rate_schedules_advanced.md
index bdaf0a9ba38d..0d933997a2bb 100644
--- a/docs/tutorials/gluon/learning_rate_schedules_advanced.md
+++ b/docs/tutorials/gluon/learning_rate_schedules_advanced.md
@@ -1,3 +1,22 @@
+
+
# Advanced Learning Rate Schedules
diff --git a/docs/tutorials/gluon/logistic_regression_explained.md b/docs/tutorials/gluon/logistic_regression_explained.md
index 577a91413b33..93a93ccc9c32 100644
--- a/docs/tutorials/gluon/logistic_regression_explained.md
+++ b/docs/tutorials/gluon/logistic_regression_explained.md
@@ -1,3 +1,22 @@
+
+
# Logistic regression using Gluon API explained
diff --git a/docs/tutorials/gluon/mnist.md b/docs/tutorials/gluon/mnist.md
index 35fb40521f62..aa1efccfb653 100644
--- a/docs/tutorials/gluon/mnist.md
+++ b/docs/tutorials/gluon/mnist.md
@@ -1,3 +1,22 @@
+
+
# Hand-written Digit Recognition
In this tutorial, we'll give you a step-by-step walkthrough of building a hand-written digit classifier using the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset.
diff --git a/docs/tutorials/gluon/naming.md b/docs/tutorials/gluon/naming.md
index 3606a03dcbd2..16383d309fc5 100644
--- a/docs/tutorials/gluon/naming.md
+++ b/docs/tutorials/gluon/naming.md
@@ -1,3 +1,22 @@
+
+
# Naming of Gluon Parameter and Blocks
diff --git a/docs/tutorials/gluon/ndarray.md b/docs/tutorials/gluon/ndarray.md
index 7cf08a88cbf3..f4bae9494454 100644
--- a/docs/tutorials/gluon/ndarray.md
+++ b/docs/tutorials/gluon/ndarray.md
@@ -1,3 +1,22 @@
+
+
# NDArray - Scientific computing on CPU and GPU
NDArray is a tensor data structure similar to numpy's multi-dimensional array.
diff --git a/docs/tutorials/gluon/pretrained_models.md b/docs/tutorials/gluon/pretrained_models.md
index 0de5fdd0b44f..796fe05e731e 100644
--- a/docs/tutorials/gluon/pretrained_models.md
+++ b/docs/tutorials/gluon/pretrained_models.md
@@ -1,3 +1,22 @@
+
+
# Using pre-trained models in MXNet
diff --git a/docs/tutorials/gluon/save_load_params.md b/docs/tutorials/gluon/save_load_params.md
index ebc8103e7b45..feaf5eaf8547 100644
--- a/docs/tutorials/gluon/save_load_params.md
+++ b/docs/tutorials/gluon/save_load_params.md
@@ -1,3 +1,22 @@
+
+
# Saving and Loading Gluon Models
Training large models take a lot of time and it is a good idea to save the trained models to files to avoid training them again and again. There are a number of reasons to do this. For example, you might want to do inference on a machine that is different from the one where the model was trained. Sometimes model's performance on validation set decreases towards the end of the training because of overfitting. If you saved your model parameters after every epoch, at the end you can decide to use the model that performs best on the validation set. Another reason would be to train your model using one language (like Python that has a lot of tools for training) and run inference using a different language (like Scala probably because your application is built on Scala).
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index cad9099fcc71..562c66170a01 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/java/index.md b/docs/tutorials/java/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/java/index.md
+++ b/docs/tutorials/java/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/java/mxnet_java_on_intellij.md b/docs/tutorials/java/mxnet_java_on_intellij.md
index ef2c009f66e8..b36ee3cb0ebc 100644
--- a/docs/tutorials/java/mxnet_java_on_intellij.md
+++ b/docs/tutorials/java/mxnet_java_on_intellij.md
@@ -1,3 +1,22 @@
+
+
# Run MXNet Java Examples Using the IntelliJ IDE (macOS)
This tutorial guides you through setting up a simple Java project in IntelliJ IDE on macOS and demonstrates usage of the MXNet Java APIs.
diff --git a/docs/tutorials/java/ssd_inference.md b/docs/tutorials/java/ssd_inference.md
index 3a20329f9a91..b61fe97036ef 100644
--- a/docs/tutorials/java/ssd_inference.md
+++ b/docs/tutorials/java/ssd_inference.md
@@ -1,3 +1,22 @@
+
+
# Multi Object Detection using pre-trained SSD Model via Java Inference APIs
This tutorial shows how to use MXNet Java Inference APIs to run inference on a pre-trained Single Shot Detector (SSD) Model.
diff --git a/docs/tutorials/nlp/cnn.md b/docs/tutorials/nlp/cnn.md
index b3d7d0d38941..128f5caaae26 100644
--- a/docs/tutorials/nlp/cnn.md
+++ b/docs/tutorials/nlp/cnn.md
@@ -1,3 +1,22 @@
+
+
# Text Classification Using a Convolutional Neural Network on MXNet
This tutorial is based of Yoon Kim's [paper](https://arxiv.org/abs/1408.5882) on using convolutional neural networks for sentence sentiment classification. The tutorial has been tested on MXNet 1.0 running under Python 2.7 and Python 3.6.
diff --git a/docs/tutorials/nlp/index.md b/docs/tutorials/nlp/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/nlp/index.md
+++ b/docs/tutorials/nlp/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/onnx/export_mxnet_to_onnx.md b/docs/tutorials/onnx/export_mxnet_to_onnx.md
index 3f925c7b5b84..6190b9242a74 100644
--- a/docs/tutorials/onnx/export_mxnet_to_onnx.md
+++ b/docs/tutorials/onnx/export_mxnet_to_onnx.md
@@ -1,3 +1,22 @@
+
+
# Exporting MXNet model to ONNX format
diff --git a/docs/tutorials/onnx/fine_tuning_gluon.md b/docs/tutorials/onnx/fine_tuning_gluon.md
index 750a6757272f..70c1469afa94 100644
--- a/docs/tutorials/onnx/fine_tuning_gluon.md
+++ b/docs/tutorials/onnx/fine_tuning_gluon.md
@@ -1,3 +1,22 @@
+
+
# Fine-tuning an ONNX model with MXNet/Gluon
diff --git a/docs/tutorials/onnx/index.md b/docs/tutorials/onnx/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/onnx/index.md
+++ b/docs/tutorials/onnx/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/onnx/inference_on_onnx_model.md b/docs/tutorials/onnx/inference_on_onnx_model.md
index b2522ad0c1f1..f500956f72c6 100644
--- a/docs/tutorials/onnx/inference_on_onnx_model.md
+++ b/docs/tutorials/onnx/inference_on_onnx_model.md
@@ -1,3 +1,22 @@
+
+
# Running inference on MXNet/Gluon from an ONNX model
diff --git a/docs/tutorials/onnx/super_resolution.md b/docs/tutorials/onnx/super_resolution.md
index 36c06b743c8e..d9825090e8ee 100644
--- a/docs/tutorials/onnx/super_resolution.md
+++ b/docs/tutorials/onnx/super_resolution.md
@@ -1,3 +1,22 @@
+
+
# Importing an ONNX model into MXNet
In this tutorial we will:
diff --git a/docs/tutorials/python/data_augmentation.md b/docs/tutorials/python/data_augmentation.md
index e4dbbb672997..1034f621ba6f 100644
--- a/docs/tutorials/python/data_augmentation.md
+++ b/docs/tutorials/python/data_augmentation.md
@@ -1,3 +1,22 @@
+
+
# Methods of applying data augmentation (Module API)
Data Augmentation is a regularization technique that's used to avoid overfitting when training Machine Learning models. Although the technique can be applied in a variety of domains, it's very common in Computer Vision. Adjustments are made to the original images in the training dataset before being used in training. Some example adjustments include translating, cropping, scaling, rotating, changing brightness and contrast. We do this to reduce the dependence of the model on spurious characteristics; e.g. training data may only contain faces that fill 1/4 of the image, so the model trained without data augmentation might unhelpfully learn that faces can only be of this size.
diff --git a/docs/tutorials/python/data_augmentation_with_masks.md b/docs/tutorials/python/data_augmentation_with_masks.md
index ac587ac2f5e2..080708caedb1 100644
--- a/docs/tutorials/python/data_augmentation_with_masks.md
+++ b/docs/tutorials/python/data_augmentation_with_masks.md
@@ -1,3 +1,22 @@
+
+
# Data Augmentation with Masks
diff --git a/docs/tutorials/python/index.md b/docs/tutorials/python/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/python/index.md
+++ b/docs/tutorials/python/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/python/kvstore.md b/docs/tutorials/python/kvstore.md
index 3e6bbf12c393..593c240ad17c 100644
--- a/docs/tutorials/python/kvstore.md
+++ b/docs/tutorials/python/kvstore.md
@@ -1,3 +1,22 @@
+
+
# Distributed Key-Value Store
KVStore is a place for data sharing. Think of it as a single object shared
diff --git a/docs/tutorials/python/linear-regression.md b/docs/tutorials/python/linear-regression.md
index fd336ad2aed5..77ca7f9a7ceb 100644
--- a/docs/tutorials/python/linear-regression.md
+++ b/docs/tutorials/python/linear-regression.md
@@ -1,3 +1,22 @@
+
+
# Linear Regression
In this tutorial we'll walk through how one can implement *linear regression* using MXNet APIs.
diff --git a/docs/tutorials/python/matrix_factorization.md b/docs/tutorials/python/matrix_factorization.md
index 154fa4b3e127..62674e7ad646 100644
--- a/docs/tutorials/python/matrix_factorization.md
+++ b/docs/tutorials/python/matrix_factorization.md
@@ -1,3 +1,22 @@
+
+
# Matrix Factorization
In a recommendation system, there is a group of users and a set of items. Given
diff --git a/docs/tutorials/python/mnist.md b/docs/tutorials/python/mnist.md
index df949d487b63..a8eee40e095e 100644
--- a/docs/tutorials/python/mnist.md
+++ b/docs/tutorials/python/mnist.md
@@ -1,3 +1,22 @@
+
+
# Handwritten Digit Recognition
In this tutorial, we'll give you a step by step walk-through of how to build a hand-written digit classifier using the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. For someone new to deep learning, this exercise is arguably the "Hello World" equivalent.
diff --git a/docs/tutorials/python/predict_image.md b/docs/tutorials/python/predict_image.md
index 8be98d991366..e02db165d7f6 100644
--- a/docs/tutorials/python/predict_image.md
+++ b/docs/tutorials/python/predict_image.md
@@ -1,3 +1,22 @@
+
+
# Predict with pre-trained models
This tutorial explains how to recognize objects in an image with a pre-trained model, and how to perform feature extraction.
diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md
index 7dcda10f11b8..1d6c0b2fc7f6 100644
--- a/docs/tutorials/python/profiler.md
+++ b/docs/tutorials/python/profiler.md
@@ -1,3 +1,22 @@
+
+
# Profiling MXNet Models
It is often helpful to understand what operations take how much time while running a model. This helps optimize the model to run faster. In this tutorial, we will learn how to profile MXNet models to measure their running time and memory consumption using the MXNet profiler.
diff --git a/docs/tutorials/python/types_of_data_augmentation.md b/docs/tutorials/python/types_of_data_augmentation.md
index 4308932bf483..344f3cd4d34a 100644
--- a/docs/tutorials/python/types_of_data_augmentation.md
+++ b/docs/tutorials/python/types_of_data_augmentation.md
@@ -1,3 +1,22 @@
+
+
# Types of Data Augmentation
diff --git a/docs/tutorials/r/CallbackFunction.md b/docs/tutorials/r/CallbackFunction.md
index 103352dd2907..f3156cd969e8 100644
--- a/docs/tutorials/r/CallbackFunction.md
+++ b/docs/tutorials/r/CallbackFunction.md
@@ -1,3 +1,22 @@
+
+
Callback Function
======================================
diff --git a/docs/tutorials/r/CustomIterator.md b/docs/tutorials/r/CustomIterator.md
index 1ad634bcd669..62a618a51590 100644
--- a/docs/tutorials/r/CustomIterator.md
+++ b/docs/tutorials/r/CustomIterator.md
@@ -1,3 +1,22 @@
+
+
Custom Iterator Tutorial
======================================
diff --git a/docs/tutorials/r/CustomLossFunction.md b/docs/tutorials/r/CustomLossFunction.md
index afb99518894c..87ac40dafc63 100644
--- a/docs/tutorials/r/CustomLossFunction.md
+++ b/docs/tutorials/r/CustomLossFunction.md
@@ -1,3 +1,22 @@
+
+
Customized loss function
======================================
diff --git a/docs/tutorials/r/MultidimLstm.md b/docs/tutorials/r/MultidimLstm.md
index 8692086d180b..49a07361c9d4 100644
--- a/docs/tutorials/r/MultidimLstm.md
+++ b/docs/tutorials/r/MultidimLstm.md
@@ -1,3 +1,22 @@
+
+
LSTM time series example
=============================================
diff --git a/docs/tutorials/r/charRnnModel.md b/docs/tutorials/r/charRnnModel.md
index cb21e77559b5..f0d1f9e4d1fb 100644
--- a/docs/tutorials/r/charRnnModel.md
+++ b/docs/tutorials/r/charRnnModel.md
@@ -1,3 +1,22 @@
+
+
# Character-level Language Model using RNN
diff --git a/docs/tutorials/r/classifyRealImageWithPretrainedModel.md b/docs/tutorials/r/classifyRealImageWithPretrainedModel.md
index b2f2035426ec..8a7daf87e4f1 100644
--- a/docs/tutorials/r/classifyRealImageWithPretrainedModel.md
+++ b/docs/tutorials/r/classifyRealImageWithPretrainedModel.md
@@ -1,3 +1,22 @@
+
+
Classify Images with a PreTrained Model
=================================================
MXNet is a flexible and efficient deep learning framework. One of the interesting things that a deep learning
diff --git a/docs/tutorials/r/fiveMinutesNeuralNetwork.md b/docs/tutorials/r/fiveMinutesNeuralNetwork.md
index 6d79cd288d2c..56688a65338e 100644
--- a/docs/tutorials/r/fiveMinutesNeuralNetwork.md
+++ b/docs/tutorials/r/fiveMinutesNeuralNetwork.md
@@ -1,3 +1,22 @@
+
+
Develop a Neural Network with MXNet in Five Minutes
=============================================
diff --git a/docs/tutorials/r/index.md b/docs/tutorials/r/index.md
index fbc8911f2a6d..6ab039561886 100644
--- a/docs/tutorials/r/index.md
+++ b/docs/tutorials/r/index.md
@@ -1,3 +1,22 @@
+
+
# R Tutorials
These tutorials introduce a few fundamental concepts in deep learning and how to implement them in R using _MXNet_.
diff --git a/docs/tutorials/r/mnistCompetition.md b/docs/tutorials/r/mnistCompetition.md
index ed3c2827011d..95efbfd6d7af 100644
--- a/docs/tutorials/r/mnistCompetition.md
+++ b/docs/tutorials/r/mnistCompetition.md
@@ -1,3 +1,22 @@
+
+
Handwritten Digits Classification Competition
=============================================
diff --git a/docs/tutorials/r/ndarray.md b/docs/tutorials/r/ndarray.md
index cb7639a8a44d..30f5d887a3b6 100644
--- a/docs/tutorials/r/ndarray.md
+++ b/docs/tutorials/r/ndarray.md
@@ -1,3 +1,22 @@
+
+
# NDArray: Vectorized Tensor Computations on CPUs and GPUs
`NDArray` is the basic vectorized operation unit in MXNet for matrix and tensor computations.
diff --git a/docs/tutorials/r/symbol.md b/docs/tutorials/r/symbol.md
index 4a87643b9f50..0ab560856bb4 100644
--- a/docs/tutorials/r/symbol.md
+++ b/docs/tutorials/r/symbol.md
@@ -1,3 +1,22 @@
+
+
# Symbol and Automatic Differentiation
The computational unit `NDArray` requires a way to construct neural networks. MXNet provides a symbolic interface, named Symbol, to do this. Symbol combines both flexibility and efficiency.
diff --git a/docs/tutorials/scala/char_lstm.md b/docs/tutorials/scala/char_lstm.md
index 4d6a5aee921e..fafd5d5a0bd9 100644
--- a/docs/tutorials/scala/char_lstm.md
+++ b/docs/tutorials/scala/char_lstm.md
@@ -1,3 +1,22 @@
+
+
# Developing a Character-level Language model
This tutorial shows how to train a character-level language model with a multilayer recurrent neural network (RNN) using Scala. This model takes one text file as input and trains an RNN that learns to predict the next character in the sequence. In this tutorial, you train a multilayer LSTM (Long Short-Term Memory) network that generates relevant text using Barack Obama's speech patterns.
diff --git a/docs/tutorials/scala/index.md b/docs/tutorials/scala/index.md
index f14337f90f08..55e41e428d38 100644
--- a/docs/tutorials/scala/index.md
+++ b/docs/tutorials/scala/index.md
@@ -1,3 +1,22 @@
+
+
# MXNet-Scala Tutorials
## Installation & Setup
diff --git a/docs/tutorials/scala/mnist.md b/docs/tutorials/scala/mnist.md
index 79f2129ef0ef..1f05fb464fb0 100644
--- a/docs/tutorials/scala/mnist.md
+++ b/docs/tutorials/scala/mnist.md
@@ -1,3 +1,22 @@
+
+
# Handwritten Digit Recognition
This Scala tutorial guides you through a classic computer vision application: identifying hand written digits.
diff --git a/docs/tutorials/scala/mxnet_scala_on_intellij.md b/docs/tutorials/scala/mxnet_scala_on_intellij.md
index 769a6b4fe506..cae70567de4d 100644
--- a/docs/tutorials/scala/mxnet_scala_on_intellij.md
+++ b/docs/tutorials/scala/mxnet_scala_on_intellij.md
@@ -1,3 +1,22 @@
+
+
# Run MXNet Scala Examples Using the IntelliJ IDE (macOS)
This tutorial guides you through setting up a Scala project in the IntelliJ IDE on macOS, and shows how to use the MXNet package from your application.
diff --git a/docs/tutorials/sparse/csr.md b/docs/tutorials/sparse/csr.md
index 0aede1ab4313..f1718687cfb0 100644
--- a/docs/tutorials/sparse/csr.md
+++ b/docs/tutorials/sparse/csr.md
@@ -1,3 +1,22 @@
+
+
# CSRNDArray - NDArray in Compressed Sparse Row Storage Format
diff --git a/docs/tutorials/sparse/index.md b/docs/tutorials/sparse/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/sparse/index.md
+++ b/docs/tutorials/sparse/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/sparse/row_sparse.md b/docs/tutorials/sparse/row_sparse.md
index 46a5edad075e..2986c4d3ab8f 100644
--- a/docs/tutorials/sparse/row_sparse.md
+++ b/docs/tutorials/sparse/row_sparse.md
@@ -1,3 +1,22 @@
+
+
# RowSparseNDArray - NDArray for Sparse Gradient Updates
diff --git a/docs/tutorials/sparse/train.md b/docs/tutorials/sparse/train.md
index fde4c0e65521..d22fbe9e8f0c 100644
--- a/docs/tutorials/sparse/train.md
+++ b/docs/tutorials/sparse/train.md
@@ -1,3 +1,22 @@
+
+
# Train a Linear Regression Model with Sparse Symbols
In previous tutorials, we introduced `CSRNDArray` and `RowSparseNDArray`,
diff --git a/docs/tutorials/speech_recognition/ctc.md b/docs/tutorials/speech_recognition/ctc.md
index 0b01fb48999c..3f1b8d4ff9f7 100644
--- a/docs/tutorials/speech_recognition/ctc.md
+++ b/docs/tutorials/speech_recognition/ctc.md
@@ -1,3 +1,22 @@
+
+
# Connectionist Temporal Classification
```python
diff --git a/docs/tutorials/speech_recognition/index.md b/docs/tutorials/speech_recognition/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/speech_recognition/index.md
+++ b/docs/tutorials/speech_recognition/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/tensorrt/index.md b/docs/tutorials/tensorrt/index.md
index 9515a5b9fd1e..d3dcaeb9db84 100644
--- a/docs/tutorials/tensorrt/index.md
+++ b/docs/tutorials/tensorrt/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/tensorrt/inference_with_trt.md b/docs/tutorials/tensorrt/inference_with_trt.md
index 489a92ec1014..77f82a7a1039 100644
--- a/docs/tutorials/tensorrt/inference_with_trt.md
+++ b/docs/tutorials/tensorrt/inference_with_trt.md
@@ -1,3 +1,22 @@
+
+
# Optimizing Deep Learning Computation Graphs with TensorRT
NVIDIA's TensorRT is a deep learning library that has been shown to provide large speedups when used for network inference. MXNet 1.3.0 is shipping with experimental integrated support for TensorRT. This means MXNet users can noew make use of this acceleration library to efficiently run their networks. In this blog post we'll see how to install, enable and run TensorRT with MXNet. We'll also give some insight into what is happening behind the scenes in MXNet to enable TensorRT graph execution.
diff --git a/docs/tutorials/unsupervised_learning/gan.md b/docs/tutorials/unsupervised_learning/gan.md
index 0efdc5565519..9c50fcfdf15b 100644
--- a/docs/tutorials/unsupervised_learning/gan.md
+++ b/docs/tutorials/unsupervised_learning/gan.md
@@ -1,3 +1,22 @@
+
+
# Generative Adversarial Network (GAN)
diff --git a/docs/tutorials/unsupervised_learning/index.md b/docs/tutorials/unsupervised_learning/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/unsupervised_learning/index.md
+++ b/docs/tutorials/unsupervised_learning/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/vision/cnn_visualization.md b/docs/tutorials/vision/cnn_visualization.md
index 5ded6f1587e0..f528d35bfd32 100644
--- a/docs/tutorials/vision/cnn_visualization.md
+++ b/docs/tutorials/vision/cnn_visualization.md
@@ -1,3 +1,22 @@
+
+
# Visualizing Decisions of Convolutional Neural Networks
Convolutional Neural Networks have made a lot of progress in Computer Vision. Their accuracy is as good as humans in some tasks. However, it remains difficult to explain the predictions of convolutional neural networks, as they lack the interpretability offered by other models such as decision trees.
diff --git a/docs/tutorials/vision/index.md b/docs/tutorials/vision/index.md
index 87d72894424f..aab7731342f7 100644
--- a/docs/tutorials/vision/index.md
+++ b/docs/tutorials/vision/index.md
@@ -1,3 +1,22 @@
+
+
# Tutorials
```eval_rst
diff --git a/docs/tutorials/vision/large_scale_classification.md b/docs/tutorials/vision/large_scale_classification.md
index aac03e4dd903..5fa87b14821d 100644
--- a/docs/tutorials/vision/large_scale_classification.md
+++ b/docs/tutorials/vision/large_scale_classification.md
@@ -1,3 +1,22 @@
+
+
# Large Scale Image Classification
Training a neural network with a large number of images presents several challenges. Even with the latest GPUs, it is not possible to train large networks using a large number of images in a reasonable amount of time using a single GPU. This problem can be somewhat mitigated by using multiple GPUs in a single machine. But there is a limit to the number of GPUs that can be attached to one machine (typically 8 or 16). This tutorial explains how to train large networks with terabytes of data using multiple machines each containing multiple GPUs.
diff --git a/example/README.md b/example/README.md
index dea7e289e6cd..5594e3ed0f3d 100644
--- a/example/README.md
+++ b/example/README.md
@@ -1,3 +1,22 @@
+
+
# MXNet Examples
This page contains a curated list of awesome MXNet examples, tutorials and blogs. It is inspired by [awesome-php](/~https://github.com/ziadoz/awesome-php) and [awesome-machine-learning](/~https://github.com/josephmisiti/awesome-machine-learning). See also [Awesome-MXNet](/~https://github.com/chinakook/Awesome-MXNet) for a similar list.
diff --git a/example/adversary/README.md b/example/adversary/README.md
index 5d5b44fb91ba..7f5158c03bbf 100644
--- a/example/adversary/README.md
+++ b/example/adversary/README.md
@@ -1,3 +1,22 @@
+
+
# Adversarial examples
This demonstrates the concept of "adversarial examples" from [1] showing how to fool a well-trained CNN.
diff --git a/example/autoencoder/README.md b/example/autoencoder/README.md
index 960636cd7d59..15f4851412af 100644
--- a/example/autoencoder/README.md
+++ b/example/autoencoder/README.md
@@ -1,3 +1,22 @@
+
+
# Example of a Convolutional Autoencoder
Autoencoder architectures are often used for unsupervised feature learning. This [link](http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/) contains an introduction tutorial to autoencoders. This example illustrates a simple autoencoder using a stack of convolutional layers for both the encoder and the decoder.
diff --git a/example/autoencoder/variational_autoencoder/README.md b/example/autoencoder/variational_autoencoder/README.md
index c6e68d54c4f9..07d099848b46 100644
--- a/example/autoencoder/variational_autoencoder/README.md
+++ b/example/autoencoder/variational_autoencoder/README.md
@@ -1,3 +1,22 @@
+
+
Variational Auto Encoder(VAE)
=============================
diff --git a/example/bayesian-methods/README.md b/example/bayesian-methods/README.md
index fc35b94219d7..56e4b1dd92eb 100644
--- a/example/bayesian-methods/README.md
+++ b/example/bayesian-methods/README.md
@@ -1,3 +1,22 @@
+
+
Bayesian Methods
================
diff --git a/example/bi-lstm-sort/README.md b/example/bi-lstm-sort/README.md
index f00cc85caa30..e08db18a9bcf 100644
--- a/example/bi-lstm-sort/README.md
+++ b/example/bi-lstm-sort/README.md
@@ -1,3 +1,22 @@
+
+
# Bidirectionnal LSTM to sort an array.
This is an example of using bidirectionmal lstm to sort an array. Please refer to the notebook.
diff --git a/example/caffe/README.md b/example/caffe/README.md
index 466305cc9b88..6541d4dacc04 100644
--- a/example/caffe/README.md
+++ b/example/caffe/README.md
@@ -1,3 +1,22 @@
+
+
# How to use Caffe operator in MXNet
[Caffe](http://caffe.berkeleyvision.org/) has been a well-known and widely-used deep learning framework. Now MXNet has supported calling most caffe operators(layers) and loss functions directly in its symbolic graph! Using one's own customized caffe layer is also effortless.
diff --git a/example/capsnet/README.md b/example/capsnet/README.md
index 500c7df72515..4c05781a3971 100644
--- a/example/capsnet/README.md
+++ b/example/capsnet/README.md
@@ -1,3 +1,22 @@
+
+
**CapsNet-MXNet**
=========================================
diff --git a/example/captcha/README.md b/example/captcha/README.md
index cc97442f6207..3936178ebd3e 100644
--- a/example/captcha/README.md
+++ b/example/captcha/README.md
@@ -1,3 +1,22 @@
+
+
This is the R version of [captcha recognition](http://blog.xlvector.net/2016-05/mxnet-ocr-cnn/) example by xlvector and it can be used as an example of multi-label training. For a captcha below, we consider it as an image with 4 labels and train a CNN over the data set.
![](captcha_example.png)
diff --git a/example/cnn_chinese_text_classification/README.md b/example/cnn_chinese_text_classification/README.md
index bfb271dd5c45..5bdaecb8c155 100644
--- a/example/cnn_chinese_text_classification/README.md
+++ b/example/cnn_chinese_text_classification/README.md
@@ -1,3 +1,22 @@
+
+
Implementing CNN + Highway Network for Chinese Text Classification in MXNet
============
Sentiment classification forked from [incubator-mxnet/cnn_text_classification/](/~https://github.com/apache/incubator-mxnet/tree/master/example/cnn_text_classification), i've implemented the [Highway Networks](https://arxiv.org/pdf/1505.00387.pdf) architecture.The final train model is CNN + Highway Network structure, and this version can achieve a best dev accuracy of 94.75% with the Chinese corpus.
diff --git a/example/cnn_text_classification/README.md b/example/cnn_text_classification/README.md
index 2f1991f319ac..c04a22cde103 100644
--- a/example/cnn_text_classification/README.md
+++ b/example/cnn_text_classification/README.md
@@ -1,3 +1,22 @@
+
+
Implementing CNN for Text Classification in MXNet
============
It is a slightly simplified implementation of Kim's [Convolutional Neural Networks for Sentence Classification](http://arxiv.org/abs/1408.5882) paper in MXNet.
diff --git a/example/ctc/README.md b/example/ctc/README.md
index a2f54cffaf86..96373db002d8 100644
--- a/example/ctc/README.md
+++ b/example/ctc/README.md
@@ -1,3 +1,22 @@
+
+
# Connectionist Temporal Classification
[Connectionist Temporal Classification](https://www.cs.toronto.edu/~graves/icml_2006.pdf) (CTC) is a cost function that is used to train Recurrent Neural Networks (RNNs) to label unsegmented input sequence data in supervised learning. For example in a speech recognition application, using a typical cross-entropy loss the input signal needs to be segmented into words or sub-words. However, using CTC-loss, a single unaligned label sequence per input sequence is sufficient for the network to learn both the alignment and labeling. Baidu's warp-ctc page contains a more detailed [introduction to CTC-loss](/~https://github.com/baidu-research/warp-ctc#introduction).
diff --git a/example/deep-embedded-clustering/README.md b/example/deep-embedded-clustering/README.md
index 3972f90bda4a..4e626a6bac81 100644
--- a/example/deep-embedded-clustering/README.md
+++ b/example/deep-embedded-clustering/README.md
@@ -1,3 +1,22 @@
+
+
# DEC Implementation
This is based on the paper `Unsupervised deep embedding for clustering analysis` by Junyuan Xie, Ross Girshick, and Ali Farhadi
diff --git a/example/distributed_training/README.md b/example/distributed_training/README.md
index b0b0447725b5..ba09cb91aede 100644
--- a/example/distributed_training/README.md
+++ b/example/distributed_training/README.md
@@ -1,3 +1,22 @@
+
+
# Distributed Training using Gluon
Deep learning models are usually trained using GPUs because GPUs can do a lot more computations in parallel that CPUs. But even with the modern GPUs, it could take several days to train big models. Training can be done faster by using multiple GPUs like described in [this](https://gluon.mxnet.io/chapter07_distributed-learning/multiple-gpus-gluon.html) tutorial. However only a certain number of GPUs can be attached to one host (typically 8 or 16). To make the training even faster, we can use multiple GPUs attached to multiple hosts.
diff --git a/example/dsd/README.md b/example/dsd/README.md
index 0ce5cc5d1f0f..bffacce8cabc 100644
--- a/example/dsd/README.md
+++ b/example/dsd/README.md
@@ -1,3 +1,22 @@
+
+
DSD Training
============
This folder contains an optimizer class that implements DSD training coupled with SGD. The training
diff --git a/example/fcn-xs/README.md b/example/fcn-xs/README.md
index 49c57fc08eaf..98d8edab6f93 100644
--- a/example/fcn-xs/README.md
+++ b/example/fcn-xs/README.md
@@ -1,3 +1,22 @@
+
+
FCN-xs EXAMPLE
--------------
This folder contains an example implementation for Fully Convolutional Networks (FCN) in MXNet.
diff --git a/example/gluon/audio/urban_sounds/README.md b/example/gluon/audio/urban_sounds/README.md
index c85d29db2e5a..18593272f144 100644
--- a/example/gluon/audio/urban_sounds/README.md
+++ b/example/gluon/audio/urban_sounds/README.md
@@ -1,3 +1,22 @@
+
+
# Urban Sounds Classification in MXNet Gluon
This example provides an end-to-end pipeline for a common datahack competition - Urban Sounds Classification Example.
diff --git a/example/gluon/dc_gan/README.md b/example/gluon/dc_gan/README.md
index 5aacd78a3ed5..1d91d91a3452 100644
--- a/example/gluon/dc_gan/README.md
+++ b/example/gluon/dc_gan/README.md
@@ -1,3 +1,22 @@
+
+
# DCGAN in MXNet
[Deep Convolutional Generative Adversarial Networks(DCGAN)](https://arxiv.org/abs/1511.06434) implementation with Apache MXNet GLUON.
diff --git a/example/gluon/embedding_learning/README.md b/example/gluon/embedding_learning/README.md
index e7821619a381..de11ea0db072 100644
--- a/example/gluon/embedding_learning/README.md
+++ b/example/gluon/embedding_learning/README.md
@@ -1,3 +1,22 @@
+
+
# Image Embedding Learning
This example implements embedding learning based on a Margin-based Loss with distance weighted sampling [(Wu et al, 2017)](http://www.philkr.net/papers/2017-10-01-iccv/2017-10-01-iccv.pdf). The model obtains a validation Recall@1 of ~64% on the [Caltech-UCSD Birds-200-2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset.
diff --git a/example/gluon/sn_gan/README.md b/example/gluon/sn_gan/README.md
index 5b2a750e4efb..60327bc959a6 100644
--- a/example/gluon/sn_gan/README.md
+++ b/example/gluon/sn_gan/README.md
@@ -1,3 +1,22 @@
+
+
# Spectral Normalization GAN
This example implements [Spectral Normalization for Generative Adversarial Networks](https://arxiv.org/abs/1802.05957) based on [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset.
diff --git a/example/gluon/style_transfer/README.md b/example/gluon/style_transfer/README.md
index ef273a5975ab..1162e7241053 100644
--- a/example/gluon/style_transfer/README.md
+++ b/example/gluon/style_transfer/README.md
@@ -1,3 +1,22 @@
+
+
# MXNet-Gluon-Style-Transfer
This repo provides MXNet Implementation of **[Neural Style Transfer](#neural-style)** and **[MSG-Net](#real-time-style-transfer)**.
diff --git a/example/gluon/tree_lstm/README.md b/example/gluon/tree_lstm/README.md
index e14ab4c70afc..93d6289f42af 100644
--- a/example/gluon/tree_lstm/README.md
+++ b/example/gluon/tree_lstm/README.md
@@ -1,3 +1,22 @@
+
+
# Tree-Structured Long Short-Term Memory Networks
This is a [MXNet Gluon](https://mxnet.io/) implementation of Tree-LSTM as described in the paper [Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks](http://arxiv.org/abs/1503.00075) by Kai Sheng Tai, Richard Socher, and Christopher Manning.
diff --git a/example/gluon/word_language_model/README.md b/example/gluon/word_language_model/README.md
index 4a77950d01bc..7c9b74ad3379 100644
--- a/example/gluon/word_language_model/README.md
+++ b/example/gluon/word_language_model/README.md
@@ -1,3 +1,22 @@
+
+
# Word-level language modeling RNN
This example trains a multi-layer RNN (Elman, GRU, or LSTM) on WikiText-2 language modeling benchmark.
diff --git a/example/kaggle-ndsb1/README.md b/example/kaggle-ndsb1/README.md
index 82a99f7b6947..54d33b8b1df2 100644
--- a/example/kaggle-ndsb1/README.md
+++ b/example/kaggle-ndsb1/README.md
@@ -1,3 +1,22 @@
+
+
Tutorial for Kaggle NDSB-1
-----
diff --git a/example/kaggle-ndsb2/README.md b/example/kaggle-ndsb2/README.md
index 302e54033c8a..d1e29d4f86bb 100644
--- a/example/kaggle-ndsb2/README.md
+++ b/example/kaggle-ndsb2/README.md
@@ -1,3 +1,22 @@
+
+
# End-to-End Deep Learning Tutorial for Kaggle NDSB-II
In this example, we will demo how to use MXNet to build an end-to-end deep learning system to help Diagnose Heart Disease. The demo network is able to achieve 0.039222 CRPS on validation set, which is good enough to get Top-10 (on Dec 22nd, 2015).
diff --git a/example/model-parallel/matrix_factorization/README.md b/example/model-parallel/matrix_factorization/README.md
index 00507d924f81..ee5384565d45 100644
--- a/example/model-parallel/matrix_factorization/README.md
+++ b/example/model-parallel/matrix_factorization/README.md
@@ -1,3 +1,22 @@
+
+
Model Parallel Matrix Factorization
===================================
diff --git a/example/module/README.md b/example/module/README.md
index 99dd756ead63..8dd3d6d87c7c 100644
--- a/example/module/README.md
+++ b/example/module/README.md
@@ -1,3 +1,22 @@
+
+
# Module Usage Example
This folder contains usage examples for MXNet module.
diff --git a/example/multi-task/README.md b/example/multi-task/README.md
index b7756fe378a7..4ca744189221 100644
--- a/example/multi-task/README.md
+++ b/example/multi-task/README.md
@@ -1,3 +1,22 @@
+
+
# Mulit-task learning example
This is a simple example to show how to use mxnet for multi-task learning. It uses MNIST as an example, trying to predict jointly the digit and whether this digit is odd or even.
diff --git a/example/multivariate_time_series/README.md b/example/multivariate_time_series/README.md
index 87baca36d35f..6c9e1a451f10 100644
--- a/example/multivariate_time_series/README.md
+++ b/example/multivariate_time_series/README.md
@@ -1,3 +1,22 @@
+
+
# LSTNet
- This repo contains an MXNet implementation of [this](https://arxiv.org/pdf/1703.07015.pdf) state of the art time series forecasting model.
diff --git a/example/named_entity_recognition/README.md b/example/named_entity_recognition/README.md
index c914a6985dfe..897ea4221b36 100644
--- a/example/named_entity_recognition/README.md
+++ b/example/named_entity_recognition/README.md
@@ -1,3 +1,22 @@
+
+
## Goal
- This repo contains an MXNet implementation of this state of the art [entity recognition model](https://www.aclweb.org/anthology/Q16-1026).
diff --git a/example/nce-loss/README.md b/example/nce-loss/README.md
index 56e43525a7ca..4a847e420732 100644
--- a/example/nce-loss/README.md
+++ b/example/nce-loss/README.md
@@ -1,3 +1,22 @@
+
+
# Examples of NCE Loss
[Noise-contrastive estimation](http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf) loss (nce-loss) is used to speedup multi-class classification when class num is huge.
diff --git a/example/neural-style/README.md b/example/neural-style/README.md
index 5c4b58924827..0f9545cf65b6 100644
--- a/example/neural-style/README.md
+++ b/example/neural-style/README.md
@@ -1,3 +1,22 @@
+
+
# Neural art
This is an implementation of the paper
diff --git a/example/neural-style/end_to_end/README.md b/example/neural-style/end_to_end/README.md
index 4a228c199bb7..0c38f427c300 100644
--- a/example/neural-style/end_to_end/README.md
+++ b/example/neural-style/end_to_end/README.md
@@ -1,3 +1,22 @@
+
+
# End to End Neural Art
Please refer to this [blog](http://dmlc.ml/mxnet/2016/06/20/end-to-end-neural-style.html) for details of how it is implemented.
diff --git a/example/numpy-ops/README.md b/example/numpy-ops/README.md
index aa4911f67414..a1ed1a5573e4 100644
--- a/example/numpy-ops/README.md
+++ b/example/numpy-ops/README.md
@@ -1,3 +1,22 @@
+
+
# Training with Custom Operators in Python
These examples demonstrate custom operator implementations in python.
diff --git a/example/profiler/README.md b/example/profiler/README.md
index 1b9279ccf227..14668fdbe0e5 100644
--- a/example/profiler/README.md
+++ b/example/profiler/README.md
@@ -1,3 +1,22 @@
+
+
# MXNet Profiler Examples
This folder contains examples of using MXNet profiler to generate profiling results in json files.
diff --git a/example/rcnn/README.md b/example/rcnn/README.md
index 5e6127ccb08d..737ef4418914 100644
--- a/example/rcnn/README.md
+++ b/example/rcnn/README.md
@@ -1,3 +1,22 @@
+
+
# Faster R-CNN in MXNet
Please redirect any issue or question of using this symbolic example of Faster R-CNN to /~https://github.com/ijkguo/mx-rcnn.
diff --git a/example/recommenders/README.md b/example/recommenders/README.md
index 628182c849b8..2534074268e8 100644
--- a/example/recommenders/README.md
+++ b/example/recommenders/README.md
@@ -1,3 +1,22 @@
+
+
# Recommender Systems
diff --git a/example/reinforcement-learning/a3c/README.md b/example/reinforcement-learning/a3c/README.md
index 5eaba66a5b86..1a8b4f3ba7a1 100644
--- a/example/reinforcement-learning/a3c/README.md
+++ b/example/reinforcement-learning/a3c/README.md
@@ -1,3 +1,22 @@
+
+
# A3C Implementation
This is an attempt to implement the A3C algorithm in paper Asynchronous Methods for Deep Reinforcement Learning.
diff --git a/example/reinforcement-learning/ddpg/README.md b/example/reinforcement-learning/ddpg/README.md
index 2e299dd5daa3..c3040434823a 100644
--- a/example/reinforcement-learning/ddpg/README.md
+++ b/example/reinforcement-learning/ddpg/README.md
@@ -1,3 +1,22 @@
+
+
# mx-DDPG
MXNet Implementation of DDPG
diff --git a/example/reinforcement-learning/parallel_actor_critic/README.md b/example/reinforcement-learning/parallel_actor_critic/README.md
index d3288492a611..767c67a7d72b 100644
--- a/example/reinforcement-learning/parallel_actor_critic/README.md
+++ b/example/reinforcement-learning/parallel_actor_critic/README.md
@@ -1,3 +1,22 @@
+
+
# 'Parallel Advantage-Actor Critic' Implementation
This repo contains a MXNet implementation of a variant of the A3C algorithm from [Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1602.01783v2.pdf).
diff --git a/example/restricted-boltzmann-machine/README.md b/example/restricted-boltzmann-machine/README.md
index a8769a51e05a..30ee8cfb9c3d 100644
--- a/example/restricted-boltzmann-machine/README.md
+++ b/example/restricted-boltzmann-machine/README.md
@@ -1,3 +1,22 @@
+
+
# Restricted Boltzmann machine (RBM)
An example of the binary RBM [1] learning the MNIST data. The RBM is implemented as a custom operator, and a gluon block is also provided. `binary_rbm.py` contains the implementation of the RBM. `binary_rbm_module.py` and `binary_rbm_gluon.py` train the MNIST data using the module interface and the gluon interface respectively. The MNIST data is downloaded automatically.
diff --git a/example/rnn/README.md b/example/rnn/README.md
index 1d1df6ed7687..a819dfa1097b 100644
--- a/example/rnn/README.md
+++ b/example/rnn/README.md
@@ -1,3 +1,22 @@
+
+
Recurrent Neural Network Examples
===========
diff --git a/example/rnn/bucketing/README.md b/example/rnn/bucketing/README.md
index 7b7883d79ad1..e06d9f64da27 100644
--- a/example/rnn/bucketing/README.md
+++ b/example/rnn/bucketing/README.md
@@ -1,3 +1,22 @@
+
+
RNN Example
===========
This folder contains RNN examples using high level mxnet.rnn interface.
diff --git a/example/rnn/old/README.md b/example/rnn/old/README.md
index 5d73523dd964..806f50b834de 100644
--- a/example/rnn/old/README.md
+++ b/example/rnn/old/README.md
@@ -1,3 +1,22 @@
+
+
RNN Example
===========
This folder contains RNN examples using low level symbol interface.
diff --git a/example/rnn/word_lm/README.md b/example/rnn/word_lm/README.md
index ab0a8d704b9c..dbb7832e15dc 100644
--- a/example/rnn/word_lm/README.md
+++ b/example/rnn/word_lm/README.md
@@ -1,3 +1,22 @@
+
+
Word Level Language Modeling
===========
This example trains a multi-layer LSTM on Sherlock Holmes language modeling benchmark.
diff --git a/example/sparse/factorization_machine/README.md b/example/sparse/factorization_machine/README.md
index 32b956ed0201..42b93741fedb 100644
--- a/example/sparse/factorization_machine/README.md
+++ b/example/sparse/factorization_machine/README.md
@@ -1,3 +1,22 @@
+
+
Factorization Machine
===========
This example trains a factorization machine model using the criteo dataset.
diff --git a/example/sparse/linear_classification/README.md b/example/sparse/linear_classification/README.md
index 926d9234269d..38fde454f1b2 100644
--- a/example/sparse/linear_classification/README.md
+++ b/example/sparse/linear_classification/README.md
@@ -1,3 +1,22 @@
+
+
Linear Classification Using Sparse Matrix Multiplication
===========
This examples trains a linear model using the sparse feature in MXNet. This is for demonstration purpose only.
diff --git a/example/sparse/matrix_factorization/README.md b/example/sparse/matrix_factorization/README.md
index ddbf662c858f..d11219771012 100644
--- a/example/sparse/matrix_factorization/README.md
+++ b/example/sparse/matrix_factorization/README.md
@@ -1,3 +1,22 @@
+
+
Matrix Factorization w/ Sparse Embedding
===========
The example demonstrates the basic usage of the sparse.Embedding operator in MXNet, adapted based on @leopd's recommender examples.
diff --git a/example/sparse/wide_deep/README.md b/example/sparse/wide_deep/README.md
index 3df5e420ee36..80924274e032 100644
--- a/example/sparse/wide_deep/README.md
+++ b/example/sparse/wide_deep/README.md
@@ -1,3 +1,22 @@
+
+
## Wide and Deep Learning
The example demonstrates how to train [wide and deep model](https://arxiv.org/abs/1606.07792). The [Census Income Data Set](https://archive.ics.uci.edu/ml/datasets/Census+Income) that this example uses for training is hosted by the [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/). Tricks of feature engineering are adapted from tensorflow's [wide and deep tutorial](/~https://github.com/tensorflow/models/tree/master/official/wide_deep).
diff --git a/example/speech_recognition/README.md b/example/speech_recognition/README.md
index 6f01911e1300..3dad016a48d8 100644
--- a/example/speech_recognition/README.md
+++ b/example/speech_recognition/README.md
@@ -1,3 +1,22 @@
+
+
**deepSpeech.mxnet: Rich Speech Example**
=========================================
diff --git a/example/ssd/README.md b/example/ssd/README.md
index 713a9ea33c1b..753c967aeaf1 100644
--- a/example/ssd/README.md
+++ b/example/ssd/README.md
@@ -1,3 +1,22 @@
+
+
# SSD: Single Shot MultiBox Object Detector
SSD is an unified framework for object detection with a single network.
diff --git a/example/ssd/dataset/pycocotools/README.md b/example/ssd/dataset/pycocotools/README.md
old mode 100755
new mode 100644
index d358f53105da..0e10d4080aac
--- a/example/ssd/dataset/pycocotools/README.md
+++ b/example/ssd/dataset/pycocotools/README.md
@@ -1,2 +1,21 @@
+
+
This is a modified version of /~https://github.com/pdollar/coco python API.
No `make` is required, but this will not support mask functions.
diff --git a/example/ssd/model/README.md b/example/ssd/model/README.md
index e5bac52f5a83..0ee70c3d1fff 100644
--- a/example/ssd/model/README.md
+++ b/example/ssd/model/README.md
@@ -1 +1,20 @@
+
+
#### This is the default directory to store all the models, including `*.params` and `*.json`
diff --git a/example/ssd/symbol/README.md b/example/ssd/symbol/README.md
index 8fee31985a0d..67f64f584f8b 100644
--- a/example/ssd/symbol/README.md
+++ b/example/ssd/symbol/README.md
@@ -1,3 +1,22 @@
+
+
## How to compose SSD network on top of mainstream classification networks
1. Have the base network ready in this directory as `name.py`, such as `inceptionv3.py`.
diff --git a/example/ssd/tools/caffe_converter/README.md b/example/ssd/tools/caffe_converter/README.md
index 2e74fc56e022..9eb59812e207 100644
--- a/example/ssd/tools/caffe_converter/README.md
+++ b/example/ssd/tools/caffe_converter/README.md
@@ -1,3 +1,22 @@
+
+
# Convert Caffe Model to Mxnet Format
This folder contains the source codes for this tool.
diff --git a/example/stochastic-depth/README.md b/example/stochastic-depth/README.md
index 08c466eb8b0e..a5dfcd9c8e42 100644
--- a/example/stochastic-depth/README.md
+++ b/example/stochastic-depth/README.md
@@ -1,3 +1,22 @@
+
+
Stochastic Depth
================
diff --git a/example/svm_mnist/README.md b/example/svm_mnist/README.md
index 408f5108b44a..ffcf5c38bbd7 100644
--- a/example/svm_mnist/README.md
+++ b/example/svm_mnist/README.md
@@ -1,3 +1,22 @@
+
+
# Use case with Support Vector Machine
To ensure that not only the implementation is learning, but is able to outsmart the softmax, as [this article](https://arxiv.org/pdf/1306.0239.pdf) suggests, I ran svm_mnist.py script. It was based on the MNIST experiment description on the article and [this tutorial](/~https://github.com/dmlc/mxnet-gtc-tutorial/blob/master/tutorial.ipynb).
diff --git a/example/svrg_module/README.md b/example/svrg_module/README.md
index 250995a57152..624b50d60f12 100644
--- a/example/svrg_module/README.md
+++ b/example/svrg_module/README.md
@@ -1,3 +1,22 @@
+
+
## SVRGModule Example
SVRGModule is an extension to the Module API that implements SVRG optimization, which stands for Stochastic
diff --git a/example/vae-gan/README.md b/example/vae-gan/README.md
index 469668b9b374..5d7ae272b32b 100644
--- a/example/vae-gan/README.md
+++ b/example/vae-gan/README.md
@@ -1,3 +1,22 @@
+
+
# VAE-GAN in MXNet
* Implementation of [Autoencoding beyond pixels using a learned similarity metric](https://arxiv.org/abs/1512.09300),
diff --git a/julia/LICENSE.md b/julia/LICENSE.md
index 5ecf95ac60bc..4161166b7280 100644
--- a/julia/LICENSE.md
+++ b/julia/LICENSE.md
@@ -1,3 +1,22 @@
+
+
The MXNet.jl package is licensed under version 2.0 of the Apache License:
> Copyright (c) 2015-2018:
diff --git a/julia/NEWS.md b/julia/NEWS.md
index 3da119496fac..1c8cda7176c6 100644
--- a/julia/NEWS.md
+++ b/julia/NEWS.md
@@ -1,3 +1,22 @@
+
+
# v1.5.0 (#TBD)
* Following material from `mx` module got exported (#TBD):
diff --git a/julia/README-DEV.md b/julia/README-DEV.md
index a1d6fa9012fc..9edb8309c0c8 100644
--- a/julia/README-DEV.md
+++ b/julia/README-DEV.md
@@ -1,3 +1,22 @@
+
+
# Workflow for making a release
1. Update `NEWS.md` to list important changes
diff --git a/julia/README.md b/julia/README.md
index 91a3981464be..97a52ae9a4e6 100644
--- a/julia/README.md
+++ b/julia/README.md
@@ -1,3 +1,22 @@
+
+
# MXNet
[![MXNet](http://pkg.julialang.org/badges/MXNet_0.6.svg)](http://pkg.julialang.org/?pkg=MXNet)
diff --git a/julia/docs/src/api.md b/julia/docs/src/api.md
index 4984129863d0..92c04581c7e8 100644
--- a/julia/docs/src/api.md
+++ b/julia/docs/src/api.md
@@ -1,3 +1,22 @@
+
+
# API Documentation
```@contents
diff --git a/julia/docs/src/api/callback.md b/julia/docs/src/api/callback.md
index f67811cc41fe..327014d7dbfc 100644
--- a/julia/docs/src/api/callback.md
+++ b/julia/docs/src/api/callback.md
@@ -1,3 +1,22 @@
+
+
# Callback in training
```@autodocs
diff --git a/julia/docs/src/api/context.md b/julia/docs/src/api/context.md
index 93ccf83e51ba..e82504663b06 100644
--- a/julia/docs/src/api/context.md
+++ b/julia/docs/src/api/context.md
@@ -1,3 +1,22 @@
+
+
# Context
```@autodocs
diff --git a/julia/docs/src/api/executor.md b/julia/docs/src/api/executor.md
index b560c7a0864d..81ce7fa17ef7 100644
--- a/julia/docs/src/api/executor.md
+++ b/julia/docs/src/api/executor.md
@@ -1,3 +1,22 @@
+
+
# Executor
```@autodocs
diff --git a/julia/docs/src/api/initializer.md b/julia/docs/src/api/initializer.md
index d0aad2def4cd..8f55b6e1da51 100644
--- a/julia/docs/src/api/initializer.md
+++ b/julia/docs/src/api/initializer.md
@@ -1,3 +1,22 @@
+
+
# Initializer
```@autodocs
diff --git a/julia/docs/src/api/io.md b/julia/docs/src/api/io.md
index 7312259dbf3c..bd53d2ca667e 100644
--- a/julia/docs/src/api/io.md
+++ b/julia/docs/src/api/io.md
@@ -1,3 +1,22 @@
+
+
# Data Providers
Data providers are wrappers that load external data, be it images, text, or general tensors,
diff --git a/julia/docs/src/api/kvstore.md b/julia/docs/src/api/kvstore.md
index 34a5027f85fb..a8407b54bd34 100644
--- a/julia/docs/src/api/kvstore.md
+++ b/julia/docs/src/api/kvstore.md
@@ -1,3 +1,22 @@
+
+
# Key-Value Store
```@autodocs
diff --git a/julia/docs/src/api/metric.md b/julia/docs/src/api/metric.md
index 63cca0cc41ba..a31bc92e49e6 100644
--- a/julia/docs/src/api/metric.md
+++ b/julia/docs/src/api/metric.md
@@ -1,3 +1,22 @@
+
+
# Evaluation Metrics
Evaluation metrics provide a way to evaluate the performance of a learned model.
diff --git a/julia/docs/src/api/model.md b/julia/docs/src/api/model.md
index f793c7c406c7..8240aeb022bb 100644
--- a/julia/docs/src/api/model.md
+++ b/julia/docs/src/api/model.md
@@ -1,3 +1,22 @@
+
+
# Model
The model API provides convenient high-level interface to do training and predicting on
diff --git a/julia/docs/src/api/ndarray.md b/julia/docs/src/api/ndarray.md
index 5877d8257758..41161d547881 100644
--- a/julia/docs/src/api/ndarray.md
+++ b/julia/docs/src/api/ndarray.md
@@ -1,3 +1,22 @@
+
+
# NDArray API
## Arithmetic Operations
diff --git a/julia/docs/src/api/nn-factory.md b/julia/docs/src/api/nn-factory.md
index 833d9a3efd53..d8106eff158f 100644
--- a/julia/docs/src/api/nn-factory.md
+++ b/julia/docs/src/api/nn-factory.md
@@ -1,3 +1,22 @@
+
+
# Neural Network Factory
Neural network factory provide convenient helper functions to define
diff --git a/julia/docs/src/api/symbolic-node.md b/julia/docs/src/api/symbolic-node.md
index ef731d9f7d00..c23b67602627 100644
--- a/julia/docs/src/api/symbolic-node.md
+++ b/julia/docs/src/api/symbolic-node.md
@@ -1,3 +1,22 @@
+
+
# Symbolic API
```@autodocs
diff --git a/julia/docs/src/api/visualize.md b/julia/docs/src/api/visualize.md
index 429a927012e4..843922420a30 100644
--- a/julia/docs/src/api/visualize.md
+++ b/julia/docs/src/api/visualize.md
@@ -1,3 +1,22 @@
+
+
# Network Visualization
```@autodocs
diff --git a/julia/docs/src/index.md b/julia/docs/src/index.md
index b6a51fc162ad..8274c712e549 100644
--- a/julia/docs/src/index.md
+++ b/julia/docs/src/index.md
@@ -1,3 +1,22 @@
+
+
# MXNet Documentation
[MXNet.jl](/~https://github.com/dmlc/MXNet.jl) is the
diff --git a/julia/docs/src/tutorial/char-lstm.md b/julia/docs/src/tutorial/char-lstm.md
index 369bcddd53e9..53a371e028fe 100644
--- a/julia/docs/src/tutorial/char-lstm.md
+++ b/julia/docs/src/tutorial/char-lstm.md
@@ -1,3 +1,22 @@
+
+
Generating Random Sentence with LSTM RNN
========================================
diff --git a/julia/docs/src/tutorial/mnist.md b/julia/docs/src/tutorial/mnist.md
index 916e46deb853..96423266db1b 100644
--- a/julia/docs/src/tutorial/mnist.md
+++ b/julia/docs/src/tutorial/mnist.md
@@ -1,3 +1,22 @@
+
+
Digit Recognition on MNIST
==========================
diff --git a/julia/docs/src/user-guide/faq.md b/julia/docs/src/user-guide/faq.md
index 8fd8a6b34551..d288ad0b3b8e 100644
--- a/julia/docs/src/user-guide/faq.md
+++ b/julia/docs/src/user-guide/faq.md
@@ -1,3 +1,22 @@
+
+
FAQ
===
diff --git a/julia/docs/src/user-guide/install.md b/julia/docs/src/user-guide/install.md
index f1d5eeefacfe..52628de7a255 100644
--- a/julia/docs/src/user-guide/install.md
+++ b/julia/docs/src/user-guide/install.md
@@ -1,3 +1,22 @@
+
+
Installation Guide
==================
diff --git a/julia/docs/src/user-guide/overview.md b/julia/docs/src/user-guide/overview.md
index 5815bc6d772c..f0189f7bcd2e 100644
--- a/julia/docs/src/user-guide/overview.md
+++ b/julia/docs/src/user-guide/overview.md
@@ -1,3 +1,22 @@
+
+
# Overview
## MXNet.jl Namespace
diff --git a/julia/examples/char-lstm/README.md b/julia/examples/char-lstm/README.md
index ff16ee0a3ae9..ac77e15b131f 100644
--- a/julia/examples/char-lstm/README.md
+++ b/julia/examples/char-lstm/README.md
@@ -1,3 +1,22 @@
+
+
# LSTM char-rnn
Because we explicitly unroll the LSTM/RNN over time for a fixed sequence length,
diff --git a/julia/plugins/README.md b/julia/plugins/README.md
index 38882889f494..c5ca926ca0ac 100644
--- a/julia/plugins/README.md
+++ b/julia/plugins/README.md
@@ -1,3 +1,22 @@
+
+
# Plugins of MXNet.jl
This directory contains *plugins* of MXNet.jl. A plugin is typically a component that could be part of MXNet.jl, but excluded from the `mx` namespace. The plugins are included here primarily for two reasons:
diff --git a/matlab/README.md b/matlab/README.md
index 939b7011a4f2..13a83922d915 100644
--- a/matlab/README.md
+++ b/matlab/README.md
@@ -1,3 +1,22 @@
+
+
# MATLAB binding for MXNet
### How to use
diff --git a/perl-package/AI-MXNet/examples/gluon/style_transfer/README.md b/perl-package/AI-MXNet/examples/gluon/style_transfer/README.md
index 658a77530a92..3da1f97b517c 100644
--- a/perl-package/AI-MXNet/examples/gluon/style_transfer/README.md
+++ b/perl-package/AI-MXNet/examples/gluon/style_transfer/README.md
@@ -1,3 +1,22 @@
+
+
This directory provides AI::MXNet Implementation of MSG-Net real time style transfer, https://arxiv.org/abs/1703.06953
### Stylize Images Using Pre-trained MSG-Net
diff --git a/perl-package/AI-MXNet/examples/sparse/matrix_factorization/README.md b/perl-package/AI-MXNet/examples/sparse/matrix_factorization/README.md
index debad272206f..e4c89db760f4 100644
--- a/perl-package/AI-MXNet/examples/sparse/matrix_factorization/README.md
+++ b/perl-package/AI-MXNet/examples/sparse/matrix_factorization/README.md
@@ -1,3 +1,22 @@
+
+
Matrix Factorization w/ Sparse Embedding
===========
The example demonstrates the basic usage of the SparseEmbedding operator in MXNet, adapted based on @leopd's recommender examples.
diff --git a/perl-package/AI-MXNet/examples/sparse/wide_deep/README.md b/perl-package/AI-MXNet/examples/sparse/wide_deep/README.md
index 9a481d69edf2..fc3192623de4 100644
--- a/perl-package/AI-MXNet/examples/sparse/wide_deep/README.md
+++ b/perl-package/AI-MXNet/examples/sparse/wide_deep/README.md
@@ -1,3 +1,22 @@
+
+
## Wide and Deep Learning
The example demonstrates how to train [wide and deep model](https://arxiv.org/abs/1606.07792). The [Census Income Data Set](https://archive.ics.uci.edu/ml/datasets/Census+Income) that this example uses for training is hosted by the [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/). Tricks of feature engineering are adapted from tensorflow's [wide and deep tutorial](/~https://github.com/tensorflow/models/tree/master/official/wide_deep).
diff --git a/perl-package/README.md b/perl-package/README.md
index 93e34d1af37a..20ce7e635e6f 100644
--- a/perl-package/README.md
+++ b/perl-package/README.md
@@ -1,3 +1,22 @@
+
+
[Perl API](https://mxnet.incubator.apache.org/api/perl/index.html)
[![GitHub license](http://dmlc.github.io/img/apache2.svg)](../LICENSE)
diff --git a/plugin/caffe/README.md b/plugin/caffe/README.md
index 466305cc9b88..6541d4dacc04 100644
--- a/plugin/caffe/README.md
+++ b/plugin/caffe/README.md
@@ -1,3 +1,22 @@
+
+
# How to use Caffe operator in MXNet
[Caffe](http://caffe.berkeleyvision.org/) has been a well-known and widely-used deep learning framework. Now MXNet has supported calling most caffe operators(layers) and loss functions directly in its symbolic graph! Using one's own customized caffe layer is also effortless.
diff --git a/python/README.md b/python/README.md
index 1ab7aa4464a3..4e180360f674 100644
--- a/python/README.md
+++ b/python/README.md
@@ -1,3 +1,22 @@
+
+
MXNet Python Package
====================
This directory and nested files contain MXNet Python package and language binding.
diff --git a/python/minpy/README.md b/python/minpy/README.md
deleted file mode 100644
index 4f028e3b21ad..000000000000
--- a/python/minpy/README.md
+++ /dev/null
@@ -1,4 +0,0 @@
-MXNet Python Package
-====================
-
-This is the WIP directory for MinPy project.
diff --git a/scala-package/README.md b/scala-package/README.md
index be0fc41a5fe4..7dd5f5ea0680 100644
--- a/scala-package/README.md
+++ b/scala-package/README.md
@@ -1,3 +1,22 @@
+
+
MXNet Package for Scala/Java
=====
diff --git a/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/objectdetector/README.md b/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/objectdetector/README.md
index 55741024d08b..b6c92c1204fa 100644
--- a/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/objectdetector/README.md
+++ b/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/objectdetector/README.md
@@ -1,3 +1,22 @@
+
+
# Single Shot Multi Object Detection using Java Inference API
In this example, you will learn how to use Java Inference API to run Inference on pre-trained Single Shot Multi Object Detection (SSD) MXNet model.
diff --git a/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/predictor/README.md b/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/predictor/README.md
index 1f2c9e0e813c..cfad6a4e9a6e 100644
--- a/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/predictor/README.md
+++ b/scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/predictor/README.md
@@ -1,3 +1,22 @@
+
+
# Image Classification using Java Predictor
In this example, you will learn how to use Java Inference API to
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/benchmark/README.md b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/benchmark/README.md
index 753cb3125410..efeab0a188cf 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/benchmark/README.md
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/benchmark/README.md
@@ -1,3 +1,22 @@
+
+
# Benchmarking Scala Inference APIs
This folder contains a base class [ScalaInferenceBenchmark](/~https://github.com/apache/incubator-mxnet/tree/master/scala-package/examples/src/main/scala/org/apache/mxnetexamples/benchmark/) and provides a mechanism for benchmarking [MXNet Inference APIs]((/~https://github.com/apache/incubator-mxnet/tree/master/scala-package/infer)) in Scala.
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/cnntextclassification/README.md b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/cnntextclassification/README.md
index 5e3602e8ab15..ae2b68002e3c 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/cnntextclassification/README.md
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/cnntextclassification/README.md
@@ -1,3 +1,22 @@
+
+
# CNN Text Classification Example for Scala
This is the example using Scala type-safe api doing CNN text classification.
This example is only for Illustration and not modeled to achieve the best accuracy.
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/customop/README.md b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/customop/README.md
index 886fa2cc9d46..bf2429399e94 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/customop/README.md
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/customop/README.md
@@ -1,3 +1,22 @@
+
+
# Custom Operator Example for Scala
This is the example using Custom Operator for type-safe api of Scala.
In the example, a `Softmax` operator is implemented to run the MNIST example.
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/gan/README.md b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/gan/README.md
index 40db092727c4..fd477c56f54e 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/gan/README.md
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/gan/README.md
@@ -1,3 +1,22 @@
+
+
# GAN MNIST Example for Scala
This is the GAN MNIST Training Example implemented for Scala type-safe api
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/imclassification/README.md b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/imclassification/README.md
index cec750acdc92..55e065e1f493 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/imclassification/README.md
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/imclassification/README.md
@@ -1,3 +1,22 @@
+
+
# Image Classification Models
This examples contains a number of image classification models that can be run on various datasets.
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/infer/imageclassifier/README.md b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/infer/imageclassifier/README.md
index 541e0ce8dd31..5e8a51789300 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/infer/imageclassifier/README.md
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/infer/imageclassifier/README.md
@@ -1,3 +1,22 @@
+
+
# Image Classification
This folder contains an example for image classification with the [MXNet Scala Infer API](/~https://github.com/apache/incubator-mxnet/tree/master/scala-package/infer).
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/infer/objectdetector/README.md b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/infer/objectdetector/README.md
index 77aec7bb5dee..e3190b2fbcbf 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/infer/objectdetector/README.md
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/infer/objectdetector/README.md
@@ -1,3 +1,22 @@
+
+
# Single Shot Multi Object Detection using Scala Inference API
In this example, you will learn how to use Scala Inference API to run Inference on pre-trained Single Shot Multi Object Detection (SSD) MXNet model.
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/neuralstyle/README.md b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/neuralstyle/README.md
index fe849343c9d7..2d39a9c67733 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/neuralstyle/README.md
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/neuralstyle/README.md
@@ -1,3 +1,22 @@
+
+
# Neural Style Example for Scala
## Introduction
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/README.md b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/README.md
index 5289fc7b1b4e..dea2c12667e9 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/README.md
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/README.md
@@ -1,3 +1,22 @@
+
+
# RNN Example for MXNet Scala
This folder contains the following examples writing in new Scala type-safe API:
- [x] LSTM Bucketing
diff --git a/scala-package/memory-management.md b/scala-package/memory-management.md
index 33c36b6e6ab0..e9a0c71ded02 100644
--- a/scala-package/memory-management.md
+++ b/scala-package/memory-management.md
@@ -1,3 +1,22 @@
+
+
# JVM Memory Management
The Scala and Java bindings of Apache MXNet use native memory (memory from the C++ heap in either RAM or GPU memory) for most of the MXNet objects such as NDArray, Symbol, Executor, KVStore, Data Iterators, etc.
The associated Scala classes act only as wrappers. The operations done on these wrapper objects are then directed to the high performance MXNet C++ backend via the Java Native Interface (JNI). Therefore, the bytes are stored in the C++ native heap which allows for fast access.
diff --git a/scala-package/mxnet-demo/java-demo/README.md b/scala-package/mxnet-demo/java-demo/README.md
index 5dfbf14e8df2..3d5041c228f6 100644
--- a/scala-package/mxnet-demo/java-demo/README.md
+++ b/scala-package/mxnet-demo/java-demo/README.md
@@ -1,3 +1,22 @@
+
+
# MXNet Java Sample Project
This is an project created to use Maven-published Scala/Java package with two Java examples.
## Setup
diff --git a/scala-package/mxnet-demo/scala-demo/README.md b/scala-package/mxnet-demo/scala-demo/README.md
index b994a196b1f6..8a5abf56894b 100644
--- a/scala-package/mxnet-demo/scala-demo/README.md
+++ b/scala-package/mxnet-demo/scala-demo/README.md
@@ -1,3 +1,22 @@
+
+
# MXNet Scala Sample Project
This is an project created to use Maven-published Scala package with two Scala examples.
## Setup
diff --git a/scala-package/native/README.md b/scala-package/native/README.md
index c87b064fff02..17e913f1a4eb 100644
--- a/scala-package/native/README.md
+++ b/scala-package/native/README.md
@@ -1,3 +1,22 @@
+
+
# MXNet Scala JNI
MXNet Scala JNI is a thin wrapper layer of underlying libmxnet.so.
diff --git a/scala-package/packageTest/README.md b/scala-package/packageTest/README.md
index e9980f353759..f14cdc09c180 100644
--- a/scala-package/packageTest/README.md
+++ b/scala-package/packageTest/README.md
@@ -1,3 +1,22 @@
+
+
# MXNet Scala Package Test
This is an project created to run the test suite on a fully packaged mxnet jar. The test suite is found locally but mxnet is from the target jarfile.
diff --git a/scala-package/spark/README.md b/scala-package/spark/README.md
index 503c279038a5..79f637a3dfbf 100644
--- a/scala-package/spark/README.md
+++ b/scala-package/spark/README.md
@@ -1,3 +1,22 @@
+
+
Deep Learning on Spark
=====
diff --git a/tests/README.md b/tests/README.md
index e528edf2a9da..be09cadff3ca 100644
--- a/tests/README.md
+++ b/tests/README.md
@@ -1,3 +1,22 @@
+
+
# Testing MXNET
## Running CPP Tests
diff --git a/tests/nightly/README.md b/tests/nightly/README.md
old mode 100755
new mode 100644
index fa1771a7eeb0..5cf41eb2c3ef
--- a/tests/nightly/README.md
+++ b/tests/nightly/README.md
@@ -1,3 +1,22 @@
+
+
# Nightly Tests for MXNet
These are some longer running tests that are scheduled to run every night.
diff --git a/tests/nightly/apache_rat_license_check/README.md b/tests/nightly/apache_rat_license_check/README.md
old mode 100755
new mode 100644
index e8578a857224..0d6d37dc7da8
--- a/tests/nightly/apache_rat_license_check/README.md
+++ b/tests/nightly/apache_rat_license_check/README.md
@@ -1,3 +1,22 @@
+
+
# Apache RAT License Check
This is a nightly test that runs the Apache Tool RAT to check the License Headers on all source files
diff --git a/tests/nightly/apache_rat_license_check/rat-excludes b/tests/nightly/apache_rat_license_check/rat-excludes
index 6e7ae8c1bfdc..b3f64c246e6d 100755
--- a/tests/nightly/apache_rat_license_check/rat-excludes
+++ b/tests/nightly/apache_rat_license_check/rat-excludes
@@ -5,18 +5,15 @@
.*html
.*json
.*txt
-.*md
3rdparty/*
R-package/*
trunk/*
-docker/*
.*\\.m
.*\\.mk
.*\\.R
.*svg
.*cfg
.*config
-docs/*
__init__.py
build/*
.*\\.t
@@ -38,7 +35,6 @@ erfinv-inl.h
im2col.cuh
im2col.h
pool.h
-README.rst
dataset.cPickle
image-classification/*
rat-excludes
@@ -51,3 +47,5 @@ Project.toml
include/*
.*.iml
.*.json.ref
+searchtools_custom.js
+theme.conf
diff --git a/tests/nightly/broken_link_checker_test/README.md b/tests/nightly/broken_link_checker_test/README.md
old mode 100755
new mode 100644
index c39abd0d6175..aaad68601798
--- a/tests/nightly/broken_link_checker_test/README.md
+++ b/tests/nightly/broken_link_checker_test/README.md
@@ -1,3 +1,22 @@
+
+
# Broken link checker test
This folder contains the scripts that are required to run the nightly job of checking the broken links.
diff --git a/tests/nightly/model_backwards_compatibility_check/README.md b/tests/nightly/model_backwards_compatibility_check/README.md
index 7a2116ac564e..af17396f0e0f 100644
--- a/tests/nightly/model_backwards_compatibility_check/README.md
+++ b/tests/nightly/model_backwards_compatibility_check/README.md
@@ -1,3 +1,22 @@
+
+
# Model Backwards Compatibility Tests
This folder contains the scripts that are required to run the nightly job of verifying the compatibility and inference results of models (trained on earlier versions of MXNet) when loaded on the latest release candidate. The tests flag if:
diff --git a/tests/nightly/straight_dope/README.md b/tests/nightly/straight_dope/README.md
old mode 100755
new mode 100644
index 65a615b58d7e..869d80afcdfa
--- a/tests/nightly/straight_dope/README.md
+++ b/tests/nightly/straight_dope/README.md
@@ -1,3 +1,22 @@
+
+
# Nightly Tests for MXNet: The Straight Dope
These are some longer running tests that are scheduled to run every night.
diff --git a/tests/python/README.md b/tests/python/README.md
index 02dcb6ea6818..fd2282d67b9d 100644
--- a/tests/python/README.md
+++ b/tests/python/README.md
@@ -1,3 +1,22 @@
+
+
Python Test Case
================
This folder contains test cases for mxnet in python.
diff --git a/tools/accnn/README.md b/tools/accnn/README.md
index 02f10d111e2d..ca6d735bba39 100644
--- a/tools/accnn/README.md
+++ b/tools/accnn/README.md
@@ -1,3 +1,22 @@
+
+
# Accelerate Convolutional Neural Networks
This tool aims to accelerate the test-time computation and decrease number of parameters of deep CNNs.
diff --git a/tools/bandwidth/README.md b/tools/bandwidth/README.md
index f087af7fd147..f82e3218a19c 100644
--- a/tools/bandwidth/README.md
+++ b/tools/bandwidth/README.md
@@ -1,3 +1,22 @@
+
+
# Measure communication bandwidth
MXNet provides multiple ways to communicate data. The best choice depends on
diff --git a/tools/caffe_converter/README.md b/tools/caffe_converter/README.md
index d8ffc5cb83e5..b97b6e42ee5c 100644
--- a/tools/caffe_converter/README.md
+++ b/tools/caffe_converter/README.md
@@ -1,3 +1,22 @@
+
+
# Convert Caffe Model to Mxnet Format
This folder contains the source codes for this tool.
diff --git a/tools/caffe_translator/README.md b/tools/caffe_translator/README.md
index ad111617b7ed..5d80caea288e 100644
--- a/tools/caffe_translator/README.md
+++ b/tools/caffe_translator/README.md
@@ -1,3 +1,22 @@
+
+
# Caffe Translator
Caffe Translator is a migration tool that helps developers migrate their existing Caffe code to MXNet and continue further development using MXNet. Note that this is different from the Caffe to MXNet model converter which is available [here](/~https://github.com/apache/incubator-mxnet/tree/master/tools/caffe_converter).
diff --git a/tools/caffe_translator/build_from_source.md b/tools/caffe_translator/build_from_source.md
index 09af64e41460..f51d2fcaf3f0 100644
--- a/tools/caffe_translator/build_from_source.md
+++ b/tools/caffe_translator/build_from_source.md
@@ -1,3 +1,22 @@
+
+
### Build Caffe Translator from source
#### Prerequisites:
diff --git a/tools/caffe_translator/faq.md b/tools/caffe_translator/faq.md
index 99d19fef500b..186c0f623a14 100644
--- a/tools/caffe_translator/faq.md
+++ b/tools/caffe_translator/faq.md
@@ -1,3 +1,22 @@
+
+
### Frequently asked questions
[**Why is Caffe required to run the translated code?**](#why_caffe)
diff --git a/tools/cfn/Readme.md b/tools/cfn/Readme.md
index 677a1826fbb7..ecbdf836c9ef 100644
--- a/tools/cfn/Readme.md
+++ b/tools/cfn/Readme.md
@@ -1,2 +1,21 @@
+
+
**Distributed Deep Learning Made Easy has found more love and new home, please visit
[awslabs/deeplearning-cfn](/~https://github.com/awslabs/deeplearning-cfn)**
\ No newline at end of file
diff --git a/tools/coreml/README.md b/tools/coreml/README.md
index 45f19b608bdb..87f0a953dc71 100644
--- a/tools/coreml/README.md
+++ b/tools/coreml/README.md
@@ -1,3 +1,22 @@
+
+
# Convert MXNet models into Apple CoreML format.
This tool helps convert MXNet models into [Apple CoreML](https://developer.apple.com/documentation/coreml) format which can then be run on Apple devices.
diff --git a/tools/coreml/pip_package/README.rst b/tools/coreml/pip_package/README.rst
index 875d89fcd208..c2e66f708e85 100644
--- a/tools/coreml/pip_package/README.rst
+++ b/tools/coreml/pip_package/README.rst
@@ -1,3 +1,20 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
MXNET -> CoreML Converter
=========================
diff --git a/tools/dependencies/README.md b/tools/dependencies/README.md
index c30e85d519c3..c8a06d868b44 100644
--- a/tools/dependencies/README.md
+++ b/tools/dependencies/README.md
@@ -1,3 +1,22 @@
+
+
# Overview
This folder contains scripts for building the dependencies from source. The static libraries from
diff --git a/tools/staticbuild/README.md b/tools/staticbuild/README.md
index 2def768a1f1e..3297bbdfbd40 100644
--- a/tools/staticbuild/README.md
+++ b/tools/staticbuild/README.md
@@ -1,3 +1,22 @@
+
+
# MXNet Static Build
This folder contains the core script used to build the static library. This README provides information on how to use the scripts in this folder. Please be aware, all of the scripts are designed to be run under the root folder.