Survey the parallelism in TF #6541

Yancey1989 · 2017-12-12T16:02:30Z

The Parallelism in TF

Two Kinds of Parallelism Configuration in Session

config = tf.ConfigProto()
config.intra_op_parallelism_threads = 4
config.inter_op_parallelism_threads = 8
tf.session(config=config)

which

intra_op_parallelism_threads will control the maximum parallel speedup
for a single operator, TF use Eigen::ThreadPoolDevice to speed up the
calculation:

struct ThreadPool::Impl : Eigen::ThreadPoolTempl<EigenEnvironment> {
...

void ParallelFor(int64 total, int64 cost_per_unit,
            std::function<void(int64, int64)> fn) {
CHECK_GE(total, 0);
CHECK_EQ(total, (int64)(Eigen::Index)total);
Eigen::ThreadPoolDevice device(this, this->NumThreads());
device.parallelFor(
    total, Eigen::TensorOpCost(0, 0, cost_per_unit),
    [&fn](Eigen::Index first, Eigen::Index last) { fn(first, last); });
    }
};

inter_op_parallelism_threads will execute the ops with multi-threads which no path with other ops.

Data Parallelism on GPUs

For the data parallelism, user need to convert the model into parallelism version,
each of the GPU will calculate gradients independently and merge
the gradients with average. The Following is a piece of the cifar10_multi_gpu_train.py:

...
# Get images and labels for CIFAR-10.
images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
        [images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
    for i in xrange(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
        with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
        # Dequeues one batch for the GPU
        image_batch, label_batch = batch_queue.dequeue()
        # Calculate the loss for one tower of the CIFAR model. This function
        # constructs the entire CIFAR model but shares the variables across
        # all towers.
        loss = tower_loss(scope, image_batch, label_batch)
        .... 
        # Calculate the gradients for the batch of data on this CIFAR tower.
        grads = opt.compute_gradients(loss)

        # Keep track of the gradients across all towers.
        tower_grads.append(grads)

# We must calculate the mean of each gradient. Note that this is the
# synchronization point across all towers.
grads = average_gradients(tower_grads)

Links

ThreadPool interface: /~https://github.com/tensorflow/tensorflow/blob/26b4dfa65d360f2793ad75083c797d57f8661b93/tensorflow/core/lib/core/threadpool.cc
Session configuration: /~https://github.com/tensorflow/tensorflow/blob/26b4dfa65d360f2793ad75083c797d57f8661b93/tensorflow/core/protobuf/config.proto#L165

The text was updated successfully, but these errors were encountered:

Yancey1989 closed this as completed Dec 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Survey the parallelism in TF #6541

Survey the parallelism in TF #6541

Yancey1989 commented Dec 12, 2017

Survey the parallelism in TF #6541

Survey the parallelism in TF #6541

Comments

Yancey1989 commented Dec 12, 2017

The Parallelism in TF

Two Kinds of Parallelism Configuration in Session

Data Parallelism on GPUs

Links