Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Survey the parallelism in TF #6541

Closed
Yancey1989 opened this issue Dec 12, 2017 · 0 comments
Closed

Survey the parallelism in TF #6541

Yancey1989 opened this issue Dec 12, 2017 · 0 comments

Comments

@Yancey1989
Copy link
Contributor

The Parallelism in TF

Two Kinds of Parallelism Configuration in Session

config = tf.ConfigProto()
config.intra_op_parallelism_threads = 4
config.inter_op_parallelism_threads = 8
tf.session(config=config)

which

  • intra_op_parallelism_threads will control the maximum parallel speedup
    for a single operator, TF use Eigen::ThreadPoolDevice to speed up the
    calculation:

    struct ThreadPool::Impl : Eigen::ThreadPoolTempl<EigenEnvironment> {
    ...
    
    void ParallelFor(int64 total, int64 cost_per_unit,
                std::function<void(int64, int64)> fn) {
    CHECK_GE(total, 0);
    CHECK_EQ(total, (int64)(Eigen::Index)total);
    Eigen::ThreadPoolDevice device(this, this->NumThreads());
    device.parallelFor(
        total, Eigen::TensorOpCost(0, 0, cost_per_unit),
        [&fn](Eigen::Index first, Eigen::Index last) { fn(first, last); });
        }
    };
  • inter_op_parallelism_threads will execute the ops with multi-threads which no path with other ops.

Data Parallelism on GPUs

For the data parallelism, user need to convert the model into parallelism version,
each of the GPU will calculate gradients independently and merge
the gradients with average. The Following is a piece of the cifar10_multi_gpu_train.py:

...
# Get images and labels for CIFAR-10.
images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
        [images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
    for i in xrange(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
        with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
        # Dequeues one batch for the GPU
        image_batch, label_batch = batch_queue.dequeue()
        # Calculate the loss for one tower of the CIFAR model. This function
        # constructs the entire CIFAR model but shares the variables across
        # all towers.
        loss = tower_loss(scope, image_batch, label_batch)
        .... 
        # Calculate the gradients for the batch of data on this CIFAR tower.
        grads = opt.compute_gradients(loss)

        # Keep track of the gradients across all towers.
        tower_grads.append(grads)

# We must calculate the mean of each gradient. Note that this is the
# synchronization point across all towers.
grads = average_gradients(tower_grads)

Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant