Improve backend performance #284

Superjomn · 2018-02-28T03:20:46Z

Currently, some implementation of the backend is naive, work not slow. For that, the user will embed backend SDK into their training phase, and the backend will be triggered frequently, so the logger's performance is crucial.

A rough look at the details, there are several modules should be tuned, I list them based on the importance order:

storage/Storage::PersistToDisk, this method will save all the tablets from memory into disk even if some of them are not changed at all.
WRITE_GUARD, it is a trick to use a counter and mod some frequency to avoid the need for concurrency. But the write operation takes overhead, it better to use an async operation instead.
Adding record is expensive, for example, Image 's record adding needs to rescale all the pixels, such operations should change to asnyc.

All in all, there are two aspects to improve. First, the PersistToDist should ignore the tablets that havn't changed; second, some expensive operations should change to async ones.

The first issue is quite intuitive; let's focus on the second one.

For async tasks, thread queue is a good choice, but not suitable for this task. The operations on tablets have some dependencies which are hard to describe by stateless threads, and it is painful to introduce more condition variable or mutex. Dependency engine is a good choice, it handles dependencies naturally, and support concurrency programming without the need for condition variable or mutex.

dependency engine as a concurrent programming framework

VisualDL might be used in a parallel system, that is the SDK might be called parallelly. The dependency engine is similar to a task queue; the tasks can be added parallelly with a single mutex to protect the internal states.

The tasks can be executed parallelly by a thread pool. Both the state control and thread pool are hidden in the dependency engine, the change to VisualDL is just the task pushing logic.

For example, the heavy Image::SetSample can embed into a task and accelerated by the underlying thread-pool.

performance stats

We can reference CPU prof to get some details of the backend performance.

The text was updated successfully, but these errors were encountered:

ZeyuChen · 2020-05-22T16:31:46Z

VisualDL 2.0 use stream down sampling method to solve backend perf issues now.

Superjomn self-assigned this Feb 28, 2018

Superjomn added the enhancement label Feb 28, 2018

jetfuel mentioned this issue Mar 9, 2018

Only write the modified tablets to file system. #304

Merged

jetfuel mentioned this issue Apr 17, 2018

Research if data can append to a protobuf file #325

Closed

ZeyuChen closed this as completed May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve backend performance #284

Improve backend performance #284

Superjomn commented Feb 28, 2018

ZeyuChen commented May 22, 2020

Improve backend performance #284

Improve backend performance #284

Comments

Superjomn commented Feb 28, 2018

dependency engine as a concurrent programming framework

performance stats

ZeyuChen commented May 22, 2020