Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid Parallel AD (Part 3/?) #1294

Merged
merged 47 commits into from
Aug 7, 2021
Merged

Hybrid Parallel AD (Part 3/?) #1294

merged 47 commits into from
Aug 7, 2021

Conversation

jblueh
Copy link
Contributor

@jblueh jblueh commented Jun 1, 2021

Proposed Changes

  1. Fix preaccumulation issues with RealReverseIndex.
  2. Fix Identify issues regarding index reuse and partial tape evaluations.
  3. Add regression tests for hybrid parallel AD (subset of parallel_regression_AD).
  4. Re-enable parallel preaccumulation.
  5. Shared reading optimizations.

Related Work

continues #1284

PR Checklist

  • I am submitting my contribution to the develop branch.
  • My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags, or simply --warnlevel=2 when using meson).
  • My contribution is commented and consistent with SU2 style.
  • I have added a test case that demonstrates my contribution, if necessary.
  • I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp) , if necessary.

@jblueh jblueh changed the title WIP: Hybrid Parallel AD (Part 3/?) [WIP] Hybrid Parallel AD (Part 3/?) Jun 1, 2021
@TobiKattmann
Copy link
Contributor

Part 3, but the branch-name is is hybrid_parallel_ad4 ... who are you trying to fool?

(jk :D , that mismatch was already in the previous PR)

@jblueh
Copy link
Contributor Author

jblueh commented Jun 11, 2021

f501dc1 fixes divergence of the disc_adj_fsi case among the parallel regression tests that occurred when using the CoDiPack type RealReverseIndex instead of RealReverse.

Since the bug is not obvious from the commit and there might be a better way to fix this, let me put this up for discussion.

In CDiscAdjMultizoneDriver::SetRecording, the tape is recorded part by part. The part of the code that is at fault involves three of them.

  • DEPENDENCIES -> OBJECTIVE_FUNCTION: an index is last used on a right hand side
  • OBJECTIVE_FUNCTION -> TRANSFER: the index is available for reuse and used on the left hand side
  • recording for a zone: the index is used on a right hand side

In CDiscAdjMultizoneDriver.cpp::ComputeAdjoints, however, the TRANSFER -> OBJECTIVE_FUNCTION part of the evaluation is skipped in some cases. After the recording for a zone evaluation accumulated adjoints for the said index, the adjoint reset that would be performed in the skipped tape part is missing. Instead, the tape evaluation DEPENDENCIES -> OBJECTIVE_FUNCTION receives the adjoints for this index, which is wrong because the index is associated with a different primal variable in this part of the tape.

Have there been issues regarding dependencies between tape parts previously? Are there mechanisms established that address such issues?

If someone wants to give it a try, I added a build option for the tape type in 183c3ca. You can switch to the index tape by calling meson with -Dcodi-tape=JacobianIndex.

Comment on lines +61 to +62
if get_option('codi-tape') == 'JacobianIndex'
codi_rev_args += '-DCODI_INDEX_TAPE'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jblueh
Copy link
Contributor Author

jblueh commented Jun 30, 2021

Parallel preaccumulations must not share any variables. In particular, they must not have any common inputs and outputs. I suspect (judging from the code) that the preaccumulations treated in 781092a violate this pattern, and that this is the reason why excluding them makes several hybrid parallel AD tests pass. Other preaccumulations might be affected as well. The correctness of preaccumulations inside numerics objects, for example, depends on the values of pointers like V_i, V_j. Other preaccumulations might look wrong but are ok because common variables are prevented by color loops.

@pcarruscag pcarruscag changed the title [WIP] Hybrid Parallel AD (Part 3/?) Hybrid Parallel AD (Part 3/?) Aug 1, 2021
Copy link
Member

@pcarruscag pcarruscag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only thing left is to decide whether hybrid AD compiles by default if -Dwith-omp=true or if a special flag is added?

@jblueh
Copy link
Contributor Author

jblueh commented Aug 2, 2021

So the only thing left is to decide whether hybrid AD compiles by default if -Dwith-omp=true or if a special flag is added?

You mean a flag like -Dwith-opdi? This would only be meaningful for reverse AD + OpenMP. Conversely, reverse AD + OpenMP are not very meaningful without OpDiLib (you could probably execute it with -t 1, but then why bother using OpenMP in the first place). So if that was the question, I would say we leave it the way it is at the moment.

@pcarruscag
Copy link
Member

The omp simd directive is used in the reverse mode for passive linear algebra things, not a huge deal.

@jblueh
Copy link
Contributor Author

jblueh commented Aug 2, 2021

I see, you can't drop -Dwith-omp without losing vectorization. For a vectorized serial code, you want to be able to compile with OpenMP but make everything else, or at least AD, behave as if you wouldn't?

@pcarruscag
Copy link
Member

We don't really lose vectorization... the compiler will still generate it eventually, it will just be a bit less consistent across compilers and what they include in -O2 or -O3.
This is mostly for vector operations which probably don't take a huge time anyway, to justify another flag.
Maybe FORCE_OPDI_OMPT_BACKEND could be used to encode an "off-switch"?

@jblueh
Copy link
Contributor Author

jblueh commented Aug 2, 2021

Have a look at a9466bb for a suggestion on how to disable OpDiLib within the scope of the existing build options (casually tested).

To clarify the AD implications: If you use OpDiLib, you will have atomic adjoints where no shared reading optimizations are applied, even with -t 1. Disabling OpDiLib - if we add that option - will amend this. The presence of _OPENMP, however, makes SU2 use the thread-safe CoDiPack type and enables thread-safety augmentations in CoDiPack. Therefore, a reverse AD + OpenMP - OpDiLib build is not the same as a classical serial reverse AD build that is vectorized one way or the other.

If we add this, reverse AD + OpenMP - OpDiLib should also print an error for -t 2 or higher.

Since I am not 100% sure about it yet, could you clarify again what it is that you want to achieve in the end, and why?

@pcarruscag
Copy link
Member

Right, let's get this moving so we can have 7.2.0

@pcarruscag pcarruscag merged commit bd75cb3 into develop Aug 7, 2021
@pcarruscag pcarruscag deleted the hybrid_parallel_ad4 branch August 7, 2021 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants