Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid Parallel AD (Part 2/?) #1284

Merged
merged 31 commits into from
May 29, 2021
Merged

Hybrid Parallel AD (Part 2/?) #1284

merged 31 commits into from
May 29, 2021

Conversation

pcarruscag
Copy link
Member

@pcarruscag pcarruscag commented May 9, 2021

Proposed Changes

1 - Local indices for solution variables (which are overwritten during recording), even in singlezone problems (it was already used in multizone), to make things consistent for the two drivers.

2 - SU2_TYPE::GetDerivative was based on a global counter and a shared vector of input indices, which meant derivatives had to be extracted in the same order they were registered, and made the whole process not thread-safe. This is gone, if an input variable is overwritten during recording its (initial) index needs to be managed explicitly.

3 - Simplify and improve the restart logic because of unsteady FSI issues (not an Hybrid AD topic, but I had enough branches).

Related Work

Continues #1214

PR Checklist

  • I am submitting my contribution to the develop branch.
  • My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags, or simply --warnlevel=2 when using meson).
  • My contribution is commented and consistent with SU2 style.
  • I have added a test case that demonstrates my contribution, if necessary.
  • I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp) , if necessary.

@pcarruscag pcarruscag changed the title Hybrid Parallel AD (Part 2/?) [WIP] Hybrid Parallel AD (Part 2/?) May 9, 2021
@pr-triage pr-triage bot removed the PR: unreviewed label May 9, 2021
@TobiKattmann
Copy link
Contributor

Do you plan to add any hybrid parallel AD stuff here? I would prefer if this was a follow-up to #1260 where the Single zone DA input index storing is changed to the multizone strategy with explicit index-containers owned by the solver. You know ... smaller PR, faster merging

I will also add a preliminary unsteady cht adjoint case myself for coverage. Then we could also move the !CrossTerm of ExtractAdjoint_Solution thing in this PR and see if it breaks stuff.

@pcarruscag
Copy link
Member Author

pcarruscag commented May 12, 2021

To some extent this is a fix for hybrid parallel AD, because the single zone driver used the version of RegisterInput that pushed indices into that global "inputValues" vector, and that would not work with OpenMP.
This PR will not be very big anyway. You can test your testcase here, or in #1260, or in a different PR, and then we test the cross term thing.

SU2_CFD/src/solvers/CDiscAdjSolver.cpp Show resolved Hide resolved
/*--- Residuals and time_n terms are not needed when evaluating multizone cross terms. ---*/
if (CrossTerm) return;

SetResToZero();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have an unwanted addition to the residual because of moving the SetResToZero below the CrossTerm-return? The residual should compare the (full) adjoint solution so aren't we missing the External (crossterms + dualTime terms) here anyway. I guess just the dualTime Terms are constant so we can leave them out. Or is it correct now as CrossTerms are extracted first. I have to look that up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the residual should measure the norm of the residual of the (linear) adjoint equations, otherwise it would measure stagnation instead of convergence.
It just so happens that differences between consecutive fixed point iterations give you that residual (because the fixed point is the same as a Richardson iteration for the right-preconditioned adjoint equations).
The single zone uses differences of the total terms, the multizone uses differences of the product between total terms and the Jacobian of the iteration (it is easier to do it that way), this is equivalent since the right hand side is constant during inner iterations.

SU2_CFD/src/solvers/CDiscAdjSolver.cpp Show resolved Hide resolved
Comment on lines -1187 to -1210
switch (config->GetKind_Solver()) {
case TEMPLATE_SOLVER: template_solver = true; break;
case EULER : case INC_EULER: euler = true; break;
case NEMO_EULER : NEMO_euler = true; break;
case NAVIER_STOKES: case INC_NAVIER_STOKES: ns = true; heat = config->GetWeakly_Coupled_Heat(); break;
case NEMO_NAVIER_STOKES: NEMO_ns = true; break;
case RANS : case INC_RANS: ns = true; turbulent = true; heat = config->GetWeakly_Coupled_Heat(); break;
case FEM_EULER : fem_euler = true; break;
case FEM_NAVIER_STOKES: fem_ns = true; break;
case FEM_RANS : fem_ns = true; break;
case FEM_LES : fem_ns = true; break;
case HEAT_EQUATION: heat = true; break;
case FEM_ELASTICITY: fem = true; break;
case ADJ_EULER : euler = true; adj_euler = true; break;
case ADJ_NAVIER_STOKES : ns = true; turbulent = (config->GetKind_Turb_Model() != NONE); adj_ns = true; break;
case ADJ_RANS : ns = true; turbulent = true; adj_ns = true; adj_turb = (!config->GetFrozen_Visc_Cont()); break;
case DISC_ADJ_EULER: case DISC_ADJ_INC_EULER: euler = true; disc_adj = true; break;
case DISC_ADJ_NAVIER_STOKES: case DISC_ADJ_INC_NAVIER_STOKES: ns = true; disc_adj = true; heat = config->GetWeakly_Coupled_Heat(); break;
case DISC_ADJ_RANS: case DISC_ADJ_INC_RANS: ns = true; turbulent = true; disc_adj = true; disc_adj_turb = (!config->GetFrozen_Visc_Disc()); heat = config->GetWeakly_Coupled_Heat(); break;
case DISC_ADJ_FEM_EULER: fem_euler = true; disc_adj = true; break;
case DISC_ADJ_FEM_NS: fem_ns = true; disc_adj = true; break;
case DISC_ADJ_FEM_RANS: fem_ns = true; turbulent = true; disc_adj = true; disc_adj_turb = (!config->GetFrozen_Visc_Disc()); break;
case DISC_ADJ_FEM: fem = true; disc_adj_fem = true; break;
case DISC_ADJ_HEAT: heat = true; disc_adj_heat = true; break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🎆 ☠️

Copy link
Contributor

@TobiKattmann TobiKattmann May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... this kind of cleanup will also makes the potential change to enum class Kind_Solver less tedious, so kudos from someone in the future and from me in the present :)

Copy link
Contributor

@TobiKattmann TobiKattmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the adjoint consolidation of index storing things and all the other stuff 💐 . Imo this helps quite a bit to understand the DA solver (single & multizone). I have mostly question for my own understanding below. (I also have to come back to some of your previous comments/asnwers which I have to understand ;) )

inline void GetAdjoint_MeshCoord(unsigned long iPoint, su2double *adj_mesh) final {
for (unsigned long iDim = 0; iDim < nDim; iDim++) {
adj_mesh[iDim] = AD::GetDerivative(AD_InputIndex(iPoint,iDim));
AD::ResetInput(Mesh_Coord(iPoint,iDim));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit off-topic: I was digging a bit about AD::ResetInput and what it does. The code documentation says

   * \brief Reset the variable (set index to zero).
   * \param[in] data - the variable to be unregistered from the tape.

and the code is the following

FORCEINLINE void ResetInput(su2double &data) {data.getGradientData() = su2double::GradientData();}

now data.getGradientData() returns some index on the tape and su2double::GradientData() is the default constructor (which I guess returns 0) of the datatype codi::RealForward::GradientData which matches the left sides datatype. Alright but why are we doing that? I am struggling a bit with that

We also AD::ResetInput in CDiscAdjSolver::SetRecording but only for Solution_n/1 not for Solution itself, i.e. in steady simulation AD::ResetInput is not called during the flow adjoint.

AD::ResetInput(direct_solver->GetNodes()->GetSolution_time_n(iPoint)[iVar]);

Thanks for any insight already

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the trend is that if something is an input the index is not cleared by the dummy recording (since nothing is written to an input) and so you have to clear it explicitly.
Now, I guess this is only strictly required if sometimes (i.e. some recordings) you register that variable and sometimes you don't. If you always register it then the index is always valid, but if you leave it with "dangling" indices then you have a problem.
From my comment above it also seems that the location where this reset is performed matters... (but I might also have changed two things at the same time while testing).

SU2_CFD/include/variables/CVariable.hpp Outdated Show resolved Hide resolved
SU2_CFD/include/variables/CVariable.hpp Show resolved Hide resolved
iteration_container[iZone] = new CIteration* [nInst[iZone]] ();
solver_container[iZone] = new CSolver*** [nInst[iZone]] ();
integration_container[iZone] = new CIntegration** [nInst[iZone]] ();
numerics_container[iZone] = new CNumerics**** [nInst[iZone]] ();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ understanding question: With () you call the default constructor, right? What is the benefit of that? Does it automatically initialize with nullptr as that set-ting was deleted below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines +1127 to +1133
for (auto iSol = 0u; iSol < MAX_SOLS; ++iSol) {
auto sol = solver[MESH_0][iSol];
if (sol && !sol->GetAdjoint()) {
/*--- Note that the mesh solver always loads the most recent file (and not -2). ---*/
SU2_OMP_PARALLEL_(if(sol->GetHasHybridParallel()))
sol->LoadRestart(geometry, solver, config, val_iter + (iSol==MESH_SOL && dt_step_2nd), update_geo);
END_SU2_OMP_PARALLEL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome cleanup of the restart logic🧹

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the only advantage of the solver container being an array.

direct_solver->GetNodes()->GetAdjointSolution(iPoint,Solution);
}
su2double Solution[MAXNVAR] = {0.0};
direct_solver->GetNodes()->GetAdjointSolution(iPoint,Solution);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have the input-index & output-index storing-strategies consolidated between single and multizone adjoint 👍

TestCases/hybrid_regression.py Show resolved Hide resolved
Copy link
Contributor

@TobiKattmann TobiKattmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man this is good stuff 💐 Also without any problems (and modifications) with Register/Extract_Variable. I hope there is at least some Testcases that had e.g. AOA sensitivity in their screen output? (Did you maybe even check that)

And if I understood it correctly that this change revealed, that in two occasions

  FORCEINLINE passivedouble GetDerivative(const su2double& data) {
    return AD::getGlobalTape().getGradient(AD::inputValues[AD::adjointVectorPosition++]);
  }

fooled people with its fake input value.

Thanks again! As this is now at +500 -1000 and quite some changes some to the DA machine room I would personally prefer to have this merged without any major other updates. Also kudos for updating the PR explanation on top 👍

Common/include/basic_types/ad_structure.hpp Show resolved Hide resolved
else AD::RegisterOutput(variable(iPoint,iVar));

AD::SetIndex(ad_index(iPoint,iVar), variable(iPoint,iVar));
if (ad_index) AD::SetIndex((*ad_index)(iPoint,iVar), variable(iPoint,iVar));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 working with additional index containers and without

Comment on lines -98 to -99
su2matrix<int> AD_Time_n_InputIndex; /*!< \brief Indices of Solution variables in the adjoint vector. */
su2matrix<int> AD_Time_n1_InputIndex; /*!< \brief Indices of Solution variables in the adjoint vector. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we are making explicit that solution_time_n/1 should not be changed in one dual-time step. I like that. The regression test didn't fail so I see this as good thing

@@ -56,5 +56,5 @@ TEST_CASE("Simple AD Test", "[AD tests]") {
AD::ComputeAdjoint();

CHECK(SU2_TYPE::GetValue(y) == Approx(64));
CHECK(SU2_TYPE::GetDerivative(y) == Approx(48));
CHECK(SU2_TYPE::GetDerivative(x) == Approx(48));
Copy link
Contributor

@TobiKattmann TobiKattmann May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record: here GetDerivative(y) was not get-ting the derivative of y but out of the hidden datastructure inputValues which (is/)was filled with RegisterInput -> that's why it worked before

Comment on lines -720 to +719

for (iDV_Value = 0; iDV_Value < nDV_Value; iDV_Value++){
for (iDV_Value = 0; iDV_Value < config->GetnDV_Value(iDV); iDV_Value++){

/*--- Initilization with su2double resets the index ---*/
config->SetDV_Value(iDV, iDV_Value, 0.0);

DV_Value = 0.0;

AD::RegisterInput(DV_Value);

config->SetDV_Value(iDV, iDV_Value, DV_Value);
AD::RegisterInput(config->GetDV_Value(iDV, iDV_Value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered what made this break before and I can only think that before the local DV_VALUE was registered and below we now(!) extracted what is in config->GetDV_Value(...). Before this PR it accessed the hidden inputValues so everything was good (no matter the input value to SU2_TYPE::GetDerivative(...)).

      DV_Value = config->GetDV_Value(iDV, iDV_Value);
      my_Gradient = SU2_TYPE::GetDerivative(DV_Value);

Now you changed it to only work with the DV_Value that lives in the config and that is consistent with this change up here.

The problem before this very commit was probably index management optimization because config->SetDV_Value(iDV, iDV_Value, DV_Value); before was just an assignment. But that is sth I would ask @jblueh in the Dev-meeting as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were using always the same variable (declared at the top of the function).
On each loop iteration the =0 part would reset the previously registered index, then on extraction we were also writing over the same variable.

Comment on lines +2 to +3
0 -3.461460672772851e-03
1 -1.841786315630031e-03
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the changes are marginal and probably ok, but if this is not a fully converged adjoint and DOT step (not just 20 SU2_CFD_AD steps and then SU2_DOT_AD) I would prefer to have that checked on the 'converged' gradient. Maybe you did this already

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the settings of the case to make it converge better/faster.

@pcarruscag pcarruscag changed the title [WIP] Hybrid Parallel AD (Part 2/?) Hybrid Parallel AD (Part 2/?) May 26, 2021
@pcarruscag pcarruscag changed the title Hybrid Parallel AD (Part 2/?) [WIP] Hybrid Parallel AD (Part 2/?) May 27, 2021
@pcarruscag
Copy link
Member Author

Back to WIP, sorry @TobiKattmann.
It may be worth git cherry-picking 19d3329 to see if it fixes the issues @oleburghardt found.

@pcarruscag pcarruscag changed the title [WIP] Hybrid Parallel AD (Part 2/?) Hybrid Parallel AD (Part 2/?) May 29, 2021
@pcarruscag pcarruscag merged commit dc2c9e1 into develop May 29, 2021
@pcarruscag pcarruscag deleted the hybrid_parallel_ad3 branch May 29, 2021 14:27
@TobiKattmann
Copy link
Contributor

Well thanks for reverting the latest two commits with RealReverseIndex and the CoDiPack update and merging this. I guess that work will go on in another PR

@jblueh jblueh mentioned this pull request Jun 1, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants