-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hybrid Parallel AD (Part 2/?) #1284
Conversation
Do you plan to add any hybrid parallel AD stuff here? I would prefer if this was a follow-up to #1260 where the Single zone DA input index storing is changed to the multizone strategy with explicit index-containers owned by the solver. You know ... smaller PR, faster merging I will also add a preliminary unsteady cht adjoint case myself for coverage. Then we could also move the |
To some extent this is a fix for hybrid parallel AD, because the single zone driver used the version of RegisterInput that pushed indices into that global "inputValues" vector, and that would not work with OpenMP. |
/*--- Residuals and time_n terms are not needed when evaluating multizone cross terms. ---*/ | ||
if (CrossTerm) return; | ||
|
||
SetResToZero(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have an unwanted addition to the residual because of moving the SetResToZero
below the CrossTerm-return? The residual should compare the (full) adjoint solution so aren't we missing the External (crossterms + dualTime terms) here anyway. I guess just the dualTime Terms are constant so we can leave them out. Or is it correct now as CrossTerms are extracted first. I have to look that up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the residual should measure the norm of the residual of the (linear) adjoint equations, otherwise it would measure stagnation instead of convergence.
It just so happens that differences between consecutive fixed point iterations give you that residual (because the fixed point is the same as a Richardson iteration for the right-preconditioned adjoint equations).
The single zone uses differences of the total terms, the multizone uses differences of the product between total terms and the Jacobian of the iteration (it is easier to do it that way), this is equivalent since the right hand side is constant during inner iterations.
switch (config->GetKind_Solver()) { | ||
case TEMPLATE_SOLVER: template_solver = true; break; | ||
case EULER : case INC_EULER: euler = true; break; | ||
case NEMO_EULER : NEMO_euler = true; break; | ||
case NAVIER_STOKES: case INC_NAVIER_STOKES: ns = true; heat = config->GetWeakly_Coupled_Heat(); break; | ||
case NEMO_NAVIER_STOKES: NEMO_ns = true; break; | ||
case RANS : case INC_RANS: ns = true; turbulent = true; heat = config->GetWeakly_Coupled_Heat(); break; | ||
case FEM_EULER : fem_euler = true; break; | ||
case FEM_NAVIER_STOKES: fem_ns = true; break; | ||
case FEM_RANS : fem_ns = true; break; | ||
case FEM_LES : fem_ns = true; break; | ||
case HEAT_EQUATION: heat = true; break; | ||
case FEM_ELASTICITY: fem = true; break; | ||
case ADJ_EULER : euler = true; adj_euler = true; break; | ||
case ADJ_NAVIER_STOKES : ns = true; turbulent = (config->GetKind_Turb_Model() != NONE); adj_ns = true; break; | ||
case ADJ_RANS : ns = true; turbulent = true; adj_ns = true; adj_turb = (!config->GetFrozen_Visc_Cont()); break; | ||
case DISC_ADJ_EULER: case DISC_ADJ_INC_EULER: euler = true; disc_adj = true; break; | ||
case DISC_ADJ_NAVIER_STOKES: case DISC_ADJ_INC_NAVIER_STOKES: ns = true; disc_adj = true; heat = config->GetWeakly_Coupled_Heat(); break; | ||
case DISC_ADJ_RANS: case DISC_ADJ_INC_RANS: ns = true; turbulent = true; disc_adj = true; disc_adj_turb = (!config->GetFrozen_Visc_Disc()); heat = config->GetWeakly_Coupled_Heat(); break; | ||
case DISC_ADJ_FEM_EULER: fem_euler = true; disc_adj = true; break; | ||
case DISC_ADJ_FEM_NS: fem_ns = true; disc_adj = true; break; | ||
case DISC_ADJ_FEM_RANS: fem_ns = true; turbulent = true; disc_adj = true; disc_adj_turb = (!config->GetFrozen_Visc_Disc()); break; | ||
case DISC_ADJ_FEM: fem = true; disc_adj_fem = true; break; | ||
case DISC_ADJ_HEAT: heat = true; disc_adj_heat = true; break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 🎆 ☠️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... this kind of cleanup will also makes the potential change to enum class Kind_Solver
less tedious, so kudos from someone in the future and from me in the present :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the adjoint consolidation of index storing things and all the other stuff 💐 . Imo this helps quite a bit to understand the DA solver (single & multizone). I have mostly question for my own understanding below. (I also have to come back to some of your previous comments/asnwers which I have to understand ;) )
inline void GetAdjoint_MeshCoord(unsigned long iPoint, su2double *adj_mesh) final { | ||
for (unsigned long iDim = 0; iDim < nDim; iDim++) { | ||
adj_mesh[iDim] = AD::GetDerivative(AD_InputIndex(iPoint,iDim)); | ||
AD::ResetInput(Mesh_Coord(iPoint,iDim)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit off-topic: I was digging a bit about AD::ResetInput
and what it does. The code documentation says
* \brief Reset the variable (set index to zero).
* \param[in] data - the variable to be unregistered from the tape.
and the code is the following
FORCEINLINE void ResetInput(su2double &data) {data.getGradientData() = su2double::GradientData();}
now data.getGradientData()
returns some index on the tape and su2double::GradientData()
is the default constructor (which I guess returns 0) of the datatype codi::RealForward::GradientData
which matches the left sides datatype. Alright but why are we doing that? I am struggling a bit with that
We also AD::ResetInput
in CDiscAdjSolver::SetRecording
but only for Solution_n/1 not for Solution itself, i.e. in steady simulation AD::ResetInput
is not called during the flow adjoint.
AD::ResetInput(direct_solver->GetNodes()->GetSolution_time_n(iPoint)[iVar]);
Thanks for any insight already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the trend is that if something is an input the index is not cleared by the dummy recording (since nothing is written to an input) and so you have to clear it explicitly.
Now, I guess this is only strictly required if sometimes (i.e. some recordings) you register that variable and sometimes you don't. If you always register it then the index is always valid, but if you leave it with "dangling" indices then you have a problem.
From my comment above it also seems that the location where this reset is performed matters... (but I might also have changed two things at the same time while testing).
iteration_container[iZone] = new CIteration* [nInst[iZone]] (); | ||
solver_container[iZone] = new CSolver*** [nInst[iZone]] (); | ||
integration_container[iZone] = new CIntegration** [nInst[iZone]] (); | ||
numerics_container[iZone] = new CNumerics**** [nInst[iZone]] (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ understanding question: With ()
you call the default constructor, right? What is the benefit of that? Does it automatically initialize with nullptr
as that set-ting was deleted below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
for (auto iSol = 0u; iSol < MAX_SOLS; ++iSol) { | ||
auto sol = solver[MESH_0][iSol]; | ||
if (sol && !sol->GetAdjoint()) { | ||
/*--- Note that the mesh solver always loads the most recent file (and not -2). ---*/ | ||
SU2_OMP_PARALLEL_(if(sol->GetHasHybridParallel())) | ||
sol->LoadRestart(geometry, solver, config, val_iter + (iSol==MESH_SOL && dt_step_2nd), update_geo); | ||
END_SU2_OMP_PARALLEL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome cleanup of the restart logic🧹
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the only advantage of the solver container being an array.
direct_solver->GetNodes()->GetAdjointSolution(iPoint,Solution); | ||
} | ||
su2double Solution[MAXNVAR] = {0.0}; | ||
direct_solver->GetNodes()->GetAdjointSolution(iPoint,Solution); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have the input-index & output-index storing-strategies consolidated between single and multizone adjoint 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man this is good stuff 💐 Also without any problems (and modifications) with Register/Extract_Variable. I hope there is at least some Testcases that had e.g. AOA sensitivity in their screen output? (Did you maybe even check that)
And if I understood it correctly that this change revealed, that in two occasions
FORCEINLINE passivedouble GetDerivative(const su2double& data) {
return AD::getGlobalTape().getGradient(AD::inputValues[AD::adjointVectorPosition++]);
}
fooled people with its fake input value.
Thanks again! As this is now at +500 -1000
and quite some changes some to the DA machine room I would personally prefer to have this merged without any major other updates. Also kudos for updating the PR explanation on top 👍
else AD::RegisterOutput(variable(iPoint,iVar)); | ||
|
||
AD::SetIndex(ad_index(iPoint,iVar), variable(iPoint,iVar)); | ||
if (ad_index) AD::SetIndex((*ad_index)(iPoint,iVar), variable(iPoint,iVar)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 working with additional index containers and without
su2matrix<int> AD_Time_n_InputIndex; /*!< \brief Indices of Solution variables in the adjoint vector. */ | ||
su2matrix<int> AD_Time_n1_InputIndex; /*!< \brief Indices of Solution variables in the adjoint vector. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now we are making explicit that solution_time_n/1 should not be changed in one dual-time step. I like that. The regression test didn't fail so I see this as good thing
@@ -56,5 +56,5 @@ TEST_CASE("Simple AD Test", "[AD tests]") { | |||
AD::ComputeAdjoint(); | |||
|
|||
CHECK(SU2_TYPE::GetValue(y) == Approx(64)); | |||
CHECK(SU2_TYPE::GetDerivative(y) == Approx(48)); | |||
CHECK(SU2_TYPE::GetDerivative(x) == Approx(48)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record: here GetDerivative(y)
was not get-ting the derivative of y
but out of the hidden datastructure inputValues
which (is/)was filled with RegisterInput
-> that's why it worked before
|
||
for (iDV_Value = 0; iDV_Value < nDV_Value; iDV_Value++){ | ||
for (iDV_Value = 0; iDV_Value < config->GetnDV_Value(iDV); iDV_Value++){ | ||
|
||
/*--- Initilization with su2double resets the index ---*/ | ||
config->SetDV_Value(iDV, iDV_Value, 0.0); | ||
|
||
DV_Value = 0.0; | ||
|
||
AD::RegisterInput(DV_Value); | ||
|
||
config->SetDV_Value(iDV, iDV_Value, DV_Value); | ||
AD::RegisterInput(config->GetDV_Value(iDV, iDV_Value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered what made this break before and I can only think that before the local DV_VALUE
was registered and below we now(!) extracted what is in config->GetDV_Value(...)
. Before this PR it accessed the hidden inputValues
so everything was good (no matter the input value to SU2_TYPE::GetDerivative(...)
).
DV_Value = config->GetDV_Value(iDV, iDV_Value);
my_Gradient = SU2_TYPE::GetDerivative(DV_Value);
Now you changed it to only work with the DV_Value
that lives in the config and that is consistent with this change up here.
The problem before this very commit was probably index management optimization because config->SetDV_Value(iDV, iDV_Value, DV_Value);
before was just an assignment. But that is sth I would ask @jblueh in the Dev-meeting as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were using always the same variable (declared at the top of the function).
On each loop iteration the =0 part would reset the previously registered index, then on extraction we were also writing over the same variable.
0 -3.461460672772851e-03 | ||
1 -1.841786315630031e-03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the changes are marginal and probably ok, but if this is not a fully converged adjoint and DOT step (not just 20 SU2_CFD_AD steps and then SU2_DOT_AD) I would prefer to have that checked on the 'converged' gradient. Maybe you did this already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the settings of the case to make it converge better/faster.
…into hybrid_parallel_ad3
…on and MAXNVAR fix for flow.
Back to WIP, sorry @TobiKattmann. |
Well thanks for reverting the latest two commits with RealReverseIndex and the CoDiPack update and merging this. I guess that work will go on in another PR |
Proposed Changes
1 - Local indices for solution variables (which are overwritten during recording), even in singlezone problems (it was already used in multizone), to make things consistent for the two drivers.
2 - SU2_TYPE::GetDerivative was based on a global counter and a shared vector of input indices, which meant derivatives had to be extracted in the same order they were registered, and made the whole process not thread-safe. This is gone, if an input variable is overwritten during recording its (initial) index needs to be managed explicitly.
3 - Simplify and improve the restart logic because of unsteady FSI issues (not an Hybrid AD topic, but I had enough branches).
Related Work
Continues #1214
PR Checklist