Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Framework: Recent STK update breaks ROL's downward dependency testing #3407

Closed
dridzal opened this issue Sep 7, 2018 · 16 comments
Closed

Framework: Recent STK update breaks ROL's downward dependency testing #3407

dridzal opened this issue Sep 7, 2018 · 16 comments
Assignees
Labels
Framework tasks Framework tasks (used internally by Framework team) pkg: Panzer pkg: ROL pkg: STK

Comments

@dridzal
Copy link
Contributor

dridzal commented Sep 7, 2018

@trilinos/framework @trilinos/stk @trilinos/panzer @trilinos/rol

Expectations

Restore ROL's downward dependency testing.

Current Behavior

https://testing-vm.sandia.gov/cdash/viewBuildError.php?buildid=3913799

Build error:

/ascldap/users/dridzal/development/TEST/rol-trilinos/Trilinos/packages/stk/stk_util/stk_util/parallel/CommNeighbors.cpp: In member function ‘virtual ompi_communicator_t* stk::CommNeighbors::setup_neighbor_comm(stk::ParallelMachine, const std::vector<int>&, const std::vector<int>&)’: /ascldap/users/dridzal/development/TEST/rol-trilinos/Trilinos/packages/stk/stk_util/stk_util/parallel/CommNeighbors.cpp:81:30: error: ‘MPI_UNWEIGHTED’ was not declared in this scope const int* weights = (int*)MPI_UNWEIGHTED; ^ /ascldap/users/dridzal/development/TEST/rol-trilinos/Trilinos/packages/stk/stk_util/stk_util/parallel/CommNeighbors.cpp:86:43: error: ‘MPI_Dist_graph_create_adjacent’ was not declared in this scope info, reorder, &neighborComm); ^ /ascldap/users/dridzal/development/TEST/rol-trilinos/Trilinos/packages/stk/stk_util/stk_util/parallel/CommNeighbors.cpp: In member function ‘virtual void stk::CommNeighbors::perform_neighbor_communication(MPI_Comm, const std::vector<unsigned char>&, const std::vector<int>&, const std::vector<int>&, std::vector<unsigned char>&, std::vector<int>&, std::vector<int>&)’: /ascldap/users/dridzal/development/TEST/rol-trilinos/Trilinos/packages/stk/stk_util/stk_util/parallel/CommNeighbors.cpp:221:71: error: ‘MPI_Neighbor_alltoall’ was not declared in this scope (void*)recvCountsPtr, 1, MPI_INT, neighborComm); ^ /ascldap/users/dridzal/development/TEST/rol-trilinos/Trilinos/packages/stk/stk_util/stk_util/parallel/CommNeighbors.cpp:236:78: error: ‘MPI_Neighbor_alltoallv’ was not declared in this scope (void*)recvBufPtr, recvCountsPtr, recvDisplsPtr, MPI_BYTE, neighborComm); ^

Possible Solution

I believe that MPI that is loaded for ROL's nightly testing (1.6.5??) no longer works for STK. This needs to be updated ASAP. I simply call

$TRILINOS_DIR/cmake/load_sems_dev_env.sh "default"

so the issue must be in load_sems_dev_env or the definition of "default". I wonder why others are not seeing this issue. Is the "default" abandoned or renamed? On the other hand, pre-push test scripts seem to work just fine ... how are they different?

@dridzal dridzal added pkg: STK Framework tasks Framework tasks (used internally by Framework team) pkg: Panzer pkg: ROL labels Sep 7, 2018
@prwolfe
Copy link
Contributor

prwolfe commented Sep 7, 2018

Is this a duplicate of #3390? See @mhoemmen's comment there about the age of this MPI version.

I will talk with the team.

@dridzal
Copy link
Contributor Author

dridzal commented Sep 7, 2018

It's definitely related to #3390, but it's somewhat specific to fixing the load_sems_dev_env.sh script and/or the definition of "default" for this script. The ATDM builds don't use this script -- do they?

@mhoemmen
Copy link
Contributor

mhoemmen commented Sep 7, 2018

OpenMPI 1.6.5 is waaaaaaaay too old. Sierra doesn't test with it, as far as I know.

@dridzal
Copy link
Contributor Author

dridzal commented Sep 7, 2018

Agreed. The bottom line is that Trilinos does not compile with the sems dev env defaults, due to the recent STK changes. So we either change the defaults or retrofit STK with ancient MPI alternatives. The change in the default/minimum MPI version would need to be communicated to the stakeholders. An important question is if there are projects that track our master/develop branches but absolutely rely on OpenMPI 1.6.5 (have we heard from anyone?). Their use case would no longer be supported.

@bartlettroscoe
Copy link
Member

@dridzal, I will go ahead and make the default match the current CI build which is using OpenMPI 1.10.

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Sep 7, 2018
…rilinos#3407)

The default SEMS env should match the env used in the post-push CI build.
This change should have been made when the CI build was updated to use OpenMPI
1.10.1.

NOTE: This also updates the default CMake to 3.11.1.

This should resolve the errors reported in trilinos#3407.
@dridzal
Copy link
Contributor Author

dridzal commented Sep 7, 2018

@jwillenbring @bartlettroscoe Is there a minimum version of MPI that Trilinos officially supports?

@bartlettroscoe
Copy link
Member

@dridzal said:

@jwillenbring @bartlettroscoe Is there a minimum version of MPI that Trilinos officially supports?

Don't know. I only know what the ATDM builds are using. My guess is that OpenMPI 1.10.1 is the lowest.

@dridzal
Copy link
Contributor Author

dridzal commented Sep 7, 2018

Well, by allowing the STK failures and abandoning 1.6.5 testing, we are effectively setting the minimum MPI version ... to something higher than 1.6.5. 1.10.1? 1.8.7?

@mhoemmen
Copy link
Contributor

mhoemmen commented Sep 7, 2018

Sierra had problems with OpenMPI 1.8.x, and decided to skip over it and require 1.10.y.

trilinos-autotester added a commit that referenced this issue Sep 8, 2018
…lt-openmpi-1.10.1

Automatically Merged using Trilinos Pull Request AutoTester
PR Title: Switch SEMS env default to match GCC 4.8.4 OpenMPI 1.10.1 CI build (#3407)
PR Author: bartlettroscoe
@alanw0
Copy link
Contributor

alanw0 commented Sep 9, 2018

Sorry I was out of town for a couple days and missed much of the discussion. I can easily add ifdefs to allow building with old mpi versions. The 'CommNeighbors' class which uses these missing MPI calls will then throw at run-time. Since it's a relatively new class, it is reasonable for any user of it to use a reasonably new MPI implementation. But then at least Trilinos will still build for existing users who aren't using this class and who have an old MPI installation. If ATDM builds are the issue, they aren't using STK anyway, so they shouldn't be impacted by a STK MPI version requirement.

prwolfe added a commit to prwolfe/Trilinos that referenced this issue Sep 10, 2018
This imports stk development as of

commit e27c191218c3e6412114091434afae1417cb74d0
Author: Alan Williams <william@sandia.gov>
Date:   Sun Sep 9 13:03:52 2018 -0600

    Add ifdefs for MPI_VERSION in stk CommNeighbors class.

    CommNeighbors uses MPI_Neighbor* functions, which only exist in
    MPI implementations which support the MPI 3 standard. In the case
    of building with an MPI implementation older than that (which is
    rare, but Trilinos still has a build for OpenMPI 1.6),
    the code will still build but a run-time error will occur,
    stating that the user must use a newer MPI if they wish to use
    the stk CommNeighbors class.

    Change-Id: I365923bb52ae6aef5ef73dd69470d104b650c357
    Reviewed-on: https://sierra-git.sandia.gov:4443/349628
    Tested-by: Mark E Hamilton <sierra@sandia.gov>
    Reviewed-by: Kendall Hugh Pierson <khpiers@sandia.gov>

And includes fixes for issue trilinos#3377 trilinos#3390 and trilinos#3407
@bartlettroscoe
Copy link
Member

@alanw0 said:

If ATDM builds are the issue, they aren't using STK anyway, so they shouldn't be impacted by a STK MPI version requirement.

FYI: The current ATDM Trilinos configuration that is modeled after the EMPIRE Trilinos configuration currently enables the following STK subpackages:

  • Final set of enabled SE packages: ... STKUtil STKSimd STKTopology STKMesh STKNGP STKIO STKMath STKTools STKExprEval STK ... 82

as shown, for example, on CDash here.

And the SPARC Trilinos configuration enables the STK subpackages:

  • Final set of enabled SE packages: ...STKUtil STKSearch STKTransfer STK ... 95

as shown here.

So that is a several STK subpackages being used by ATDM (at least as I currently understand things).

@alanw0
Copy link
Contributor

alanw0 commented Sep 10, 2018

@bartlettroscoe thanks for the info, that's certainly news to me.

@prwolfe
Copy link
Contributor

prwolfe commented Sep 11, 2018

@dridzal - PR #3416 should have solved this, but todays dashboard still shows an issue. Are you testing against develop or master?

@dridzal
Copy link
Contributor Author

dridzal commented Sep 11, 2018 via email

@dridzal
Copy link
Contributor Author

dridzal commented Sep 11, 2018

The sync won't be done until tomorrow, Sep. 12 (due to PR #3425), so we'll have to check the dashboard on Sep. 13.

@dridzal
Copy link
Contributor Author

dridzal commented Sep 13, 2018

ROL's downward dependency tests are passing again.

@dridzal dridzal closed this as completed Sep 13, 2018
tjfulle pushed a commit to tjfulle/Trilinos that referenced this issue Dec 6, 2018
…rilinos#3407)

The default SEMS env should match the env used in the post-push CI build.
This change should have been made when the CI build was updated to use OpenMPI
1.10.1.

NOTE: This also updates the default CMake to 3.11.1.

This should resolve the errors reported in trilinos#3407.
tjfulle pushed a commit to tjfulle/Trilinos that referenced this issue Dec 6, 2018
This imports stk development as of

commit e27c191218c3e6412114091434afae1417cb74d0
Author: Alan Williams <william@sandia.gov>
Date:   Sun Sep 9 13:03:52 2018 -0600

    Add ifdefs for MPI_VERSION in stk CommNeighbors class.

    CommNeighbors uses MPI_Neighbor* functions, which only exist in
    MPI implementations which support the MPI 3 standard. In the case
    of building with an MPI implementation older than that (which is
    rare, but Trilinos still has a build for OpenMPI 1.6),
    the code will still build but a run-time error will occur,
    stating that the user must use a newer MPI if they wish to use
    the stk CommNeighbors class.

    Change-Id: I365923bb52ae6aef5ef73dd69470d104b650c357
    Reviewed-on: https://sierra-git.sandia.gov:4443/349628
    Tested-by: Mark E Hamilton <sierra@sandia.gov>
    Reviewed-by: Kendall Hugh Pierson <khpiers@sandia.gov>

And includes fixes for issue trilinos#3377 trilinos#3390 and trilinos#3407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework tasks Framework tasks (used internally by Framework team) pkg: Panzer pkg: ROL pkg: STK
Projects
None yet
Development

No branches or pull requests

5 participants