Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STK build errors on ATDM rhel6 gcc 7.2 build #3390

Closed
fryeguy52 opened this issue Sep 5, 2018 · 26 comments
Closed

STK build errors on ATDM rhel6 gcc 7.2 build #3390

fryeguy52 opened this issue Sep 5, 2018 · 26 comments
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. type: bug The primary issue is a bug in Trilinos code or tests

Comments

@fryeguy52
Copy link
Contributor

fryeguy52 commented Sep 5, 2018

CC: @trilinos/stk , @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe

Next Action Status

Jenkins job driving this build was disabled on 9/8/2018 (see here). Next: Watch for build not being submitted to CDash starting 9/9/2018 ...

Description

As shown in this query There are 3 build failures in STK that began on 8/21/2018 and are still failing. On CDash the failures are the following:

Error building CMakeFiles/STKUnit_tests_util_parallel_UnitTest.dir/UnitTestCommNeighbors.cpp.o
Error building libstk_util_parallel.so.12.13
Error building CMakeFiles/stk_util_parallel.dir/CommNeighbors.cpp.o

output here

These failures are occurring in the build:

  • Trilinos-atdm-sems-gcc-7-2-0

some examples of the kind of errors in the output

‘MPI_UNWEIGHTED’ was not declared in this scope
...
‘MPI_Dist_graph_create_adjacent’ was not declared in this scope
...
‘MPI_Neighbor_alltoall’ was not declared in this scope

Steps to Reproduce

One should be able to reproduce this failure on the machine as described in:

More specifically, the commands given for the system are provided at:

The exact commands to reproduce this issue should be:


$ cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh intel-opt-openmp

$ cmake \
  -GNinja \
  -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
  -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_STK=ON \
  $TRILINOS_DIR

$ make NP=16

$ ctest -j16 \
@fryeguy52 fryeguy52 added type: bug The primary issue is a bug in Trilinos code or tests pkg: STK client: ATDM Any issue primarily impacting the ATDM project labels Sep 5, 2018
@alanw0
Copy link
Contributor

alanw0 commented Sep 5, 2018

Looks like this build uses an old mpi that doesn’t have the mpi-neighbor functions? I’ll look into a solution as soon as possible.

@mhoemmen
Copy link
Contributor

mhoemmen commented Sep 5, 2018

@alanw0 You can check the MPI_VERSION macro -- Trilinos does that in some places. You could use Tpetra::Distributor as a fall-back in case you don't have MPI 3.

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Sep 6, 2018

If you look at the configure output, for example here, you can see that this is using OpenMPI 1.6.5.

@mhoemmen and @fryeguy52,

Now that we have GCC 7.2.0 builds up and running on 'white' and 'ride' and 'wateman', I wonder if it is worthwhile to keep this auxiliary GCC 7.2.0 build? This is not an ATDM build env. Also, SEMS now has a GCC 7.3.0 build env so perhaps we should just set up one of those builds using the SEMS env?

@mhoemmen
Copy link
Contributor

mhoemmen commented Sep 6, 2018

Sierra has moved on to OpenMPI 1.10.x. OpenMPI doesn't even support that version any more. 2.0 is already a "retired" version; 1.6.5 is so old that it's one step away from "ancient."

https://www.open-mpi.org/software/ompi/v3.1/

Can't we move on from 1.6.5?

@bartlettroscoe
Copy link
Member

FYI: I disabled the Jenkins job that runs this ATDM SEMS GCC 7.2.0 + OpenMPI 1.6.5 build. Therefore, this build and these build errors should be gone starting 9/9/2018. See:

I added the issue TRIL-226 to add a SEMS-based GCC 7.3.0 build using OpenMPI 1.10.1.

Therefore, I am putting this issue in review for confirmation that this build will disappear.

@bartlettroscoe bartlettroscoe added the stage: in review Primary work is completed and now is just waiting for human review and/or test feedback label Sep 9, 2018
@mhoemmen
Copy link
Contributor

mhoemmen commented Sep 9, 2018

Awesome, thanks @bartlettroscoe ! :D

@bartlettroscoe
Copy link
Member

The GCC 7.2.0 build with OpenMPI 1.6.5 has been disabled and did not run yesterday as shown in the below "missing builds" email.

It would be good if Trilinos or STK or other Trilinos packages that use MPI could add a configure-time check for a minimum version of OpenMPI required to build. That is much better than a cryptic build-time error like this.

Otherwise, we can close this issue.


From: CDash [mailto:admin@cdash.org]
Sent: Monday, September 10, 2018 12:01 AM
To: Perschbacher, Brent M bmpersc@sandia.gov; Cole, David (External
Contact) david.cole@kitware.com; zack.galbreath@kitware.com; Bartlett,
Roscoe A rabartl@sandia.gov; Frye, Joe jfrye@sandia.gov
Subject: [EXTERNAL] CDash [Trilinos] - Missing Builds

The following expected build(s) for the project Trilinos didn't submit
yesterday:

  • sems-rhel6 - Trilinos-atdm-sems-gcc-7-2-0 (ATDM)
  • chama - Trilinos-atdm-chama-intel-debug-openmp (ATDM)

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos

-CDash on testing-vm.sandia.gov

@mhoemmen
Copy link
Contributor

Thanks @bartlettroscoe ! :D

It would be good if Trilinos or STK or other Trilinos packages that use MPI could add a configure-time check for a minimum version of OpenMPI required to build.

The relevant check is MPI_VERSION >= 3. The MPI standard requires all implementations of the C binding to have the MPI_VERSION macro. That means we don't actually have to figure out the OpenMPI version; code just needs to check MPI_VERSION at compile time and report a helpful error if it's insufficient.

@bartlettroscoe
Copy link
Member

@mhoemmen wrote:

The relevant check is `MPI_VERSION >= 3`. The MPI standard requires all implementations of the C binding to have the `MPI_VERSION` macro. That means we don't actually have to figure out the OpenMPI version; code just needs to check `MPI_VERSION` at compile time and report a helpful error if it's insufficient.

@trilinos/stk, @kddevin, @mhoemmen, @alanw0, can someone set up a configure-time check for STK or Trilinos to assert that Trilinos now requires MPI 3? We don't need this for the ATDM Trilinos builds but it seems like we need to do this for general users of Trilinos.

prwolfe added a commit to prwolfe/Trilinos that referenced this issue Sep 10, 2018
This imports stk development as of

commit e27c191218c3e6412114091434afae1417cb74d0
Author: Alan Williams <william@sandia.gov>
Date:   Sun Sep 9 13:03:52 2018 -0600

    Add ifdefs for MPI_VERSION in stk CommNeighbors class.

    CommNeighbors uses MPI_Neighbor* functions, which only exist in
    MPI implementations which support the MPI 3 standard. In the case
    of building with an MPI implementation older than that (which is
    rare, but Trilinos still has a build for OpenMPI 1.6),
    the code will still build but a run-time error will occur,
    stating that the user must use a newer MPI if they wish to use
    the stk CommNeighbors class.

    Change-Id: I365923bb52ae6aef5ef73dd69470d104b650c357
    Reviewed-on: https://sierra-git.sandia.gov:4443/349628
    Tested-by: Mark E Hamilton <sierra@sandia.gov>
    Reviewed-by: Kendall Hugh Pierson <khpiers@sandia.gov>

And includes fixes for issue trilinos#3377 trilinos#3390 and trilinos#3407
@alanw0
Copy link
Contributor

alanw0 commented Sep 10, 2018

A pull-request should be coming through today, which adds ifdefs that allow stk to build with the ancient MPI version. So MPI 3 will not be required for stk to build.

@mhoemmen
Copy link
Contributor

@bartlettroscoe I'm not super keen on adding yet another configure-time check; I've added too many already ;-) . It should be enough for code to check MPI_VERSION and #error out if it's too low.

@bartlettroscoe
Copy link
Member

As this is not an ATDM issue anymore, I am going to take the "ATDM" label off of this. ATDM should not be using versions of OpenMPI older than 1.10.1.

@bartlettroscoe bartlettroscoe removed client: ATDM Any issue primarily impacting the ATDM project stage: in review Primary work is completed and now is just waiting for human review and/or test feedback labels Sep 10, 2018
@kddevin
Copy link
Contributor

kddevin commented Sep 10, 2018

I don't see how we can require MPI 3 for Trilinos builds when it isn't stable on all platforms yet (e.g., see #3356). Plus, increasing the MPI version requirement without at least warning users is likely to evoke grumbling. Can all our users (e.g., Xyce) make this move?

@mhoemmen
Copy link
Contributor

@kddevin You might be confusing the OpenMPI version with the MPI version. OpenMPI 1.10 implements the neighborhood collectives from MPI 3 that STK is using.

@kddevin
Copy link
Contributor

kddevin commented Sep 10, 2018

@mhoemmen Thanks for the clarification.
Increasing the version may still be disruptive to users; key stakeholders should be asked (or at least given some warning).

@mhoemmen
Copy link
Contributor

@kddevin I'm OK with that being a per-package decision. STK has more control over its target audience's build environment than most Trilinos packages do. Tpetra does not yet require MPI_VERSION >= 3; I went through some trouble (e.g., in Tpetra::idot) to have a working fall-back for the MPI_VERSION < 3 case.

@prwolfe
Copy link
Contributor

prwolfe commented Sep 11, 2018

@fryeguy52 - PR #3416 should have resolved this and @bartlettroscoe has disabled the build in question. As such I would like to close this or at least remove stk. Really it seems like the discussion has become about mph version not the original issue.

Thanks!

@prwolfe prwolfe removed the pkg: STK label Sep 11, 2018
@bartlettroscoe
Copy link
Member

@kddevin, @alanw0, @mhoemmen and @prwolfe, with PR #3416 now merged, should we re-enable this GCC 7.2.0 + OpenMPI 1.6.5 build to see that STK and the rest of Trilinos works with older MPI implementations (and with MPI 3 standard functions)? Would that provide value?

@prwolfe
Copy link
Contributor

prwolfe commented Sep 11, 2018

I don't see the older MPI providing serious value. I would argue for removing all builds using it where we can.

@bartlettroscoe
Copy link
Member

@prwolfe said:

I don't see the older MPI providing serious value. I would argue for removing all builds using it where we can.

This is really an issue for the Trilinos Product Areas leads. ATDM has no interest in supporting older MPI implementations (ATDM is looking ahead). But if the Trilinos leaders want to have Trilinos support older MPI implementations, then there must be automated builds of Trilinos using older MPI implementations or this will not be maintained. This GitHub issue is proof of that. So I will leave this to that Trilinos leadership team and the @trilinos/framework team.

We can close this ATDM issue as far as I am concerned.

@alanw0
Copy link
Contributor

alanw0 commented Sep 11, 2018

I'm fine with letting the trilinos leads decide. The stk team is neutral to it. stk should now build with the ancient mpi. Certainly the sierra builds (where stk development is done) will never go back to something that old again. But it could be an issue again in the future if/when the MPI standard moves from 3 to 4 etc.

@mhoemmen
Copy link
Contributor

@bartlettroscoe wrote:

should we re-enable this GCC 7.2.0 + OpenMPI 1.6.5 build to see that STK and the rest of Trilinos works with older MPI implementations (and with MPI 3 standard functions)? Would that provide value?

Thanks for asking, but I'm not sure if this has much value. MPI 3 has a lot of useful stuff in it, and it's no newer than the C++11 that we require. If people can update their compilers to support C++11, then they can update their MPI installations too. People also need OpenMPI >= 1.8 at least in order to get CUDA support (just for point-to-point messages, not even collectives).

@dridzal
Copy link
Contributor

dridzal commented Sep 11, 2018

The question is if there are Trilinos users who are stuck with OpenMPI 1.6.5 or similar. For example, we support some products that are deployed on hardware with older RHEL builds. Therefore, as @bartlettroscoe says, this issue needs to go back to the Trilinos leadership team and the @trilinos/framework team, who need to survey the use cases.

@bartlettroscoe
Copy link
Member

Any reason to keep this Issue open? Since we removed this SEMS-based GCC 7.2.0 + OpenMPI 1.6.5 build more than a month ago, this is no longer an ATDM issue.

tjfulle pushed a commit to tjfulle/Trilinos that referenced this issue Dec 6, 2018
This imports stk development as of

commit e27c191218c3e6412114091434afae1417cb74d0
Author: Alan Williams <william@sandia.gov>
Date:   Sun Sep 9 13:03:52 2018 -0600

    Add ifdefs for MPI_VERSION in stk CommNeighbors class.

    CommNeighbors uses MPI_Neighbor* functions, which only exist in
    MPI implementations which support the MPI 3 standard. In the case
    of building with an MPI implementation older than that (which is
    rare, but Trilinos still has a build for OpenMPI 1.6),
    the code will still build but a run-time error will occur,
    stating that the user must use a newer MPI if they wish to use
    the stk CommNeighbors class.

    Change-Id: I365923bb52ae6aef5ef73dd69470d104b650c357
    Reviewed-on: https://sierra-git.sandia.gov:4443/349628
    Tested-by: Mark E Hamilton <sierra@sandia.gov>
    Reviewed-by: Kendall Hugh Pierson <khpiers@sandia.gov>

And includes fixes for issue trilinos#3377 trilinos#3390 and trilinos#3407
@github-actions
Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Aug 14, 2021
@github-actions
Copy link

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

7 participants