mpirun can mangle tagged output lines, so use heuristics to fix that. #510
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Of course, the saga is not over.
When specifying
--tag-output
, mpirun is supposed to "tag each line" with[jobid, rank]<stdxxx>:
. It mostly does. Howeverit occasionally does something else. Assume that
a.txt
andb.txt
containABCD
andEFGH
, respectively. Runningmpirun --tag-output -n 1 cat a.txt b.txt
mostly producesOccasionally, the following shows up instead:
That is indistinguishable from
b.txt
having contained[1,0]<stdout>:EFGH
. This, my guess would be, is due to a brief delay between the files thatcat
introduces. This can be verified by adding more files forcat
and seeing all kinds of combinations of tags popping out in the middle of a line.One solution is to use heuristics and consider an output line to begin with the tag while also assuming that it is very unlikely for the application to produce the tag in the middle. Hence, we can filter on lines that start with the tag and then
remove any other tags that appear in the middle. This should significantly reduce the likelihood of random mishaps, but transforms it into less likely but deterministic mishaps (e.g., running
echo "[1, 0]<stdout>:bla"
through mpirun.Another choice is --xml. Unfortunately, parsing XML in POSIX only is difficult and many simplifying assumptions are made. Nonetheless, that branch appears to work fine with OpenMPI 4, so, perhaps, the loss in clarity might not outweigh the benefits.