Debugging a C/C++ MPI program

How do you debug a C/C++ MPI program?

One way is to start a separate terminal and gdb session for each of the processes:

mpirun -n 4 xterm -hold -e gdb -ex run --args ./program [arg1] [arg2] [...]

where 4 is the number of processes.

Including -ex run above automatically starts all the processes (otherwise you have to type run manually in each terminal window).

What if you don't have a GUI handy?

(This section is not guaranteed to work.)

(See below for a handy script.)

Spin up the mpi program in its debugger in a number of screen sessions:

mpirun -np 4 screen -AdmS mpi gdb ./parallel_pit_fill.exe one retain ./beauford.tif 500 500

:::bash
mpirun -np 4 screen -AdmS mpi gdb -ex run --args ./parallel_pit_fill.exe one retain ./beauford.tif 500 500

Spin up a new screen session to access the debugger:

screen -AdmS debug

Load the debugger's screen sessions in to the new screen session

screen -list |                   #Get list of screen sessions
   grep -E "[0-9]+.mpi"  |       #Extract the relevant ones
   awk '{print NR-1,$1}' |       #Generate tab #s and session ids, drop rest of the string
   xargs -n 2 sh -c '
     screen -S debug -X screen -t tab$0 screen -r $1
   '

Jump into the new screen session:

screen -r debug

I've encapsulated the above in a handy script:

#!/bin/bash
if [ $# -lt 2 ]
then
  echo "Parallel Debugger Syntax: $0 <NP> <PROGRAM> [arg1] [arg2] [...]"
  exit 1
fi

the_time=`date +%s` #Use this so we can run multiple debugging sessions at once
                    #(assumes we are only starting one per second)

#The first argument is the number of processes. Everything else is what we want
#to run. Make a new mpi screen for each process.
mpirun -np $1 screen -AdmS ${the_time}.mpi gdb "${@:2}"

#Create a new screen for debugging from
screen -AdmS ${the_time}.debug

#The following are used for loading the debuggers into the debugging screen
firstpart="screen -S ${the_time}.debug"
secondpart=' -X screen -t tab$0 screen -r $1'

screen -list |                         #Get list of mpi screens
   grep -E "[0-9]+.${the_time}.mpi"  | #Extract the relevant ones
   awk '{print NR-1,$1}' |             #Generate tab #s and session ids, drop rest of the string
   xargs -n 2 sh -c "$firstpart$secondpart"

screen -r ${the_time}.debug            #Enter debugging screen

links