How do you debug a C/C++ MPI program?
One way is to start a separate terminal and gdb session for each of the processes:
mpirun -n 4 xterm -hold -e gdb -ex run --args ./program [arg1] [arg2] [...]
where 4
is the number of processes.
Including -ex run
above automatically starts all the processes (otherwise you have to type run
manually in each terminal window).
What if you don't have a GUI handy?
(This section is not guaranteed to work.)
(See below for a handy script.)
Spin up the mpi program in its debugger in a number of screen sessions:
mpirun -np 4 screen -AdmS mpi gdb ./parallel_pit_fill.exe one retain ./beauford.tif 500 500
:::bash
mpirun -np 4 screen -AdmS mpi gdb -ex run --args ./parallel_pit_fill.exe one retain ./beauford.tif 500 500
Spin up a new screen session to access the debugger:
screen -AdmS debug
Load the debugger's screen sessions in to the new screen session
screen -list | #Get list of screen sessions
grep -E "[0-9]+.mpi" | #Extract the relevant ones
awk '{print NR-1,$1}' | #Generate tab #s and session ids, drop rest of the string
xargs -n 2 sh -c '
screen -S debug -X screen -t tab$0 screen -r $1
'
Jump into the new screen session:
screen -r debug
I've encapsulated the above in a handy script:
#!/bin/bash
if [ $# -lt 2 ]
then
echo "Parallel Debugger Syntax: $0 <NP> <PROGRAM> [arg1] [arg2] [...]"
exit 1
fi
the_time=`date +%s` #Use this so we can run multiple debugging sessions at once
#(assumes we are only starting one per second)
#The first argument is the number of processes. Everything else is what we want
#to run. Make a new mpi screen for each process.
mpirun -np $1 screen -AdmS ${the_time}.mpi gdb "${@:2}"
#Create a new screen for debugging from
screen -AdmS ${the_time}.debug
#The following are used for loading the debuggers into the debugging screen
firstpart="screen -S ${the_time}.debug"
secondpart=' -X screen -t tab$0 screen -r $1'
screen -list | #Get list of mpi screens
grep -E "[0-9]+.${the_time}.mpi" | #Extract the relevant ones
awk '{print NR-1,$1}' | #Generate tab #s and session ids, drop rest of the string
xargs -n 2 sh -c "$firstpart$secondpart"
screen -r ${the_time}.debug #Enter debugging screen