This blog post entails a simple client / server example using OpenMPI, the opmi-server, and simple commands to publish a named server, lookup the server using a client, then connect and transceive data between the server and client.
If you need a refresher on OpenMPI first then this is a good start.
The Gist is located here.
I got this working on:
This builds on pseudo-examples left here and mainly here: http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node106.htm#Node108
You compile the client and server:
The client is here
https://gist.github.com/DaemonDave/21fea476847d94326ec6c9664c15fb87#file-name-client-c
https://gist.github.com/DaemonDave/21fea476847d94326ec6c9664c15fb87#file-name-server-c
You run the ompi-server first, to establish the name server. But to do this you run mpirun with ompi-server and not just run it.
Once the ompi-server is running, you can refer to it by file, but you can also refer to it by it's pid if it's local. To get things running with the minimum of overlapping errors, I start with simple local setups.
You find the pid of the ompi-server:
You run the server like this:
Now you mpirun the client and it looks like this:
You can mpirun the client and server from a config file as well instead of informing of the ompi-server location by PID like this:
If you need a refresher on OpenMPI first then this is a good start.
The Gist is located here.
I got this working on:
uname -a
Linux hellion 3.19.0-77-generic #85-Ubuntu SMP Fri Dec 2 03:43:54 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
mpicc --versionThis post assumes you've got the OpenMPI installed completely and working properly.
gcc (Ubuntu 4.9.2-10ubuntu13) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This builds on pseudo-examples left here and mainly here: http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node106.htm#Node108
You compile the client and server:
The client is here
https://gist.github.com/DaemonDave/21fea476847d94326ec6c9664c15fb87#file-name-client-c
mpicc -o client name-client.cThe server is here
https://gist.github.com/DaemonDave/21fea476847d94326ec6c9664c15fb87#file-name-server-c
mpicc -o server name-server.c
You run the ompi-server first, to establish the name server. But to do this you run mpirun with ompi-server and not just run it.
*!
\brief How to execute mpi name server
mpirun -np 1 ompi-server --no-daemonize -r + &
*
* Success looks like this :
*
"server available at 3653042176.0;tcp://192.168.10.191:52434+3653042177.0;tcp://192.168.10.191:48880:300"
*/
Once the ompi-server is running, you can refer to it by file, but you can also refer to it by it's pid if it's local. To get things running with the minimum of overlapping errors, I start with simple local setups.
You find the pid of the ompi-server:
ps -ef | grep ompi
1634 1458 0 Dec15 ? 00:03:41 compiz dave
16050 1954 0 10:26 pts/18 00:00:00 mpirun -n 1 ompi-server --no-daemonize -r + dave
16051 16050 0 10:26 pts/18 00:00:00 ompi-server --no-daemonize -r + dave
You run the server like this:
It looks successful like this:mpirun -np 1 --ompi-server pid:16050 ./server
server available at 3500802048.0;tcp://192.168.10.191:50129+3500802049.0;tcp://192.168.10.191:53693:300
Now you mpirun the client and it looks like this:
mpirun -np 1 --ompi-server pid:14341 ./clientThe server responds like this:
looking up server ...
we got a client's data: 25.500000
we got a client's data: 26.500000
we got a client's data: 27.500000
^Cmpirun: killing job...
You can mpirun the client and server from a config file as well instead of informing of the ompi-server location by PID like this:
mpirun -np 1 --ompi-server file:./nameserver.cfg ./server
server available at 2886402048.0;tcp://192.168.10.191:51344+2886402049.0;tcp://192.168.10.191:54857:300
I am trying to run this example, and the client and server both crash. After the server crashes I see the following output: https://imgur.com/a/ij2lv
ReplyDeleteI am not sure where ORTE_ERROR_LOG can be found. Or what the "Data unpack" error message means.
Any thoughts on what is going on here?
1. I launch ompi-server:
/usr/bin/ompi-server --no-daemonize -r mpiuri
2. I launch server code:
/usr/bin/mpirun -np 1 --ompi-server file:mpiuri ./out_name_server
3. I launch client code:
/usr/bin/mpirun -np 1 --ompi-server file:mpiuri ./out_name_client
All 3 commands are issued in separate terminal windows, in the order indicated above. After launching the client application, I see the server application crashes as well as the client application.
Any ideas on what is going on? After the server crashes, there is a message that says "An error occurred in MPI_Comm_connect"
Any help or thoughts are much appreciated! I am lost!
Thanks,
Matt Overlin
Hi Matt;
ReplyDeleteWell, to be clear, I am not an MPI expert per se. I am just learning just like you, you're just a few lessons behind me.
Looking at the data errors the data overrun at dpm.c that to my experience that probably means you have a library mismatch in the software build or a library it is depending on. Without a full compile output it's hard to guess. But guess I will.
So it compiled OK and it started OK, so why would that be?
It found all the right libraries, and it found the right headers so the compiler knew all the right symbols to link together correctly. Then it crashed - MPI_ERROR_UNKNOWN - which means in general ( as a coder) that this is an unforeseen case that the dev didn't spend a lot of time on.
So how could that be?
Either you have a bug in your code that is causing a segfault because it is sending the wrong data (most often a NULL pointer) or the symbols are connected and there is a data struct or function call difference that exceeds the expected version. Makes sense?
So what else could go wrong?
Well, when all the variable declarations are correct then take your code out and test it in isolation. That eliminates you as the problem.
Then try running the mpi processes without your code. That isolates mpi as the problem. If it doesn't work here you have two options: the configuration is wrong or your libraries are wrong. Another source of segfaults is uninitialized data, mine works exactly as built. If you added anything else then that's also suspect.
Are you running on a Xen or another virtual box? Can that system allow what MPI expects? You haven't excluded other error sources.
Since I don't know if you cross-compiled, canadian cross-compiled, are running it on an arduino, or whatever your system to start from it, that's the best I can offer I'm afraid.
What I recommend is start by isolating one component and prove it works exclusive of the others. Then add one more. Repeat until happy.
Lots of my time is spend debugging.
Best of Luck!