I needed distributed computation using as many different IPs as possible. I decided to use ipython for my task and it's ipcluster command. Unfortunately I've found several problems with this usage.
For example, you cannot specify port and username for SSH connection. Additionally, there is problem if you want to run python from some specific virtualenv, which is different on every machine. Hence, here is what I ended up doing.
In my configuration there was main_host, where the main Hub is running. I created new parallel profile using ipython profile create --parallel distcomp. This creates all the necessary configuration files. Additionally, I wanted to use several cores on this main machine too. Then there are host1, host2, ..., hostN which are explicitly for engines (over SSH).
Step-by-step
Configuring SSH
Open .ssh/config and add configuration for each host. Every host should have at least it's port (can be ommited if it is 22), username, hostname and identity key.
Here is an example of mine:
Host host1
HostName host1.example.com
IdentityFile /home/username/.ssh/host1
User bob
Port 1933
Host host2
HostName host2.example.com
IdentityFile /home/username/.ssh/host2
User alice
Port 4931
...
Host hostN
HostName hostN.example.com
IdentityFile /home/username/.ssh/hostN
User adam
Port 1033
```python
Where identity files can be generated by `ssh-keygen -t dsa` with no pass phrase (just press enter). It will create two files `keynamehostX` and `keynamehostX.pub`. Go to `hostX` environment and copy the content of `keynamehostX` to `/home/hostXusername/.ssh/authorized_keys`.
Additionally, if you want to run some engines on the same machine as is ipcluster running (in my case it's `main_host`), you need to add ssh record for localhost. Again, create key and then copy it's public part to `authorized_keys`. Then add this to `.ssh/config`:
```language-bash
Host main_host
Hostname localhost
IdentityFile /home/username/.ssh/main_host_key
User username
```python
Configuring ipcluster
---------------------
Open `~/.ipython/profile_distcomp/ipcluster_config.py` and find part with `SSHEngineSetLauncher configuration`. There you need to add this:
```language-python
c.SSHEngineSetLauncher.engine_cmd = ['/tmp/dist_python', '-m', 'IPython.parallel.engine']
c.SSHEngineSetLauncher.engines = {"main_host": 4,
'host1' : 2,
...
'hostN' : 4}
```python
This will create 4 engines on the `main_host`, 2 engines on `host1` etc. Now it is necessary to create link in `/tmp/dist_python` to python version of your virtualenv on each host. Let's say that you have virtualenv on host1 as follows `/home/bob/virtualenv_for_distrubuted_computation/bin/python`. Then you need to go to `/tmp` and create `ln -s /home/bob/virtualenv_for_distrubuted_computation/bin/python dist_python`. Everyone should have permissions for writing to `/tmp`. Of course, you need to have ipython installed in each virtuaelnv.
Running ipcluster and ipython notebook
--------------------------------------
All you need to do now is to run ipcluster by this command: `ipcluster start --profile=distcomp`. If you want to run `ipython notebook`, do it like this: `ipython notebook --profile=distcomp`.
Now you can check if everything is working by creating new notebook and using e.g. this:
```language-python
from IPython.parallel import Client
rc = Client()
print("Number of cores:", len(rc.ids))
def testingfunc():
import os
q = os.getcwdb()
import getpass
u = getpass.getuser()
return(q, u)
rc[:].apply_sync(testingfunc)
And that's it!