Speed up generating figures by running external python script parallelly using Python and matplotlib.pyplot


The result is:

Speed up generating figures by running external python script parallelly using Python and matplotlib.pyplot

This page shows my suggestion to process data and generate figure parallelly by running some external python script.
The advantage of this method is; you don't have to wait the end of processing so you can execute some cells/lines in Jupyter-Notebook, you don't have to use difficult ipyparallel or multiprocessing. In this code, firstly the original python code which process data serially is shown, then I show the python code which process data parallelly. I don't show the result figure because it is no mean in this page.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pickle
from os.path import join
import subprocess
%matplotlib agg

First of all, generate the data to show.
In this example, the data suitable to contourf is saved in pickle format.

In [2]:
NUMFIG = 1000
datadir = './data2graph_parallelly'
xs = np.linspace(-2.0,2.0,10)
ys = np.linspace(-2.0,2.0,10)
XX,YY = np.meshgrid(xs,ys)
ZZs = [base**(XX*YY) for base in np.linspace(1,5,NUMFIG)]
with open(join(datadir,'xyz.pkl'), mode='wb') as f: pickle.dump([XX,YY,ZZs],f)

Convert data to png file serially.

In [3]:
for idx in range(NUMFIG):
    # Never show result in this notebook
    %matplotlib agg
    plt.figure(facecolor='w',figsize=(5,4))
    cont = plt.contourf(XX, YY, ZZs[idx], 50)
    cbar = plt.colorbar(cont)
    plt.savefig(join(datadir,'xyz%03d.png'%idx))

Prepare for parallel processing.
Write code and save as external .py file.

In [4]:
scriptname = 'data2graph.py'
with open(scriptname, mode='w') as f:
    f.writelines("""
import numpy as np
import pickle
import matplotlib.pyplot as plt
import argparse
from os.path import join

def main():
    with open(join(args.datadir,args.fname), mode='rb') as f:
        XX,YY,ZZs = pickle.load(f)
    
    for idx in [int(i) for i in args.idxs.split(',')]:
        plt.figure(facecolor='w',figsize=(5,4))
        cont = plt.contourf(XX, YY, ZZs[idx], 50)
        cbar = plt.colorbar(cont)
        plt.savefig(join(args.datadir,'xyz%03d.png'%idx))

if __name__ == '__main__':
    p = argparse.ArgumentParser(description='load image and convert to image')
    p.add_argument('datadir', type=str,
                        help='directory name to store output graph(s)')
    p.add_argument('fname', type=str,
                        help='file name which contains data to process')
    p.add_argument('idxs', type=str,
                        help='comma separated index number(s) of data to process')
    args = p.parse_args()
    main()
""")
In [5]:
ret = subprocess.check_output('python %s -h'%(scriptname))
print(ret.decode())
usage: data2graph.py [-h] datadir fname idxs

load image and convert to image

positional arguments:
  datadir     directory name to store output graph(s)
  fname       file name which contains data to process
  idxs        comma separated index number(s) of data to process

optional arguments:
  -h, --help  show this help message and exit

Run 8 python scripts (because my PC has 8 cores).

In [6]:
NCORE = 8
for st in range(NCORE):
    idxs = ','.join(map(str, range(st,NUMFIG+1,NCORE)))
    cmd = 'python %s %s xyz.pkl %s'%(scriptname,datadir,idxs)
    subprocess.Popen(cmd)