The concept of this article is:
This page shows an example of how to extract data from tar.gz file and expand not on HDD/SSD but on RAM, then plot some data using matplotlib.
In some cases such as simulation, data logging,and image processing, you may have to deals with great many files composed of small files. In such situation, as you know, tar.gz file is efficient way to increase data transfer speed and decrease the number of the files. You have to extract some data from tar.gz file in order to generate figures using data. You don't have to deals with annoying intermediate files if you expand data not on HDD/SSD but directly on RAM as shown in this page.
See also:
This page shows my suggestion to process data and generate figure parallelly by running some external python script.
Python Matplotlib Tips: Arrange multiple images in one large image using Python PIL.Image
An example of the tips of PIL Module. Arrange small figure to one large figure.
In [1]:
import platform
print('python: '+platform.python_version())
import matplotlib.pyplot as plt
from matplotlib import __version__ as matplotlibversion
print('matplotlib: '+matplotlibversion)
import numpy as np
print('numpy: '+np.__version__)
%matplotlib inline
In [2]:
from os import mkdir
from os.path import join
subdir = "tardir"
try: mkdir(subdir)
except: pass
ts = np.linspace(0,2,11)
shift = np.linspace(0,5,11)
for num,phi in enumerate(shift):
wave = np.array([ts,ts**2+phi])
with open(join(subdir,"%d.dat"%num), "wb") as f: np.savetxt(f,wave,delimiter=",")
Check the output dat file
In [3]:
with open(join(subdir,"0.dat"), "r") as f:
print(f.readlines())
Compress bulk files to one tar.gz file
In [4]:
import subprocess
subprocess.call("tar cvzf tardir.tar.gz tardir")
Out[4]:
Remove bulk files
In [5]:
import shutil
shutil.rmtree(subdir)
Check filenames in tardir.tar.gz
In [6]:
import tarfile
with tarfile.open("tardir.tar.gz", mode="r:gz") as tar:
print(tar.getnames())
Plot each data with extracting files not on HDD/SSD but on RAM
In [7]:
import csv
from io import StringIO
In [8]:
with tarfile.open("tardir.tar.gz", mode="r:gz") as tar:
for tarinfo in tar:
if not tarinfo.isfile(): continue
# Extract each file in tar.gz as binary
binary = b''.join(tar.extractfile(tarinfo).readlines())
# Convert binary to string
strdata = binary.decode("utf-8")
# Convert string to np.array
arr = np.array(list(csv.reader(StringIO(strdata),delimiter=',')),dtype="float32")
# Draw figure
plt.plot(arr[0],arr[1])
If you know filename to be plotted, you can manually extract data from tar file as follows:
In [9]:
toplotfile1 = subdir+"/"+"0.dat"
toplotfile2 = subdir+"/"+"5.dat"
with tarfile.open("tardir.tar.gz", mode="r:gz") as tar:
#plot toplotfile1
binary = b''.join(tar.extractfile(toplotfile1).readlines())
strdata = binary.decode("utf-8")
arr = np.array(list(csv.reader(StringIO(strdata),delimiter=',')),dtype="float32")
plt.plot(arr[0],arr[1])
# plot toplotfiles2
binary = b''.join(tar.extractfile(toplotfile2).readlines())
strdata = binary.decode("utf-8")
arr = np.array(list(csv.reader(StringIO(strdata),delimiter=',')),dtype="float32")
plt.plot(arr[0],arr[1])