Monday, August 27, 2012

Decent scientific plots with matplotlib

I think matplotlib is poorly documented and too object-oriented to be immediately usable. Trying to be too many things at once. Besides, the default values and default behaviour is kooky sometimes.
For instance, histogram bar widths: they are different for different datasets, and you cannot compare two distributions if that's the case. You have to resort to hacks like this:

  hist, bins = np.histogram(data, bins = 10)
  width=1*(bins[1]-bins[0])
And I think it's a bloody hack, calling histogram method from some other module in order to be able to call matplotlib's histogram plotting routine. Nevertheless. It's flexible and, since I already use Python to pull my data from databases, I decided to give it a try when I had to prepare some plots for a review poster.
So, what makes an ugly plot look decent? First of all, ticks:

      minor_locator = plt.MultipleLocator(plotTitles.yMinorTicks)
      Ymajor_locator = plt.MultipleLocator(plotTitles.yMajorTicks)  
      major_locator = plt.MultipleLocator(plotTitles.xMajorTicks)      
      Xminor_locator = plt.MultipleLocator(plotTitles.xMinorTicks)   
      ax.xaxis.set_major_locator(major_locator)
      ax.xaxis.set_minor_locator(Xminor_locator)     
      ax.yaxis.set_major_locator(Ymajor_locator)
      ax.yaxis.set_minor_locator(minor_locator)
They have to be set separately for each axis, I pass them as parameters from a wrapper class. Then, hatched bars.
Some parameters, redefining matplotlib's defaults (these plots are for printouts, so font sizes are big):

params = {'backend': 'ps',
          'axes.labelsize': 10,
          'text.fontsize': 10,
          'legend.fontsize': 10,
          'xtick.labelsize': 8,
          'ytick.labelsize': 8,
          'text.usetex': True, 
          'font': 'serif',
          'font.size': 16,
          'ylabel.fontsize': 20}
If you'd like to have different symbols (markers) in a scatterplot, this is useful:

      markers = itertools.cycle('.^*')
      p1 = ax.plot(gd.data[0], gd.data[1], linestyle='none', marker=markers.next(), markersize=8, color=gd.colour, mec=gd.colour, alpha = 1) 
But the double symbols in the legend, why is that? No sense.
I would not claim the plots shown are publication-quality (just look at slightly differing histogram widths). But they look way better than default plots one would get with matplotlib.

No comments:

Post a Comment