Tuesday, April 24, 2012

numpy: combining 'less' and 'more' operators in where expression, bitwise AND

This caused a major problem last night, so I'm putting it here.

Efficiency 101: reading large FITS table with Python

Last week I hastily wrote some code to search the NSAtlas. It was almost ok for a single galaxy: the array is huge, and iterating through it 1000 times...it took 1 minute for each loop, so I would have to wait all day for the full query to complete.

I've changed the 'if' loop to numpy.where statements and used the collections module: here. It is almost usable, though it takes 15 minutes to search for some variable for 1000 galaxies, and the machine slows down to an almost complete halt.

There is a significant overhead associated with opening and closing files, especially as the one in question was quite huge (0.5 GB) in this case. Not calling the function from within a loop, but passing an array of search arguments and then looping through it within the function reduced the running time to 16 seconds. A 60-fold improvement in two minutes.

I'd also like to mention a fine module called collections, which helped me find an overlap between two lists, like that:

      a_multiset = collections.Counter(list(ras[0]))
      b_multiset = collections.Counter(list(decs[0]))
      overlap = list((a_multiset & b_multiset).elements())

Wednesday, April 18, 2012

NSAtlas, surface photometry

I wrote some [terribly inefficient, as of now] code for searching NASA Sloan Atlas, given a ra and dec of a galaxy. Just like I expected, my photometry results (SDSS r band) fall somewhere in between the original SDSS photometry and Sloan Atlas (they re-did the photometry, using more sophisticated sky-fitting techniques). And I am still unsure if mine makes sense. Just a good point illustrating the lesson from the previous post: due to the sheer number of pixels in the outskirts of the galaxy, the fine systematic differences on where to stop integrating give rise to significant variations in the final brightness.

Monday, April 16, 2012

Surface photometry

I used to think that galaxy photometry is difficult only in that sense that it necessarily includes errors. As galaxy edges are diffuse, it's impossible to decide where the limit is, so any estimate is subjective and uncertain. I used to think that those are only second-order errors, important in statistical sense (every approach brings its own systematics), but not for individual galaxies. Last week I was struck by a simple plot I made -- a graph showing cumulative flux, flux per pixel and flux at a ring of given distance vs. distance from the galaxy center. The cumulative flux curve does not flatten or grow linear as one would be inclined to think -- as the number of pixels at each distance increases quadratically, their contributed flux increases as well. Mean flux per pixel, however, does flatten (and gives us a rough estimate of the sky value). It might just be a pretty good measure, but I'm worried about gradient in SDSS sky.

Friday, April 13, 2012

SDSS database utilities

I've digested some available scripts for SDSS into my code, namely, Min-Su Shin's code that fetches SDSS ugriz magnitudes (actually, not necessarily, any query can do) from SDSS database, and Tamas Budavari's sqlcl.py. The first one was disemboweled to be a callable function instead of standalone command-line tool. It's in github.

Thursday, April 12, 2012

Latex degree symbol

Kelvins are official and basic, but I needed to make my students be aware of the conversion between Kelvin and Celsius scales. I'll probably remember that, but will keep it here for digital posterity: $^{\circ}$

Wednesday, April 4, 2012

Creating a discrete Gaussian kernel with Python

Discrete Gaussian kernels are often used for convolution in signal processing, or, in my case, weighting. I used some hardcoded values before, but here's a recipe for making it on-the-fly.

def gauss_kern(size, sizey=None):
    """ Returns a normalized 2D gauss kernel array for convolutions """
    size = int(size)
    if not sizey:
        sizey = size
        sizey = int(sizey)
    x, y = mgrid[-size:size+1, -sizey:sizey+1]
    g = exp(-(x**2/float(size)+y**2/float(sizey)))
    return g / g.sum()

Note that 'size' here refers to (shape[0]/2 - 1), i.e. the resulting kernel side is size*2 + 1 long. I need to use a kernel with a central value of 1, so I adjusted the normalisation accordingly:

def gauss_kern(size, sizey=None):
    """ Returns a normalized 2D gauss kernel array for convolutions """
    size = int(size)
    if not sizey:
        sizey = size
        sizey = int(sizey)
    x, y = scipy.mgrid[-size:size+1, -sizey:sizey+1]
    g = scipy.exp(-(x**2/float(size)+y**2/float(sizey)))
    return g / g.max()

Tuesday, April 3, 2012

A quick way to check if a list of files is complete

That's a pretty dumb way to do it, but I'm tired. It involves touch, however, it doesn't check if new files are created.

# -*- coding: utf-8 -*-
import os
import csv
import string

def touch(fname, times = None):
    with file(fname, 'a'):
        os.utime(fname, times)

for i in range(0, 939):
  with open('../data/maskFilenames.csv', 'rb') as f:
    mycsv = csv.reader(f)
    mycsv = list(mycsv)
    fname = string.strip(mycsv[i][1])
    print fname

Check if files in a list exist with Python

I needed to test whether a large list of galaxies have corresponding mask files in a certain directory. The tentative mask filenames were created by a code that cross-correlated internal survey galaxy IDs with their respective NED names.

  wrongFilenames = []
  for i in range(0, noOfGalaxies):
    maskFile = GalaxyParameters.getMaskUrl(listFile, dataDir, simpleFile, i)
    print i, 'maskfile: ', maskFile  
      f = open(maskFile)
      print "maskfile exists:", maskFile
    except IOError as e:

Creating masks with GIMP

Last time I had to make some really fine masking, and I resorted to GIMP. At least under Linux it can read and save fits files, albeit it treats them as _images_, not _data_, which I consider fundamentally wrong in many cases. So, you open the fits file in GIMP, draw something white or black, using any tool and save it. However, there are many pixels that are not 0 or 1, meaning that the image cannot be used as mask. Here's a script to normalise it back. It is cautious, meaning all pixels that are not equal to 0 will be treated as ones (and your mask may get bigger than you were expecting, at least theoretically).

masked = np.where(mask != 0) mask[masked] = mask[masked]/mask[masked]

Monday, April 2, 2012

WCS to pixel values and vice versa in Python

It's often necessary to convert pixel coordinates within a binary image to physical(ra, dec) sky coordinates. For instance, one might need to find brightness at the central pixel of the galaxy, calculate flux, any photometry-related task would do, actually. I used astLib.astWCS for this purpose, though I think pywcs might work just as well.
from astLib import astWCS


center = WCS.wcs2pix(ra, dec)
print center
print 'brightness at the galaxy center', img[center[1], center[0]] 
#thanks, numpy -- python indexes arrays ass-backward (or, rather, column-backward) for some reason. One have to be careful not to swap them.

Getting tsField files from SDSS

I'm always looking for this or that little piece of code I really wrote sometime ago but can't remember where I put it. Then I end up rewriting it because it is faster than grepping entire work directory and sorting through the offal. I wrote about batch downloading of SDSS images, now it seems I need a specific pipeline parameter from all 900 tsField files. Here's a oneliner (from a multi-line blob of code I'm now keeping at GitHub).
wget http://das.sdss.org/imaging/'+run+'/'+rerun+'/calibChunks/'+camcol+'/tsField-'+runstr+'-'+camcol+'-'+rerun+'-'+field_str+'.fit

After downloading (or just reading the parameter from header and moving happily along, I'm currently figuring that out)-- no downloading. I'm reading the value from header via http, as pyfits are capable of that.