Thursday, March 27, 2014

A select to compile the final photometry data table

I had to come back to that once more, so here it goes:
SELECT u.califa_id, m.califa_str, m.name, mo.ra, mo.dec, u.el_mag, err.u_err, g.el_mag, err.g_err, r.el_mag, err.r_err, i.el_mag, err.i_err, z.el_mag, err.z_err, 0.396*r.elHLR, 0.396*err.r_elHLR_err_lo, 0.396*err.r_el_HLR_err_hi, 0.396*r.elR90, 0.396*err.r_R90_err_lo, 0.396*err.r_R90_err_hi, b.ba, n.pa, 0.396*r_sky.isoA, flags.sum_flag FROM gc2_u as u, morph as m, mothersample as mo, gc2_errors as err, gc2_g as g, gc2_r as r, gc2_i as i, gc2_z as z, bestBA as b, nadine as n, gc2_flags as flags, gc2_r_sky as r_sky WHERE u.califa_id = g.califa_id and g.califa_id = r.califa_id and r.califa_id = i.califa_id and i.califa_id = z.califa_id and z.califa_id = err.califa_id and err.califa_id = b.califa_id and b.califa_id = n.califa_id and n.califa_id = r_sky.califa_id and r_sky.califa_id = flags.califa_id and flags.califa_id = m.califa_id and m.califa_id = mo.califa_id

Sunday, January 19, 2014

Friday, December 20, 2013

Thursday, December 5, 2013

stats: the covariance matrix

The covariance matrix is N-dimensional generalisation of the scalar variance in 1 dimension. Here's a simple explanation of it which I needed in order to code up a test case for fitting a 2D Gaussian with intrinsic scatter.

astroML: calling hist

If you're trying to use astroML.plotting.hist and it spits blood saying something like:
slice1[axis] = slice(1, None)
IndexError: list assignment index out of range

check the namespace: astroML's hist should simply be called as hist(data), whereas ax.hist() or plt.hist() methods are the normal Matplotlib's methods which choke from astroML's options.

Wednesday, December 4, 2013

lit: The Habitable Epoch of the Early Universe

by A. Loeb.
That's a fantastically crazy paper (by A. Loeb, who's one of the best known names in areas as diverse as gravitational microlensing/black hole evolution/reionisation and 21 cm signal/high-z GRBs/Event Horizon telescope, and many others). He's written several papers about wild ideas before: exploring the (sad and very distant) future of observational cosmology, bio-markers in white dwarf planets' atmospheres, planets of hypervelocity stars, search of artificially-illuminated objects in and beyond the Solar System and cosmology measurements from hypervelocity stars which shows that not all is lost for the future cosmologists.
In this paper he looks at the dawn of the Universe, when it was only ~15 million year old. A. Loeb points out that the temperature of the cosmic microwave background was roughly around 0-30 C then, and therefore liquid water could have existed on any solid surface, meaning that there might have been conditions suitable to life as we know it. In order to form any rocky planet one needs to explode some massive stars before (to get some heavier elements like oxygen, iron, silicium, etc, which are what rocky planets are made of). At that time we're left with the formation of the very first stars in the very first, tiny, dark matter haloes at the far end of the density distribution. Calculations show that the number of such protogalaxies was incredibly small at these redshifts.
However, if we assume that the initial density distribution was not perfectly Gaussian (there are many theories explaining why it might have been the case, although Planck and other observations haven't found any proof of non-gaussianity yet), there might have been some haloes that had formed massive stars by that time.
It's a marvelous article, although AFAIK, it takes much longer than a few Myr for rocky planets to assemble from accretion disks of stars (and cool down due to the decay of radioactive elements..). But think of it: at some time in the history of the Universe the outer space was warm (and at least a million times denser than now).

Python: bootstrapping with sklearn

sklearn.cross_validation.Bootstrap returns indices of random bootstrap samples from your data. I am not yet well familiar with the use of bootstrapping for 2 sample comparison, so I'm using means as a way to characterise the two distributions (that's better than 2D K-S test, which doesn't exist anyway!). This is of course an incomplete statistic (I have a reason to suspect that two of the samples I am working with have different skewnesses). I'm working with three sets of absolute r magnitudes (M_r) here.
Here's how it looks in practice:

from sklearn import cross_validation

#let's call our sample array data. It can be N-dimensional.

#len(data) -- total number of data points in my dataset, 
#nBoot -- number of bootstrap samples, 
#train_size = bootstrap sample size (proportion of the whole sample, or just number)
#Here we create an empty array we will store the means of bootstrapped 

means = np.empty((nBoot, ))

#then we get the class instance, drawing half of the sample with replacement every time

bs = cross_validation.Bootstrap(len(data), nBoot, train_size=0.5, random_state=0)

#Filling the means array while iterating over the bootstrap samples (indexed by train_index):

i = 0
for train_index, test_index in bsJ:
    means[i] = np.mean(data[train_index])
    i+=1



I've repeated it for all three distributions I'm interested in, and here is a histogram of all the bootstrap sample means. The difference is just what what I've expected: basically, some faint galaxies (with M_r < 20 or so) were manually rejected from the green and red distributions, so the mean of them is shifted towards brighter absolute magnitudes. It remains to be seen how important this is. If we think this is worth it, I'll try using other distributions' statistics some other time.

Tuesday, November 26, 2013