these are really good lecture notes, explaining the linear algebra, covariance matrices and so on. Directly helps for coding.
Friday, December 20, 2013
Tuesday, December 17, 2013
Thursday, December 5, 2013
stats: the covariance matrix
The covariance matrix is N-dimensional generalisation of the scalar variance in 1 dimension. Here's a simple explanation of it which I needed in order to code up a test case for fitting a 2D Gaussian with intrinsic scatter.
astroML: calling hist
If you're trying to use astroML.plotting.hist and it spits blood saying something like:
slice1[axis] = slice(1, None)
IndexError: list assignment index out of range
check the namespace: astroML's hist should simply be called as hist(data), whereas ax.hist() or plt.hist() methods are the normal Matplotlib's methods which choke from astroML's options.
slice1[axis] = slice(1, None)
IndexError: list assignment index out of range
check the namespace: astroML's hist should simply be called as hist(data), whereas ax.hist() or plt.hist() methods are the normal Matplotlib's methods which choke from astroML's options.
Wednesday, December 4, 2013
lit: The Habitable Epoch of the Early Universe
by A. Loeb.
That's a fantastically crazy paper (by A. Loeb, who's one of the best known names in areas as diverse as gravitational microlensing/black hole evolution/reionisation and 21 cm signal/high-z GRBs/Event Horizon telescope, and many others). He's written several papers about wild ideas before: exploring the (sad and very distant) future of observational cosmology, bio-markers in white dwarf planets' atmospheres, planets of hypervelocity stars, search of artificially-illuminated objects in and beyond the Solar System and cosmology measurements from hypervelocity stars which shows that not all is lost for the future cosmologists.
In this paper he looks at the dawn of the Universe, when it was only ~15 million year old. A. Loeb points out that the temperature of the cosmic microwave background was roughly around 0-30 C then, and therefore liquid water could have existed on any solid surface, meaning that there might have been conditions suitable to life as we know it. In order to form any rocky planet one needs to explode some massive stars before (to get some heavier elements like oxygen, iron, silicium, etc, which are what rocky planets are made of). At that time we're left with the formation of the very first stars in the very first, tiny, dark matter haloes at the far end of the density distribution. Calculations show that the number of such protogalaxies was incredibly small at these redshifts.
However, if we assume that the initial density distribution was not perfectly Gaussian (there are many theories explaining why it might have been the case, although Planck and other observations haven't found any proof of non-gaussianity yet), there might have been some haloes that had formed massive stars by that time.
It's a marvelous article, although AFAIK, it takes much longer than a few Myr for rocky planets to assemble from accretion disks of stars (and cool down due to the decay of radioactive elements..). But think of it: at some time in the history of the Universe the outer space was warm (and at least a million times denser than now).
That's a fantastically crazy paper (by A. Loeb, who's one of the best known names in areas as diverse as gravitational microlensing/black hole evolution/reionisation and 21 cm signal/high-z GRBs/Event Horizon telescope, and many others). He's written several papers about wild ideas before: exploring the (sad and very distant) future of observational cosmology, bio-markers in white dwarf planets' atmospheres, planets of hypervelocity stars, search of artificially-illuminated objects in and beyond the Solar System and cosmology measurements from hypervelocity stars which shows that not all is lost for the future cosmologists.
In this paper he looks at the dawn of the Universe, when it was only ~15 million year old. A. Loeb points out that the temperature of the cosmic microwave background was roughly around 0-30 C then, and therefore liquid water could have existed on any solid surface, meaning that there might have been conditions suitable to life as we know it. In order to form any rocky planet one needs to explode some massive stars before (to get some heavier elements like oxygen, iron, silicium, etc, which are what rocky planets are made of). At that time we're left with the formation of the very first stars in the very first, tiny, dark matter haloes at the far end of the density distribution. Calculations show that the number of such protogalaxies was incredibly small at these redshifts.
However, if we assume that the initial density distribution was not perfectly Gaussian (there are many theories explaining why it might have been the case, although Planck and other observations haven't found any proof of non-gaussianity yet), there might have been some haloes that had formed massive stars by that time.
It's a marvelous article, although AFAIK, it takes much longer than a few Myr for rocky planets to assemble from accretion disks of stars (and cool down due to the decay of radioactive elements..). But think of it: at some time in the history of the Universe the outer space was warm (and at least a million times denser than now).
Python: bootstrapping with sklearn
sklearn.cross_validation.Bootstrap returns indices of random bootstrap samples from your data. I am not yet well familiar with the use of bootstrapping for 2 sample comparison, so I'm using means as a way to characterise the two distributions (that's better than 2D K-S test, which doesn't exist anyway!). This is of course an incomplete statistic (I have a reason to suspect that two of the samples I am working with have different skewnesses). I'm working with three sets of absolute r magnitudes (M_r) here.
Here's how it looks in practice:
Here's how it looks in practice:
from sklearn import cross_validation
#let's call our sample array data. It can be N-dimensional.
#len(data) -- total number of data points in my dataset,
#nBoot -- number of bootstrap samples,
#train_size = bootstrap sample size (proportion of the whole sample, or just number)
#Here we create an empty array we will store the means of bootstrapped
means = np.empty((nBoot, ))
#then we get the class instance, drawing half of the sample with replacement every time
bs = cross_validation.Bootstrap(len(data), nBoot, train_size=0.5, random_state=0)
#Filling the means array while iterating over the bootstrap samples (indexed by train_index):
i = 0
for train_index, test_index in bsJ:
means[i] = np.mean(data[train_index])
i+=1
I've repeated it for all three distributions I'm interested in, and here is a histogram of all the bootstrap sample means. The difference is just what what I've expected: basically, some faint galaxies (with M_r < 20 or so) were manually rejected from the green and red distributions, so the mean of them is shifted towards brighter absolute magnitudes. It remains to be seen how important this is. If we think this is worth it, I'll try using other distributions' statistics some other time.
Subscribe to:
Posts (Atom)