Last week I hastily wrote some code to search the NSAtlas. It was almost ok for a single galaxy: the array is huge, and iterating through it 1000 times...it took 1 minute for each loop, so I would have to wait all day for the full query to complete.
I've changed the 'if' loop to numpy.where statements and used the collections module: here. It is almost usable, though it takes 15 minutes to search for some variable for 1000 galaxies, and the machine slows down to an almost complete halt.
There is a significant overhead associated with opening and closing files, especially as the one in question was quite huge (0.5 GB) in this case. Not calling the function from within a loop, but passing an array of search arguments and then looping through it within the function reduced the running time to 16 seconds. A 60-fold improvement in two minutes.
I'd also like to mention a fine module called collections, which helped me find an overlap between two lists, like that:
a_multiset = collections.Counter(list(ras[0]))
b_multiset = collections.Counter(list(decs[0]))
overlap = list((a_multiset & b_multiset).elements())
No comments:
Post a Comment