## Friday, March 9, 2012

### Healing holes in arrays in Python: interpolation vs. inpainting

I had solved that interpolation task I was working on previously, by the way. I post my approach here, in case someone finds it useful.

If the bad pixel regions are big, standard methods of interpolation are not applicable. Interpolation works under assumption that the regions are contiguous, whereas this is not necessarily the case. Besides, none of interpolation implementations in python seem to work with masked arrays, and I've tried a metric ton of them. miceuz, who was sitting on this task half night as well, asked about it on stackoverflow, and someone mentioned the right keyword: inpainting.

That's still inventing the data, but the option is having unquantifiable errors in photometry if we leave out the bad pixels or don't mask the bad areas at all.

OpenCV is a huge piece of software, and using it to fill those holes is tantamount to using a hammer to crack a nut. However, I reused this tiny piece of code, adding a [hardcoded ,)] Gaussian weights kernel as I needed to use inverse distance weighting for interpolation. Here's the code:

import numpy as np
import pyfits
from scipy import ndimage
import inpaint

filled = inpaint.replace_nans(NANMask, 5, 0.5, 2, 'iwd')
hdu = pyfits.PrimaryHDU(filled)
hdu.writeto('filled_lm.fits')


inpaint module boils down to this -- I used the code linked above, just lobotomised the Cython:

# -*- coding: utf-8 -*-

"""A module for various utilities and helper functions"""

import numpy as np
#cimport numpy as np
#cimport cython

DTYPEf = np.float64
#ctypedef np.float64_t DTYPEf_t

DTYPEi = np.int32
#ctypedef np.int32_t DTYPEi_t

#@cython.boundscheck(False) # turn of bounds-checking for entire function
#@cython.wraparound(False) # turn of bounds-checking for entire function
def replace_nans(array, max_iter, tol,kernel_size=1,method='localmean'):
"""Replace NaN elements in an array using an iterative image inpainting algorithm.
The algorithm is the following:
1) For each element in the input array, replace it by a weighted average
of the neighbouring elements which are not NaN themselves. The weights depends
of the method type. If method=localmean weight are equal to 1/( (2*kernel_size+1)**2 -1 )
2) Several iterations are needed if there are adjacent NaN elements.
If this is the case, information is "spread" from the edges of the missing
regions iteratively, until the variation is below a certain threshold.
Parameters
----------
array : 2d np.ndarray
an array containing NaN elements that have to be replaced
max_iter : int
the number of iterations
kernel_size : int
the size of the kernel, default is 1
method : str
the method used to replace invalid values. Valid options are
localmean, 'idw'.
Returns
-------
filled : 2d np.ndarray
a copy of the input array, where NaN elements have been replaced.
"""

#    cdef int i, j, I, J, it, n, k, l
#    cdef int n_invalids

filled = np.empty( [array.shape, array.shape], dtype=DTYPEf)
kernel = np.empty( (2*kernel_size+1, 2*kernel_size+1), dtype=DTYPEf )

#    cdef np.ndarray[np.int_t, ndim=1] inans
#    cdef np.ndarray[np.int_t, ndim=1] jnans

# indices where array is NaN
inans, jnans = np.nonzero( np.isnan(array) )

# number of NaN elements
n_nans = len(inans)

# arrays which contain replaced values to check for convergence
replaced_new = np.zeros( n_nans, dtype=DTYPEf)
replaced_old = np.zeros( n_nans, dtype=DTYPEf)

# depending on kernel type, fill kernel array
if method == 'localmean':

print 'kernel_size', kernel_size
for i in range(2*kernel_size+1):
for j in range(2*kernel_size+1):
kernel[i,j] = 1
print kernel, 'kernel'

elif method == 'idw':
kernel = np.array([[0, 0.5, 0.5, 0.5,0],
[0.5,0.75,0.75,0.75,0.5],
[0.5,0.75,1,0.75,0.5],
[0.5,0.75,0.75,0.5,1],
[0, 0.5, 0.5 ,0.5 ,0]])
print kernel, 'kernel'
else:
raise ValueError( 'method not valid. Should be one of localmean.')

# fill new array with input elements
for i in range(array.shape):
for j in range(array.shape):
filled[i,j] = array[i,j]

# make several passes
# until we reach convergence
for it in range(max_iter):
print 'iteration', it
# for each NaN element
for k in range(n_nans):
i = inans[k]
j = jnans[k]

# initialize to zero
filled[i,j] = 0.0
n = 0

# loop over the kernel
for I in range(2*kernel_size+1):
for J in range(2*kernel_size+1):

# if we are not out of the boundaries
if i+I-kernel_size < array.shape and i+I-kernel_size >= 0:
if j+J-kernel_size < array.shape and j+J-kernel_size >= 0:

# if the neighbour element is not NaN itself.
if filled[i+I-kernel_size, j+J-kernel_size] == filled[i+I-kernel_size, j+J-kernel_size] :

# do not sum itself
if I-kernel_size != 0 and J-kernel_size != 0:

# convolve kernel with original array
filled[i,j] = filled[i,j] + filled[i+I-kernel_size, j+J-kernel_size]*kernel[I, J]
n = n + 1*kernel[I,J]
print n

# divide value by effective number of added elements
if n != 0:
filled[i,j] = filled[i,j] / n
replaced_new[k] = filled[i,j]
else:
filled[i,j] = np.nan

# check if mean square difference between values of replaced
#elements is below a certain tolerance
print 'tolerance', np.mean( (replaced_new-replaced_old)**2 )
if np.mean( (replaced_new-replaced_old)**2 ) < tol:
break
else:
for l in range(n_nans):
replaced_old[l] = replaced_new[l]

return filled

def sincinterp(image, x,  y, kernel_size=3 ):
"""Re-sample an image at intermediate positions between pixels.
This function uses a cardinal interpolation formula which limits
the loss of information in the resampling process. It uses a limited
number of neighbouring pixels.
The new image :math:im^+ at fractional locations :math:x and :math:y is computed as:
.. math::
im^+(x,y) = \sum_{i=-\mathtt{kernel\_size}}^{i=\mathtt{kernel\_size}} \sum_{j=-\mathtt{kernel\_size}}^{j=\mathtt{kernel\_size}} \mathtt{image}(i,j) sin[\pi(i-\mathtt{x})] sin[\pi(j-\mathtt{y})] / \pi(i-\mathtt{x}) / \pi(j-\mathtt{y})
Parameters
----------
image : np.ndarray, dtype np.int32
the image array.
x : two dimensions np.ndarray of floats
an array containing fractional pixel row
positions at which to interpolate the image
y : two dimensions np.ndarray of floats
an array containing fractional pixel column
positions at which to interpolate the image
kernel_size : int
interpolation is performed over a (2*kernel_size+1)*(2*kernel_size+1)
submatrix in the neighbourhood of each interpolation point.
Returns
-------
im : np.ndarray, dtype np.float64
the interpolated value of image at the points specified
by x and y
"""

# indices
#    cdef int i, j, I, J

# the output array
r = np.zeros( [x.shape, x.shape], dtype=DTYPEf)

# fast pi
pi = 3.1419

# for each point of the output array
for I in range(x.shape):
for J in range(x.shape):

#loop over all neighbouring grid points
for i in range( int(x[I,J])-kernel_size, int(x[I,J])+kernel_size+1 ):
for j in range( int(y[I,J])-kernel_size, int(y[I,J])+kernel_size+1 ):
# check that we are in the boundaries
if i >= 0 and i <= image.shape and j >= 0 and j <= image.shape:
if (i-x[I,J]) == 0.0 and (j-y[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j]
elif (i-x[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(j-y[I,J]) )/( pi*(j-y[I,J]) )
elif (j-y[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )/( pi*(i-x[I,J]) )
else:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )*np.sin( pi*(j-y[I,J]) )/( pi*pi*(i-x[I,J])*(j-y[I,J]))
return r

#cdef extern from "math.h":
#   double sin(double)

The thing that's bugging me is that this algorithm doesn't traverse around the edges of the hole, rather, it indexes the bad pixels incrementally. However, it does several iterations of smoothing, so maybe it's ok. At least, there are no visible spooky artifacts.

1. Hi - I just wanted to say thank you for this post! I've been dealing with some patchy radar data, trying to build a digital elevation model for the Totten Ice Shelf in Antarctica. Your method healed the gaps in my data set beautifully and has save me a bunch of time, so thank you :)

2. Thank you for this post! I found it very useful in filling in some holes in astronomical images. It is a much better approach than filling with np.nanmean.
Cheers!

3. Thanks Matthew! Great to hear it's been useful. Here's a newer, cleaned-up version of the script: https://github.com/Technariumas/Inpainting

1. Hugely useful algorithm, using it to fill holes in DEMs generated from LIDAR data thank you!

4. i'm using your inpainting.py from github to clean up bad pixels and cosmic ray hits from hubble jupiter images.

unfortunately STScI has made it increasingly difficult (although still theoretically possible) to use IRAF tools like fixpix, which i formerly used for this application. there is no replacement for iraf.fixpix that i know of within STScI's analysis software... so i was delighted to find that your code not only replaces fixpix, but improves the results as well !!

5. Curious why you didn't use the inpaint function in OpenCV? https://docs.opencv.org/3.3.1/df/d3d/tutorial_py_inpainting.html