In the last few days I’ve been fortunate to have my research work overlap with my class work on the topic of computer vision. I started to familiarize myself with the following concepts:
Extrinsic (coordinate transformation matrix), intrinsic (camera tuning) matrix
This was by far the most complicated concept covered and I have a very surface level understanding of this as of now. The hardest part was was the math for how 3d space is mapped onto a 2d plane (the image). One thing that was intuitive is that to reconstruct a representation of 3d space, we need at least two images (stereo images, like our eyes!) or 2 images at different time steps. This makes sense, when we take a picture we lose a dimension, so we need at least one other source of information to reconstruct depth.
I didn’t know that you had to “tune” cameras and that all this math is done inside the camera. Pretty incredible.
In particular, the research I am working on uses Sobel edge detection, but the core idea behind most edge detection is similar. Essentially you have a “kernel” (fancy word for a matrix used in convolution). On one side you have negative numbers, on the others side positive:
$$ G_x = \begin{matrix} -1 & 0 & 1\\ -1 & 0 & 1 \\ -1 & 0 & 1 \end{matrix} $$
The matrix above would detect vertical edges, transpose it to $G_y$ for horizontal edges. The idea is that this convolution measures the change in pixel intensity across the 0 row/column, which makes sense because all an edge is a place where the derivative of pixel intensity is high.
Because this transformation is linear we can say that the gradient of the image is $\Delta f= (G_x, G_y)$, we can get the magnitude of this gradient via $\sqrt{G_x^2+G_y^2}$ . Then we can set some threshold for this number where we say that this in infact an edge. Pretty cool.
The values in the matrix may change slightly depending on which particular technique adopted, but this core idea is the same across most.
Histogram equalization can be used to increase the contrast of an image. For example, the grad student I am doing research with is using it to increase the contrast of images taken from AUV because the images are typically lowlight and most of the pixels fall within similar intensity values. This leads to an image PDF that looks something like this:
X axis is pixel intensity, Y axis is pixel count
Low contrast images are hard to analyze and detect features in. Consider the edge detection above. Even if there was an edge, if the change in pixel intensity was from 200 → 215 across an edge, would that be enough to classify it as an edge?
So, histogram equalization proposes a solution.
Essentially we set some threshold value, chop off all the pixels above that threshold (to make sure future steps aren’t dominated by said values). Then we say $G_{new} = 255*CDF(G_{original})$. This of course assumes we are working with gray values from 0-255, but this can be adjusted to multiple color channels quite easily. This increases the contrast considerably: