IMAGE ANALYSIS INTRO USING PYTHON & OPENCV

Updated October 2015

It’s been a while since I first wrote about how useful computer vision can be in product development, and I recently put together a quick demo for other engineers at my new gig (@ Continuum) that is cleaner and more thorough than previous versions (plus it uses the cv2 library instead of the deprecated cv library I used before).

(and of course it’s written in python)

…

Image analysis is hugely powerful, particularly in the context of product development. Most of the challenges in computer vision (and AI in general) comes from trying to process unstructured and/or uncontrolled information.

Fortunately, we product development engineers spend a bunch of time setting up experiments in the lab. In those cases we have much more control over lighting and what we’re tracking than we would if we were trying to track something in the “real” world.

So to kick off with the basics we’re going to take a look at an image and see if we can identify an object in it!

Python’s most interesting functionality comes from it’s wide range of libraries. These are typically placed at the top of a file using the syntax shown below

In [1]:
from ipywidgets import interact #this allows interactivty in the notebook
import cv2 # a computer vision library
import matplotlib.pyplot as plt #a plotting library
import os #library for navigating the os

First create a variable that has the location of folder. Then list all the files in that folder using the listdir function from the os library

In [2]:
file_dir = r"C:\Users\cloughnane\code\demo"
print os.listdir(file_dir)['.ipynb_checkpoints', 'abduction.avi', 'ball.jpg', 'cv blog video.mov', 'Image Analysis Tutorial.ipynb', 'Untitled.ipynb', 'Untitled1.ipynb']

Next create a variable that has the full path of the file we want

In [3]:
filepath = file_dir+r'\ball.jpg'
print filepathC:\Users\cloughnane\code\demo\ball.jpg

Now that we’ve got the file location in we can read the image in using the imread function from the cv2 library and assign it to the variable img. Once we’ve read it in we’ll print the shape property of the img array. We can also query the object using the ? syntax

In [4]:
img = cv2.imread(filepath)
print img.shape
print '------'
print img(1080, 1920, 3)

Out [4]:
------
[[[195 147  81]
  [196 148  82]
  [197 149  83]
  ..., 
  [210 157  94]
  [211 157  96]
  [211 157  96]][[195 147  81]
  [195 147  81]
  [196 148  82]
  ..., 
  [214 161  98]
  [214 160  99]
  [213 159  98]][[196 148  82]
  [196 148  82]
  [195 147  81]
  ..., 
  [210 159  96]
  [210 159  97]
  [209 158  96]]..., 
 [[229 184 123]
  [228 183 122]
  [228 183 122]
  ..., 
  [220 173 117]
  [220 173 117]
  [220 173 117]][[228 182 124]
  [228 182 124]
  [228 182 124]
  ..., 
  [221 174 118]
  [221 174 118]
  [221 174 118]][[230 184 126]
  [230 184 126]
  [230 184 126]
  ..., 
  [221 174 118]
  [222 175 119]
  [221 174 118]]]

Note the shape of the image, (1080,1920,3) indicates that the image gas 1080 rows (y dimension), 1920 columns (x dimension), and 3 channels (Blue, Green, and Red).

Let’s take a look at the image itself. To do that we’ll use the imshow function from matplotlib.pyplot (which we’ve shortened to plt)

plt.imshow(img)
Out[5]:
<matplotlib.image.AxesImage at 0x7251d90>

Image for post

Much better, although the colors seem a bit off. This is a quirk where the cv2 library reads images in as BGR (Blue Green Red), but the plt.imshow function assumes RGB.

This is a little annoying for displaying images, but doesn’t really matter for analysis as the RGB color space is pretty useless for analyzing images (as we will see), it’s much more useful for generating them.

For now, let’s overwrite the img variable with an RGB image, using the cv2.cvtColor function, and then display it again

In [6]:
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
plt.imshow(img)

Out[6]:
<matplotlib.image.AxesImage at 0x7603c90>

Image for post

To show what I mean about the RGB color space being useless for analyzing images, let’s look at each channel individually. We do this by slicing our img array (now in RGB color space) into three individual arrays.

The syntax for slicing is [rows,columns,channels]. A (:) indicates that you want to keep everything.

In our case we want to keep all the rows and columns, but only keep one channel per img. This results in a grayscale image (i.e each pixel has a single value from 0–255, as opposed to 3 for our original RGB image)

We then plot the image with our plt.imshow function. The plt.subplot function lets us plot multiple images at once

In [7]:
img_r = img[:,:,0] #get red channel
img_g = img[:,:,1] #get green channel
img_b = img[:,:,2] #get blue channel
plt.subplot('131')
plt.imshow(img_r,cmap='gray')
plt.subplot('132')
plt.imshow(img_g,cmap='gray')
plt.subplot('133')
plt.imshow(img_b,cmap='gray')

Out[7]:
<matplotlib.image.AxesImage at 0xc3814b0>

Image for post

Looking at those images, none of them seem to have the ball be either the darkest (pixels are all low) or brightest (pixels are all high) things in the image. That makes it very difficult to identify.

Instead we are going to use the LAB color space. We do this because The nonlinear relations for L*, a*, and b* are intended to mimic the nonlinear response of the eye.

To convert to Lab we change we use the same cv2.cvtColor function as before, but pass it the cv2.COLOR_RGB2LAB parameter.

We thin show all the images as we did witht the RGB image, though the code is a little more pythonic.

In [8]:
lab = cv2.cvtColor(img,cv2.COLOR_RGB2LAB)
l = lab[:,:,0]
a = lab[:,:,1]
b = lab[:,:,2]channels = [l,a,b]for i,channel in enumerate(channels):
    plt.subplot('13'+str(i+1))
    plt.imshow(channel,cmap='gray')

Image for post Much better!. Though the L channel (which stands for brightness) is pretty useless, a and b look promising.

The next step is to threshold the image to make everything below a certain threshold go to 0, and everything above or equal to it go to 1.

The way this is done is using the cv2.threshold function. To better understand what information you need to give the function we can again use the ? syntax

In [9]:
cv2.threshold?

So it takes the following parameters:

- `src`: the source image file to threshold
- `thresh`: the threshold value
- `maxval`: the value to be assigned to all values equal to or greater than the threshold
- `type`: thresholding type (we will use `cv2.THRESH_BINARY` but there are [others as well](http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html?highlight=threshold#threshold)

The annoying thing about thresholding is that it’s kind of a  guess-and-check to find the best value (ie one that keeps most  everything you want and eliminates most everything you don’t). In cases  like this I like to use the iPython Notebook’s `interact` function.

`interact` takes a function and gives you a way to vary that function's input live. In our case, our function will take the input `thresh_val`, calculate the threshold using the `cv2.threshold` function, and use `plt.imshow` to show the image.

*the* `*interact*` *function will only work in an active ipython notebook*

```python
In [10]:
def threshold(thresh_val):
    ret,thresh = cv2.threshold(b,thresh_val,1,cv2.THRESH_BINARY)
    plt.imshow(thresh,cmap='gray')    interact(threshold,thresh_val=(0,255))
Out[10]:
<function __main__.threshold>

Image for post

In [11]:
ret,thresh = cv2.threshold(b,177,255,cv2.THRESH_BINARY)
plt.imshow(thresh,cmap='gray')
Out[11]:

Image for post

In trying to figure out how to find the center of that point I searched for How to find the center of anrea opencv python and came up with this tutorial.

It uses the cv2.moments function to “calculate all the moments up to the third order of a polygon or rasterized shape”… or something.

Anyway, some of those moments can be used to calculate the center. So let’s calculate the moments and give it the variable M.

In [12]:
M = cv2.moments(thresh)
print M

Out [12]:
{'mu02': 95754201.53900146, 'mu03': 80503297.40625, 'm11': 175173119295.0, 'nu02': 0.00031738476817687447, 'm12': 113530069679385.0, 'mu21': -92491757.57328033, 'mu20': 96604484.71447754, 'nu20': 0.0003202030980694093, 'm30': 65672888208105.0, 'nu21': -4.136550528978932e-07, 'mu11': -6438801.4972229, 'mu12': 50503144.48964691, 'nu11': -2.1341909677985636e-05, 'nu12': 2.2586748758475254e-07, 'm02': 230634317265.0, 'm03': 149542102639215.0, 'm00': 549270.0, 'm01': 355848165.0, 'mu30': -54730518.03125, 'nu30': -2.4477376066087524e-07, 'nu03': 3.6003852257487303e-07, 'm10': 270398685.0, 'm20': 133210462605.0, 'm21': 86294838216045.0}

So it looks like it returns a dictionary (a series of key:value pairs). It’s actually tough to read so let’s print it out more nicely using the iteritems function to iterate through the dictionary

In [13]:
for key,value in M.iteritems():
    print key,value
	
Out[13]:
mu02 95754201.539
mu03 80503297.4062
m11 1.75173119295e+11
nu02 0.000317384768177
m12 1.13530069679e+14
mu21 -92491757.5733
mu20 96604484.7145
nu20 0.000320203098069
m30 6.56728882081e+13
nu21 -4.13655052898e-07
mu11 -6438801.49722
mu12 50503144.4896
nu11 -2.1341909678e-05
nu12 2.25867487585e-07
m02 2.30634317265e+11
m03 1.49542102639e+14
m00 549270.0
m01 355848165.0
mu30 -54730518.0312
nu30 -2.44773760661e-07
nu03 3.60038522575e-07
m10 270398685.0
m20 1.33210462605e+11
m21 8.6294838216e+13

Much cleaner to read, but still greek to me. Following along with the tutorial…

In [14]:
center_col = M['m10']/M['m00']
center_row = M['m01']/M['m00']
print 'center row: {}\ncenter_col: {}'.format(center_row,center_col)center row: 647.856545961
center_col: 492.287372331

With that we can use cv2.circle to draw a red (255,0,0). 20px circle at the center.

We need to use integers because there are no decimals in pixels.

In [15]:
cv2.circle(img,(int(center_col),int(center_row)),20,(255,0,0),-1)

Now we just so the image and…


```python
In [16]:
plt.imshow(img)

Out[16]:
<matplotlib.image.AxesImage at 0xca67750>

Image for post

Looks good!

Now what’s even better is that once you have something like this automated, you can easily do video analysis… as videos are just a series of images (usually at 30 frames per second). If you combine that power with a well-positioned high-speed camera… you can easily capture data on really neat phenomena.

Here are a few things I’ve done in the past that show how the techniques shown above can be applied to video.

In [17]:
from IPython.display import YouTubeVideo,display


```python
In [18]:
vid = YouTubeVideo("njab2bBps6U")
display

Out [18]:

In [19]:
vid = YouTubeVideo("03AoAlm-szU")
display(vid)

Out [19]: