Earlier this week Nick Randolph asked me if I knew of any .NET barcode recognition software. Not an SDK for an existing laser scanner, but something that might take a picture of a barcode and decode it.
Early on in my development career I dealt with barcode scanners and I’ve always found the technology to be fairly interesting. I remembered some work Casey Chesnut had done quite some time ago on barcode recognition that I had always wanted to spend a little time playing building off of. Of course Casey didn’t provide any code, but that’s fine – I’d rather attack this as a pure mental exercise for myself.
I actually decided to attack the problem a little differently than Casey. I knew that doing a decode of a “pure” clean barcode would be easy and not a realistic scenario anyway, so I didn’t even bother working on code to do it. I knew I was going to be able to do the decoding algorithm part – that’s simply turning bits into text and anyone can do that given the algorithm. The challenge, and fun, is in extracting the binary data bits from an “analog” picture. I’m saying “digital” and “analog” here in that an ideal barcode bit is either black or white, on or off, a picture (even a digital picture) is not going to be so clear.
So step one was to take a picture of a barcode. I grabbed a book off the shelf and used my phone to snap a picture (bonus points if you know what book it is):
You can see, it’s not an “optimal” image – it’s dark and there are a lot of variations in color. This seemed way more realistic.
Next I needed to get rid of the “color” and I decided the best path would be to simply “draw” a logical line horizontally through the center of the image assuming that it would cross the barcode.
I extracted every pixel across this line and determined the luminance of each, essentially turning the line of pixels into greyscale.
Next I needed to turn this analog grey data into binary, which introduced a variable I’ll call the “threshold.” Luminance values above the threshold (brighter) would be a zero, below the threshold would be a one. This means that you can alter what the software “sees” for bars by simply adjusting that threshold.
I put together a library and a sample application that showed all of this process at once – the barcode, a chart of the luminance and a chart of the “binary” representation of that luminance based on a given threshold.
The next step was to try to turn this “binary” data into actual bits that I could decode. My first attempt followed this reasoning:
1. an EAN13 barcode (which is what I’m working with) starts with three “guard” or delimiter bits: 1-0-1.
2. I start at the left edge of the image and traverse until I hit the first “on” pixel and record that as the start position.
3. I traverse until the next “off” pixel. The distance between the on and off is the width of a bit.
4. I “back up” 1/2 of a bit width (to give me the best odds of hitting the actual bit value) and then step forward by bit widths, checking the pixel value.
It took a little tweaking of the logic, in which I’d move the current and subsequent sample points away from any nearby “edge” to help get me in the middle of a data point. With only this minor logic improvement I was able to decode the picture I started with. I had a sample app that showed each of these steps:
– The loaded barcode image
– The luminance across the barcode at the mid point as a graph
– A graph of the binary representation of that luminance based on a threshold
– decoding of that binary set
All that in just about 1 day of work.
Of course the second barcode picture I took and tried failed. It worked through all of the steps until the decode, where it always fell apart, being unable to correctly identify any digit but the first. The failure, I think, is due to the fact the book cover is glossy, so I have a lot more “noise” in the luminance graph. That simply means I need to improve my image recognition algorithm, which will be the focus of the next blog entry on this library.
Now you might be asking “well where’s the code for this?” Patience. It’s in Codeplex right now, it’s just not yet published. I want to get it to a state that’s a little less ugly before I turn it loose. When can you expect it? Well I can’t say for certain, but Codeplex required publication within 30 days, so that gives you a “latest possible release” date – though I hope to publish earlier.