A fruit image classifier with Python and SimpleCV
I had to solve an image recognition problem for a project that I'm working on. I knew that there are a lot of tools around python that could help me, but I never imagined that it could be that easy using SimpleCV. SimpleCV is an OpenCV wrapper with “batteries included” because it has a lot of other features, such as integration with the OCR Tesseract or the well known machine learning framework, Orange.
There are a lot of ways to get SimpleCV working, although be aware that it has tons of dependencies and can be a painful process to install, but it is worth it. I chose to use a virtualenv on linux mint and osx too. But I have to say that it was difficult to set up (seems like they have superpackages for windows).
I did a Trainer class which is able to classify a lot of different kinds of images, I tried it with different things and it works pretty well in certain cases. The class looks as follow:
class Trainer(): def __init__(self,classes, trainPaths): self.classes = classes self.trainPaths = trainPaths def getExtractors(self): hhfe = HueHistogramFeatureExtractor(10) ehfe = EdgeHistogramFeatureExtractor(10) haarfe = HaarLikeFeatureExtractor(fname='../SimpleCV/SimpleCV/Features/haar.txt') return [hhfe,ehfe,haarfe] def getClassifiers(self,extractors): svm = SVMClassifier(extractors) tree = TreeClassifier(extractors) bayes = NaiveBayesClassifier(extractors) knn = KNNClassifier(extractors) return [svm,tree,bayes,knn] def train(self): self.classifiers = self.getClassifiers(self.getExtractors()) for classifier in self.classifiers: classifier.train(self.trainPaths,self.classes,verbose=False) def test(self,testPaths): for classifier in self.classifiers: print classifier.test(testPaths,self.classes,verbose=False) def visualizeResults(self,classifier,imgs): for img in imgs: className = classifier.classify(img) img.drawText(className,10,10,fontsize=60,color=Color.BLUE) imgs.show()
In the constructor it receives a list of the classes which we want to classify. In our case “pear”, “orange” and “strawberry” and it also receives the path where the train images are.
The getExtrators method is the method which will return the extractors that we will use, an extractor will look for a pattern in our images. In our case we're using a hue histogram extractor, an edge histogram extractor and a haar like feature extractor. SimpleCV has a lot more extractors that we can use.
The getClassifiers method has four classifer (in order to use them we have to install Orange).
->The SVM classifier is a support vector machine.
->The TreeClassifier encapsulates tree-based machine learning approaches (decision trees, boosted adaptive decision trees, random forests and bootstrap aggregation).
->Naive Bayes Classifier.
->KNN is a K-Nearest neighbor classifier.
The train method instantiates the classifiers and trains them.
The test method is useful to see if our classifiers work and which one works better. It will need a tests image path (obviously different than the train path).
And visualizeResults takes a classifier and the imgs per parameter, and tries to classify them putting its name on the image and then showing all images with its guess to the user (see the video bellow).
Now it's time to try out our Trainer class.
classes = ['pear','orange','strawberry',] def main(): trainPaths = ['./post/'+c+'/train/' for c in classes ] testPaths = ['./post/'+c+'/test/' for c in classes ] trainer = Trainer(classes,trainPaths) trainer.train() tree = trainer.classifiers imgs = ImageSet() for p in testPaths: imgs += ImageSet(p) random.shuffle(imgs) print "Result test" trainer.test(testPaths) trainer.visualizeResults(tree,imgs) main()
This code just trains our classifier, takes the tests images, shuffles them, tests the classifier and also puts the results in the image.
The results from the tests are:
svm [33.33333333333333, 66.66666666666666, [[0.0, 0.0, 5.0], [0.0, 0.0, 5.0], [0.0, 0.0, 5.0]]]
tree [93.33333333333333, 6.666666666666667, [[5.0, 0.0, 0.0], [0.0, 4.0, 1.0], [0.0, 0.0, 5.0]]]
bayes [33.33333333333333, 66.66666666666666, [[0.0, 0.0, 5.0], [0.0, 0.0, 5.0], [0.0, 0.0, 5.0]]]
knn [33.33333333333333, 66.66666666666666, [[5.0, 0.0, 0.0], [5.0, 0.0, 0.0], [5.0, 0.0, 0.0]]]
For each classifier the first element is the percentage of hits.
As we can see in our case (with a small set of images for training) the tree classifier works pretty well. In 93% of the cases it is able to classify correctly.
If you want to see the tree classifier in action check out this video: