Over the past few years, deep learning has been widely used and obtained very good results in image recognition. In this project, several state-of-the-art deep learning models and their combinations have been applied to fish recognition in images, in particular 9 common species of fish in Missouri rivers. Four different data processing and machine learnings pipelines have been developed and extensive experiments have been conducted to evaluate their performances. The deep convolutional neural network (CNN) models used in these pipelines include SSD, VGG16, ResNet50, etc. The four pipelines are image-based, instance-based, instance rotation based, and ensemble, with increasing complexity. Without doing any preprocessing, the image-based pipeline takes an entire image as input to classify the image into one of the target classes using deep CNNs. This pipeline achieved up to 75.57% classification accuracy on our test dataset. The instance-based pipeline consists of object detection by one deep CNN followed by classification by another deep CNN. This method achieved up to 80.03% accuracy on our test dataset. The instance rotation based pipeline adds a deep CNN to do pose estimation between object detection and classification. The posture-adjusted fish image is used as the input to the classification model, which help the pipeline to achieve up to 82.83% accuracy on the same dataset. Finally, the ensemble pipeline is a combination of two instance rotation based pipelines. The difference of these two instance rotation based pipelines is in the classification model: one is VGG16 and the other ResNet50. The ensemble pipeline achieved up to 87.22% accuracy, outperforming all other pipelines significantly.
For the detection stuff, we used SSD_Caffe and YOLO v2.
SSD (Single Shot MultiBox Detector)
- Framework: Caffe
- Input size: 512*512
- Base net: VGG pretrained on imagenet.
In SSD, mostly you need to train a new model on your own dataset, for more convenience, I recommend you to generate the training data format by using Kitti-SSD: https://github.com/jinfagang/kitti-ssd
YOLO v2 (You only look once)
- Framework: Darknet
- Input size: 544*544
- Real-Time Object Detection
- Base net: Darknet19 pretrained on imagenet
Still working on Detection. YOLO v2 may not be considered later.
For the training phase, we have
VGG16 Instance Pipeline
VGG16 Instance Rotation Pipeline
ResNet50 Instance Rotation Pipeline
Fish pixel only:
Tool: Mask RCNN
For here, I just show some result because I am still working on it.
Left: ground truth; Right: predection