Classification of Galaxy Morphologies Using Support Vector Machines
For our Computer Vision final project during Spring 2014, Computer Science students Ettienne Montagner, José Ruiz Cepeda and I designed a procedure to automatically perform galaxy morphology classification and reproduce manual classification results.
The first task is to preprocess the data provided by GalaxyZoo, by converting the RGB images to grayscale and then remove noise with a spatial Gaussian Smoothing Filter. Secondly, we use the Otsu's method to find a threshold that allows us to turn the image to a binary one. After that, the holes in the binary image are filled and then the program determines which object contains the most area. This is how the biggest galaxy within the image is determined. The resulting image becomes a sort of filter that is multiplied with the original image. This is how the new image only shows the biggest galaxy, and this is an approach that was applied in a publication that can be found here: A spatial-color layout feature for representing galaxy images
After preprocessing the data, we obtained the SIFT descriptors for each image in the training set and quantized them using K-means clustering to create a codebook with the cluster centers. We then trained the SVM using the quantized descriptors from the training data. Once the SVM have been trained (one for each response to every question in the decision tree), we used them to classify the test images, following the decision tree with One vs. All for the answers in each question. Our poster below demonstrates the advantages over certain descriptors as well as limitations:
The first task is to preprocess the data provided by GalaxyZoo, by converting the RGB images to grayscale and then remove noise with a spatial Gaussian Smoothing Filter. Secondly, we use the Otsu's method to find a threshold that allows us to turn the image to a binary one. After that, the holes in the binary image are filled and then the program determines which object contains the most area. This is how the biggest galaxy within the image is determined. The resulting image becomes a sort of filter that is multiplied with the original image. This is how the new image only shows the biggest galaxy, and this is an approach that was applied in a publication that can be found here: A spatial-color layout feature for representing galaxy images
After preprocessing the data, we obtained the SIFT descriptors for each image in the training set and quantized them using K-means clustering to create a codebook with the cluster centers. We then trained the SVM using the quantized descriptors from the training data. Once the SVM have been trained (one for each response to every question in the decision tree), we used them to classify the test images, following the decision tree with One vs. All for the answers in each question. Our poster below demonstrates the advantages over certain descriptors as well as limitations: