machine-learningdata-sciencepythonjupyter

Classification of Point Sources in Space

By Vidas Sileikis
Picture of the author
Published on
Duration
4 Months
Role
Machine Learning Research
Classification of Point Sources analysis
Classification of Point Sources analysis
Model comparison charts
Model comparison charts

Overview

Developed a machine learning pipeline to classify astronomical point sources in space using low-resolution photometric measurements as an alternative to expensive high-resolution imaging. The project tested and compared multiple classification algorithms to determine which approach most accurately identifies and categorizes celestial objects from photometric survey data.


Problem Statement

Traditional classification of astronomical point sources—such as stars, quasars, and galaxies—relies heavily on high-resolution spectroscopic imaging, which is both time-consuming and expensive. As the volume of astronomical survey data grows exponentially, there is a pressing need for automated classification methods that can work with more cost-effective, low-resolution photometric data without sacrificing accuracy.


Approach

The project followed a structured data science workflow, beginning with comprehensive data preprocessing and exploratory data analysis, followed by the implementation and evaluation of multiple machine learning models. Each model was trained, tuned, and benchmarked against the others to identify the most effective classifier for this domain.


Algorithms Implemented

K-Nearest Neighbors (KNN) — Instance-based learning algorithm that classifies point sources based on proximity to labeled training examples in feature space.

Decision Tree — Tree-based classifier that creates interpretable decision boundaries, useful for understanding which photometric features drive classification.

Linear and Gaussian SVM — Support Vector Machines with both linear and radial basis function kernels, effective at finding optimal hyperplanes in high-dimensional photometric feature spaces.

Extreme Gradient Boosting (XGBoost) — Ensemble method combining multiple weak learners for high-accuracy classification with built-in regularization.

Gaussian Mixture Model — Probabilistic model that captures the underlying distribution of point source classes, useful for identifying overlapping populations.

Neural Network — Deep learning model implemented in Python to capture complex nonlinear relationships between photometric features and source classifications.


Model Comparison

A dedicated model comparison pipeline was built to systematically evaluate all algorithms against consistent metrics including accuracy, precision, recall, and F1-score. This allowed for an objective assessment of each model's strengths and weaknesses when applied to astronomical classification tasks.


Technical Implementation

The entire pipeline was developed in Jupyter Notebooks with a separate notebook for each algorithm, enabling clear documentation and reproducibility. Data preprocessing handled missing values, feature scaling, and class balancing. The neural network component was implemented as a standalone Python script for modularity.


Tech Stack

  • Python — Core programming language for all data processing and modeling
  • Jupyter Notebooks — Interactive development environment for analysis and visualization
  • scikit-learn — Machine learning library for classical algorithms (KNN, SVM, Decision Trees)
  • XGBoost — Gradient boosting framework for ensemble classification
  • TensorFlow / Keras — Neural network implementation
  • Pandas & NumPy — Data manipulation and numerical computation
  • Matplotlib & Seaborn — Data visualization and result plotting

Stay Tuned

Want to follow my work?
My latest developments, technologies, and articles delivered to your inbox.