Classification of Point Sources in Space
A machine learning research project that reduces imaging costs by leveraging cost-effective low-resolution photometric measurements to accurately classify astronomical point sources.
App in Action


Project Details
Problem / Approach / Result
Spectroscopic imaging is too expensive for large-scale classification
Traditional classification of astronomical point sources -- stars, quasars, and galaxies -- relies heavily on high-resolution spectroscopic imaging, which is both time-consuming and expensive. As the volume of astronomical survey data grows exponentially, there is a pressing need for automated methods that work with cost-effective, low-resolution photometric data.
- High-resolution spectroscopic imaging is prohibitively expensive at scale
- Astronomical survey data is growing exponentially, outpacing manual classification
- No systematic comparison of ML algorithms existed for photometric classification
- Researchers needed reproducible pipelines to evaluate multiple approaches objectively
Build and benchmark a multi-model classification pipeline
The project followed a structured data science workflow, implementing six distinct classification algorithms and evaluating each against consistent metrics. Every model was trained, tuned, and benchmarked in separate Jupyter notebooks with clear documentation for full reproducibility.
- Implemented KNN, Decision Tree, SVM, XGBoost, Gaussian Mixture, and Neural Network classifiers
- Comprehensive data preprocessing including missing values, feature scaling, and class balancing
- Consistent evaluation metrics -- accuracy, precision, recall, and F1-score across all models
- Separate Jupyter notebook per algorithm for modularity and reproducibility
- Neural network component built as a standalone Python script for flexibility
Cost-effective classification with systematic model comparison
A complete, reproducible ML pipeline that demonstrates photometric data can replace expensive spectroscopic imaging for celestial object classification. The systematic comparison across six algorithms provides clear evidence for which approaches work best in this domain.
- Six classification algorithms tested and benchmarked with consistent metrics
- Demonstrated viability of low-resolution photometric data for accurate classification
- Fully reproducible research with documented Jupyter notebooks per algorithm
- Comprehensive model comparison pipeline for objective algorithm assessment
- Neural network captured complex nonlinear relationships in photometric features
Key Features
Multi-Model Pipeline
Six classification algorithms tested and benchmarked -- KNN, Decision Tree, SVM, XGBoost, Gaussian Mixture, and Neural Network.
Cost Reduction
Uses low-resolution photometric data instead of expensive high-resolution spectroscopic imaging for celestial classification.
Systematic Evaluation
Consistent metrics including accuracy, precision, recall, and F1-score across all models for objective comparison.
Reproducible Research
Separate Jupyter notebooks for each algorithm with clear documentation and reproducible workflows.
Deep Learning
Neural network implemented in Python to capture complex nonlinear relationships between photometric features and source classifications.
Data Pipeline
Comprehensive preprocessing handling missing values, feature scaling, and class balancing before model training.
Architecture
Low-resolution photometric measurements from astronomical surveys, targeting cost-effective classification without spectroscopic imaging.
Comprehensive preprocessing pipeline handles missing values, normalizes features, and balances class distributions before training.
Three classical machine learning algorithms trained and tuned in separate Jupyter notebooks with consistent evaluation metrics.
Advanced models including gradient boosting, probabilistic clustering, and deep learning to capture complex nonlinear relationships.
Systematic evaluation using accuracy, precision, recall, and F1-score for objective algorithm comparison across all models.
Key Metrics
Tech Stack
Core language for all data processing and modeling
Interactive development for analysis and visualization
KNN, SVM, Decision Trees, and evaluation metrics
Gradient boosting framework for ensemble classification
Neural network implementation for nonlinear patterns
Data manipulation and numerical computation
Data visualization and result plotting
| Category | Technology | Purpose |
|---|---|---|
| Language | Python | Core language for all data processing and modeling |
| Environment | Jupyter Notebooks | Interactive development for analysis and visualization |
| Classical ML | scikit-learn | KNN, SVM, Decision Trees, and evaluation metrics |
| Ensemble | XGBoost | Gradient boosting framework for ensemble classification |
| Deep Learning | TensorFlow / Keras | Neural network implementation for nonlinear patterns |
| Data Tools | Pandas & NumPy | Data manipulation and numerical computation |
| Visualization | Matplotlib & Seaborn | Data visualization and result plotting |