Hunt for Exoplanets

Diameter: km

Distance from Sun: million km

Orbital Period: days

Exoplanet:

Hunt for Exoplanets — Project Overview

Summary

Space-based surveys have produced vast volumes of transit photometry and catalog data from missions such as Kepler, K2, and TESS. While many discoveries were made through manual vetting, modern AI / ML techniques enable automated, scalable, and repeatable identification of exoplanet signals. This project builds and deploys ML models that classify candidates into Confirmed, Candidate, or False Positive, and integrates the best models into an interactive web interface for exploration and validation.

Background

Many exoplanets are detected using the transit method — monitoring stellar brightness for characteristic dips produced when a planet crosses in front of its host star.
Kepler provided nearly a decade of continuous photometry; K2 extended that legacy with a different survey pattern; TESS continues to survey nearby stars since 2018.
Each mission publishes public catalogs containing confirmed planets, candidates, and false positives alongside features such as orbital period, transit duration, planet radius, stellar parameters, and transit depth.

Challenge

Manual classification is slow and can be inconsistent across human vetters.
Large-scale automated analysis must robustly handle noise, gaps, and variable-quality observations.
The central goal: create models that can reliably label new observations as Confirmed, Candidate, or False Positive.

Objectives

Train ML models on one or more NASA mission datasets to classify exoplanetary candidates.
Compare performance between models trained on author-provided catalog features and models trained on engineered or learned features from raw light curves.
Provide a web interface that allows users to upload or paste candidate data, run predictions, and visualize model outputs and confidence.
Document preprocessing, model decisions, and evaluation so results are transparent and reproducible.

Data Sources

Primary datasets come from NASA’s public archives for:

Kepler — long-baseline photometry and object catalogs.
K2 — extended Kepler observations with multiple campaigns.
TESS — wide-field short-cadence survey of nearby stars.

Dataset fields commonly used: orbital period, transit duration, transit depth, planet radius, stellar effective temperature, surface gravity, metallicity, and photometric time-series (light curves).

Approach & Methodology

Preprocessing: impute or remove missing values, normalize/scale numerical features, encode categoricals, de-trend and denoise light curves, and handle class imbalance.
Feature strategies: evaluate author-provided catalog features vs. automatically extracted features (e.g., TSFEL, custom time-series feature extraction) and learned representations (1D CNN / LSTM).
Modeling: classical classifiers (Random Forest, SVM, Logistic Regression) for tabular features; deep models (1D CNN, LSTM, Transformer-style encoders) for raw light-curve sequences.
Evaluation: use precision, recall, F1, ROC-AUC, and confusion matrices across cross-validation folds and holdout sets. Emphasize calibration and false-positive control.
Deployment: expose model inference through a lightweight API (Flask / FastAPI) and integrate into the frontend with interactive visualizations and downloadable CSV results.

Technical Stack

Data & ML: Python, NumPy, pandas, Scikit-learn, TensorFlow / PyTorch, TSFEL (optional)
Backend: Flask or FastAPI to serve model predictions and file uploads
Frontend: React.js for UI, Three.js for 3D visualizations (optional), and D3 or Chart.js for plots
DevOps: Docker for reproducible environments, GitHub for version control, and optional CI for model testing

User Interaction

Upload CSV files or paste candidate parameters for instant classification.
Visualize light curves with model overlays and per-sample confidence scores.
Download prediction results and view model diagnostics to support scientific review.

Call to action

Explore the Explore page for datasets, model training details, and interactive demos. Visit the Resources tab for direct links to NASA archives and example CSVs to test locally.