SAM-RFI: Radio Frequency Interference Detection with SAM2
Automated RFI detection for radio astronomy using Meta's Segment Anything Model 2 (SAM2), fi ne tuned on radio visibility data.
Overview
SAM-RFI adapts SAM2's powerful segmentation capabilities to identify and flag Radio Frequenc y Interference (RFI) in radio astronomy measurement sets. The models are trained on physics- based synthetic RFI data and can detect various interference patterns including narrowband c arriers, broadband interference, and transient events.
Key Features:
- π― High accuracy: 80-90% IoU on validation data
- β‘ Multiple model sizes: Tiny to Large (balance speed vs accuracy)
- π Iterative flagging: Progressive deep cleaning with multiple passes
- π οΈ Easy integration: Compatible with CASA measurement sets
- π¦ One-line usage: Auto-download and run with
samrfiCLI
Model Sizes
All models are trained with vision encoder + mask decoder fine-tuning on radio astronomy data.
| Size | Best Train Loss | Best Val Loss | Learning Rate | Batch Size |
|---|---|---|---|---|
| tiny | 0.0708 | 0.0724 | 1e-06 | 8 |
| small | 0.0810 | 0.0764 | 1e-06 | 8 |
| base_plus | 0.0708 | 0.0740 | 1e-06 | 8 |
| large | 0.0708 | 0.0770 | 1e-06 | 8 |
Recommended Use Cases
- tiny (40M params): Quick testing, low-memory environments (~4GB VRAM)
- small (180M params): Balanced performance for general use (~8GB VRAM)
- base_plus (330M params): High accuracy for production pipelines (~12GB VRAM)
- large (850M params): Best performance, research applications (~16GB VRAM)
Quick Start
Installation
pip install samrfi[gpu]
Single-Pass Prediction
# Use any model size: tiny, small, base_plus, large
samrfi predict \
--model polarimetic/sam-rfi/large \
--input observation.ms
Iterative Prediction (Recommended)
For deep cleaning, use 2-3 iterations to progressively find fainter RFI:
samrfi predict \
--model polarimetic/sam-rfi/large \
--input observation.ms \
--iterations 3
Python API
from samrfi.inference import RFIPredictor
# Initialize predictor (auto-downloads model)
predictor = RFIPredictor(
model_path="polarimetic/sam-rfi/large",
device="cuda"
)
# Single-pass prediction
flags = predictor.predict_ms("observation.ms")
# Iterative prediction (3 passes)
flags = predictor.predict_iterative("observation.ms", num_iterations=3)
Training Details
Architecture
- Base Model: SAM2 (Segment Anything Model 2) from Meta AI
- Vision Encoder: Hiera hierarchical transformer (fine-tuned for radio astronomy)
- Prompt Encoder: Positional encoding for bounding boxes (frozen)
- Mask Decoder: Transformer decoder (fine-tuned for RFI segmentation)
The vision encoder and mask decoder are trained on radio astronomy data, adapting SAM2's vis ual features to recognize RFI patterns in visibility waterfalls.
Training Data
- Type: Physics-based synthetic RFI simulations
- RFI Patterns: Narrowband carriers, broadband interference, impulsive events, satellite glint
- Dynamic Range: 10^6 to 10^7 (matching real observations)
- Samples: 4000-10000 training samples per model
Input Preprocessing
Radio visibility data is converted to 3-channel RGB-like features:
- Channel 1: Spatial gradient (edge detection for RFI boundaries)
- Channel 2: Log amplitude (intensity, range [-3, 4])
- Channel 3: Phase information ([-Ο, Ο] β [0, 1])
All channels normalized with ImageNet statistics for SAM2 compatibility.
Hardware
- Platform: NAIRR Jetstream-2
- GPU: NVIDIA H100 (80 GB HBM3)
- Framework: PyTorch 2.0+ with HuggingFace Transformers
Performance
Typical validation metrics:
- IoU (Intersection over Union): 80-90%
- Precision: 85-95%
- Recall: 80-90%
- F1 Score: 82-92%
Performance varies by RFI type, severity, and model size. Iterative prediction (2-3 passes) improves detection of faint RFI.
Usage Examples
Different Model Sizes
# Fast testing with tiny model
samrfi predict --model polarimetic/sam-rfi/tiny --input obs.ms
# Balanced performance with base_plus
samrfi predict --model polarimetic/sam-rfi/base_plus --input obs.ms --iterations 2
# Best performance with large model
samrfi predict --model polarimetic/sam-rfi/large --input obs.ms --iterations 3
Custom Flagging Strategies
from samrfi.inference import RFIPredictor
# Conservative flagging (fewer false positives)
predictor = RFIPredictor("polarimetic/sam-rfi/large", device="cuda")
flags = predictor.predict_ms("obs.ms") # Single pass
# Aggressive deep cleaning (more thorough)
flags = predictor.predict_iterative("obs.ms", num_iterations=3)
Integration with CASA
SAM-RFI works directly with CASA measurement sets:
from samrfi.inference import RFIPredictor
# Flag RFI in measurement set
predictor = RFIPredictor("polarimetic/sam-rfi/large", device="cuda")
flags = predictor.predict_ms("observation.ms")
# Flags are automatically written to MS FLAG column
# Continue with CASA calibration pipeline...
Limitations
- CASA dependency: Requires CASA tools for measurement set I/O
- Preprocessing sensitivity: Best results when preprocessing matches training configurat ion
- Data domain: Optimized for VLA-like interferometric data (single-dish and VLBI not ext ensively tested)
- Over-flagging: Iterative prediction with >3 passes may flag clean data
Recommendations
- Start with 1-2 iterations on new datasets, validate before using 3+
- Validate on known clean data to assess over-flagging
- Use appropriate model size for your hardware (tiny for testing, large for production)
- Monitor statistics (mean, std, MAD) before/after flagging to balance RFI removal vs da ta retention
Citation
A paper describing SAM-RFI is in preparation. If you use these models, please cite:
@software{samrfi2024,
author = {polarimetic},
title = {SAM-RFI: Radio Frequency Interference Detection with SAM2},
year = {2024},
url = {https://github.com/preshanth/SAM-RFI},
note = {Models available at https://huggingface.co/polarimetic/sam-rfi}
}
Acknowledgments
Training compute provided by the [National AI Research Resource (NAIRR)](https://nairrpilot. org/) Pilot via Jetstream-2 at Indiana University.
Repository & Documentation
- GitHub: https://github.com/preshanth/SAM-RFI
- Documentation: See repository README for installation, training, and advanced usage
- Issues: Report bugs or request features on GitHub
License
MIT License - See repository for details.
Generated with SAM-RFI β’ Models: tiny, small, base_plus, large
Model tree for polarimetric/sam-rfi
Base model
facebook/sam2.1-hiera-base-plus