← Back to projects

HEDWIG: Learning Geospatial Embeddings for Large-Scale Retrieval

One-line summary: Built a ViCLIP-based geolocation system that learns richer geospatial embeddings from multi-frame panoramic imagery and captions.

Key Results

Reduced median top-1 prediction error by over 1600 km compared with CLIP.
Increased the proportion of predictions within 750 km by nearly 4x.
Improved top-1 retrieval quality across distance thresholds.

What I Built

ViCLIP-based multimodal embedding pipeline.
Projection and classification layers for geolocation clustering.
Geocell-based retrieval workflow using similarity search.
Experiment pipeline for captioning, clustering, and ablation studies.

Technical Approach

Processed panoramic viewpoints into a multi-frame representation pipeline.
Trained embedding and clustering heads for geocell prediction.
Benchmarked retrieval and geolocation quality against CLIP baselines.

Key Insight

Averaging embeddings across viewpoints weakens location signal; weighted multi-frame representations improve geospatial retrieval and clustering.

Tools / Models Used

Python, PyTorch, ViCLIP, CLIP, geospatial clustering, retrieval, similarity search.

Optional links to paper / GitHub / PDF

Read full paper (PDF)