GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristic
Modi Jin1 · Yiming Zhang1 · Boyuan Sun1 · Dingwen Zhang2 · Mingming Cheng1 · Qibin Hou1†
1VCIP, Nankai University 2 School of Automation, Northwestern Polytechnical University
†Corresponding author
English | 简体中文
GeoAgent is a vision-language model for image geolocation that reasons closely with humans and derives fine-grained address conclusions. Built upon Qwen2.5-VL, it achieves strong performance across multiple geographic grains (city, region, country, continent) while generating interpretable chain-of-thought reasoning.
GeoAgent introduces:
- Geo-similarity reward combining spatial and semantic similarity to handle the many-to-one mapping between natural language and geographic locations;
- Consistency reward assessed by a consistency agent to ensure the integrity and consistency of reasoning chains. The model is trained on GeoSeek, a novel geolocation dataset with human-annotated CoT and bias-reducing sampling.
We also introduce GeoSeek, which is a new geolocation dataset comprising:
- GeoSeek-CoT (10k): High-quality chain-of-thought data labeled by geography experts and professional geolocation game players. Each entry includes street-view images, GPS coordinates, three-level location labels (country, city, precise location), and human reasoning processes—standardized into a unified CoT format.
- GeoSeek-Loc (20k): Images for RL-based finetuning, sampled via a stratified strategy considering population, land area, and highway mileage to reduce geographic bias.
- GeoSeek-Val (3k): Validation benchmark with locatability scores and scene categories (manmade structures, natural landscapes, etc.) for evaluation.
Installation
Requirements
- Python>=3.9
- torch==2.6.0
- torchvision==0.21.0
- torchaudio==2.6.0
- ms-swift>=3.8.0
- xformers==0.0.27.post2
- deepspeed==0.15.0
- cuda==12.4
Setup
git clone https://github.com/HVision-NKU/GeoAgent.git
cd GeoAgent
conda create -n GeoAgent python=3.9
conda activate GeoAgent
pip install -r requirements.txt
Usage
Get GeoAgent Model
Download the pre-trained checkpoints from Hugging Face:
mkdir checkpoints
cd checkpoints
# (Optional) Using huggingface mirrors
export HF_ENDPOINT=https://hf-mirror.com
# download GeoAgent model from huggingface
huggingface-cli download --resume-download ghost233lism/GeoAgent --local-dir ghost233lism/GeoAgent
Quick Inference
We provide the quick inference scripts for single/batch image input in infer/. Please refer to infer/README for detailed information.
Training
bash tools/train_sft.sh
bash tools/train_grpo.sh
Citation
@article{jin2026geoagent,
title={GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics},
author={Jin, Modi and Zhang, Yiming and Sun, Boyuan and Zhang, Dingwen and Cheng, Ming-Ming and Hou, Qibin},
journal={arXiv preprint arXiv:2602.12617},
year={2026}
}
License
This code is licensed under the Creative Commons Attribution-NonCommercial 4.0 International for non-commercial use only.
Please note that any commercial use of this code requires formal permission prior to use.
Contact
For technical questions, please contact jin_modi[AT]mail.nankai.edu.cn
For commercial licensing, please contact andrewhoux[AT]gmail.com.
Acknowledgments
We sincerely thank Yue Zhang, H.M., Haowen He, Yuke Jun, and other experts in geography, as well as outstanding geolocation game players, for their valuable guidance, prompt design suggestions, and data support throughout the construction of the GeoSeek dataset.
We also thank Zhixiang Wang, Chilin Chen, Jincheng Shi, Liupeng Zhang, Yuan Gu, Yanghang Shao, Jinhua Zhang, Jiachen Zhu, Gucheng Qiuyue, Qingyang Guo, Jingchen Yang, Weilong Kong, Xinyuan Li, and Mr. Xu (an anonymous volunteer) for their outstanding contributions in providing high-quality reasoning process data.
- Downloads last month
- 34
Model tree for ghost233lism/GeoAgent
Base model
Qwen/Qwen2.5-VL-7B-Instruct