Video Moment Retrieval in Practical Setting: A Dataset of Ranked Moments for Imprecise Queries

The benchmark and dataset for the paper Video Moment Retrieval in Practical Settings: A Dataset of Ranked Moments for Imprecise Queries.

We recommend cloning the code, data, and feature files from the Hugging Face repository at TVR-Ranking. This repository only includes the code for ReLoCLNet_RVMR. You can download the other baseline models from XML_RVMR and CONQUER_RVMR.

Getting started

1. Install the requisites

The Python packages we used are listed as follows. Commonly, the most recent versions work well.

conda create --name tvr_ranking python=3.11
conda activate tvr_ranking
pip install pytorch # 2.2.1+cu121
pip install tensorboard 
pip install h5py pandas tqdm easydict pyyaml

2. Download full dataset

For the full dataset, please go down from Hugging Face TVR-Ranking.
The detailed introduction and raw annotations is available at Dataset Introduction.

TVR_Ranking/
  -val.json                  
  -test.json                 
  -train_top01.json
  -train_top20.json
  -train_top40.json
  -video_corpus.json

3. Download features

For the query BERT features, you can download them from Hugging Face TVR-Ranking.
For the video and subtitle features, please request them at TVR.

tar -xf tvr_feature_release.tar.gz -C data/TVR_Ranking/feature

4. Training

# modify the data path first 
sh run_top20.sh

5. Inferring

The checkpoint can all be accessed from Hugging Face TVR-Ranking.

sh infer_top20.sh

Experiment Results

Baseline

The baseline performance of $NDGC@40$ was shown as follows. Top $N$ moments were comprised of a pseudo training set by the query-caption similarity.

Model	Train Set Top N	IoU=0.3		IoU=0.5		IoU=0.7
		Val	Test	Val	Test	Val	Test
XML	1	0.1077	0.1016	0.0775	0.0727	0.0273	0.0294
	20	0.2580	0.2512	0.1874	0.1853	0.0705	0.0753
	40	0.2408	0.2432	0.1740	0.1791	0.0666	0.0720
CONQUER	1	0.0952	0.0835	0.0808	0.0687	0.0526	0.0484
	20	0.2130	0.1995	0.1976	0.1867	0.1527	0.1368
	40	0.2183	0.1968	0.2022	0.1851	0.1524	0.1365
ReLoCLNet	1	0.1533	0.1489	0.1321	0.1304	0.0878	0.0869
	20	0.4039	0.4031	0.3656	0.3648	0.2542	0.2567
	40	0.4725	0.4735	0.4337	0.4337	0.3015	0.3079

ReLoCLNet Performance

Model	Train Set Top N	IoU=0.3		IoU=0.5		IoU=0.7
		Val	Test	Val	Test	Val	Test
NDCG@10
ReLoCLNet	1	0.1575	0.1525	0.1358	0.1349	0.0908	0.0916
ReLoCLNet	20	0.3751	0.3751	0.3407	0.3397	0.2316	0.2338
ReLoCLNet	40	0.4339	0.4353	0.3984	0.3986	0.2693	0.2807
NDCG@20
ReLoCLNet	1	0.1504	0.1439	0.1303	0.1269	0.0866	0.0849
ReLoCLNet	20	0.3815	0.3792	0.3462	0.3427	0.2381	0.2386
ReLoCLNet	40	0.4418	0.4439	0.4060	0.4059	0.2787	0.2877
NDCG@40
ReLoCLNet	1	0.1533	0.1489	0.1321	0.1304	0.0878	0.0869
ReLoCLNet	20	0.4039	0.4031	0.3656	0.3648	0.2542	0.2567
ReLoCLNet	40	0.4725	0.4735	0.4337	0.4337	0.3015	0.3079

Citation

If you feel this project helpful to your research, please cite our work.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train axgroup/TVR-Ranking

Paper for axgroup/TVR-Ranking

TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

Paper • 2407.06597 • Published Jul 9, 2024 • 1