SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

@article{Giancola2018SoccerNetAS,
  title={SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos},
  author={Silvio Giancola and Mohieddine Amine and Tarek Dghaily and Bernard Ghanem},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2018},
  pages={1792-179210},
  url={https://api.semanticscholar.org/CorpusID:5047207}
}
This paper introduces SoccerNet, a benchmark for action spotting in soccer videos, and shows that the best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%.

Figures and Tables from this paper

Ask This Paper
AI-Powered

A Context-Aware Loss Function for Action Spotting in Soccer Videos

This paper proposes a novel loss function that specifically considers the temporal context naturally present around each action, rather than focusing on the single annotated frame to spot, and demonstrates the generalization capability of this loss for generic activity proposals and detection on ActivityNet.

SoccerDB: A Large-Scale Database for Comprehensive Video Understanding

This paper proposes a new soccer video database named SoccerDB, comprising 171,191 video segments from 346 high-quality soccer games, which is the largest database for comprehensive sports video understanding on various aspects.

SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos

This work proposes SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production, and extends current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection, and a novel replay grounding task.

Improved Soccer Action Spotting using both Audio and Video Streams

This work used the SoccerNet benchmark dataset, which contains annotated events for 500 soccer game videos from the Big Five European leagues, and evaluated several ways to integrate audio stream into video-only-based architectures.

A Graph-Based Method for Soccer Action Spotting Using Unsupervised Player Classification

This work identifies and representing the players, referees, and goalkeepers as nodes in a graph, and modeling their temporal interactions as sequences of graphs, and obtains an overall performance that surpasses similar graph-based methods and has competitive results with heavy computing methods.

STE: Spatio-Temporal Encoder for Action Spotting in Soccer Videos

A modified version of the Spatio-Temporal Encoder (STE) model is introduced: STE-v2 that improved the tight a-mAP to reach 58.71% on the challenge split and 58.48%" on the test split.

SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries

A novel task of dense video captioning focusing on the generation of textual commentaries anchored with single times-tamps that has the potential to enhance the accessibility and understanding of soccer content for a wider audience and bring the excitement of the game to more people.

Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection

This tech report presents a two-stage paradigm to detect what and when events happen in soccer broadcast videos, fine-tune multiple action recognition models on soccer data to extract high-level semantic features, and design a transformer based temporal detection module to locate the target events.

RMS-Net: Regression and Masking for Soccer Event Spotting

A lightweight and modular network which can simultaneously predict the event label and its temporal offset using the same underlying features is devised, which reaches a gain of more than 10 Average-mAP points on the test set when fine-tuned in combination with a strong 2D backbone.

Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors

A model for temporally precise action spotting in videos uses a dense set of detection anchors, predicting a detection confidence and corresponding fine-grained temporal displacement for each anchor, and experiment with two trunk architectures, one of which is a one-dimensional version of a u-net, and a Transformer encoder (TE).
...

Dense-Captioning Events in Videos

This work proposes a new model that is able to identify all events in a single pass of the video while simultaneously describing the detected events with natural language, and introduces a new captioning module that uses contextual information from past and future events to jointly describe all events.

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.

Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos

This paper introduces a proposal method that aims to recover temporal segments containing actions in untrimmed videos and introduces a learning framework to represent and retrieve activity proposals.

Automatic Soccer Video Analysis and Summarization

A fully automatic and computationally efficient framework for analysis and summarization of soccer videos using cinematic and object-based features, which includes some novel low-level soccer video processing algorithms, as well as some higher-level algorithms for goal detection, referee detection, and penalty-box detection.

Soccer Video Event Annotation by Synchronization of Attack–Defense Clips and Match Reports With Coarse-Grained Time Information

A more generalized approach that synchronizes video events with text descriptions using high-level semantics with coarse time constraints, rather than assuming that the timestamp is given exactly in the text description.

Automatic Soccer Video Event Detection Based on a Deep Neural Network Combined CNN and RNN

A deep neural network is constructed to detect soccer video event and uses RNN to map the semantic features of key frames from PB to soccer event types, including goal, goal attempt, card and corner.

Detecting Events and Key Actors in Multi-person Videos

This paper proposes a model which learns to detect events in videos while automatically "attending" to the people responsible for the event, and outperforms state-of-the-art methods for both event classification and detection on this new dataset.

Goal!! Event detection in sports video

Experimental results demonstrate that extremely high classification accuracy can be achieved, from a dramatically limited number of examples, by leveraging pre-trained models with fusion of spatio-temporal features.

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

A novel loss function for the localization network is proposed to explicitly consider temporal overlap and achieve high temporal localization accuracy in untrimmed long videos.

Leveraging Contextual Cues for Generating Basketball Highlights

The informativeness of five different cues derived from the video and from the environment are explored through user studies and show that for study participants, the highlights produced by the system are comparable to the ones produced by ESPN for the same games.
...