A Closer Look at Spatiotemporal Convolutions for Action Recognition
Abstract - Summary: The paper discusses several forms of spatiotemporal convolutions for video analysis and studies their effects on action recognition.It empirically demonstrates the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning.The paper also proposes the design of a new spatiotemporal convolution block “R(2+1)D”, which produces results superior/comparable to state of the art on popular action recognition datasets such as : Sports - 1M, Kinetics, UCF 101 and HMDB51.
Paper - https://arxiv.org/abs/1711.11248 Code - https://github.com/irhum/R2Plus1D-PyTorch Dataset - hmdb 51: https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/ milkbottletoydataset: https://github.com/microsoft/computervision-recipes/blob/master/scenarios/action_recognition/01_training_introduction.ipynb