Details
Paper ID 69
Difficulty - Hard

Categories

  • Computer Vision
  • Action Recognition
  • hard

Abstract - Summary: The paper discusses several forms of spatiotemporal convolutions for video analysis and studies their effects on action recognition.It empirically demonstrates the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning.The paper also proposes the design of a new spatiotemporal convolution block “R(2+1)D”, which produces results superior/comparable to state of the art on popular action recognition datasets such as : Sports - 1M, Kinetics, UCF 101 and HMDB51.

Paper - https://arxiv.org/abs/1711.11248 Code - https://github.com/irhum/R2Plus1D-PyTorch Dataset - hmdb 51: https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/ milkbottletoydataset: https://github.com/microsoft/computervision-recipes/blob/master/scenarios/action_recognition/01_training_introduction.ipynb