Browsing by Autor "Jitendra Malik"

Now showing 1 - 5 of 5

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
(2024) Kristen Grauman; Andrew Westbury; Lorenzo Torresani; Kris Kitani; Jitendra Malik; Triantafyllos Afouras; Kumar Ashutosh; Vijay Baiyya; Siddhant Bansal; Bikram Boote
We present Ego-Exo4D, a diverse, large-scale multi-modal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured ego-centric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is un-precedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions-including a novel “expert commentary” done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community.
Learning Rich Features from RGB-D Images for Object Detection and Segmentation
(Springer Science+Business Media, 2014) Saurabh Gupta; Ross Girshick; Pablo Arbeláez; Jitendra Malik
Learning Rich Features from RGB-D Images for Object Detection and Segmentation
(Cornell University, 2014) Saurabh Gupta; Ross Girshick; Pablo Arbeláez; Jitendra Malik
In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. We propose a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity. We demonstrate that this geocentric embedding works better than using raw depth images for learning feature representations with convolutional neural networks. Our final object detection system achieves an average precision of 37.3%, which is a 56% relative improvement over existing methods. We then focus on the task of instance segmentation where we label pixels belonging to object instances found by our detector. For this task, we propose a decision forest approach that classifies pixels in the detection window as foreground or background using a family of unary and binary tests that query shape and geocentric pose features. Finally, we use the output from our object detectors in an existing superpixel classification framework for semantic scene segmentation and achieve a 24% relative improvement over current state-of-the-art for the object categories that we study. We believe advances such as those represented in this paper will facilitate the use of perception in fields like robotics.
Simultaneous Detection and Segmentation
(Cornell University, 2014) Bharath Hariharan; Pablo Arbeláez; Ross Girshick; Jitendra Malik
We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [16]), introducing a novel architecture tailored for SDS. We then use category-specific, top- down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 5 point boost (10% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work.
Simultaneous Detection and Segmentation
(Springer Science+Business Media, 2014) Bharath Hariharan; Pablo Arbeláez; Ross Girshick; Jitendra Malik