LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 150

Search options

  1. Book ; Online: Learning Space-Time Semantic Correspondences

    Tran, Du / Malik, Jitendra

    2023  

    Abstract: We propose a new task of space-time semantic correspondence prediction in videos. Given a source video, a target video, and a set of space-time key-points in the source video, the task requires predicting a set of keypoints in the target video that are ... ...

    Abstract We propose a new task of space-time semantic correspondence prediction in videos. Given a source video, a target video, and a set of space-time key-points in the source video, the task requires predicting a set of keypoints in the target video that are the semantic correspondences of the provided source keypoints. We believe that this task is important for fine-grain video understanding, potentially enabling applications such as activity coaching, sports analysis, robot imitation learning, and more. Our contributions in this paper are: (i) proposing a new task and providing annotations for space-time semantic correspondences on two existing benchmarks: Penn Action and Pouring; and (ii) presenting a comprehensive set of baselines and experiments to gain insights about the new problem. Our main finding is that the space-time semantic correspondence prediction problem is best approached jointly in space and time rather than in their decomposed sub-problems: time alignment and spatial correspondences.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-06-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Article ; Online: Real-world humanoid locomotion with reinforcement learning.

    Radosavovic, Ilija / Xiao, Tete / Zhang, Bike / Darrell, Trevor / Malik, Jitendra / Sreenath, Koushil

    Science robotics

    2024  Volume 9, Issue 89, Page(s) eadi9579

    Abstract: Humanoid robots that can autonomously operate in diverse environments have the potential to help address labor shortages in factories, assist elderly at home, and colonize new planets. Although classical controllers for humanoid robots have shown ... ...

    Abstract Humanoid robots that can autonomously operate in diverse environments have the potential to help address labor shortages in factories, assist elderly at home, and colonize new planets. Although classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesized that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in context, without updating its weights. We trained our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deployed it to the real-world zero-shot. Our controller could walk over various outdoor terrains, was robust to external disturbances, and could adapt in context.
    MeSH term(s) Humans ; Aged ; Robotics/methods ; Locomotion ; Walking ; Learning ; Reinforcement, Psychology
    Language English
    Publishing date 2024-04-17
    Publishing country United States
    Document type Journal Article
    ISSN 2470-9476
    ISSN (online) 2470-9476
    DOI 10.1126/scirobotics.adi9579
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Book ; Online: Decoupling Human and Camera Motion from Videos in the Wild

    Ye, Vickie / Pavlakos, Georgios / Malik, Jitendra / Kanazawa, Angjoo

    2023  

    Abstract: We propose a method to reconstruct global human trajectories from videos in the wild. Our optimization method decouples the camera and human motion, which allows us to place people in the same world coordinate frame. Most existing methods do not model ... ...

    Abstract We propose a method to reconstruct global human trajectories from videos in the wild. Our optimization method decouples the camera and human motion, which allows us to place people in the same world coordinate frame. Most existing methods do not model the camera motion; methods that rely on the background pixels to infer 3D human motion usually require a full scene reconstruction, which is often not possible for in-the-wild videos. However, even when existing SLAM systems cannot recover accurate scene reconstructions, the background pixel motion still provides enough signal to constrain the camera motion. We show that relative camera estimates along with data-driven human motion priors can resolve the scene scale ambiguity and recover global human trajectories. Our method robustly recovers the global 3D trajectories of people in challenging in-the-wild videos, such as PoseTrack. We quantify our improvement over existing methods on 3D human dataset Egobody. We further demonstrate that our recovered camera scale allows us to reason about motion of multiple people in a shared coordinate frame, which improves performance of downstream tracking in PoseTrack. Code and video results can be found at https://vye16.github.io/slahmr.

    Comment: Project site: https://vye16.github.io/slahmr
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-02-24
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Learning Vision-based Pursuit-Evasion Robot Policies

    Bajcsy, Andrea / Loquercio, Antonio / Kumar, Ashish / Malik, Jitendra

    2023  

    Abstract: Learning strategic robot behavior -- like that required in pursuit-evasion interactions -- under real-world constraints is extremely challenging. It requires exploiting the dynamics of the interaction, and planning through both physical state and latent ... ...

    Abstract Learning strategic robot behavior -- like that required in pursuit-evasion interactions -- under real-world constraints is extremely challenging. It requires exploiting the dynamics of the interaction, and planning through both physical state and latent intent uncertainty. In this paper, we transform this intractable problem into a supervised learning problem, where a fully-observable robot policy generates supervision for a partially-observable one. We find that the quality of the supervision signal for the partially-observable pursuer policy depends on two key factors: the balance of diversity and optimality of the evader's behavior and the strength of the modeling assumptions in the fully-observable policy. We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild. Despite all the challenges, the sensing constraints bring about creativity: the robot is pushed to gather information when uncertain, predict intent from noisy measurements, and anticipate in order to intercept. Project webpage: https://abajcsy.github.io/vision-based-pursuit/

    Comment: Includes Supplementary. Project webpage at https://abajcsy.github.io/vision-based-pursuit/
    Keywords Computer Science - Robotics ; Computer Science - Artificial Intelligence
    Subject code 629
    Publishing date 2023-08-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Interactive Task Planning with Language Models

    Li, Boyi / Wu, Philipp / Abbeel, Pieter / Malik, Jitendra

    2023  

    Abstract: An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals or distinct tasks, even during execution. However, most traditional methods require predefined module design, which makes it hard to generalize ... ...

    Abstract An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals or distinct tasks, even during execution. However, most traditional methods require predefined module design, which makes it hard to generalize to different goals. Recent large language model based approaches can allow for more open-ended planning but often require heavy prompt engineering or domain-specific pretrained models. To tackle this, we propose a simple framework that achieves interactive task planning with language models. Our system incorporates both high-level planning and low-level function execution via language. We verify the robustness of our system in generating novel high-level instructions for unseen objectives and its ease of adaptation to different tasks by merely substituting the task guidelines, without the need for additional complex prompt engineering. Furthermore, when the user sends a new request, our system is able to replan accordingly with precision based on the new request, task guidelines and previously executed steps. Please check more details on our https://wuphilipp.github.io/itp_site and https://youtu.be/TrKLuyv26_g.
    Keywords Computer Science - Robotics ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Human-Computer Interaction
    Subject code 004
    Publishing date 2023-10-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Adaptive Human Trajectory Prediction via Latent Corridors

    Thakkar, Neerja / Mangalam, Karttikeya / Bajcsy, Andrea / Malik, Jitendra

    2023  

    Abstract: Human trajectory prediction is typically posed as a zero-shot generalization problem: a predictor is learnt on a dataset of human motion in training scenes, and then deployed on unseen test scenes. While this paradigm has yielded tremendous progress, it ... ...

    Abstract Human trajectory prediction is typically posed as a zero-shot generalization problem: a predictor is learnt on a dataset of human motion in training scenes, and then deployed on unseen test scenes. While this paradigm has yielded tremendous progress, it fundamentally assumes that trends in human behavior within the deployment scene are constant over time. As such, current prediction models are unable to adapt to scene-specific transient human behaviors, such as crowds temporarily gathering to see buskers, pedestrians hurrying through the rain and avoiding puddles, or a protest breaking out. We formalize the problem of scene-specific adaptive trajectory prediction and propose a new adaptation approach inspired by prompt tuning called latent corridors. By augmenting the input of any pre-trained human trajectory predictor with learnable image prompts, the predictor can improve in the deployment scene by inferring trends from extremely small amounts of new data (e.g., 2 humans observed for 30 seconds). With less than 0.1% additional model parameters, we see up to 23.9% ADE improvement in MOTSynth simulated data and 16.4% ADE in MOT and Wildtrack real pedestrian data. Qualitatively, we observe that latent corridors imbue predictors with an awareness of scene geometry and scene-specific human behaviors that non-adaptive predictors struggle to capture. The project website can be found at https://neerja.me/atp_latent_corridors/.

    Comment: Project website can be found at https://neerja.me/atp_latent_corridors/
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-12-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Article ; Online: Navigating to objects in the real world.

    Gervet, Theophile / Chintala, Soumith / Batra, Dhruv / Malik, Jitendra / Chaplot, Devendra Singh

    Science robotics

    2023  Volume 8, Issue 79, Page(s) eadf6991

    Abstract: Semantic navigation is necessary to deploy mobile robots in uncontrolled environments such as homes or hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial ... ...

    Abstract Semantic navigation is necessary to deploy mobile robots in uncontrolled environments such as homes or hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end learning approaches reactively map sensor inputs to actions with deep neural networks, whereas modular learning approaches enrich the classical pipeline with learning-based semantic sensing and exploration. However, learned visual navigation policies have predominantly been evaluated in sim, with little known about what works on a robot. We present a large-scale empirical study of semantic visual navigation methods comparing representative methods with classical, modular, and end-to-end learning approaches across six homes with no prior experience, maps, or instrumentation. We found that modular learning works well in the real world, attaining a 90% success rate. In contrast, end-to-end learning does not, dropping from 77% sim to a 23% real-world success rate because of a large image domain gap between sim and reality. For practitioners, we show that modular learning is a reliable approach to navigate to objects: Modularity and abstraction in policy design enable sim-to-real transfer. For researchers, we identify two key issues that prevent today's simulators from being reliable evaluation benchmarks-a large sim-to-real gap in images and a disconnect between sim and real-world error modes-and propose concrete steps forward.
    Language English
    Publishing date 2023-06-28
    Publishing country United States
    Document type Journal Article
    ISSN 2470-9476
    ISSN (online) 2470-9476
    DOI 10.1126/scirobotics.adf6991
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Multi-View Supervision for Single-View Reconstruction via Differentiable Ray Consistency.

    Tulsiani, Shubham / Zhou, Tinghui / Efros, Alexei A / Malik, Jitendra

    IEEE transactions on pattern analysis and machine intelligence

    2022  Volume 44, Issue 12, Page(s) 8754–8765

    Abstract: We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view. We do so by reformulating view consistency ... ...

    Abstract We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view. We do so by reformulating view consistency using a differentiable ray consistency (DRC) term. We show that this formulation can be incorporated in a learning framework to leverage different types of multi-view observations e.g., foreground masks, depth, color images, semantics etc. as supervision for learning single-view 3D prediction. We present empirical analysis of our technique in a controlled setting. We also show that this approach allows us to improve over existing techniques for single-view reconstruction of objects from the PASCAL VOC dataset.
    Language English
    Publishing date 2022-11-07
    Publishing country United States
    Document type Journal Article
    ISSN 1939-3539
    ISSN (online) 1939-3539
    DOI 10.1109/TPAMI.2019.2898859
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Book ; Online: CA$^2$T-Net

    Collins, Jasmine / Liang, Anqi / Malik, Jitendra / Zhang, Hao / Devernay, Frédéric

    Category-Agnostic 3D Articulation Transfer from Single Image

    2023  

    Abstract: We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i.e., unarticulated) 3D model. Our network learns to predict the object's pose, part segmentation, and corresponding motion ... ...

    Abstract We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i.e., unarticulated) 3D model. Our network learns to predict the object's pose, part segmentation, and corresponding motion parameters to reproduce the articulation shown in the input image. The network is composed of three distinct branches that take a shared joint image-shape embedding and is trained end-to-end. Unlike previous methods, our approach is independent of the topology of the object and can work with objects from arbitrary categories. Our method, trained with only synthetic data, can be used to automatically animate a mesh, infer motion from real images, and transfer articulation to functionally similar but geometrically distinct 3D models at test time.

    Comment: 8 pages
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Graphics ; Computer Science - Machine Learning
    Publishing date 2023-01-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: On the Benefits of 3D Pose and Tracking for Human Action Recognition

    Rajasegaran, Jathushan / Pavlakos, Georgios / Kanazawa, Angjoo / Feichtenhofer, Christoph / Malik, Jitendra

    2023  

    Abstract: In this work we study the benefits of using tracking and 3D poses for action recognition. To achieve this, we take the Lagrangian view on analysing actions over a trajectory of human motion rather than at a fixed point in space. Taking this stand allows ... ...

    Abstract In this work we study the benefits of using tracking and 3D poses for action recognition. To achieve this, we take the Lagrangian view on analysing actions over a trajectory of human motion rather than at a fixed point in space. Taking this stand allows us to use the tracklets of people to predict their actions. In this spirit, first we show the benefits of using 3D pose to infer actions, and study person-person interactions. Subsequently, we propose a Lagrangian Action Recognition model by fusing 3D pose and contextualized appearance over tracklets. To this end, our method achieves state-of-the-art performance on the AVA v2.2 dataset on both pose only settings and on standard benchmark settings. When reasoning about the action using only pose cues, our pose model achieves +10.0 mAP gain over the corresponding state-of-the-art while our fused model has a gain of +2.8 mAP over the best state-of-the-art model. Code and results are available at: https://brjathu.github.io/LART

    Comment: CVPR2023 (project page: https://brjathu.github.io/LART)
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-04-03
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top