LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 19

Search options

  1. Book ; Online: Grounding Language in Play

    Lynch, Corey / Sermanet, Pierre

    2020  

    Abstract: Natural language is perhaps the most versatile and intuitive way for humans to communicate tasks to a robot. Prior work on Learning from Play (LfP) [Lynch et al, 2019] provides a simple approach for learning a wide variety of robotic behaviors from ... ...

    Abstract Natural language is perhaps the most versatile and intuitive way for humans to communicate tasks to a robot. Prior work on Learning from Play (LfP) [Lynch et al, 2019] provides a simple approach for learning a wide variety of robotic behaviors from general sensors. However, each task must be specified with a goal image---something that is not practical in open-world environments. In this work we present a simple and scalable way to condition policies on human language instead. We extend LfP by pairing short robot experiences from play with relevant human language after-the-fact. To make this efficient, we introduce multicontext imitation, which allows us to train a single agent to follow image or language goals, then use just language conditioning at test time. This reduces the cost of language pairing to less than 1% of collected robot experience, with the majority of control still learned via self-supervised imitation. At test time, a single agent trained in this manner can perform many different robotic manipulation skills in a row in a 3D environment, directly from images, and specified only with natural language (e.g. "open the drawer.now pick up the block.now press the green button."). Finally, we introduce a simple technique that transfers knowledge from large unlabeled text corpora to robotic learning. We find that transfer significantly improves downstream robotic manipulation. It also allows our agent to follow thousands of novel instructions at test time in zero shot, in 16 different languages. See videos of our experiments at language-play.github.io
    Keywords Computer Science - Robotics ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition
    Subject code 121 ; 004
    Publishing date 2020-05-15
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Learning to Play by Imitating Humans

    Dinyari, Rostam / Sermanet, Pierre / Lynch, Corey

    2020  

    Abstract: Acquiring multiple skills has commonly involved collecting a large number of expert demonstrations per task or engineering custom reward functions. Recently it has been shown that it is possible to acquire a diverse set of skills by self-supervising ... ...

    Abstract Acquiring multiple skills has commonly involved collecting a large number of expert demonstrations per task or engineering custom reward functions. Recently it has been shown that it is possible to acquire a diverse set of skills by self-supervising control on top of human teleoperated play data. Play is rich in state space coverage and a policy trained on this data can generalize to specific tasks at test time outperforming policies trained on individual expert task demonstrations. In this work, we explore the question of whether robots can learn to play to autonomously generate play data that can ultimately enhance performance. By training a behavioral cloning policy on a relatively small quantity of human play, we autonomously generate a large quantity of cloned play data that can be used as additional training. We demonstrate that a general purpose goal-conditioned policy trained on this augmented dataset substantially outperforms one trained only with the original human data on 18 difficult user-specified manipulation tasks in a simulated robotic tabletop environment. A video example of a robot imitating human play can be seen here: https://learning-to-play.github.io/videos/undirected_play1.mp4
    Keywords Computer Science - Robotics ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning ; Electrical Engineering and Systems Science - Systems and Control
    Subject code 006
    Publishing date 2020-06-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning

    Ichter, Brian / Sermanet, Pierre / Lynch, Corey

    2020  

    Abstract: Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces with complex dynamics. Classical motion planning algorithms, such as rapidly-exploring random trees, are capable of ... ...

    Abstract Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces with complex dynamics. Classical motion planning algorithms, such as rapidly-exploring random trees, are capable of efficiently exploring large state spaces and computing long-horizon, sequential plans. However, these algorithms are generally challenged with complex, stochastic, and high-dimensional state spaces as well as in the presence of narrow passages, which naturally emerge in tasks that interact with the environment. Machine learning offers a promising solution for its ability to learn general policies that can handle complex interactions and high-dimensional observations. However, these policies are generally limited in horizon length. Our approach, Broadly-Exploring, Local-policy Trees (BELT), merges these two approaches to leverage the strengths of both through a task-conditioned, model-based tree search. BELT uses an RRT-inspired tree search to efficiently explore the state space. Locally, the exploration is guided by a task-conditioned, learned policy capable of performing general short-horizon tasks. This task space can be quite general and abstract; its only requirements are to be sampleable and to well-cover the space of useful tasks. This search is aided by a task-conditioned model that temporally extends dynamics propagation to allow long-horizon search and sequential reasoning over tasks. BELT is demonstrated experimentally to be able to plan long-horizon, sequential trajectories with a goal conditioned policy and generate plans that are robust.
    Keywords Computer Science - Robotics ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2020-10-13
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: With a Little Help from My Friends

    Dwibedi, Debidatta / Aytar, Yusuf / Tompson, Jonathan / Sermanet, Pierre / Zisserman, Andrew

    Nearest-Neighbor Contrastive Learning of Visual Representations

    2021  

    Abstract: Self-supervised learning algorithms based on instance discrimination train encoders to be invariant to pre-defined transformations of the same instance. While most methods treat different views of the same image as positives for a contrastive loss, we ... ...

    Abstract Self-supervised learning algorithms based on instance discrimination train encoders to be invariant to pre-defined transformations of the same instance. While most methods treat different views of the same image as positives for a contrastive loss, we are interested in using positives from other instances in the dataset. Our method, Nearest-Neighbor Contrastive Learning of visual Representations (NNCLR), samples the nearest neighbors from the dataset in the latent space, and treats them as positives. This provides more semantic variations than pre-defined transformations. We find that using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification, from 71.7% to 75.6%, outperforming previous state-of-the-art methods. On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53.8% to 56.5%. On transfer learning benchmarks our method outperforms state-of-the-art methods (including supervised learning with ImageNet) on 8 out of 12 downstream datasets. Furthermore, we demonstrate empirically that our method is less reliant on complex data augmentations. We see a relative reduction of only 2.1% ImageNet Top-1 accuracy when we train using only random crops.

    Comment: Accepted at ICCV 2021
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2021-04-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Counting Out Time

    Dwibedi, Debidatta / Aytar, Yusuf / Tompson, Jonathan / Sermanet, Pierre / Zisserman, Andrew

    Class Agnostic Video Repetition Counting in the Wild

    2020  

    Abstract: We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that ... ...

    Abstract We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called Repnet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos. Project webpage: https://sites.google.com/view/repnet .

    Comment: Accepted at CVPR 2020. Project webpage: https://sites.google.com/view/repnet
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2020-06-27
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Learning Actionable Representations from Visual Observations

    Dwibedi, Debidatta / Tompson, Jonathan / Lynch, Corey / Sermanet, Pierre

    2018  

    Abstract: In this work we explore a new approach for robots to teach themselves about the world simply by observing it. In particular we investigate the effectiveness of learning task-agnostic representations for continuous control tasks. We extend Time- ... ...

    Abstract In this work we explore a new approach for robots to teach themselves about the world simply by observing it. In particular we investigate the effectiveness of learning task-agnostic representations for continuous control tasks. We extend Time-Contrastive Networks (TCN) that learn from visual observations by embedding multiple frames jointly in the embedding space as opposed to a single frame. We show that by doing so, we are now able to encode both position and velocity attributes significantly more accurately. We test the usefulness of this self-supervised approach in a reinforcement learning setting. We show that the representations learned by agents observing themselves take random actions, or other agents perform tasks successfully, can enable the learning of continuous control policies using algorithms like Proximal Policy Optimization (PPO) using only the learned embeddings as input. We also demonstrate significant improvements on the real-world Pouring dataset with a relative error reduction of 39.4% for motion attributes and 11.1% for static attributes compared to the single-frame baseline. Video results are available at https://sites.google.com/view/actionablerepresentations .

    Comment: This work is accepted in IROS 2018. Project website: https://sites.google.com/view/actionablerepresentations
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning ; Computer Science - Robotics
    Subject code 006
    Publishing date 2018-08-02
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations

    Heravi, Negin / Wahid, Ayzaan / Lynch, Corey / Florence, Pete / Armstrong, Travis / Tompson, Jonathan / Sermanet, Pierre / Bohg, Jeannette / Dwibedi, Debidatta

    2022  

    Abstract: Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current ... ...

    Abstract Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervised methods require large labeled datasets for each task that are expensive to collect in the real world. Using self-supervised learning to obtain representations from unlabeled data can mitigate this problem. However, current self-supervised representation learning methods are mostly object agnostic, and we demonstrate that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components. In this paper, we explore the effectiveness of using object-aware representation learning techniques for robotic tasks. Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction. We show that our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object agnostic techniques as well as methods trained on raw RGB images. Our results show a 20 percent increase in performance in low data regimes (1000 trajectories) in policy training using implicit behavioral cloning (IBC). Furthermore, our method outperforms the baselines for the task of object localization in multi-object scenes.
    Keywords Computer Science - Robotics ; Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning
    Subject code 006 ; 004
    Publishing date 2022-05-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models

    Xiao, Ted / Chan, Harris / Sermanet, Pierre / Wahid, Ayzaan / Brohan, Anthony / Hausman, Karol / Levine, Sergey / Tompson, Jonathan

    2022  

    Abstract: In recent years, much progress has been made in learning robotic manipulation policies that follow natural language instructions. Such methods typically learn from corpora of robot-language data that was either collected with specific tasks in mind or ... ...

    Abstract In recent years, much progress has been made in learning robotic manipulation policies that follow natural language instructions. Such methods typically learn from corpora of robot-language data that was either collected with specific tasks in mind or expensively re-labelled by humans with rich language descriptions in hindsight. Recently, large-scale pretrained vision-language models (VLMs) like CLIP or ViLD have been applied to robotics for learning representations and scene descriptors. Can these pretrained models serve as automatic labelers for robot data, effectively importing Internet-scale knowledge into existing datasets to make them useful even for tasks that are not reflected in their ground truth annotations? To accomplish this, we introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL): we utilize semi-supervised language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data and then train language-conditioned policies on the augmented datasets. This method enables cheaper acquisition of useful language descriptions compared to expensive human labels, allowing for more efficient label coverage of large-scale datasets. We apply DIAL to a challenging real-world robotic manipulation domain where 96.5% of the 80,000 demonstrations do not contain crowd-sourced language annotations. DIAL enables imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset.

    Comment: Published as a conference paper at RSS 2023
    Keywords Computer Science - Robotics ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Subject code 004
    Publishing date 2022-11-21
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Temporal Cycle-Consistency Learning

    Dwibedi, Debidatta / Aytar, Yusuf / Tompson, Jonathan / Sermanet, Pierre / Zisserman, Andrew

    2019  

    Abstract: We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. The method trains a network using temporal cycle consistency (TCC), a differentiable cycle-consistency loss that can be used to find ... ...

    Abstract We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. The method trains a network using temporal cycle consistency (TCC), a differentiable cycle-consistency loss that can be used to find correspondences across time in multiple videos. The resulting per-frame embeddings can be used to align videos by simply matching frames using the nearest-neighbors in the learned embedding space. To evaluate the power of the embeddings, we densely label the Pouring and Penn Action video datasets for action phases. We show that (i) the learned embeddings enable few-shot classification of these action phases, significantly reducing the supervised training requirements; and (ii) TCC is complementary to other methods of self-supervised learning in videos, such as Shuffle and Learn and Time-Contrastive Networks. The embeddings are also used for a number of applications based on alignment (dense temporal correspondence) between video pairs, including transfer of metadata of synchronized modalities between videos (sounds, temporal semantic labels), synchronized playback of multiple videos, and anomaly detection. Project webpage: https://sites.google.com/view/temporal-cycle-consistency .

    Comment: Accepted at CVPR 2019. Project webpage: https://sites.google.com/view/temporal-cycle-consistency
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2019-04-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Online Object Representations with Contrastive Learning

    Pirk, Sören / Khansari, Mohi / Bai, Yunfei / Lynch, Corey / Sermanet, Pierre

    2019  

    Abstract: We propose a self-supervised approach for learning representations of objects from monocular videos and demonstrate it is particularly useful in situated settings such as robotics. The main contributions of this paper are: 1) a self-supervising objective ...

    Abstract We propose a self-supervised approach for learning representations of objects from monocular videos and demonstrate it is particularly useful in situated settings such as robotics. The main contributions of this paper are: 1) a self-supervising objective trained with contrastive learning that can discover and disentangle object attributes from video without using any labels; 2) we leverage object self-supervision for online adaptation: the longer our online model looks at objects in a video, the lower the object identification error, while the offline baseline remains with a large fixed error; 3) to explore the possibilities of a system entirely free of human supervision, we let a robot collect its own data, train on this data with our self-supervise scheme, and then show the robot can point to objects similar to the one presented in front of it, demonstrating generalization of object attributes. An interesting and perhaps surprising finding of this approach is that given a limited set of objects, object correspondences will naturally emerge when using contrastive learning without requiring explicit positive pairs. Videos illustrating online object adaptation and robotic pointing are available at: https://online-objects.github.io/.

    Comment: 10 pages
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning ; Computer Science - Robotics
    Subject code 004
    Publishing date 2019-06-10
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top