LIVIVO - Das Suchportal für Lebenswissenschaften

switch to English language
Erweiterte Suche

Ihre letzten Suchen

  1. AU="Schmid, Cordelia"
  2. AU="Zhiping Yuan" AU="Zhiping Yuan"
  3. AU="Szczepanska, Joanna"
  4. AU="Charmillon, Alexandre"
  5. AU=Chua Jia Xiang
  6. AU="Barber, T M"
  7. AU="Stipp, Christopher S"
  8. AU="Styczynski, Jan"
  9. AU="Schramm, Karl-Werner"
  10. AU="Mandhane, Piushkumar J"
  11. AU="Piraino, Domenica Carmen"
  12. AU="Owens, Janel E"
  13. AU="Vella, Kevin"
  14. AU="Eisenhaure, Thomas M"
  15. AU="Eggenberger, Caleb"
  16. AU="Choi, Seung-Hye"
  17. AU="Hans, B. T. Hackett S."
  18. AU="Singh, Ronak"
  19. AU=Forcados Gilead Ebiegberi
  20. AU="Kasperbauer, Jan L."
  21. AU="Xiang, Fangfei"
  22. AU=Bhattacharyya Rupam
  23. AU="Stefanski, R J"
  24. AU="Huiyuan Zhang"
  25. AU="Garg, Shivam Kumar"
  26. AU="Bart J. A. Rijnders"
  27. AU="Malaise, D"
  28. AU="Ahammad, Rijwan U"
  29. AU="Wong, Man Yu"
  30. AU="Yilmaz, Adnan"
  31. AU="Turkyilmaz, Ayberk"
  32. AU="Ryan, Sophia C"
  33. AU="Stino, Heiko"
  34. AU=Fischbeck K H
  35. AU="Giadinis, Nektarios D"
  36. AU="Patten, Scott"
  37. AU="Verma, Deepika"
  38. AU="Foo, Anthony Tun Lin"
  39. AU="Georgia Panagiotakos"
  40. AU="Tennankore, Karthik K."
  41. AU=Kubota Kenji
  42. AU="Vieille, Peggy"
  43. AU="Kan, Yin-Shi"
  44. AU="Jasińska-Balwierz, Agata"
  45. AU="Hargitai, Rita"
  46. AU=Ueda Kazumitsu
  47. AU="Andrew N. Jordan"
  48. AU="Millemaggi, Alessia"
  49. AU=Paulsen Paige
  50. AU="Fan, Su-Su"
  51. AU="de Azeredo, Andressa Cardoso"
  52. AU="Miller, Russell"
  53. AU="A Mombet"

Suchergebnis

Treffer 1 - 10 von insgesamt 142

Suchoptionen

  1. Buch ; Online: RAVEN

    Ghosh, Partha / Sanyal, Soubhik / Schmid, Cordelia / Schölkopf, Bernhard

    Rethinking Adversarial Video Generation with Efficient Tri-plane Networks

    2024  

    Abstract: We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies. To capture these dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware ... ...

    Abstract We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies. To capture these dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a singular latent code to model an entire video sequence. Individual video frames are then synthesized from an intermediate tri-plane representation, which itself is derived from the primary latent code. This novel strategy reduces computational complexity by a factor of $2$ as measured in FLOPs. Consequently, our approach facilitates the efficient and temporally coherent generation of videos. Moreover, our joint frame modeling approach, in contrast to autoregressive methods, mitigates the generation of visual artifacts. We further enhance the model's capabilities by integrating an optical flow-based module within our Generative Adversarial Network (GAN) based generator architecture, thereby compensating for the constraints imposed by a smaller generator size. As a result, our model is capable of synthesizing high-fidelity video clips at a resolution of $256\times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps. The efficacy and versatility of our approach are empirically validated through qualitative and quantitative assessments across three different datasets comprising both synthetic and real video clips.
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning
    Thema/Rubrik (Code) 004
    Erscheinungsdatum 2024-01-11
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  2. Buch ; Online: gSDF

    Chen, Zerui / Chen, Shizhe / Schmid, Cordelia / Laptev, Ivan

    Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction

    2023  

    Abstract: Signed distance functions (SDFs) is an attractive framework that has recently shown promising results for 3D shape reconstruction from images. SDFs seamlessly generalize to different shape resolutions and topologies but lack explicit modelling of the ... ...

    Abstract Signed distance functions (SDFs) is an attractive framework that has recently shown promising results for 3D shape reconstruction from images. SDFs seamlessly generalize to different shape resolutions and topologies but lack explicit modelling of the underlying 3D geometry. In this work, we exploit the hand structure and use it as guidance for SDF-based shape reconstruction. In particular, we address reconstruction of hands and manipulated objects from monocular RGB images. To this end, we estimate poses of hands and objects and use them to guide 3D reconstruction. More specifically, we predict kinematic chains of pose transformations and align SDFs with highly-articulated hand poses. We improve the visual features of 3D points with geometry alignment and further leverage temporal information to enhance the robustness to occlusion and motion blurs. We conduct extensive experiments on the challenging ObMan and DexYCB benchmarks and demonstrate significant improvements of the proposed method over the state of the art.

    Comment: Accepted by CVPR 2023. Project Page: https://zerchen.github.io/projects/gsdf.html
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition
    Thema/Rubrik (Code) 004
    Erscheinungsdatum 2023-04-24
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  3. Buch ; Online: Retrieval-Enhanced Contrastive Vision-Text Models

    Iscen, Ahmet / Caron, Mathilde / Fathi, Alireza / Schmid, Cordelia

    2023  

    Abstract: Contrastive image-text models such as CLIP form the building blocks of many state-of-the-art systems. While they excel at recognizing common generic concepts, they still struggle on fine-grained entities which are rare, or even absent from the pre- ... ...

    Abstract Contrastive image-text models such as CLIP form the building blocks of many state-of-the-art systems. While they excel at recognizing common generic concepts, they still struggle on fine-grained entities which are rare, or even absent from the pre-training dataset. Hence, a key ingredient to their success has been the use of large-scale curated pre-training data aiming at expanding the set of concepts that they can memorize during the pre-training stage. In this work, we explore an alternative to encoding fine-grained knowledge directly into the model's parameters: we instead train the model to retrieve this knowledge from an external memory. Specifically, we propose to equip existing vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time, which greatly improves their zero-shot predictions. Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP. Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks: for example +10.9 on Stanford Cars, +10.2 on CUB-2011 and +7.3 on the recent OVEN benchmark.
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-06-12
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  4. Buch ; Online: Learning Video-Conditioned Policies for Unseen Manipulation Tasks

    Chane-Sane, Elliot / Schmid, Cordelia / Laptev, Ivan

    2023  

    Abstract: The ability to specify robot commands by a non-expert user is critical for building generalist agents capable of solving a large variety of tasks. One convenient way to specify the intended robot goal is by a video of a person demonstrating the target ... ...

    Abstract The ability to specify robot commands by a non-expert user is critical for building generalist agents capable of solving a large variety of tasks. One convenient way to specify the intended robot goal is by a video of a person demonstrating the target task. While prior work typically aims to imitate human demonstrations performed in robot environments, here we focus on a more realistic and challenging setup with demonstrations recorded in natural and diverse human environments. We propose Video-conditioned Policy learning (ViP), a data-driven approach that maps human demonstrations of previously unseen tasks to robot manipulation skills. To this end, we learn our policy to generate appropriate actions given current scene observations and a video of the target task. To encourage generalization to new tasks, we avoid particular tasks during training and learn our policy from unlabelled robot trajectories and corresponding robot videos. Both robot and human videos in our framework are represented by video embeddings pre-trained for human action recognition. At test time we first translate human videos to robot videos in the common video embedding space, and then use resulting embeddings to condition our policies. Notably, our approach enables robot control by human demonstrations in a zero-shot manner, i.e., without using robot trajectories paired with human instructions during training. We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art. Our method also demonstrates excellent performance in a new challenging zero-shot setup where no paired data is used during training.

    Comment: ICRA 2023. See the project webpage at https://www.di.ens.fr/willow/research/vip/
    Schlagwörter Computer Science - Robotics ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning
    Thema/Rubrik (Code) 629 ; 004
    Erscheinungsdatum 2023-05-10
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  5. Buch ; Online: Improving Image Recognition by Retrieving from Web-Scale Image-Text Data

    Iscen, Ahmet / Fathi, Alireza / Schmid, Cordelia

    2023  

    Abstract: Retrieval augmented models are becoming increasingly popular for computer vision tasks after their recent success in NLP problems. The goal is to enhance the recognition capabilities of the model by retrieving similar examples for the visual input from ... ...

    Abstract Retrieval augmented models are becoming increasingly popular for computer vision tasks after their recent success in NLP problems. The goal is to enhance the recognition capabilities of the model by retrieving similar examples for the visual input from an external memory set. In this work, we introduce an attention-based memory module, which learns the importance of each retrieved example from the memory. Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query. We also thoroughly study various ways of constructing the memory dataset. Our experiments show the benefit of using a massive-scale memory dataset of 1B image-text pairs, and demonstrate the performance of different memory representations. We evaluate our method in three different classification tasks, namely long-tailed recognition, learning with noisy labels, and fine-grained classification, and show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.

    Comment: Accepted to CVPR 2023
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-04-11
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  6. Buch ; Online: AVFormer

    Seo, Paul Hongsuck / Nagrani, Arsha / Schmid, Cordelia

    Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

    2023  

    Abstract: Audiovisual automatic speech recognition (AV-ASR) aims to improve the robustness of a speech recognition system by incorporating visual information. Training fully supervised multimodal models for this task from scratch, however is limited by the need ... ...

    Abstract Audiovisual automatic speech recognition (AV-ASR) aims to improve the robustness of a speech recognition system by incorporating visual information. Training fully supervised multimodal models for this task from scratch, however is limited by the need for large labelled audiovisual datasets (in each downstream domain of interest). We present AVFormer, a simple method for augmenting audio-only models with visual information, at the same time performing lightweight domain adaptation. We do this by (i) injecting visual embeddings into a frozen ASR model using lightweight trainable adaptors. We show that these can be trained on a small amount of weakly labelled video data with minimum additional training time and parameters. (ii) We also introduce a simple curriculum scheme during training which we show is crucial to enable the model to jointly process audio and visual information effectively; and finally (iii) we show that our model achieves state of the art zero-shot results on three different AV-ASR benchmarks (How2, VisSpeech and Ego4D), while also crucially preserving decent performance on traditional audio-only speech recognition benchmarks (LibriSpeech). Qualitative results show that our model effectively leverages visual information for robust speech recognition.

    Comment: CVPR 2023
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Thema/Rubrik (Code) 006 ; 004
    Erscheinungsdatum 2023-03-29
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  7. Buch ; Online: PolarNet

    Chen, Shizhe / Garcia, Ricardo / Schmid, Cordelia / Laptev, Ivan

    3D Point Clouds for Language-Guided Robotic Manipulation

    2023  

    Abstract: The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in ... ...

    Abstract The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in combining multi-view cameras and inferring precise 3D positions and relationships. To address these limitations, we propose a 3D point cloud based policy called PolarNet for language-guided manipulation. It leverages carefully designed point cloud inputs, efficient point cloud encoders, and multimodal transformers to learn 3D point cloud representations and integrate them with language instructions for action prediction. PolarNet is shown to be effective and data efficient in a variety of experiments conducted on the RLBench benchmark. It outperforms state-of-the-art 2D and 3D approaches in both single-task and multi-task learning. It also achieves promising results on a real robot.

    Comment: Accepted to CoRL 2023. Project website: https://www.di.ens.fr/willow/research/polarnet/
    Schlagwörter Computer Science - Robotics ; Computer Science - Computer Vision and Pattern Recognition
    Thema/Rubrik (Code) 629
    Erscheinungsdatum 2023-09-27
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  8. Buch ; Online: How can objects help action recognition?

    Zhou, Xingyi / Arnab, Anurag / Sun, Chen / Schmid, Cordelia

    2023  

    Abstract: Current state-of-the-art video models process a video clip as a long sequence of spatio-temporal tokens. However, they do not explicitly model objects, their interactions across the video, and instead process all the tokens in the video. In this paper, ... ...

    Abstract Current state-of-the-art video models process a video clip as a long sequence of spatio-temporal tokens. However, they do not explicitly model objects, their interactions across the video, and instead process all the tokens in the video. In this paper, we investigate how we can use knowledge of objects to design better video models, namely to process fewer tokens and to improve recognition accuracy. This is in contrast to prior works which either drop tokens at the cost of accuracy, or increase accuracy whilst also increasing the computation required. First, we propose an object-guided token sampling strategy that enables us to retain a small fraction of the input tokens with minimal impact on accuracy. And second, we propose an object-aware attention module that enriches our feature representation with object information and improves overall accuracy. Our resulting framework achieves better performance when using fewer tokens than strong baselines. In particular, we match our baseline with 30%, 40%, and 60% of the input tokens on SomethingElse, Something-something v2, and Epic-Kitchens, respectively. When we use our model to process the same number of tokens as our baseline, we improve by 0.6 to 4.2 points on these datasets.

    Comment: CVPR 2023
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-06-20
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  9. Buch ; Online: Dense Video Object Captioning from Disjoint Supervision

    Zhou, Xingyi / Arnab, Anurag / Sun, Chen / Schmid, Cordelia

    2023  

    Abstract: We propose a new task and model for dense video object captioning -- detecting, tracking, and captioning trajectories of all objects in a video. This task unifies spatial and temporal understanding of the video, and requires fine-grained language ... ...

    Abstract We propose a new task and model for dense video object captioning -- detecting, tracking, and captioning trajectories of all objects in a video. This task unifies spatial and temporal understanding of the video, and requires fine-grained language description. Our model for dense video object captioning is trained end-to-end and consists of different modules for spatial localization, tracking, and captioning. As such, we can train our model with a mixture of disjoint tasks, and leverage diverse, large-scale datasets which supervise different parts of our model. This results in noteworthy zero-shot performance. Moreover, by finetuning a model from this initialization, we can further improve our performance, surpassing strong image-based baselines by a significant margin. Although we are not aware of other work performing this task, we are able to repurpose existing video grounding datasets for our task, namely VidSTG and VLN. We show our task is more general than grounding, and models trained on our task can directly be applied to grounding by finding the bounding box with the maximum likelihood of generating the query sentence. Our model outperforms dedicated, state-of-the-art models for spatial grounding on both VidSTG and VLN.
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition
    Thema/Rubrik (Code) 004
    Erscheinungsdatum 2023-06-20
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  10. Buch ; Online: Dense Optical Tracking

    Moing, Guillaume Le / Ponce, Jean / Schmid, Cordelia

    Connecting the Dots

    2023  

    Abstract: Recent approaches to point tracking are able to recover the trajectory of any scene point through a large portion of a video despite the presence of occlusions. They are, however, too slow in practice to track every point observed in a single frame in a ... ...

    Abstract Recent approaches to point tracking are able to recover the trajectory of any scene point through a large portion of a video despite the presence of occlusions. They are, however, too slow in practice to track every point observed in a single frame in a reasonable amount of time. This paper introduces DOT, a novel, simple and efficient method for solving this problem. It first extracts a small set of tracks from key regions at motion boundaries using an off-the-shelf point tracking algorithm. Given source and target frames, DOT then computes rough initial estimates of a dense flow field and visibility mask through nearest-neighbor interpolation, before refining them using a learnable optical flow estimator that explicitly handles occlusions and can be trained on synthetic data with ground-truth correspondences. We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal" trackers like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker while being at least two orders of magnitude faster. Quantitative and qualitative experiments with synthetic and real videos validate the promise of the proposed approach. Code, data, and videos showcasing the capabilities of our approach are available in the project webpage: https://16lemoing.github.io/dot .
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition
    Thema/Rubrik (Code) 004 ; 006
    Erscheinungsdatum 2023-12-01
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

Zum Seitenanfang