LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 100

Search options

  1. Article: [(2)Neural Network].

    Harada, Tatsuya

    No shinkei geka. Neurological surgery

    2020  Volume 48, Issue 2, Page(s) 173–188

    MeSH term(s) Algorithms ; Humans ; Neural Networks, Computer
    Language Japanese
    Publishing date 2020-02-24
    Publishing country Japan
    Document type Journal Article
    ZDB-ID 197053-7
    ISSN 1882-1251 ; 0301-2603
    ISSN (online) 1882-1251
    ISSN 0301-2603
    DOI 10.11477/mf.1436204155
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Spherical Image Generation From a Few Normal-Field-of-View Images by Considering Scene Symmetry.

    Hara, Takayuki / Mukuta, Yusuke / Harada, Tatsuya

    IEEE transactions on pattern analysis and machine intelligence

    2023  Volume 45, Issue 5, Page(s) 6339–6353

    Abstract: Spherical images taken in all directions (360 degrees by 180 degrees) can represent an entire space including the subject, providing free direction viewing and an immersive experience to viewers. It is convenient and expands the usage scenarios to ... ...

    Abstract Spherical images taken in all directions (360 degrees by 180 degrees) can represent an entire space including the subject, providing free direction viewing and an immersive experience to viewers. It is convenient and expands the usage scenarios to generate a spherical image from a few normal-field-of-view (NFOV) images, which are partial observations. The primary challenge is generating a plausible image and controlling the high degree of freedom involved in generating a wide area that includes all directions. We focus on scene symmetry, which is a basic property of the global structure of spherical images, such as the rotational and plane symmetries. We propose a method for generating a spherical image from a few NFOV images and controlling the generated regions using scene symmetry. We incorporate the intensity of the symmetry as a latent variable into conditional variational autoencoders to estimate the possible range of symmetry and decode a spherical image whose features are represented through a combination of symmetric transformations of the NFOV image features. Our experiments show that the proposed method can generate various plausible spherical images controlled from asymmetrically to symmetrically, and can reduce the reconstruction errors of the generated images based on the estimated symmetry.
    Language English
    Publishing date 2023-04-03
    Publishing country United States
    Document type Journal Article
    ISSN 1939-3539
    ISSN (online) 1939-3539
    DOI 10.1109/TPAMI.2022.3215933
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Book ; Online: Self-Supervised Learning for Group Equivariant Neural Networks

    Mukuta, Yusuke / Harada, Tatsuya

    2023  

    Abstract: This paper proposes a method to construct pretext tasks for self-supervised learning on group equivariant neural networks. Group equivariant neural networks are the models whose structure is restricted to commute with the transformations on the input. ... ...

    Abstract This paper proposes a method to construct pretext tasks for self-supervised learning on group equivariant neural networks. Group equivariant neural networks are the models whose structure is restricted to commute with the transformations on the input. Therefore, it is important to construct pretext tasks for self-supervised learning that do not contradict this equivariance. To ensure that training is consistent with the equivariance, we propose two concepts for self-supervised tasks: equivariant pretext labels and invariant contrastive loss. Equivariant pretext labels use a set of labels on which we can define the transformations that correspond to the input change. Invariant contrastive loss uses a modified contrastive loss that absorbs the effect of transformations on each input. Experiments on standard image recognition benchmarks demonstrate that the equivariant neural networks exploit the proposed equivariant self-supervised tasks.

    Comment: 12 pages, 4 figures
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-03-08
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: K-VQG

    Uehara, Kohei / Harada, Tatsuya

    Knowledge-aware Visual Question Generation for Common-sense Acquisition

    2022  

    Abstract: Visual Question Generation (VQG) is a task to generate questions from images. When humans ask questions about an image, their goal is often to acquire some new knowledge. However, existing studies on VQG have mainly addressed question generation from ... ...

    Abstract Visual Question Generation (VQG) is a task to generate questions from images. When humans ask questions about an image, their goal is often to acquire some new knowledge. However, existing studies on VQG have mainly addressed question generation from answers or question categories, overlooking the objectives of knowledge acquisition. To introduce a knowledge acquisition perspective into VQG, we constructed a novel knowledge-aware VQG dataset called K-VQG. This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge. We also developed a new VQG model that can encode and use knowledge as the target for a question. The experiment results show that our model outperforms existing models on the K-VQG dataset.
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Computation and Language
    Publishing date 2022-03-15
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: SATTS

    Goswami, Nabarun / Harada, Tatsuya

    Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate

    2022  

    Abstract: The mapping of text to speech (TTS) is non-deterministic, letters may be pronounced differently based on context, or phonemes can vary depending on various physiological and stylistic factors like gender, age, accent, emotions, etc. Neural speaker ... ...

    Abstract The mapping of text to speech (TTS) is non-deterministic, letters may be pronounced differently based on context, or phonemes can vary depending on various physiological and stylistic factors like gender, age, accent, emotions, etc. Neural speaker embeddings, trained to identify or verify speakers are typically used to represent and transfer such characteristics from reference speech to synthesized speech. Speech separation on the other hand is the challenging task of separating individual speakers from an overlapping mixed signal of various speakers. Speaker attractors are high-dimensional embedding vectors that pull the time-frequency bins of each speaker's speech towards themselves while repelling those belonging to other speakers. In this work, we explore the possibility of using these powerful speaker attractors for zero-shot speaker adaptation in multi-speaker TTS synthesis and propose speaker attractor text to speech (SATTS). Through various experiments, we show that SATTS can synthesize natural speech from text from an unseen target speaker's reference signal which might have less than ideal recording conditions, i.e. reverberations or mixed with other speakers.

    Comment: Accepted to Interspeech 2022. Visit https://naba89.github.io/SATTS-demo/ for a demo
    Keywords Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Sound
    Publishing date 2022-07-13
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Enhancement of Novel View Synthesis Using Omnidirectional Image Completion

    Hara, Takayuki / Harada, Tatsuya

    2022  

    Abstract: In this study, we present a method for synthesizing novel views from a single 360-degree RGB-D image based on the neural radiance field (NeRF) . Prior studies relied on the neighborhood interpolation capability of multi-layer perceptrons to complete ... ...

    Abstract In this study, we present a method for synthesizing novel views from a single 360-degree RGB-D image based on the neural radiance field (NeRF) . Prior studies relied on the neighborhood interpolation capability of multi-layer perceptrons to complete missing regions caused by occlusion and zooming, which leads to artifacts. In the method proposed in this study, the input image is reprojected to 360-degree RGB images at other camera positions, the missing regions of the reprojected images are completed by a 2D image generative model, and the completed images are utilized to train the NeRF. Because multiple completed images contain inconsistencies in 3D, we introduce a method to learn the NeRF model using a subset of completed images that cover the target scene with less overlap of completed regions. The selection of such a subset of images can be attributed to the maximum weight independent set problem, which is solved through simulated annealing. Experiments demonstrated that the proposed method can synthesize plausible novel views while preserving the features of the scene for both artificial and real-world data.

    Comment: 20 pages, 19 figures
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2022-03-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Deforming Radiance Fields with Cages

    Xu, Tianhan / Harada, Tatsuya

    2022  

    Abstract: Recent advances in radiance fields enable photorealistic rendering of static or dynamic 3D scenes, but still do not support explicit deformation that is used for scene manipulation or animation. In this paper, we propose a method that enables a new type ... ...

    Abstract Recent advances in radiance fields enable photorealistic rendering of static or dynamic 3D scenes, but still do not support explicit deformation that is used for scene manipulation or animation. In this paper, we propose a method that enables a new type of deformation of the radiance field: free-form radiance field deformation. We use a triangular mesh that encloses the foreground object called cage as an interface, and by manipulating the cage vertices, our approach enables the free-form deformation of the radiance field. The core of our approach is cage-based deformation which is commonly used in mesh deformation. We propose a novel formulation to extend it to the radiance field, which maps the position and the view direction of the sampling points from the deformed space to the canonical space, thus enabling the rendering of the deformed scene. The deformation results of the synthetic datasets and the real-world datasets demonstrate the effectiveness of our approach.

    Comment: ECCV 2022. Project page: https://xth430.github.io/deforming-nerf/
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Publishing date 2022-07-25
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Non-rigid Point Cloud Registration with Neural Deformation Pyramid

    Li, Yang / Harada, Tatsuya

    2022  

    Abstract: Non-rigid point cloud registration is a key component in many computer vision and computer graphics applications. The high complexity of the unknown non-rigid motion make this task a challenging problem. In this paper, we break down this problem via ... ...

    Abstract Non-rigid point cloud registration is a key component in many computer vision and computer graphics applications. The high complexity of the unknown non-rigid motion make this task a challenging problem. In this paper, we break down this problem via hierarchical motion decomposition. Our method called Neural Deformation Pyramid (NDP) represents non-rigid motion using a pyramid architecture. Each pyramid level, denoted by a Multi-Layer Perception (MLP), takes as input a sinusoidally encoded 3D point and outputs its motion increments from the previous level. The sinusoidal function starts with a low input frequency and gradually increases when the pyramid level goes down. This allows a multi-level rigid to nonrigid motion decomposition and also speeds up the solving by 50 times compared to the existing MLP-based approach. Our method achieves advanced partialto-partial non-rigid point cloud registration results on the 4DMatch/4DLoMatch benchmark under both no-learned and supervised settings.

    Comment: Code: https://github.com/rabbityl/DeformationPyramid
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 629
    Publishing date 2022-05-25
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: HiPerformer

    Umagami, Ryo / Ono, Yu / Mukuta, Yusuke / Harada, Tatsuya

    Hierarchically Permutation-Equivariant Transformer for Time Series Forecasting

    2023  

    Abstract: It is imperative to discern the relationships between multiple time series for accurate forecasting. In particular, for stock prices, components are often divided into groups with the same characteristics, and a model that extracts relationships ... ...

    Abstract It is imperative to discern the relationships between multiple time series for accurate forecasting. In particular, for stock prices, components are often divided into groups with the same characteristics, and a model that extracts relationships consistent with this group structure should be effective. Thus, we propose the concept of hierarchical permutation-equivariance, focusing on index swapping of components within and among groups, to design a model that considers this group structure. When the prediction model has hierarchical permutation-equivariance, the prediction is consistent with the group relationships of the components. Therefore, we propose a hierarchically permutation-equivariant model that considers both the relationship among components in the same group and the relationship among groups. The experiments conducted on real-world data demonstrate that the proposed method outperforms existing state-of-the-art methods.

    Comment: 10 pages, 3 figures
    Keywords Computer Science - Machine Learning
    Subject code 330
    Publishing date 2023-05-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Combining inherent knowledge of vision-language models with unsupervised domain adaptation through self-knowledge distillation

    Westfechtel, Thomas / Zhang, Dexuan / Harada, Tatsuya

    2023  

    Abstract: Unsupervised domain adaptation (UDA) tries to overcome the tedious work of labeling data by leveraging a labeled source dataset and transferring its knowledge to a similar but different target dataset. On the other hand, current vision-language models ... ...

    Abstract Unsupervised domain adaptation (UDA) tries to overcome the tedious work of labeling data by leveraging a labeled source dataset and transferring its knowledge to a similar but different target dataset. On the other hand, current vision-language models exhibit astonishing zero-shot prediction capabilities. In this work, we combine knowledge gained through UDA with the inherent knowledge of vision-language models. In a first step, we generate the zero-shot predictions of the source and target dataset using the vision-language model. Since zero-shot predictions usually exhibit a large entropy, meaning that the class probabilities are rather evenly distributed, we first adjust the distribution to accentuate the winning probabilities. This is done using both source and target data to keep the relative confidence between source and target data. We then employ a conventional DA method, to gain the knowledge from the source dataset, in combination with self-knowledge distillation, to maintain the inherent knowledge of the vision-language model. We further combine our method with a gradual source domain expansion strategy (GSDE) and show that this strategy can also benefit by including zero-shot predictions. We conduct experiments and ablation studies on three benchmarks (OfficeHome, VisDA, and DomainNet) and outperform state-of-the-art methods. We further show in ablation studies the contributions of different parts of our algorithm.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-12-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top