LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 762844

Search options

  1. Article ; Online: X

    Zeng, Yan / Zhang, Xinsong / Li, Hang / Wang, Jiawei / Zhang, Jipeng / Zhou, Wangchunshu

    IEEE transactions on pattern analysis and machine intelligence

    2024  Volume 46, Issue 5, Page(s) 3156–3168

    Abstract: ... grained aligning and multi-grained localization simultaneously. Based on it, we present X ...

    Abstract Vision language pre-training aims to learn alignments between vision and language from a large amount of data. Most existing methods only learn image-text alignments. Some others utilize pre-trained object detectors to leverage vision language alignments at the object level. In this paper, we propose to learn multi-grained vision language alignments by a unified pre-training framework that learns multi-grained aligning and multi-grained localization simultaneously. Based on it, we present X
    Language English
    Publishing date 2024-04-03
    Publishing country United States
    Document type Journal Article
    ISSN 1939-3539
    ISSN (online) 1939-3539
    DOI 10.1109/TPAMI.2023.3339661
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Book ; Online: X-HRNet

    Zhou, Yixuan / Wang, Xuanhan / Xu, Xing / Zhao, Lei / Song, Jingkuan

    Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

    2023  

    Abstract: ... backbone X-HRNet, where `X' represents the estimated cross-shape attention vectors. Extensive experiments ... on the COCO benchmark demonstrate the superiority of our X-HRNet, and comprehensive ablation studies show ... the effectiveness of the SUSA modules. The code is publicly available at https://github.com/cool-xuan/x-hrnet ...

    Abstract High-resolution representation is necessary for human pose estimation to achieve high performance, and the ensuing problem is high computational complexity. In particular, predominant pose estimation methods estimate human joints by 2D single-peak heatmaps. Each 2D heatmap can be horizontally and vertically projected to and reconstructed by a pair of 1D heat vectors. Inspired by this observation, we introduce a lightweight and powerful alternative, Spatially Unidimensional Self-Attention (SUSA), to the pointwise (1x1) convolution that is the main computational bottleneck in the depthwise separable 3c3 convolution. Our SUSA reduces the computational complexity of the pointwise (1x1) convolution by 96% without sacrificing accuracy. Furthermore, we use the SUSA as the main module to build our lightweight pose estimation backbone X-HRNet, where `X' represents the estimated cross-shape attention vectors. Extensive experiments on the COCO benchmark demonstrate the superiority of our X-HRNet, and comprehensive ablation studies show the effectiveness of the SUSA modules. The code is publicly available at https://github.com/cool-xuan/x-hrnet.

    Comment: Accepted by ICME 2022
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-10-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: X-MLP

    Wang, Xinyue / Cai, Zhicheng / Peng, Chenglei

    A Patch Embedding-Free MLP Architecture for Vision

    2023  

    Abstract: ... However, existing vision MLP architectures always depend on convolution for patch embedding. Thus we propose X-MLP ... channel independently and alternately. X-MLP is tested on ten benchmark datasets, all obtaining better ...

    Abstract Convolutional neural networks (CNNs) and vision transformers (ViT) have obtained great achievements in computer vision. Recently, the research of multi-layer perceptron (MLP) architectures for vision have been popular again. Vision MLPs are designed to be independent from convolutions and self-attention operations. However, existing vision MLP architectures always depend on convolution for patch embedding. Thus we propose X-MLP, an architecture constructed absolutely upon fully connected layers and free from patch embedding. It decouples the features extremely and utilizes MLPs to interact the information across the dimension of width, height and channel independently and alternately. X-MLP is tested on ten benchmark datasets, all obtaining better performance than other vision MLP models. It even surpasses CNNs by a clear margin on various dataset. Furthermore, through mathematically restoring the spatial weights, we visualize the information communication between any couples of pixels in the feature map and observe the phenomenon of capturing long-range dependency.

    Comment: IJCNN 2023
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-07-02
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: X-Adapter

    Ran, Lingmin / Cun, Xiaodong / Liu, Jia-Wei / Zhao, Rui / Zijie, Song / Wang, Xintao / Keppo, Jussi / Shou, Mike Zheng

    Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

    2023  

    Abstract: We introduce X-Adapter, a universal upgrader to enable the pretrained plug-and-play modules (e.g ... model with the new text-image data pairs. In detail, X-Adapter keeps a frozen copy of the old model ... to preserve the connectors of different plugins. Additionally, X-Adapter adds trainable mapping layers ...

    Abstract We introduce X-Adapter, a universal upgrader to enable the pretrained plug-and-play modules (e.g., ControlNet, LoRA) to work directly with the upgraded text-to-image diffusion model (e.g., SDXL) without further retraining. We achieve this goal by training an additional network to control the frozen upgraded model with the new text-image data pairs. In detail, X-Adapter keeps a frozen copy of the old model to preserve the connectors of different plugins. Additionally, X-Adapter adds trainable mapping layers that bridge the decoders from models of different versions for feature remapping. The remapped features will be used as guidance for the upgraded model. To enhance the guidance ability of X-Adapter, we employ a null-text training strategy for the upgraded model. After training, we also introduce a two-stage denoising strategy to align the initial latents of X-Adapter and the upgraded model. Thanks to our strategies, X-Adapter demonstrates universal compatibility with various plugins and also enables plugins of different versions to work together, thereby expanding the functionalities of diffusion community. To verify the effectiveness of the proposed method, we conduct extensive experiments and the results show that X-Adapter may facilitate wider application in the upgraded foundational diffusion model.

    Comment: Project page: https://showlab.github.io/X-Adapter/
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Multimedia
    Subject code 004
    Publishing date 2023-12-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: X-Mesh

    Ma, Yiwei / Zhang, Xiaioqing / Sun, Xiaoshuai / Ji, Jiayi / Wang, Haowei / Jiang, Guannan / Zhuang, Weilin / Ji, Rongrong

    Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance

    2023  

    Abstract: ... to unsatisfactory stylization and slow convergence. To address these limitations, we present X-Mesh, an innovative ... objective comparisons. Our extensive qualitative and quantitative experiments demonstrate that X-Mesh ...

    Abstract Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior methods adopt text-independent multilayer perceptrons (MLPs) to predict the attributes of the target mesh with the supervision of CLIP loss. However, such text-independent architecture lacks textual guidance during predicting attributes, thus leading to unsatisfactory stylization and slow convergence. To address these limitations, we present X-Mesh, an innovative text-driven 3D stylization framework that incorporates a novel Text-guided Dynamic Attention Module (TDAM). The TDAM dynamically integrates the guidance of the target text by utilizing text-relevant spatial and channel-wise attentions during vertex feature extraction, resulting in more accurate attribute prediction and faster convergence speed. Furthermore, existing works lack standard benchmarks and automated metrics for evaluation, often relying on subjective and non-reproducible user studies to assess the quality of stylized 3D assets. To overcome this limitation, we introduce a new standard text-mesh benchmark, namely MIT-30, and two automated metrics, which will enable future research to achieve fair and objective comparisons. Our extensive qualitative and quantitative experiments demonstrate that X-Mesh outperforms previous state-of-the-art methods.

    Comment: Technical report
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 410
    Publishing date 2023-03-28
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: X-Adv

    Liu, Aishan / Guo, Jun / Wang, Jiakai / Liang, Siyuan / Tao, Renshuai / Zhou, Wenbo / Liu, Cong / Liu, Xianglong / Tao, Dacheng

    Physical Adversarial Object Attacks against X-ray Prohibited Item Detection

    2023  

    Abstract: ... However, attacks targeting texture-free X-ray images remain underexplored, despite the widespread application ... of X-ray imaging in safety-critical scenarios such as the X-ray detection of prohibited items ... In this paper, we take the first step toward the study of adversarial attacks targeted at X-ray prohibited item ...

    Abstract Adversarial attacks are valuable for evaluating the robustness of deep learning models. Existing attacks are primarily conducted on the visible light spectrum (e.g., pixel-wise texture perturbation). However, attacks targeting texture-free X-ray images remain underexplored, despite the widespread application of X-ray imaging in safety-critical scenarios such as the X-ray detection of prohibited items. In this paper, we take the first step toward the study of adversarial attacks targeted at X-ray prohibited item detection, and reveal the serious threats posed by such attacks in this safety-critical scenario. Specifically, we posit that successful physical adversarial attacks in this scenario should be specially designed to circumvent the challenges posed by color/texture fading and complex overlapping. To this end, we propose X-adv to generate physically printable metals that act as an adversarial agent capable of deceiving X-ray detectors when placed in luggage. To resolve the issues associated with color/texture fading, we develop a differentiable converter that facilitates the generation of 3D-printable objects with adversarial shapes, using the gradients of a surrogate model rather than directly generating adversarial textures. To place the printed 3D adversarial objects in luggage with complex overlapped instances, we design a policy-based reinforcement learning strategy to find locations eliciting strong attack performance in worst-case scenarios whereby the prohibited items are heavily occluded by other items. To verify the effectiveness of the proposed X-Adv, we conduct extensive experiments in both the digital and the physical world (employing a commercial X-ray security inspection system for the latter case). Furthermore, we present the physical-world X-ray adversarial attack dataset XAD.

    Comment: Accepted by USENIX Security 2023
    Keywords Computer Science - Cryptography and Security ; Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-02-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: X-Transfer

    Zhang, Lei / Chen, Hao / Hu, Shu / Zhu, Bin / Lin, Ching Sheng / Wu, Xi / Hu, Jinrong / Wang, Xin

    A Transfer Learning-Based Framework for GAN-Generated Fake Image Detection

    2023  

    Abstract: ... algorithm called X-Transfer, which enhances transfer learning by utilizing two neural networks that employ ...

    Abstract Generative adversarial networks (GANs) have remarkably advanced in diverse domains, especially image generation and editing. However, the misuse of GANs for generating deceptive images, such as face replacement, raises significant security concerns, which have gained widespread attention. Therefore, it is urgent to develop effective detection methods to distinguish between real and fake images. Current research centers around the application of transfer learning. Nevertheless, it encounters challenges such as knowledge forgetting from the original dataset and inadequate performance when dealing with imbalanced data during training. To alleviate this issue, this paper introduces a novel GAN-generated image detection algorithm called X-Transfer, which enhances transfer learning by utilizing two neural networks that employ interleaved parallel gradient transmission. In addition, we combine AUC loss and cross-entropy loss to improve the model's performance. We carry out comprehensive experiments on multiple facial image datasets. The results show that our model outperforms the general transferring approach, and the best metric achieves 99.04%, which is increased by approximately 10%. Furthermore, we demonstrate excellent performance on non-face datasets, validating its generality and broader application prospects.

    Comment: 9 pages, 3 figures, and 6 tables; references added
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-10-06
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: X-Dreamer

    Ma, Yiwei / Fan, Yijun / Ji, Jiayi / Wang, Haowei / Sun, Xiaoshuai / Jiang, Guannan / Shu, Annan / Ji, Rongrong

    Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

    2023  

    Abstract: ... to suboptimal outcomes. To address this issue, we present X-Dreamer, a novel approach for high-quality text ... The key components of X-Dreamer are two innovative designs: Camera-Guided Low-Rank Adaptation (CG-LoRA ... github.io/Projects/X-Dreamer . ...

    Abstract In recent times, automatic text-to-3D content creation has made significant progress, driven by the development of pretrained 2D diffusion models. Existing text-to-3D methods typically optimize the 3D representation to ensure that the rendered image aligns well with the given text, as evaluated by the pretrained 2D diffusion model. Nevertheless, a substantial domain gap exists between 2D images and 3D assets, primarily attributed to variations in camera-related attributes and the exclusive presence of foreground objects. Consequently, employing 2D diffusion models directly for optimizing 3D representations may lead to suboptimal outcomes. To address this issue, we present X-Dreamer, a novel approach for high-quality text-to-3D content creation that effectively bridges the gap between text-to-2D and text-to-3D synthesis. The key components of X-Dreamer are two innovative designs: Camera-Guided Low-Rank Adaptation (CG-LoRA) and Attention-Mask Alignment (AMA) Loss. CG-LoRA dynamically incorporates camera information into the pretrained diffusion models by employing camera-dependent generation for trainable parameters. This integration enhances the alignment between the generated 3D assets and the camera's perspective. AMA loss guides the attention map of the pretrained diffusion model using the binary mask of the 3D object, prioritizing the creation of the foreground object. This module ensures that the model focuses on generating accurate and detailed foreground objects. Extensive evaluations demonstrate the effectiveness of our proposed method compared to existing text-to-3D approaches. Our project webpage: https://xmuxiaoma666.github.io/Projects/X-Dreamer .

    Comment: Technical report
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-11-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: PaLI-X

    Chen, Xi / Djolonga, Josip / Padlewski, Piotr / Mustafa, Basil / Changpinyo, Soravit / Wu, Jialin / Ruiz, Carlos Riquelme / Goodman, Sebastian / Wang, Xiao / Tay, Yi / Shakeri, Siamak / Dehghani, Mostafa / Salz, Daniel / Lucic, Mario / Tschannen, Michael / Nagrani, Arsha / Hu, Hexiang / Joshi, Mandar / Pang, Bo /
    Montgomery, Ceslee / Pietrzyk, Paulina / Ritter, Marvin / Piergiovanni, AJ / Minderer, Matthias / Pavetic, Filip / Waters, Austin / Li, Gang / Alabdulmohsin, Ibrahim / Beyer, Lucas / Amelot, Julien / Lee, Kenton / Steiner, Andreas Peter / Li, Yang / Keysers, Daniel / Arnab, Anurag / Xu, Yuanzhong / Rong, Keran / Kolesnikov, Alexander / Seyedhosseini, Mojtaba / Angelova, Anelia / Zhai, Xiaohua / Houlsby, Neil / Soricut, Radu

    On Scaling up a Multilingual Vision and Language Model

    2023  

    Abstract: We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language ... in-context) learning, as well as object detection, video question answering, and video captioning. PaLI-X ...

    Abstract We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. PaLI-X advances the state-of-the-art on most vision-and-language benchmarks considered (25+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Computation and Language ; Computer Science - Machine Learning
    Publishing date 2023-05-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article: X-ray Detectors Based on Ga

    Zhang, Chongyang / Dou, Wenjie / Yang, Xun / Zang, Huaping / Chen, Yancheng / Fan, Wei / Wang, Shaoyi / Zhou, Weimin / Chen, Xuexia / Shan, Chongxin

    Materials (Basel, Switzerland)

    2023  Volume 16, Issue 13

    Abstract: X-ray detectors have numerous applications in medical imaging, industrial inspection, and crystal ...

    Abstract X-ray detectors have numerous applications in medical imaging, industrial inspection, and crystal structure analysis. Gallium oxide (Ga
    Language English
    Publishing date 2023-06-30
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2487261-1
    ISSN 1996-1944
    ISSN 1996-1944
    DOI 10.3390/ma16134742
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top