LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 35

Search options

  1. Book ; Online: CLIP for Lightweight Semantic Segmentation

    Jin, Ke / Yang, Wankou

    2023  

    Abstract: The large-scale pretrained model CLIP, trained on 400 million image-text pairs, offers a promising paradigm for tackling vision tasks, albeit at the image level. Later works, such as DenseCLIP and LSeg, extend this paradigm to dense prediction, including ...

    Abstract The large-scale pretrained model CLIP, trained on 400 million image-text pairs, offers a promising paradigm for tackling vision tasks, albeit at the image level. Later works, such as DenseCLIP and LSeg, extend this paradigm to dense prediction, including semantic segmentation, and have achieved excellent results. However, the above methods either rely on CLIP-pretrained visual backbones or use none-pretrained but heavy backbones such as Swin, while falling ineffective when applied to lightweight backbones. The reason for this is that the lightweitht networks, feature extraction ability of which are relatively limited, meet difficulty embedding the image feature aligned with text embeddings perfectly. In this work, we present a new feature fusion module which tackles this problem and enables language-guided paradigm to be applied to lightweight networks. Specifically, the module is a parallel design of CNN and transformer with a two-way bridge in between, where CNN extracts spatial information and visual context of the feature map from the image encoder, and the transformer propagates text embeddings from the text encoder forward. The core of the module is the bidirectional fusion of visual and text feature across the bridge which prompts their proximity and alignment in embedding space. The module is model-agnostic, which can not only make language-guided lightweight semantic segmentation practical, but also fully exploit the pretrained knowledge of language priors and achieve better performance than previous SOTA work, such as DenseCLIP, whatever the vision backbone is. Extensive experiments have been conducted to demonstrate the superiority of our method.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-10-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Article ; Online: Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments.

    Dai, Ming / Zheng, Enhui / Feng, Zhenhua / Qi, Lei / Zhuang, Jiedong / Yang, Wankou

    IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

    2023  Volume PP

    Abstract: Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning. However, due to limited satellite coverage or communication disruptions, UAVs may lose signals for positioning. In such situations, vision-based techniques can serve as an ... ...

    Abstract Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning. However, due to limited satellite coverage or communication disruptions, UAVs may lose signals for positioning. In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs. However, most of the existing datasets are developed for the geo-localization task of the objects captured by UAVs, rather than UAV self-positioning. Furthermore, the existing UAV datasets apply discrete sampling to synthetic data, such as Google Maps, neglecting the crucial aspects of dense sampling and the uncertainties commonly experienced in practical scenarios. To address these issues, this paper presents a new dataset, DenseUAV, that is the first publicly available dataset tailored for the UAV self-positioning task. DenseUAV adopts dense sampling on UAV images obtained in low-altitude urban areas. In total, over 27K UAV- and satellite-view images of 14 university campuses are collected and annotated. In terms of methodology, we first verify the superiority of Transformers over CNNs for the proposed task. Then we incorporate metric learning into representation learning to enhance the model's discriminative capacity and to reduce the modality discrepancy. Besides, to facilitate joint learning from both the satellite and UAV views, we introduce a mutually supervised learning approach. Last, we enhance the Recall@K metric and introduce a new measurement, SDM@K, to evaluate both the retrieval and localization performance for the proposed task. As a result, the proposed baseline method achieves a remarkable Recall@1 score of 83.01% and an SDM@1 score of 86.50% on DenseUAV. The dataset and code have been made publicly available on https://github.com/Dmmm1997/DenseUAV.
    Language English
    Publishing date 2023-12-29
    Publishing country United States
    Document type Journal Article
    ISSN 1941-0042
    ISSN (online) 1941-0042
    DOI 10.1109/TIP.2023.3346279
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Book ; Online: SSPNet

    Shen, Jifeng / Guo, Teng / Zuo, Xin / Fan, Heng / Yang, Wankou

    Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute Recognition

    2023  

    Abstract: Global feature based Pedestrian Attribute Recognition (PAR) models are often poorly localized when using Grad-CAM for attribute response analysis, which has a significant impact on the interpretability, generalizability and performance. Previous ... ...

    Abstract Global feature based Pedestrian Attribute Recognition (PAR) models are often poorly localized when using Grad-CAM for attribute response analysis, which has a significant impact on the interpretability, generalizability and performance. Previous researches have attempted to improve generalization and interpretation through meticulous model design, yet they often have neglected or underutilized effective prior information crucial for PAR. To this end, a novel Scale and Spatial Priors Guided Network (SSPNet) is proposed for PAR, which is mainly composed of the Adaptive Feature Scale Selection (AFSS) and Prior Location Extraction (PLE) modules. The AFSS module learns to provide reasonable scale prior information for different attribute groups, allowing the model to focus on different levels of feature maps with varying semantic granularity. The PLE module reveals potential attribute spatial prior information, which avoids unnecessary attention on irrelevant areas and lowers the risk of model over-fitting. More specifically, the scale prior in AFSS is adaptively learned from different layers of feature pyramid with maximum accuracy, while the spatial priors in PLE can be revealed from part feature with different granularity (such as image blocks, human pose keypoint and sparse sampling points). Besides, a novel IoU based attribute localization metric is proposed for Weakly-supervised Pedestrian Attribute Localization (WPAL) based on the improved Grad-CAM for attribute response mask. The experimental results on the intra-dataset and cross-dataset evaluations demonstrate the effectiveness of our proposed method in terms of mean accuracy (mA). Furthermore, it also achieves superior performance on the PCS dataset for attribute localization in terms of IoU. Code will be released at https://github.com/guotengg/SSPNet.

    Comment: 39 pages, 11 figures, Accepted by Pattern Recognition
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-12-10
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Probabilistic Decomposition Transformer for Time Series Forecasting

    Tong, Junlong / Xie, Liping / Yang, Wankou / Zhang, Kanjian

    2022  

    Abstract: Time series forecasting is crucial for many fields, such as disaster warning, weather prediction, and energy consumption. The Transformer-based models are considered to have revolutionized the field of sequence modeling. However, the complex temporal ... ...

    Abstract Time series forecasting is crucial for many fields, such as disaster warning, weather prediction, and energy consumption. The Transformer-based models are considered to have revolutionized the field of sequence modeling. However, the complex temporal patterns of the time series hinder the model from mining reliable temporal dependencies. Furthermore, the autoregressive form of the Transformer introduces cumulative errors in the inference step. In this paper, we propose the probabilistic decomposition Transformer model that combines the Transformer with a conditional generative model, which provides hierarchical and interpretable probabilistic forecasts for intricate time series. The Transformer is employed to learn temporal patterns and implement primary probabilistic forecasts, while the conditional generative model is used to achieve non-autoregressive hierarchical probabilistic forecasts by introducing latent space feature representations. In addition, the conditional generative model reconstructs typical features of the series, such as seasonality and trend terms, from probability distributions in the latent space to enable complex pattern separation and provide interpretable forecasts. Extensive experiments on several datasets demonstrate the effectiveness and robustness of the proposed model, indicating that it compares favorably with the state of the art.
    Keywords Computer Science - Machine Learning
    Subject code 330 ; 006
    Publishing date 2022-10-31
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article: Hourly solar irradiance forecasting based on encoder–decoder model using series decomposition and dynamic error compensation

    Tong, Junlong / Xie, Liping / Fang, Shixiong / Yang, Wankou / Zhang, Kanjian

    Energy conversion and management. 2022 July 21,

    2022  

    Abstract: Accurate solar irradiance prediction is crucial for harnessing solar energy resources. However, the pattern of irradiance sequence is intricate due to its nonlinear and non-stationary characteristics. In this paper, a deep hybrid model based on encoder– ... ...

    Abstract Accurate solar irradiance prediction is crucial for harnessing solar energy resources. However, the pattern of irradiance sequence is intricate due to its nonlinear and non-stationary characteristics. In this paper, a deep hybrid model based on encoder–decoder is proposed to cope with the complex pattern for hourly irradiance forecasting. The hybrid deep model integrates complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), encoder–decoder module, and dynamic error compensation (DEC) architecture. The CEEMDAN is implemented to reduce the nonlinear and non-stationarity of the irradiance sequence. The encoder–decoder integrates temporal convolutional networks (TCN), long short-term memory networks (LSTM), and multi-layer perceptron (MLP) for temporal features extraction and multi-step prediction. The DEC architecture dynamically updates the model based on adjacent error information to mine the predictable components of error information. Furthermore, a new loss function is further proposed for multi-objective optimization to balance the performance of multi-step forecasting. In the hourly irradiance forecasting experiments on the three public datasets, the root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R) of the proposed model are observed to be in a range of 30.693-34.433 W/m2, 19.398-22.900 W/m2, and 0.9872-0.9902, respectively. Compared to the benchmark models (including MLP, LSTM, and TCN), the RMSE and MAE reduce by 10.76%–22.00% and 5.47%–20.40%, respectively. The experimental results indicate that the proposed model shows accurate and robust forecasting performance and is a reliable alternative to hourly irradiance forecasting.
    Keywords administrative management ; data collection ; energy conversion ; light intensity ; neural networks ; prediction ; solar energy ; solar radiation
    Language English
    Dates of publication 2022-0721
    Publishing place Elsevier Ltd
    Document type Article
    Note Pre-press version
    ZDB-ID 2000891-0
    ISSN 0196-8904
    ISSN 0196-8904
    DOI 10.1016/j.enconman.2022.116049
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

  6. Article ; Online: Unsupervised Eyeglasses Removal in the Wild.

    Hu, Bingwen / Zheng, Zhedong / Liu, Ping / Yang, Wankou / Ren, Mingwu

    IEEE transactions on cybernetics

    2021  Volume 51, Issue 9, Page(s) 4373–4385

    Abstract: Eyeglasses removal is challenging in removing different kinds of eyeglasses, e.g., rimless glasses, full-rim glasses, and sunglasses, and recovering appropriate eyes. Due to the significant visual variants, the conventional methods lack scalability. Most ...

    Abstract Eyeglasses removal is challenging in removing different kinds of eyeglasses, e.g., rimless glasses, full-rim glasses, and sunglasses, and recovering appropriate eyes. Due to the significant visual variants, the conventional methods lack scalability. Most existing works focus on the frontal face images in the controlled environment, such as the laboratory, and need to design specific systems for different eyeglass types. To address the limitation, we propose a unified eyeglass removal model called the eyeglasses removal generative adversarial network (ERGAN), which could handle different types of glasses in the wild. The proposed method does not depend on the dense annotation of eyeglasses location but benefits from the large-scale face images with weak annotations. Specifically, we study the two relevant tasks simultaneously, that is, removing eyeglasses and wearing eyeglasses. Given two face images with and without eyeglasses, the proposed model learns to swap the eye area in two faces. The generation mechanism focuses on the eye area and invades the difficulty of generating a new face. In the experiment, we show the proposed method achieves a competitive removal quality in terms of realism and diversity. Furthermore, we evaluate ERGAN on several subsequent tasks, such as face verification and facial expression recognition. The experiment shows that our method could serve as a preprocessing method for these tasks.
    MeSH term(s) Eyeglasses
    Language English
    Publishing date 2021-09-15
    Publishing country United States
    Document type Journal Article
    ISSN 2168-2275
    ISSN (online) 2168-2275
    DOI 10.1109/TCYB.2020.2995496
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Multiview Learning With Robust Double-Sided Twin SVM.

    Ye, Qiaolin / Huang, Peng / Zhang, Zhao / Zheng, Yuhui / Fu, Liyong / Yang, Wankou

    IEEE transactions on cybernetics

    2022  Volume 52, Issue 12, Page(s) 12745–12758

    Abstract: Multiview learning (MVL), which enhances the learners' performance by coordinating complementarity and consistency among different views, has attracted much attention. The multiview generalized eigenvalue proximal support vector machine (MvGSVM) is a ... ...

    Abstract Multiview learning (MVL), which enhances the learners' performance by coordinating complementarity and consistency among different views, has attracted much attention. The multiview generalized eigenvalue proximal support vector machine (MvGSVM) is a recently proposed effective binary classification method, which introduces the concept of MVL into the classical generalized eigenvalue proximal support vector machine (GEPSVM). However, this approach cannot guarantee good classification performance and robustness yet. In this article, we develop multiview robust double-sided twin SVM (MvRDTSVM) with SVM-type problems, which introduces a set of double-sided constraints into the proposed model to promote classification performance. To improve the robustness of MvRDTSVM against outliers, we take L1-norm as the distance metric. Also, a fast version of MvRDTSVM (called MvFRDTSVM) is further presented. The reformulated problems are complex, and solving them are very challenging. As one of the main contributions of this article, we design two effective iterative algorithms to optimize the proposed nonconvex problems and then conduct theoretical analysis on the algorithms. The experimental results verify the effectiveness of our proposed methods.
    Language English
    Publishing date 2022-11-18
    Publishing country United States
    Document type Journal Article
    ISSN 2168-2275
    ISSN (online) 2168-2275
    DOI 10.1109/TCYB.2021.3088519
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Book ; Online: Finding Point with Image

    Dai, Ming / Zheng, Enhui / Feng, Zhenhua / Chen, Jiahao / Yang, Wankou

    A Simple and Efficient Method for UAV Self-Localization

    2022  

    Abstract: Image retrieval has emerged as a prominent solution for the self-localization task of unmanned aerial vehicles (UAVs). However, this approach involves complicated pre-processing and post-processing operations, placing significant demands on both ... ...

    Abstract Image retrieval has emerged as a prominent solution for the self-localization task of unmanned aerial vehicles (UAVs). However, this approach involves complicated pre-processing and post-processing operations, placing significant demands on both computational and storage resources. To mitigate this issue, this paper presents an end-to-end positioning framework, namely Finding Point with Image (FPI), which aims to directly identify the corresponding location of a UAV in satellite-view images via a UAV-view image. To validate the practicality of our framework, we construct a paired dataset, namely UL14, that consists of UAV and satellite views. In addition, we establish two transformer-based baseline models, Post Fusion and Mix Fusion, for end-to-end training and inference. Through experiments, we can conclude that fusion in the backbone network can achieve better performance than later fusion. Furthermore, considering the singleness of paired images, Random Scale Crop (RSC) is proposed to enrich the diversity of the paired data. Also, the ratio and weight of positive and negative samples play a key role in model convergence. Therefore, we conducted experimental verification and proposed a Weight Balance Loss (WBL) to weigh the impact of positive and negative samples. Last, our proposed baseline based on Mix Fusion structure exhibits superior performance in time and storage efficiency, amounting to just 1/24 and 1/68, respectively, while delivering comparable or even superior performance compared to the image retrieval method. The dataset and code will be made publicly available.

    Comment: 15 pages, 14 figures
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2022-08-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: EfficientFace

    Wang, Guangtao / Li, Jun / Wu, Zhijian / Xu, Jianhua / Shen, Jifeng / Yang, Wankou

    An Efficient Deep Network with Feature Enhancement for Accurate Face Detection

    2023  

    Abstract: In recent years, deep convolutional neural networks (CNN) have significantly advanced face detection. In particular, lightweight CNNbased architectures have achieved great success due to their lowcomplexity structure facilitating real-time detection ... ...

    Abstract In recent years, deep convolutional neural networks (CNN) have significantly advanced face detection. In particular, lightweight CNNbased architectures have achieved great success due to their lowcomplexity structure facilitating real-time detection tasks. However, current lightweight CNN-based face detectors trading accuracy for efficiency have inadequate capability in handling insufficient feature representation, faces with unbalanced aspect ratios and occlusion. Consequently, they exhibit deteriorated performance far lagging behind the deep heavy detectors. To achieve efficient face detection without sacrificing accuracy, we design an efficient deep face detector termed EfficientFace in this study, which contains three modules for feature enhancement. To begin with, we design a novel cross-scale feature fusion strategy to facilitate bottom-up information propagation, such that fusing low-level and highlevel features is further strengthened. Besides, this is conducive to estimating the locations of faces and enhancing the descriptive power of face features. Secondly, we introduce a Receptive Field Enhancement module to consider faces with various aspect ratios. Thirdly, we add an Attention Mechanism module for improving the representational capability of occluded faces. We have evaluated EfficientFace on four public benchmarks and experimental results demonstrate the appealing performance of our method. In particular, our model respectively achieves 95.1% (Easy), 94.0% (Medium) and 90.1% (Hard) on validation set of WIDER Face dataset, which is competitive with heavyweight models with only 1/15 computational costs of the state-of-the-art MogFace detector.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-02-23
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Capturing the motion of every joint

    Yang, Sen / Heng, Wen / Liu, Gang / Luo, Guozhong / Yang, Wankou / Yu, Gang

    3D human pose and shape estimation with independent tokens

    2023  

    Abstract: In this paper we present a novel method to estimate 3D human pose and shape from monocular videos. This task requires directly recovering pixel-alignment 3D human pose and body shape from monocular images or videos, which is challenging due to its ... ...

    Abstract In this paper we present a novel method to estimate 3D human pose and shape from monocular videos. This task requires directly recovering pixel-alignment 3D human pose and body shape from monocular images or videos, which is challenging due to its inherent ambiguity. To improve precision, existing methods highly rely on the initialized mean pose and shape as prior estimates and parameter regression with an iterative error feedback manner. In addition, video-based approaches model the overall change over the image-level features to temporally enhance the single-frame feature, but fail to capture the rotational motion at the joint level, and cannot guarantee local temporal consistency. To address these issues, we propose a novel Transformer-based model with a design of independent tokens. First, we introduce three types of tokens independent of the image feature: \textit{joint rotation tokens, shape token, and camera token}. By progressively interacting with image features through Transformer layers, these tokens learn to encode the prior knowledge of human 3D joint rotations, body shape, and position information from large-scale data, and are updated to estimate SMPL parameters conditioned on a given image. Second, benefiting from the proposed token-based representation, we further use a temporal model to focus on capturing the rotational temporal information of each joint, which is empirically conducive to preventing large jitters in local parts. Despite being conceptually simple, the proposed method attains superior performances on the 3DPW and Human3.6M datasets. Using ResNet-50 and Transformer architectures, it obtains 42.0 mm error on the PA-MPJPE metric of the challenging 3DPW, outperforming state-of-the-art counterparts by a large margin. Code will be publicly available at https://github.com/yangsenius/INT_HMR_Model

    Comment: 17 pages, 12 figures. ICLR 2023 (spotlight)
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-03-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top