LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 32

Search options

  1. Book ; Online: Understanding Depth Map Progressively

    Cheng, Xianhui / Qiu, Shoumeng / Zou, Zhikang / Pu, Jian / Xue, Xiangyang

    Adaptive Distance Interval Separation for Monocular 3d Object Detection

    2023  

    Abstract: Monocular 3D object detection aims to locate objects in different scenes with just a single image. Due to the absence of depth information, several monocular 3D detection techniques have emerged that rely on auxiliary depth maps from the depth estimation ...

    Abstract Monocular 3D object detection aims to locate objects in different scenes with just a single image. Due to the absence of depth information, several monocular 3D detection techniques have emerged that rely on auxiliary depth maps from the depth estimation task. There are multiple approaches to understanding the representation of depth maps, including treating them as pseudo-LiDAR point clouds, leveraging implicit end-to-end learning of depth information, or considering them as an image input. However, these methods have certain drawbacks, such as their reliance on the accuracy of estimated depth maps and suboptimal utilization of depth maps due to their image-based nature. While LiDAR-based methods and convolutional neural networks (CNNs) can be utilized for pseudo point clouds and depth maps, respectively, it is always an alternative. In this paper, we propose a framework named the Adaptive Distance Interval Separation Network (ADISN) that adopts a novel perspective on understanding depth maps, as a form that lies between LiDAR and images. We utilize an adaptive separation approach that partitions the depth map into various subgraphs based on distance and treats each of these subgraphs as an individual image for feature extraction. After adaptive separations, each subgraph solely contains pixels within a learned interval range. If there is a truncated object within this range, an evident curved edge will appear, which we can leverage for texture extraction using CNNs to obtain rich depth information in pixels. Meanwhile, to mitigate the inaccuracy of depth estimation, we designed an uncertainty module. To take advantage of both images and depth maps, we use different branches to learn localization detection tasks and appearance tasks separately.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-06-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: CrowdCLIP

    Liang, Dingkang / Xie, Jiahao / Zou, Zhikang / Ye, Xiaoqing / Xu, Wei / Bai, Xiang

    Unsupervised Crowd Counting via Vision-Language Model

    2023  

    Abstract: Supervised crowd counting relies heavily on costly manual labeling, which is difficult and expensive, especially in dense scenes. To alleviate the problem, we propose a novel unsupervised framework for crowd counting, named CrowdCLIP. The core idea is ... ...

    Abstract Supervised crowd counting relies heavily on costly manual labeling, which is difficult and expensive, especially in dense scenes. To alleviate the problem, we propose a novel unsupervised framework for crowd counting, named CrowdCLIP. The core idea is built on two observations: 1) the recent contrastive pre-trained vision-language model (CLIP) has presented impressive performance on various downstream tasks; 2) there is a natural mapping between crowd patches and count text. To the best of our knowledge, CrowdCLIP is the first to investigate the vision language knowledge to solve the counting problem. Specifically, in the training stage, we exploit the multi-modal ranking loss by constructing ranking text prompts to match the size-sorted crowd patches to guide the image encoder learning. In the testing stage, to deal with the diversity of image patches, we propose a simple yet effective progressive filtering strategy to first select the highly potential crowd patches and then map them into the language space with various counting intervals. Extensive experiments on five challenging datasets demonstrate that the proposed CrowdCLIP achieves superior performance compared to previous unsupervised state-of-the-art counting methods. Notably, CrowdCLIP even surpasses some popular fully-supervised methods under the cross-dataset setting. The source code will be available at https://github.com/dk-liang/CrowdCLIP.

    Comment: Accepted by CVPR 2023
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-04-09
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: SOOD

    Hua, Wei / Liang, Dingkang / Li, Jingyu / Liu, Xiaolong / Zou, Zhikang / Ye, Xiaoqing / Bai, Xiang

    Towards Semi-Supervised Oriented Object Detection

    2023  

    Abstract: Semi-Supervised Object Detection (SSOD), aiming to explore unlabeled data for boosting object detectors, has become an active task in recent years. However, existing SSOD approaches mainly focus on horizontal objects, leaving multi-oriented objects that ... ...

    Abstract Semi-Supervised Object Detection (SSOD), aiming to explore unlabeled data for boosting object detectors, has become an active task in recent years. However, existing SSOD approaches mainly focus on horizontal objects, leaving multi-oriented objects that are common in aerial images unexplored. This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework. Towards oriented objects in aerial scenes, we design two loss functions to provide better supervision. Focusing on the orientations of objects, the first loss regularizes the consistency between each pseudo-label-prediction pair (includes a prediction and its corresponding pseudo label) with adaptive weights based on their orientation gap. Focusing on the layout of an image, the second loss regularizes the similarity and explicitly builds the many-to-many relation between the sets of pseudo-labels and predictions. Such a global consistency constraint can further boost semi-supervised learning. Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark. The code will be available at https://github.com/HamPerdredes/SOOD.

    Comment: Accepted to CVPR 2023. Code will be available at https://github.com/HamPerdredes/SOOD
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-04-10
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: CityTrack

    Lu, Jincheng / Yang, Xipeng / Ye, Jin / Zhang, Yifu / Zou, Zhikang / Zhang, Wei / Tan, Xiao

    Improving City-Scale Multi-Camera Multi-Target Tracking by Location-Aware Tracking and Box-Grained Matching

    2023  

    Abstract: Multi-Camera Multi-Target Tracking (MCMT) is a computer vision technique that involves tracking multiple targets simultaneously across multiple cameras. MCMT in urban traffic visual analysis faces great challenges due to the complex and dynamic nature of ...

    Abstract Multi-Camera Multi-Target Tracking (MCMT) is a computer vision technique that involves tracking multiple targets simultaneously across multiple cameras. MCMT in urban traffic visual analysis faces great challenges due to the complex and dynamic nature of urban traffic scenes, where multiple cameras with different views and perspectives are often used to cover a large city-scale area. Targets in urban traffic scenes often undergo occlusion, illumination changes, and perspective changes, making it difficult to associate targets across different cameras accurately. To overcome these challenges, we propose a novel systematic MCMT framework, called CityTrack. Specifically, we present a Location-Aware SCMT tracker which integrates various advanced techniques to improve its effectiveness in the MCMT task and propose a novel Box-Grained Matching (BGM) method for the ICA module to solve the aforementioned problems. We evaluated our approach on the public test set of the CityFlowV2 dataset and achieved an IDF1 of 84.91%, ranking 1st in the 2022 AI CITY CHALLENGE. Our experimental results demonstrate the effectiveness of our approach in overcoming the challenges posed by urban traffic scenes.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 380
    Publishing date 2023-07-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: SAM3D

    Zhang, Dingyuan / Liang, Dingkang / Yang, Hongcheng / Zou, Zhikang / Ye, Xiaoqing / Liu, Zhe / Bai, Xiang

    Zero-Shot 3D Object Detection via Segment Anything Model

    2023  

    Abstract: With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of ... ...

    Abstract With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D.

    Comment: Accepted by Science China Information Sciences (SCIS)
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Electrical Engineering and Systems Science - Image and Video Processing
    Subject code 004
    Publishing date 2023-06-03
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Paint and Distill

    Ju, Bo / Zou, Zhikang / Ye, Xiaoqing / Jiang, Minyue / Tan, Xiao / Ding, Errui / Wang, Jingdong

    Boosting 3D Object Detection with Semantic Passing Network

    2022  

    Abstract: 3D object detection task from lidar or camera sensors is essential for autonomous driving. Pioneer attempts at multi-modality fusion complement the sparse lidar point clouds with rich semantic texture information from images at the cost of extra network ... ...

    Abstract 3D object detection task from lidar or camera sensors is essential for autonomous driving. Pioneer attempts at multi-modality fusion complement the sparse lidar point clouds with rich semantic texture information from images at the cost of extra network designs and overhead. In this work, we propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models with the guidance of rich context painting, with no extra computation cost during inference. Our key design is to first exploit the potential instructive semantic knowledge within the ground-truth labels by training a semantic-painted teacher model and then guide the pure-lidar network to learn the semantic-painted representation via knowledge passing modules at different granularities: class-wise passing, pixel-wise passing and instance-wise passing. Experimental results show that the proposed SPNet can seamlessly cooperate with most existing 3D detection frameworks with 1~5% AP gain and even achieve new state-of-the-art 3D detection performance on the KITTI test benchmark. Code is available at: https://github.com/jb892/SPNet.

    Comment: Accepted by ACMMM2022
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2022-07-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Article: Relationships Between Haloes and Objective Visual Quality in Healthy Eyes.

    Yao, Lu / Xu, Ye / Han, Tian / Qi, Linsong / Shi, Jiumei / Zou, Zhikang / Zhou, Xingtao

    Translational vision science & technology

    2020  Volume 9, Issue 10, Page(s) 13

    Abstract: Purpose: To determine the normal values and relationships between haloes and objective optical quality in healthy eyes.: Methods: In this cross-sectional study, haloes, pupillary responses to light, and objective optical quality were measured with ... ...

    Abstract Purpose: To determine the normal values and relationships between haloes and objective optical quality in healthy eyes.
    Methods: In this cross-sectional study, haloes, pupillary responses to light, and objective optical quality were measured with the optical quality analysis system (OQAS) and a vision monitor (MonCv3) in 138 right eyes of 138 healthy young men with mean spherical equivalent of 0.32 ± 0.47 D.
    Results: The mean disc halo size was 77.17 ± 25.03 arcmin. The mean objective optical quality values were as follows: objective scatter index (OSI), 0.58 ± 0.33; Strehl ratio (SR), 0.21 ± 0.05; modulation transfer function cutoff, 36.27 ± 7.98 cpd; OQAS value (OV)100%, 1.21 ± 0.27; OV20%, 0.91 ± 0.23; and OV9%, 0.59 ± 0.16. Disc halo size correlated independently with OSI (
    Conclusions: Reference values for disc halo size and objective optical quality in healthy young subjects were established. Eyes with worse objective vision quality exhibited larger haloes.
    Translational relevance: The study provided the knowledge and the relationships of OQAS and halo measurements from a well-defined group of healthy young subjects. Both measurements are useful in clinical practice to help quantify the vision quality and complement each other.
    MeSH term(s) Cross-Sectional Studies ; Eye ; Humans ; Male ; Reference Values ; Refraction, Ocular ; Vision, Ocular
    Language English
    Publishing date 2020-09-11
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2674602-5
    ISSN 2164-2591
    ISSN 2164-2591
    DOI 10.1167/tvst.9.10.13
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Book ; Online: Multi-Modal 3D Object Detection by Box Matching

    Liu, Zhe / Ye, Xiaoqing / Zou, Zhikang / He, Xinwei / Tan, Xiao / Ding, Errui / Wang, Jingdong / Bai, Xiang

    2023  

    Abstract: Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds ... ...

    Abstract Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds and RGB images. However, such an assumption is not reliable in a real-world self-driving system, as the alignment between different modalities is easily affected by asynchronous sensors and disturbed sensor placement. We propose a novel {F}usion network by {B}ox {M}atching (FBMNet) for multi-modal 3D detection, which provides an alternative way for cross-modal feature alignment by learning the correspondence at the bounding box level to free up the dependency of calibration during inference. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features. Extensive experiments on the nuScenes dataset demonstrate that our method is much more stable in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods. We hope that our FBMNet could provide an available solution to dealing with these challenging cases for safety in real autonomous driving scenarios. Codes will be publicly available at https://github.com/happinesslz/FBMNet.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-05-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Coarse to Fine

    Zou, Zhikang / Qu, Xiaoye / Zhou, Pan / Xu, Shuangjie / Ye, Xiaoqing / Wu, Wenhao / Ye, Jin

    Domain Adaptive Crowd Counting via Adversarial Scoring Network

    2021  

    Abstract: Recent deep networks have convincingly demonstrated high capability in crowd counting, which is a critical task attracting widespread attention due to its various industrial applications. Despite such progress, trained data-dependent models usually can ... ...

    Abstract Recent deep networks have convincingly demonstrated high capability in crowd counting, which is a critical task attracting widespread attention due to its various industrial applications. Despite such progress, trained data-dependent models usually can not generalize well to unseen scenarios because of the inherent domain shift. To facilitate this issue, this paper proposes a novel adversarial scoring network (ASNet) to gradually bridge the gap across domains from coarse to fine granularity. In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning. The distributions between two domains can thus be aligned roughly. At the fine-grained stage, we explore the transferability of source characteristics by scoring how similar the source samples are to target ones from multiple levels based on generative probability derived from coarse stage. Guided by these hierarchical scores, the transferable source features are properly selected to enhance the knowledge transfer during the adaptation process. With the coarse-to-fine design, the generalization bottleneck induced from the domain discrepancy can be effectively alleviated. Three sets of migration experiments show that the proposed methods achieve state-of-the-art counting performance compared with major unsupervised methods.

    Comment: Accepted by ACMMM2021
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Multimedia
    Subject code 004
    Publishing date 2021-07-27
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Towards Adversarial Patch Analysis and Certified Defense against Crowd Counting

    Wu, Qiming / Zou, Zhikang / Zhou, Pan / Ye, Xiaoqing / Wang, Binghui / Li, Ang

    2021  

    Abstract: Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems. Especially, deep neural network (DNN) methods have significantly reduced estimation errors for crowd counting missions. Recent studies have ... ...

    Abstract Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems. Especially, deep neural network (DNN) methods have significantly reduced estimation errors for crowd counting missions. Recent studies have demonstrated that DNNs are vulnerable to adversarial attacks, i.e., normal images with human-imperceptible perturbations could mislead DNNs to make false predictions. In this work, we propose a robust attack strategy called Adversarial Patch Attack with Momentum (APAM) to systematically evaluate the robustness of crowd counting models, where the attacker's goal is to create an adversarial perturbation that severely degrades their performances, thus leading to public safety accidents (e.g., stampede accidents). Especially, the proposed attack leverages the extreme-density background information of input images to generate robust adversarial patches via a series of transformations (e.g., interpolation, rotation, etc.). We observe that by perturbing less than 6\% of image pixels, our attacks severely degrade the performance of crowd counting systems, both digitally and physically. To better enhance the adversarial robustness of crowd counting models, we propose the first regression model-based Randomized Ablation (RA), which is more sufficient than Adversarial Training (ADT) (Mean Absolute Error of RA is 5 lower than ADT on clean samples and 30 lower than ADT on adversarial examples). Extensive experiments on five crowd counting models demonstrate the effectiveness and generality of the proposed method. The supplementary materials and certificate retrained models are available at \url{https://www.dropbox.com/s/hc4fdx133vht0qb/ACM_MM2021_Supp.pdf?dl=0}

    Comment: Accepted by ACM Multimedia 2021
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2021-04-22
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top