LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 48

Search options

  1. Article ; Online: Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery.

    Xu, Mengya / Islam, Mobarakol / Bai, Long / Ren, Hongliang

    IEEE transactions on medical imaging

    2024  Volume PP

    Abstract: Deep Neural Networks (DNNs) based semantic segmentation of the robotic instruments and tissues can enhance the precision of surgical activities in robot-assisted surgery. However, in biological learning, DNNs cannot learn incremental tasks over time and ... ...

    Abstract Deep Neural Networks (DNNs) based semantic segmentation of the robotic instruments and tissues can enhance the precision of surgical activities in robot-assisted surgery. However, in biological learning, DNNs cannot learn incremental tasks over time and exhibit catastrophic forgetting, which refers to the sharp decline in performance on previously learned tasks after learning a new one. Specifically, when data scarcity is the issue, the model shows a rapid drop in performance on previously learned instruments after learning new data with new instruments. The problem becomes worse when it limits releasing the dataset of the old instruments for the old model due to privacy concerns and the unavailability of the data for the new or updated version of the instruments for the continual learning model. For this purpose, we develop a privacy-preserving synthetic continual semantic segmentation framework by blending and harmonizing (i) open-source old instruments foreground to the synthesized background without revealing real patient data in public and (ii) new instruments foreground to extensively augmented real background. To boost the balanced logit distillation from the old model to the continual learning model, we design overlapping class-aware temperature normalization (CAT) by controlling model learning utility. We also introduce multi-scale shifted-feature distillation (SD) to maintain long and short-range spatial relationships among the semantic objects where conventional short-range spatial features with limited information reduce the power of feature distillation. We demonstrate the effectiveness of our framework on the EndoVis 2017 and 2018 instrument segmentation dataset with a generalized continual learning setting. Code is available at https://github.com/XuMengyaAmy/Synthetic_CAT_SD.
    Language English
    Publishing date 2024-02-21
    Publishing country United States
    Document type Journal Article
    ZDB-ID 622531-7
    ISSN 1558-254X ; 0278-0062
    ISSN (online) 1558-254X
    ISSN 0278-0062
    DOI 10.1109/TMI.2024.3364969
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery.

    Cui, Beilei / Islam, Mobarakol / Bai, Long / Ren, Hongliang

    International journal of computer assisted radiology and surgery

    2024  

    Abstract: Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., ... ...

    Abstract Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation.
    Methods: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene.
    Results: Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation.
    Conclusion: Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly.
    Language English
    Publishing date 2024-03-08
    Publishing country Germany
    Document type Journal Article
    ZDB-ID 2365628-1
    ISSN 1861-6429 ; 1861-6410
    ISSN (online) 1861-6429
    ISSN 1861-6410
    DOI 10.1007/s11548-024-03083-5
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Book ; Online: Surgical-DINO

    Cui, Beilei / Islam, Mobarakol / Bai, Long / Ren, Hongliang

    Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

    2024  

    Abstract: Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., ... ...

    Abstract Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. Methods: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. Results: Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. Conclusion: Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly. Code is available at https://github.com/BeileiCui/SurgicalDINO.

    Comment: Accepted by IPCAI 2024 (IJCAR Special Issue)
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence
    Subject code 004
    Publishing date 2024-01-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Class Balanced PixelNet for Neurological Image Segmentation

    Islam, Mobarakol / Ren, Hongliang

    2022  

    Abstract: In this paper, we propose an automatic brain tumor segmentation approach (e.g., PixelNet) using a pixel-level convolutional neural network (CNN). The model extracts feature from multiple convolutional layers and concatenate them to form a hyper-column ... ...

    Abstract In this paper, we propose an automatic brain tumor segmentation approach (e.g., PixelNet) using a pixel-level convolutional neural network (CNN). The model extracts feature from multiple convolutional layers and concatenate them to form a hyper-column where samples a modest number of pixels for optimization. Hyper-column ensures both local and global contextual information for pixel-wise predictors. The model confirms the statistical efficiency by sampling a few pixels in the training phase where spatial redundancy limits the information learning among the neighboring pixels in conventional pixel-level semantic segmentation approaches. Besides, label skewness in training data leads the convolutional model often converge to certain classes which is a common problem in the medical dataset. We deal with this problem by selecting an equal number of pixels for all the classes in sampling time. The proposed model has achieved promising results in brain tumor and ischemic stroke lesion segmentation datasets.
    Keywords Electrical Engineering and Systems Science - Image and Video Processing ; Computer Science - Computer Vision and Pattern Recognition
    Subject code 004 ; 006
    Publishing date 2022-04-23
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Curriculum-Based Augmented Fourier Domain Adaptation for Robust Medical Image Segmentation

    Wang, An / Islam, Mobarakol / Xu, Mengya / Ren, Hongliang

    2023  

    Abstract: Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making ... ...

    Abstract Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making domain adaptation (DA) a highly demanding research field to alleviate model degradation in the deployment site. To preserve the model performance across multiple testing domains, this work proposes the Curriculum-based Augmented Fourier Domain Adaptation (Curri-AFDA) for robust medical image segmentation. In particular, our curriculum learning strategy is based on the causal relationship of a model under different levels of data shift in the deployment phase, where the higher the shift is, the harder to recognize the variance. Considering this, we progressively introduce more amplitude information from the target domain to the source domain in the frequency space during the curriculum-style training to smoothly schedule the semantic knowledge transfer in an easier-to-harder manner. Besides, we incorporate the training-time chained augmentation mixing to help expand the data distributions while preserving the domain-invariant semantics, which is beneficial for the acquired model to be more robust and generalize better to unseen domains. Extensive experiments on two segmentation tasks of Retina and Nuclei collected from multiple sites and scanners suggest that our proposed method yields superior adaptation and generalization performance. Meanwhile, our approach proves to be more robust under various corruption types and increasing severity levels. In addition, we show our method is also beneficial in the domain-adaptive classification task with skin lesion datasets. The code is available at https://github.com/lofrienger/Curri-AFDA.

    Comment: Work under review. First three authors contributed equally
    Keywords Electrical Engineering and Systems Science - Image and Video Processing ; Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-06-06
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: SurgicalGPT

    Seenivasan, Lalithkumar / Islam, Mobarakol / Kannan, Gokul / Ren, Hongliang

    End-to-End Language-Vision GPT for Visual Question Answering in Surgery

    2023  

    Abstract: Advances in GPT-based large language models (LLMs) are revolutionizing natural language processing, exponentially increasing its use across various domains. Incorporating uni-directional attention, these autoregressive LLMs can generate long and coherent ...

    Abstract Advances in GPT-based large language models (LLMs) are revolutionizing natural language processing, exponentially increasing its use across various domains. Incorporating uni-directional attention, these autoregressive LLMs can generate long and coherent paragraphs. However, for visual question answering (VQA) tasks that require both vision and language processing, models with bi-directional attention or models employing fusion techniques are often employed to capture the context of multiple modalities all at once. As GPT does not natively process vision tokens, to exploit the advancements in GPT models for VQA in robotic surgery, we design an end-to-end trainable Language-Vision GPT (LV-GPT) model that expands the GPT2 model to include vision input (image). The proposed LV-GPT incorporates a feature extractor (vision tokenizer) and vision token embedding (token type and pose). Given the limitations of unidirectional attention in GPT models and their ability to generate coherent long paragraphs, we carefully sequence the word tokens before vision tokens, mimicking the human thought process of understanding the question to infer an answer from an image. Quantitatively, we prove that the LV-GPT model outperforms other state-of-the-art VQA models on two publically available surgical-VQA datasets (based on endoscopic vision challenge robotic scene segmentation 2018 and CholecTriplet2021) and on our newly annotated dataset (based on the holistic surgical scene dataset). We further annotate all three datasets to include question-type annotations to allow sub-type analysis. Furthermore, we extensively study and present the effects of token sequencing, token type and pose embedding for vision tokens in the LV-GPT model.

    Comment: The manuscript is under review
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Electrical Engineering and Systems Science - Image and Video Processing
    Subject code 004
    Publishing date 2023-04-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Surgical-VQLA

    Bai, Long / Islam, Mobarakol / Seenivasan, Lalithkumar / Ren, Hongliang

    Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

    2023  

    Abstract: Despite the availability of computer-aided simulators and recorded videos of surgical procedures, junior residents still heavily rely on experts to answer their queries. However, expert surgeons are often overloaded with clinical and academic workloads ... ...

    Abstract Despite the availability of computer-aided simulators and recorded videos of surgical procedures, junior residents still heavily rely on experts to answer their queries. However, expert surgeons are often overloaded with clinical and academic workloads and limit their time in answering. For this purpose, we develop a surgical question-answering system to facilitate robot-assisted surgical scene and activity understanding from recorded videos. Most of the existing VQA methods require an object detector and regions based feature extractor to extract visual features and fuse them with the embedded text of the question for answer generation. However, (1) surgical object detection model is scarce due to smaller datasets and lack of bounding box annotation; (2) current fusion strategy of heterogeneous modalities like text and image is naive; (3) the localized answering is missing, which is crucial in complex surgical scenarios. In this paper, we propose Visual Question Localized-Answering in Robotic Surgery (Surgical-VQLA) to localize the specific surgical area during the answer prediction. To deal with the fusion of the heterogeneous modalities, we design gated vision-language embedding (GVLE) to build input patches for the Language Vision Transformer (LViT) to predict the answer. To get localization, we add the detection head in parallel with the prediction head of the LViT. We also integrate GIoU loss to boost localization performance by preserving the accuracy of the question-answering model. We annotate two datasets of VQLA by utilizing publicly available surgical videos from MICCAI challenges EndoVis-17 and 18. Our validation results suggest that Surgical-VQLA can better understand the surgical scene and localize the specific area related to the question-answering. GVLE presents an efficient language-vision embedding technique by showing superior performance over the existing benchmarks.

    Comment: To appear in IEEE ICRA 2023. Code and data availability: https://github.com/longbai1006/Surgical-VQLA
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Machine Learning ; Computer Science - Robotics
    Subject code 004
    Publishing date 2023-05-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

    Bai, Long / Islam, Mobarakol / Ren, Hongliang

    2023  

    Abstract: Medical students and junior surgeons often rely on senior surgeons and specialists to answer their questions when learning surgery. However, experts are often busy with clinical and academic work, and have little time to give guidance. Meanwhile, ... ...

    Abstract Medical students and junior surgeons often rely on senior surgeons and specialists to answer their questions when learning surgery. However, experts are often busy with clinical and academic work, and have little time to give guidance. Meanwhile, existing deep learning (DL)-based surgical Visual Question Answering (VQA) systems can only provide simple answers without the location of the answers. In addition, vision-language (ViL) embedding is still a less explored research in these kinds of tasks. Therefore, a surgical Visual Question Localized-Answering (VQLA) system would be helpful for medical students and junior surgeons to learn and understand from recorded surgical videos. We propose an end-to-end Transformer with Co-Attention gaTed Vision-Language (CAT-ViL) for VQLA in surgical scenarios, which does not require feature extraction through detection models. The CAT-ViL embedding module is designed to fuse heterogeneous features from visual and textual sources. The fused embedding will feed a standard Data-Efficient Image Transformer (DeiT) module, before the parallel classifier and detector for joint prediction. We conduct the experimental validation on public surgical videos from MICCAI EndoVis Challenge 2017 and 2018. The experimental results highlight the superior performance and robustness of our proposed model compared to the state-of-the-art approaches. Ablation studies further prove the outstanding performance of all the proposed components. The proposed method provides a promising solution for surgical scene understanding, and opens up a primary step in the Artificial Intelligence (AI)-based VQLA system for surgical training. Our code is publicly available.

    Comment: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/CAT-ViL
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Robotics
    Subject code 004
    Publishing date 2023-07-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis

    Wang, An / Islam, Mobarakol / Xu, Mengya / Ren, Hongliang

    2023  

    Abstract: Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and ... ...

    Abstract Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging minimal source images to generate synthetic surgical instrument segmentation datasets and achieve outstanding generalization performance on unseen real domains. Specifically, in our framework, only one background tissue image and at most three images of each foreground instrument are taken as the seed images. These source images are extensively transformed and employed to build up the foreground and background image pools, from which randomly sampled tissue and instrument images are composed with multiple blending techniques to generate new surgical scene images. Besides, we introduce hybrid training-time augmentations to diversify the training data further. Extensive evaluation on three real-world datasets, i.e., Endo2017, Endo2018, and RoboTool, demonstrates that our one-to-many synthetic surgical instruments datasets generation and segmentation framework can achieve encouraging performance compared with training with real data. Notably, on the RoboTool dataset, where a more significant domain gap exists, our framework shows its superiority of generalization by a considerable margin. We expect that our inspiring results will attract research attention to improving model generalization with data synthesizing.

    Comment: First two authors contributed equally. Accepted by IROS2023
    Keywords Electrical Engineering and Systems Science - Image and Video Processing ; Computer Science - Computer Vision and Pattern Recognition
    Subject code 006 ; 004
    Publishing date 2023-06-28
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article ; Online: Glioblastoma multiforme prognosis: MRI missing modality generation, segmentation and radiogenomic survival prediction.

    Islam, Mobarakol / Wijethilake, Navodini / Ren, Hongliang

    Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society

    2021  Volume 91, Page(s) 101906

    Abstract: The accurate prognosis of glioblastoma multiforme (GBM) plays an essential role in planning correlated surgeries and treatments. The conventional models of survival prediction rely on radiomic features using magnetic resonance imaging (MRI). In this ... ...

    Abstract The accurate prognosis of glioblastoma multiforme (GBM) plays an essential role in planning correlated surgeries and treatments. The conventional models of survival prediction rely on radiomic features using magnetic resonance imaging (MRI). In this paper, we propose a radiogenomic overall survival (OS) prediction approach by incorporating gene expression data with radiomic features such as shape, geometry, and clinical information. We exploit TCGA (The Cancer Genomic Atlas) dataset and synthesize the missing MRI modalities using a fully convolutional network (FCN) in a conditional generative adversarial network (cGAN). Meanwhile, the same FCN architecture enables the tumor segmentation from the available and the synthesized MRI modalities. The proposed FCN architecture comprises octave convolution (OctConv) and a novel decoder, with skip connections in spatial and channel squeeze & excitation (skip-scSE) block. The OctConv can process low and high-frequency features individually and improve model efficiency by reducing channel-wise redundancy. Skip-scSE applies spatial and channel-wise excitation to signify the essential features and reduces the sparsity in deeper layers learning parameters using skip connections. The proposed approaches are evaluated by comparative experiments with state-of-the-art models in synthesis, segmentation, and overall survival (OS) prediction. We observe that adding missing MRI modality improves the segmentation prediction, and expression levels of gene markers have a high contribution in the GBM prognosis prediction, and fused radiogenomic features boost the OS estimation.
    MeSH term(s) Glioblastoma/diagnostic imaging ; Glioblastoma/genetics ; Humans ; Image Processing, Computer-Assisted ; Magnetic Resonance Imaging ; Prognosis
    Language English
    Publishing date 2021-05-05
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 639451-6
    ISSN 1879-0771 ; 0895-6111
    ISSN (online) 1879-0771
    ISSN 0895-6111
    DOI 10.1016/j.compmedimag.2021.101906
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top