LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 84

Search options

  1. Article ; Online: In-Domain GAN Inversion for Faithful Reconstruction and Editability.

    Zhu, Jiapeng / Shen, Yujun / Xu, Yinghao / Zhao, Deli / Chen, Qifeng / Zhou, Bolei

    IEEE transactions on pattern analysis and machine intelligence

    2024  Volume 46, Issue 5, Page(s) 2607–2621

    Abstract: Generative Adversarial Networks (GANs) have significantly advanced image synthesis through mapping randomly sampled latent codes to high-fidelity synthesized images. However, applying well-trained GANs to real image editing remains challenging. A common ... ...

    Abstract Generative Adversarial Networks (GANs) have significantly advanced image synthesis through mapping randomly sampled latent codes to high-fidelity synthesized images. However, applying well-trained GANs to real image editing remains challenging. A common solution is to find an approximate latent code that can adequately recover the input image to edit, which is also known as GAN inversion. To invert a GAN model, prior works typically focus on reconstructing the target image at the pixel level, yet few studies are conducted on whether the inverted result can well support manipulation at the semantic level. This work fills in this gap by proposing in-domain GAN inversion, which consists of a domain-guided encoder and a domain-regularized optimizer, to regularize the inverted code in the native latent space of the pre-trained GAN model. In this way, we manage to sufficiently reuse the knowledge learned by GANs for image reconstruction, facilitating a wide range of editing applications without any retraining. We further make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property. Such a trade-off sheds light on how a GAN model represents an image with various semantics encoded in the learned latent distribution.
    Language English
    Publishing date 2024-04-03
    Publishing country United States
    Document type Journal Article
    ISSN 1939-3539
    ISSN (online) 1939-3539
    DOI 10.1109/TPAMI.2023.3310872
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Development and Validation of Esophageal Squamous Cell Carcinoma Risk Prediction Models Based on an Endoscopic Screening Program.

    Han, Junming / Guo, Xiaolei / Zhao, Li / Zhang, Huan / Ma, Siqi / Li, Yan / Zhao, Deli / Wang, Jialin / Xue, Fuzhong

    JAMA network open

    2023  Volume 6, Issue 1, Page(s) e2253148

    Abstract: Importance: Assessment tools are lacking for screening of esophageal squamous cell cancer (ESCC) in China, especially for the follow-up stage. Risk prediction to optimize the screening procedure is urgently needed.: Objective: To develop and validate ...

    Abstract Importance: Assessment tools are lacking for screening of esophageal squamous cell cancer (ESCC) in China, especially for the follow-up stage. Risk prediction to optimize the screening procedure is urgently needed.
    Objective: To develop and validate ESCC prediction models for identifying people at high risk for follow-up decision-making.
    Design, setting, and participants: This open, prospective multicenter diagnostic study has been performed since September 1, 2006, in Shandong Province, China. This study used baseline and follow-up data until December 31, 2021. The data were analyzed between April 6 and May 31, 2022. Eligibility criteria consisted of rural residents aged 40 to 69 years who had no contraindications for endoscopy. Among 161 212 eligible participants, those diagnosed with cancer or who had cancer at baseline, did not complete the questionnaire, were younger than 40 years or older than 69 years, or were detected with severe dysplasia or worse lesions were eliminated from the analysis.
    Exposures: Risk factors obtained by questionnaire and endoscopy.
    Main outcomes and measures: Pathological diagnosis of ESCC and confirmation by cancer registry data.
    Results: In this diagnostic study of 104 129 participants (56.39% women; mean [SD] age, 54.31 [7.64] years), 59 481 (mean [SD] age, 53.83 [7.64] years; 58.55% women) formed the derivation set while 44 648 (mean [SD] age, 54.95 [7.60] years; 53.51% women) formed the validation set. A total of 252 new cases of ESCC were diagnosed during 424 903.50 person-years of follow-up in the derivation cohort and 61 new cases from 177 094.10 person-years follow-up in the validation cohort. Model A included the covariates age, sex, and number of lesions; model B included age, sex, smoking status, alcohol use status, body mass index, annual household income, history of gastrointestinal tract diseases, consumption of pickled food, number of lesions, distinct lesions, and mild or moderate dysplasia. The Harrell C statistic of model A was 0.80 (95% CI, 0.77-0.83) in the derivation set and 0.90 (95% CI, 0.87-0.93) in the validation set; the Harrell C statistic of model B was 0.83 (95% CI, 0.81-0.86) and 0.91 (95% CI, 0.88-0.95), respectively. The models also had good calibration performance and clinical usefulness.
    Conclusions and relevance: The findings of this diagnostic study suggest that the models developed are suitable for selecting high-risk populations for follow-up decision-making and optimizing the cancer screening process.
    MeSH term(s) Humans ; Female ; Middle Aged ; Male ; Esophageal Squamous Cell Carcinoma/diagnosis ; Esophageal Squamous Cell Carcinoma/epidemiology ; Esophageal Neoplasms/diagnosis ; Esophageal Neoplasms/epidemiology ; Esophageal Neoplasms/pathology ; Prospective Studies ; Risk Factors ; Endoscopy, Gastrointestinal
    Language English
    Publishing date 2023-01-03
    Publishing country United States
    Document type Multicenter Study ; Journal Article ; Research Support, Non-U.S. Gov't
    ISSN 2574-3805
    ISSN (online) 2574-3805
    DOI 10.1001/jamanetworkopen.2022.53148
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article: Purine salvage-associated metabolites as biomarkers for early diagnosis of esophageal squamous cell carcinoma: a diagnostic model-based study.

    Sun, Yawen / Liu, Wenjuan / Su, Mu / Zhang, Tao / Li, Xia / Liu, Wenbin / Cai, Yuping / Zhao, Deli / Yang, Ming / Zhu, Zhengjiang / Wang, Jialin / Yu, Jinming

    Cell death discovery

    2024  Volume 10, Issue 1, Page(s) 139

    Abstract: Esophageal squamous cell carcinoma (ESCC) remains an important health concern in developing countries. Patients with advanced ESCC have a poor prognosis and survival rate, and achieving early diagnosis remains a challenge. Metabolic biomarkers are ... ...

    Abstract Esophageal squamous cell carcinoma (ESCC) remains an important health concern in developing countries. Patients with advanced ESCC have a poor prognosis and survival rate, and achieving early diagnosis remains a challenge. Metabolic biomarkers are gradually gaining attention as early diagnostic biomarkers. Hence, this multicenter study comprehensively evaluated metabolism dysregulation in ESCC through an integrated research strategy to identify key metabolite biomarkers of ESCC. First, the metabolic profiles were examined in tissue and serum samples from the discovery cohort (n = 162; ESCC patients, n = 81; healthy volunteers, n = 81), and ESCC tissue-induced metabolite alterations were observed in the serum. Afterward, RNA sequencing of tissue samples (n = 46) was performed, followed by an integrated analysis of metabolomics and transcriptomics. The potential biomarkers for ESCC were further identified by censoring gene-metabolite regulatory networks. The diagnostic value of the identified biomarkers was validated in a validation cohort (n = 220), and the biological function was verified. A total of 457 dysregulated metabolites were identified in the serum, of which 36 were induced by tumor tissues. The integrated analyses revealed significant alterations in the purine salvage pathway, wherein the abundance of hypoxanthine/xanthine exhibited a positive correlation with HPRT1 expression and tumor size. A diagnostic model was developed using two purine salvage-associated metabolites. This model could accurately discriminate patients with ESCC from normal individuals, with an area under the curve (AUC) (95% confidence interval (CI): 0.680-0.843) of 0.765 in the external cohort. Hypoxanthine and HPRT1 exerted a synergistic effect in terms of promoting ESCC progression. These findings are anticipated to provide valuable support in developing novel diagnostic approaches for early ESCC and enhance our comprehension of the metabolic mechanisms underlying this disease.
    Language English
    Publishing date 2024-03-14
    Publishing country United States
    Document type Journal Article
    ISSN 2058-7716
    ISSN 2058-7716
    DOI 10.1038/s41420-024-01896-6
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: Region-Based Semantic Factorization in GANs

    Zhu, Jiapeng / Shen, Yujun / Xu, Yinghao / Zhao, Deli / Chen, Qifeng

    2022  

    Abstract: Despite the rapid advancement of semantic discovery in the latent space of Generative Adversarial Networks (GANs), existing approaches either are limited to finding global attributes or rely on a number of segmentation masks to identify local attributes. ...

    Abstract Despite the rapid advancement of semantic discovery in the latent space of Generative Adversarial Networks (GANs), existing approaches either are limited to finding global attributes or rely on a number of segmentation masks to identify local attributes. In this work, we present a highly efficient algorithm to factorize the latent semantics learned by GANs concerning an arbitrary image region. Concretely, we revisit the task of local manipulation with pre-trained GANs and formulate region-based semantic discovery as a dual optimization problem. Through an appropriately defined generalized Rayleigh quotient, we manage to solve such a problem without any annotations or training. Experimental results on various state-of-the-art GAN models demonstrate the effectiveness of our approach, as well as its superiority over prior arts regarding precise control, region robustness, speed of implementation, and simplicity of use.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2022-02-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: UKnow

    Gong, Biao / Xie, Xiaoying / Feng, Yutong / Lv, Yiliang / Shen, Yujun / Zhao, Deli

    A Unified Knowledge Protocol for Common-Sense Reasoning and Vision-Language Pre-training

    2023  

    Abstract: This work presents a unified knowledge protocol, called UKnow, which facilitates knowledge-based studies from the perspective of data. Particularly focusing on visual and linguistic modalities, we categorize data knowledge into five unit types, namely, ... ...

    Abstract This work presents a unified knowledge protocol, called UKnow, which facilitates knowledge-based studies from the perspective of data. Particularly focusing on visual and linguistic modalities, we categorize data knowledge into five unit types, namely, in-image, in-text, cross-image, cross-text, and image-text, and set up an efficient pipeline to help construct the multimodal knowledge graph from any data collection. Thanks to the logical information naturally contained in knowledge graph, organizing datasets under UKnow format opens up more possibilities of data usage compared to the commonly used image-text pairs. Following UKnow protocol, we collect, from public international news, a large-scale multimodal knowledge graph dataset that consists of 1,388,568 nodes (with 571,791 vision-related ones) and 3,673,817 triplets. The dataset is also annotated with rich event tags, including 11 coarse labels and 9,185 fine labels. Experiments on four benchmarks demonstrate the potential of UKnow in supporting common-sense reasoning and boosting vision-language pre-training with a single dataset, benefiting from its unified form of knowledge organization. Code, dataset, and models will be made publicly available.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-02-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Composer

    Huang, Lianghua / Chen, Di / Liu, Yu / Shen, Yujun / Zhao, Deli / Zhou, Jingren

    Creative and Controllable Image Synthesis with Composable Conditions

    2023  

    Abstract: Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial ... ...

    Abstract Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. With compositionality as the core idea, we first decompose an image into representative factors, and then train a diffusion model with all these factors as the conditions to recompose the input. At the inference stage, the rich intermediate representations work as composable elements, leading to a huge design space (i.e., exponentially proportional to the number of decomposed factors) for customizable content creation. It is noteworthy that our approach, which we call Composer, supports various levels of conditions, such as text description as the global information, depth map and sketch as the local guidance, color histogram for low-level details, etc. Besides improving controllability, we confirm that Composer serves as a general framework and facilitates a wide range of classical generative tasks without retraining. Code and models will be made available.

    Comment: Project page: https://damo-vilab.github.io/composer-page/
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Graphics
    Subject code 004
    Publishing date 2023-02-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: ViM

    Feng, Yutong / Gong, Biao / Jiang, Jianwen / Lv, Yiliang / Shen, Yujun / Zhao, Deli / Zhou, Jingren

    Vision Middleware for Unified Downstream Transferring

    2023  

    Abstract: Foundation models are pre-trained on massive data and transferred to downstream tasks via fine-tuning. This work presents Vision Middleware (ViM), a new learning paradigm that targets unified transferring from a single foundation model to a variety of ... ...

    Abstract Foundation models are pre-trained on massive data and transferred to downstream tasks via fine-tuning. This work presents Vision Middleware (ViM), a new learning paradigm that targets unified transferring from a single foundation model to a variety of downstream tasks. ViM consists of a zoo of lightweight plug-in modules, each of which is independently learned on a midstream dataset with a shared frozen backbone. Downstream tasks can then benefit from an adequate aggregation of the module zoo thanks to the rich knowledge inherited from midstream tasks. There are three major advantages of such a design. From the efficiency aspect, the upstream backbone can be trained only once and reused for all downstream tasks without tuning. From the scalability aspect, we can easily append additional modules to ViM with no influence on existing modules. From the performance aspect, ViM can include as many midstream tasks as possible, narrowing the task gap between upstream and downstream. Considering these benefits, we believe that ViM, which the community could maintain and develop together, would serve as a powerful tool to assist foundation models.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-03-13
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: CLIP-guided Prototype Modulating for Few-shot Action Recognition

    Wang, Xiang / Zhang, Shiwei / Cen, Jun / Gao, Changxin / Zhang, Yingya / Zhao, Deli / Sang, Nong

    2023  

    Abstract: Learning from large-scale contrastive language-image pre-training like CLIP has shown remarkable success in a wide range of downstream tasks recently, but it is still under-explored on the challenging few-shot action recognition (FSAR) task. In this work, ...

    Abstract Learning from large-scale contrastive language-image pre-training like CLIP has shown remarkable success in a wide range of downstream tasks recently, but it is still under-explored on the challenging few-shot action recognition (FSAR) task. In this work, we aim to transfer the powerful multimodal knowledge of CLIP to alleviate the inaccurate prototype estimation issue due to data scarcity, which is a critical problem in low-shot regimes. To this end, we present a CLIP-guided prototype modulating framework called CLIP-FSAR, which consists of two key components: a video-text contrastive objective and a prototype modulation. Specifically, the former bridges the task discrepancy between CLIP and the few-shot video task by contrasting videos and corresponding class text descriptions. The latter leverages the transferable textual concepts from CLIP to adaptively refine visual prototypes with a temporal Transformer. By this means, CLIP-FSAR can take full advantage of the rich semantic priors in CLIP to obtain reliable prototypes and achieve accurate few-shot classification. Extensive experiments on five commonly used benchmarks demonstrate the effectiveness of our proposed method, and CLIP-FSAR significantly outperforms existing state-of-the-art methods under various settings. The source code and models will be publicly available at https://github.com/alibaba-mmai-research/CLIP-FSAR.

    Comment: This work has been submitted to the Springer for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 006
    Publishing date 2023-03-06
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Rethinking Efficient Tuning Methods from a Unified Perspective

    Jiang, Zeyinzi / Mao, Chaojie / Huang, Ziyuan / Lv, Yiliang / Zhao, Deli / Zhou, Jingren

    2023  

    Abstract: Parameter-efficient transfer learning (PETL) based on large-scale pre-trained foundation models has achieved great success in various downstream applications. Existing tuning methods, such as prompt, prefix, and adapter, perform task-specific lightweight ...

    Abstract Parameter-efficient transfer learning (PETL) based on large-scale pre-trained foundation models has achieved great success in various downstream applications. Existing tuning methods, such as prompt, prefix, and adapter, perform task-specific lightweight adjustments to different parts of the original architecture. However, they take effect on only some parts of the pre-trained models, i.e., only the feed-forward layers or the self-attention layers, which leaves the remaining frozen structures unable to adapt to the data distributions of downstream tasks. Further, the existing structures are strongly coupled with the Transformers, hindering parameter-efficient deployment as well as the design flexibility for new approaches. In this paper, we revisit the design paradigm of PETL and derive a unified framework U-Tuning for parameter-efficient transfer learning, which is composed of an operation with frozen parameters and a unified tuner that adapts the operation for downstream applications. The U-Tuning framework can simultaneously encompass existing methods and derive new approaches for parameter-efficient transfer learning, which prove to achieve on-par or better performances on CIFAR-100 and FGVC datasets when compared with existing PETL methods.
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2023-03-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Scanning Only Once

    Pan, Yulin / He, Xiangteng / Gong, Biao / Lv, Yiliang / Shen, Yujun / Peng, Yuxin / Zhao, Deli

    An End-to-end Framework for Fast Temporal Grounding in Long Videos

    2023  

    Abstract: Video temporal grounding aims to pinpoint a video segment that matches the query description. Despite the recent advance in short-form videos (\textit{e.g.}, in minutes), temporal grounding in long videos (\textit{e.g.}, in hours) is still at its early ... ...

    Abstract Video temporal grounding aims to pinpoint a video segment that matches the query description. Despite the recent advance in short-form videos (\textit{e.g.}, in minutes), temporal grounding in long videos (\textit{e.g.}, in hours) is still at its early stage. To address this challenge, a common practice is to employ a sliding window, yet can be inefficient and inflexible due to the limited number of frames within the window. In this work, we propose an end-to-end framework for fast temporal grounding, which is able to model an hours-long video with \textbf{one-time} network execution. Our pipeline is formulated in a coarse-to-fine manner, where we first extract context knowledge from non-overlapped video clips (\textit{i.e.}, anchors), and then supplement the anchors that highly response to the query with detailed content knowledge. Besides the remarkably high pipeline efficiency, another advantage of our approach is the capability of capturing long-range temporal correlation, thanks to modeling the entire video as a whole, and hence facilitates more accurate grounding. Experimental results suggest that, on the long-form video datasets MAD and Ego4d, our method significantly outperforms state-of-the-arts, and achieves \textbf{14.6$\times$} / \textbf{102.8$\times$} higher efficiency respectively. The code will be released at \url{https://github.com/afcedf/SOONet.git}

    Comment: 12 pages, 8 figures
    Keywords Computer Science - Computer Vision and Pattern Recognition
    Subject code 004 ; 006
    Publishing date 2023-03-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top