LIVIVO - Search results -

Search results

Result 1 - 10 of total 25

Search options

Book ; Online: Full-Spectrum Out-of-Distribution Detection

Yang, Jingkang / Zhou, Kaiyang / Liu, Ziwei

2022

Abstract: Existing out-of-distribution (OOD) detection literature clearly defines semantic shift as a sign of OOD but does not have a consensus over covariate shift. Samples experiencing covariate shift but not semantic shift are either excluded from the test set ... ...

Abstract	Existing out-of-distribution (OOD) detection literature clearly defines semantic shift as a sign of OOD but does not have a consensus over covariate shift. Samples experiencing covariate shift but not semantic shift are either excluded from the test set or treated as OOD, which contradicts the primary goal in machine learning -- being able to generalize beyond the training distribution. In this paper, we take into account both shift types and introduce full-spectrum OOD (FS-OOD) detection, a more realistic problem setting that considers both detecting semantic shift and being tolerant to covariate shift; and designs three benchmarks. These new benchmarks have a more fine-grained categorization of distributions (i.e., training ID, covariate-shifted ID, near-OOD, and far-OOD) for the purpose of more comprehensively evaluating the pros and cons of algorithms. To address the FS-OOD detection problem, we propose SEM, a simple feature-based semantics score function. SEM is mainly composed of two probability measures: one is based on high-level features containing both semantic and non-semantic information, while the other is based on low-level feature statistics only capturing non-semantic image styles. With a simple combination, the non-semantic part is cancelled out, which leaves only semantic information in SEM that can better handle FS-OOD detection. Extensive experiments on the three new benchmarks show that SEM significantly outperforms current state-of-the-art methods. Our code and benchmarks are released in https://github.com/Jingkang50/OpenOOD. Comment: Code and benchmarks are integrated in OpenOOD: https://github.com/Jingkang50/OpenOOD, a unified codebase for OOD detection
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
Subject code	006
Publishing date	2022-04-11
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: The PSG challenge: towards comprehensive scene understanding.

Yang, Jingkang / Ma, Zheng / Wang, Qixun / Guo, Xiaofeng / Wang, Haofan / Liu, Ziwei / Zhang, Wayne / Xu, Xing / Zhang, Hai

National science review

2023 Volume 10, Issue 6, Page(s) nwad126

Abstract: The Panoptic Scene Graph Generation (PSG) challenge evaluates computer vision models to identify relations in images beyond object classification and localization, enabling a deeper understanding of scenes for real-world AI applications. ...

Abstract	The Panoptic Scene Graph Generation (PSG) challenge evaluates computer vision models to identify relations in images beyond object classification and localization, enabling a deeper understanding of scenes for real-world AI applications.
Language	English
Publishing date	2023-05-04
Publishing country	China
Document type	Journal Article
ZDB-ID	2745465-4
ISSN	2053-714X ; 2053-714X
ISSN (online)	2053-714X
ISSN	2053-714X
DOI	10.1093/nsr/nwad126
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Book ; Online: Conditional Prompt Learning for Vision-Language Models

Zhou, Kaiyang / Yang, Jingkang / Loy, Chen Change / Liu, Ziwei

2022

Abstract: With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt ... ...

Abstract	With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp's static prompts, our dynamic prompts adapt to each instance and are thus less sensitive to class shift. Extensive experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset; and yields stronger domain generalization performance as well. Code is available at https://github.com/KaiyangZhou/CoOp. Comment: CVPR 2022. TL;DR: We propose a conditional prompt learning approach to solve the generalizability issue of static prompts
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Machine Learning
Subject code	006 ; 004
Publishing date	2022-03-10
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: OtterHD

Li, Bo / Zhang, Peiyuan / Yang, Jingkang / Zhang, Yuanhan / Pu, Fanyi / Liu, Ziwei

A High-Resolution Multi-modality Model

2023

Abstract: In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models that are constrained by fixed-size vision ... ...

Abstract	In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models that are constrained by fixed-size vision encoders, OtterHD-8B boasts the ability to handle flexible input dimensions, ensuring its versatility across various inference requirements. Alongside this model, we introduce MagnifierBench, an evaluation framework designed to scrutinize models' ability to discern minute details and spatial relationships of small objects. Our comparative analysis reveals that while current leading models falter on this benchmark, OtterHD-8B, particularly when directly processing high-resolution inputs, outperforms its counterparts by a substantial margin. The findings illuminate the structural variances in visual information processing among different models and the influence that the vision encoders' pre-training resolution disparities have on model effectiveness within such benchmarks. Our study highlights the critical role of flexibility and high-resolution input capabilities in large multimodal models and also exemplifies the potential inherent in the Fuyu architecture's simplicity for handling complex visual data. Comment: Early Technical Report
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence
Subject code	006
Publishing date	2023-11-07
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Generalized Out-of-Distribution Detection

Yang, Jingkang / Zhou, Kaiyang / Li, Yixuan / Liu, Ziwei

A Survey

2021

Abstract: Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it ... ...

Abstract	Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen during training time and cannot make a safe decision. The term, OOD detection, first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD), are closely related to OOD detection in terms of motivation and methodology. Despite common goals, these topics develop in isolation, and their subtle differences in definition and problem setting often confuse readers and practitioners. In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. We then review each of these five areas by summarizing their recent technical developments, with a special focus on OOD detection methodologies. We conclude this survey with open challenges and potential research directions. Comment: Comments on our Overleaf manuscript are welcome: https://www.overleaf.com/read/hwphdzjfqxbs
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
Subject code	006
Publishing date	2021-10-21
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Otter

Li, Bo / Zhang, Yuanhan / Chen, Liangyu / Wang, Jinghao / Yang, Jingkang / Liu, Ziwei

A Multi-Modal Model with In-Context Instruction Tuning

2023

Abstract: Large language models (LLMs) have demonstrated significant universal capabilities as few/zero-shot learners in various tasks due to their pre-training on vast amounts of text data, as exemplified by GPT-3, which boosted to InstrctGPT and ChatGPT, ... ...

Abstract	Large language models (LLMs) have demonstrated significant universal capabilities as few/zero-shot learners in various tasks due to their pre-training on vast amounts of text data, as exemplified by GPT-3, which boosted to InstrctGPT and ChatGPT, effectively following natural language instructions to accomplish real-world tasks. In this paper, we propose to introduce instruction tuning into multi-modal models, motivated by the Flamingo model's upstream interleaved format pretraining dataset. We adopt a similar approach to construct our MultI-Modal In-Context Instruction Tuning (MIMIC-IT) dataset. We then introduce Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following ability and in-context learning. We also optimize OpenFlamingo's implementation for researchers, democratizing the required training resources from 1$\times$ A100 GPU to 4$\times$ RTX-3090 GPUs, and integrate both OpenFlamingo and Otter into Huggingface Transformers for more researchers to incorporate the models into their customized training and inference pipelines. Comment: Technical Report
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Computation and Language
Publishing date	2023-05-05
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Pair then Relation

Wang, Jinghao / Wen, Zhengyu / Li, Xiangtai / Guo, Zujin / Yang, Jingkang / Liu, Ziwei

Pair-Net for Panoptic Scene Graph Generation

2023

Abstract: Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes. However, current PSG methods have limited performance, ... ...

Abstract	Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes. However, current PSG methods have limited performance, which can hinder downstream task development. To improve PSG methods, we conducted an in-depth analysis to identify the bottleneck of the current PSG models, finding that inter-object pair-wise recall is a crucial factor which was ignored by previous PSG methods. Based on this, we present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects. We also observed the sparse nature of object pairs and used this insight to design a lightweight Matrix Learner within the PPN. Through extensive ablation and analysis, our approach significantly improves upon leveraging the strong segmenter baseline. Notably, our approach achieves new state-of-the-art results on the PSG benchmark, with over 10% absolute gains compared to PSGFormer. The code of this paper is publicly available at https://github.com/king159/Pair-Net. Comment: Project Page: https://github.com/king159/Pair-Net
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence
Subject code	004
Publishing date	2023-07-17
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Predicting Synthetic Lethality in Human Cancers via Multi-Graph Ensemble Neural Network.

Lai, Mincai / Chen, Guangyao / Yang, Haochen / Yang, Jingkang / Jiang, Zhihao / Wu, Min / Zheng, Jie

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

2021 Volume 2021, Page(s) 1731–1734

Abstract: Synthetic lethality (SL) is currently one of the most effective methods to identify new drugs for cancer treatment. It means that simultaneous inactivation target of two non-lethal genes will cause cell death, but loss of either will not. However, ... ...

Abstract	Synthetic lethality (SL) is currently one of the most effective methods to identify new drugs for cancer treatment. It means that simultaneous inactivation target of two non-lethal genes will cause cell death, but loss of either will not. However, detecting SL pair is challenging due to the experimental costs. Artificial intelligence (AI) is a low-cost way to predict the potential SL relation between two genes. In this paper, a new Multi-Graph Ensemble (MGE) network structure combining graph neural network and existing knowledge about genes is proposed to predict SL pairs, which integrates the embedding of each feature with different neural networks to predict if a pair of genes have SL relation. It has a higher prediction performance compared with existing SL prediction methods. Also, with the integration of other biological knowledge, it has the potential of interpretability.
MeSH term(s)	Artificial Intelligence ; Humans ; Neoplasms/genetics ; Neural Networks, Computer ; Synthetic Lethal Mutations/genetics
Language	English
Publishing date	2021-12-07
Publishing country	United States
Document type	Journal Article
ISSN	2694-0604
ISSN (online)	2694-0604
DOI	10.1109/EMBC46164.2021.9630716
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Learning to Prompt for Vision-Language Models

Zhou, Kaiyang / Yang, Jingkang / Loy, Chen Change / Liu, Ziwei

2021

Abstract: Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on ... ...

Abstract	Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming -- one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. Inspired by recent advances in prompt learning research in natural language processing (NLP), we propose Context Optimization (CoOp), a simple approach specifically for adapting CLIP-like vision-language models for downstream image recognition. Concretely, CoOp models a prompt's context words with learnable vectors while the entire pre-trained parameters are kept fixed. To handle different image recognition tasks, we provide two implementations of CoOp: unified context and class-specific context. Through extensive experiments on 11 datasets, we demonstrate that CoOp requires as few as one or two shots to beat hand-crafted prompts with a decent margin and is able to gain significant improvements over prompt engineering with more shots, e.g., with 16 shots the average gain is around 15% (with the highest reaching over 45%). Despite being a learning-based approach, CoOp achieves superb domain generalization performance compared with the zero-shot model using hand-crafted prompts. Comment: International Journal of Computer Vision (IJCV), 2022
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
Subject code	004 ; 006
Publishing date	2021-09-02
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Panoptic Scene Graph Generation

Yang, Jingkang / Ang, Yi Zhe / Guo, Zujin / Zhou, Kaiyang / Zhang, Wayne / Liu, Ziwei

2022

Abstract: Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i.e., objects are detected using bounding boxes followed by prediction of their pairwise relationships. ... ...

Abstract	Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i.e., objects are detected using bounding boxes followed by prediction of their pairwise relationships. We argue that such a paradigm causes several problems that impede the progress of the field. For instance, bounding box-based labels in current datasets usually contain redundant classes like hairs, and leave out background information that is crucial to the understanding of context. In this work, we introduce panoptic scene graph generation (PSG), a new problem task that requires the model to generate a more comprehensive scene graph representation based on panoptic segmentations rather than rigid bounding boxes. A high-quality PSG dataset, which contains 49k well-annotated overlapping images from COCO and Visual Genome, is created for the community to keep track of its progress. For benchmarking, we build four two-stage baselines, which are modified from classic methods in SGG, and two one-stage baselines called PSGTR and PSGFormer, which are based on the efficient Transformer-based detector, i.e., DETR. While PSGTR uses a set of queries to directly learn triplets, PSGFormer separately models the objects and relations in the form of queries from two Transformer decoders, followed by a prompting-like relation-object matching mechanism. In the end, we share insights on open challenges and future directions. Comment: Accepted to ECCV'22 (Paper ID #222, Final Score 2222). Project Page: https://psgdataset.org/. OpenPSG Codebase: https://github.com/Jingkang50/OpenPSG
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Machine Learning ; Computer Science - Multimedia
Subject code	004
Publishing date	2022-07-22
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED