LIVIVO - Suchergebnisse -

Suchergebnis

Treffer 1 - 10 von insgesamt 35

Suchoptionen

Buch ; Online: Token Merging for Fast Stable Diffusion

Bolya, Daniel / Hoffman, Judy

2023

Abstract: The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers ... ...

Abstract	The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd. Comment: Check out the code at https://github.com/dbolya/tomesd
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition
Thema/Rubrik (Code)	006
Erscheinungsdatum	2023-03-30
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: Mitigating Bias in Visual Transformers via Targeted Alignment

Sudhakar, Sruthi / Prabhu, Viraj / Krishnakumar, Arvindkumar / Hoffman, Judy

2023

Abstract: As transformer architectures become increasingly prevalent in computer vision, it is critical to understand their fairness implications. We perform the first study of the fairness of transformers applied to computer vision and benchmark several bias ... ...

Abstract	As transformer architectures become increasingly prevalent in computer vision, it is critical to understand their fairness implications. We perform the first study of the fairness of transformers applied to computer vision and benchmark several bias mitigation approaches from prior work. We visualize the feature space of the transformer self-attention modules and discover that a significant portion of the bias is encoded in the query matrix. With this knowledge, we propose TADeT, a targeted alignment strategy for debiasing transformers that aims to discover and remove bias primarily from query matrix features. We measure performance using Balanced Accuracy and Standard Accuracy, and fairness using Equalized Odds and Balanced Accuracy Difference. TADeT consistently leads to improved fairness over prior work on multiple attribute prediction tasks on the CelebA dataset, without compromising performance.
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition
Thema/Rubrik (Code)	004
Erscheinungsdatum	2023-02-08
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: Window Attention is Bugged

Bolya, Daniel / Ryali, Chaitanya / Hoffman, Judy / Feichtenhofer, Christoph

How not to Interpolate Position Embeddings

2023

Abstract: Window attention, position embeddings, and high resolution finetuning are core concepts in the modern transformer era of computer vision. However, we find that naively combining these near ubiquitous components can have a detrimental effect on ... ...

Abstract	Window attention, position embeddings, and high resolution finetuning are core concepts in the modern transformer era of computer vision. However, we find that naively combining these near ubiquitous components can have a detrimental effect on performance. The issue is simple: interpolating position embeddings while using window attention is wrong. We study two state-of-the-art methods that have these three components, namely Hiera and ViTDet, and find that both do indeed suffer from this bug. To fix it, we introduce a simple absolute window position embedding strategy, which solves the bug outright in Hiera and allows us to increase both speed and performance of the model in ViTDet. We finally combine the two to obtain HieraDet, which achieves 61.7 box mAP on COCO, making it state-of-the-art for models that only use ImageNet-1k pretraining. This all stems from what is essentially a 3 line bug fix, which we name "absolute win". Comment: Preprint. Code release will be coming in the future
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition
Erscheinungsdatum	2023-11-09
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: LANCE

Prabhu, Viraj / Yenamandra, Sriram / Chattopadhyay, Prithvijit / Hoffman, Judy

Stress-testing Visual Models by Generating Language-guided Counterfactual Images

2023

Abstract: We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). Our method leverages recent progress in large language modeling and text-based image editing to augment an IID test ... ...

Abstract	We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). Our method leverages recent progress in large language modeling and text-based image editing to augment an IID test set with a suite of diverse, realistic, and challenging test images without altering model weights. We benchmark the performance of a diverse set of pre-trained models on our generated data and observe significant and consistent performance drops. We further analyze model sensitivity across different types of edits, and demonstrate its applicability at surfacing previously unknown class-level model biases in ImageNet. Code is available at https://github.com/virajprabhu/lance. Comment: NeurIPS 2023 camera ready. Project webpage: https://virajprabhu.github.io/lance-web/
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Computation and Language ; Computer Science - Machine Learning
Thema/Rubrik (Code)	004
Erscheinungsdatum	2023-05-30
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: ICON$^2$

Sudhakar, Sruthi / Prabhu, Viraj / Russakovsky, Olga / Hoffman, Judy

Reliably Benchmarking Predictive Inequity in Object Detection

2023

Abstract: As computer vision systems are being increasingly deployed at scale in high-stakes applications like autonomous driving, concerns about social bias in these systems are rising. Analysis of fairness in real-world vision systems, such as object detection ... ...

Abstract	As computer vision systems are being increasingly deployed at scale in high-stakes applications like autonomous driving, concerns about social bias in these systems are rising. Analysis of fairness in real-world vision systems, such as object detection in driving scenes, has been limited to observing predictive inequity across attributes such as pedestrian skin tone, and lacks a consistent methodology to disentangle the role of confounding variables e.g. does my model perform worse for a certain skin tone, or are such scenes in my dataset more challenging due to occlusion and crowds? In this work, we introduce ICON$^2$, a framework for robustly answering this question. ICON$^2$ leverages prior knowledge on the deficiencies of object detection systems to identify performance discrepancies across sub-populations, compute correlations between these potential confounders and a given sensitive attribute, and control for the most likely confounders to obtain a more reliable estimate of model bias. Using our approach, we conduct an in-depth study on the performance of object detection with respect to income from the BDD100K driving dataset, revealing useful insights. Comment: Accepted to CVPR 2023 SSAD Workshop
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition
Thema/Rubrik (Code)	004
Erscheinungsdatum	2023-06-07
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: Benchmarking Low-Shot Robustness to Natural Distribution Shifts

Singh, Aaditya / Sarangmath, Kartik / Chattopadhyay, Prithvijit / Hoffman, Judy

2023

Abstract: Robustness to natural distribution shifts has seen remarkable progress thanks to recent pre-training strategies combined with better fine-tuning methods. However, such fine-tuning assumes access to large amounts of labelled data, and the extent to which ... ...

Abstract	Robustness to natural distribution shifts has seen remarkable progress thanks to recent pre-training strategies combined with better fine-tuning methods. However, such fine-tuning assumes access to large amounts of labelled data, and the extent to which the observations hold when the amount of training data is not as high remains unknown. We address this gap by performing the first in-depth study of robustness to various natural distribution shifts in different low-shot regimes: spanning datasets, architectures, pre-trained initializations, and state-of-the-art robustness interventions. Most importantly, we find that there is no single model of choice that is often more robust than others, and existing interventions can fail to improve robustness on some datasets even if they do so in the full-shot regime. We hope that our work will motivate the community to focus on this problem of practical importance. Comment: 22 Pages, 18 Tables, 12 Figures
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
Thema/Rubrik (Code)	006
Erscheinungsdatum	2023-04-21
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: FACTS

Yenamandra, Sriram / Ramesh, Pratik / Prabhu, Viraj / Hoffman, Judy

First Amplify Correlations and Then Slice to Discover Bias

2023

Abstract: Computer vision datasets frequently contain spurious correlations between task-relevant labels and (easy to learn) latent task-irrelevant attributes (e.g. context). Models trained on such datasets learn "shortcuts" and underperform on bias-conflicting ... ...

Abstract	Computer vision datasets frequently contain spurious correlations between task-relevant labels and (easy to learn) latent task-irrelevant attributes (e.g. context). Models trained on such datasets learn "shortcuts" and underperform on bias-conflicting slices of data where the correlation does not hold. In this work, we study the problem of identifying such slices to inform downstream bias mitigation strategies. We propose First Amplify Correlations and Then Slice to Discover Bias (FACTS), wherein we first amplify correlations to fit a simple bias-aligned hypothesis via strongly regularized empirical risk minimization. Next, we perform correlation-aware slicing via mixture modeling in bias-aligned feature space to discover underperforming data slices that capture distinct correlations. Despite its simplicity, our method considerably improves over prior work (by as much as 35% precision@10) in correlation bias identification across a range of diverse evaluation settings. Our code is available at: https://github.com/yvsriram/FACTS. Comment: Accepted to ICCV 2023
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition
Erscheinungsdatum	2023-09-29
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: We're Not Using Videos Effectively

Kareer, Simar / Vijaykumar, Vivek / Maheshwari, Harsh / Chattopadhyay, Prithvijit / Hoffman, Judy / Prabhu, Viraj

An Updated Domain Adaptive Video Segmentation Baseline

2024

Abstract: There has been abundant work in unsupervised domain adaptation for semantic segmentation (DAS) seeking to adapt a model trained on images from a labeled source domain to an unlabeled target domain. While the vast majority of prior work has studied this ... ...

Abstract	There has been abundant work in unsupervised domain adaptation for semantic segmentation (DAS) seeking to adapt a model trained on images from a labeled source domain to an unlabeled target domain. While the vast majority of prior work has studied this as a frame-level Image-DAS problem, a few Video-DAS works have sought to additionally leverage the temporal signal present in adjacent frames. However, Video-DAS works have historically studied a distinct set of benchmarks from Image-DAS, with minimal cross-benchmarking. In this work, we address this gap. Surprisingly, we find that (1) even after carefully controlling for data and model architecture, state-of-the-art Image-DAS methods (HRDA and HRDA+MIC)} outperform Video-DAS methods on established Video-DAS benchmarks (+14.5 mIoU on Viper$\rightarrow$CityscapesSeq, +19.0 mIoU on Synthia$\rightarrow$CityscapesSeq), and (2) naive combinations of Image-DAS and Video-DAS techniques only lead to marginal improvements across datasets. To avoid siloed progress between Image-DAS and Video-DAS, we open-source our codebase with support for a comprehensive set of Video-DAS and Image-DAS methods on a common benchmark. Code available at https://github.com/SimarKareer/UnifiedVideoDA Comment: TMLR 2024
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition
Thema/Rubrik (Code)	004
Erscheinungsdatum	2024-02-01
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: ZipIt! Merging Models from Different Tasks without Training

Stoica, George / Bolya, Daniel / Bjorner, Jakob / Hearn, Taylor / Hoffman, Judy

2023

Abstract: Typical deep visual recognition models are capable of performing the one task they were trained on. In this paper, we tackle the extremely difficult problem of combining completely distinct models with different initializations, each solving a separate ... ...

Abstract	Typical deep visual recognition models are capable of performing the one task they were trained on. In this paper, we tackle the extremely difficult problem of combining completely distinct models with different initializations, each solving a separate task, into one multi-task model without any additional training. Prior work in model merging permutes one model to the space of the other then adds them together. While this works for models trained on the same task, we find that this fails to account for the differences in models trained on disjoint tasks. Thus, we introduce "ZipIt!", a general method for merging two arbitrary models of the same architecture that incorporates two simple strategies. First, in order to account for features that aren't shared between models, we expand the model merging problem to additionally allow for merging features within each model by defining a general "zip" operation. Second, we add support for partially zipping the models up until a specified layer, naturally creating a multi-head model. We find that these two changes combined account for a staggering 20-60% improvement over prior work, making the merging of models trained on disjoint tasks feasible.
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning
Thema/Rubrik (Code)	004
Erscheinungsdatum	2023-05-04
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: AUGCAL

Chattopadhyay, Prithvijit / Goyal, Bharat / Ecsedi, Boglarka / Prabhu, Viraj / Hoffman, Judy

Improving Sim2Real Adaptation by Uncertainty Calibration on Augmented Synthetic Images

2023

Abstract: Synthetic data (SIM) drawn from simulators have emerged as a popular alternative for training models where acquiring annotated real-world images is difficult. However, transferring models trained on synthetic images to real-world applications can be ... ...

Abstract	Synthetic data (SIM) drawn from simulators have emerged as a popular alternative for training models where acquiring annotated real-world images is difficult. However, transferring models trained on synthetic images to real-world applications can be challenging due to appearance disparities. A commonly employed solution to counter this SIM2REAL gap is unsupervised domain adaptation, where models are trained using labeled SIM data and unlabeled REAL data. Mispredictions made by such SIM2REAL adapted models are often associated with miscalibration - stemming from overconfident predictions on real data. In this paper, we introduce AUGCAL, a simple training-time patch for unsupervised adaptation that improves SIM2REAL adapted models by - (1) reducing overall miscalibration, (2) reducing overconfidence in incorrect predictions and (3) improving confidence score reliability by better guiding misclassification detection - all while retaining or improving SIM2REAL performance. Given a base SIM2REAL adaptation algorithm, at training time, AUGCAL involves replacing vanilla SIM images with strongly augmented views (AUG intervention) and additionally optimizing for a training time calibration loss on augmented SIM predictions (CAL intervention). We motivate AUGCAL using a brief analytical justification of how to reduce miscalibration on unlabeled REAL data. Through our experiments, we empirically show the efficacy of AUGCAL across multiple adaptation methods, backbones, tasks and shifts.
Schlagwörter	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning
Thema/Rubrik (Code)	006
Erscheinungsdatum	2023-12-10
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Zum Seitenanfang

Ihre letzten Suchen

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED