LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 3 of total 3

Search options

  1. Book ; Online: Conformal prediction under ambiguous ground truth

    Stutz, David / Roy, Abhijit Guha / Matejovicova, Tatiana / Strachan, Patricia / Cemgil, Ali Taylan / Doucet, Arnaud

    2023  

    Abstract: In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quantification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held- ...

    Abstract In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quantification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held-out calibration set with access to ground truth labels. Unfortunately, in many domains, such labels are difficult to obtain and usually approximated by aggregating expert opinions. In fact, this holds true for almost all datasets, including well-known ones such as CIFAR and ImageNet. Applying conformal prediction using such labels underestimates uncertainty. Indeed, when expert opinions are not resolvable, there is inherent ambiguity present in the labels. That is, we do not have ``crisp'', definitive ground truth labels and this uncertainty should be taken into account during calibration. In this paper, we develop a conformal prediction framework for such ambiguous ground truth settings which relies on an approximation of the underlying posterior distribution of labels given inputs. We demonstrate our methodology on synthetic and real datasets, including a case study of skin condition classification in dermatology.
    Keywords Computer Science - Machine Learning ; Computer Science - Computer Vision and Pattern Recognition ; Statistics - Methodology ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2023-07-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Evaluating AI systems under uncertain ground truth

    Stutz, David / Cemgil, Ali Taylan / Roy, Abhijit Guha / Matejovicova, Tatiana / Barsbey, Melih / Strachan, Patricia / Schaekermann, Mike / Freyberg, Jan / Rikhye, Rajeev / Freeman, Beverly / Matos, Javier Perez / Telang, Umesh / Webster, Dale R. / Liu, Yuan / Corrado, Greg S. / Matias, Yossi / Kohli, Pushmeet / Liu, Yun / Doucet, Arnaud /
    Karthikesalingam, Alan

    a case study in dermatology

    2023  

    Abstract: For safety, AI systems in health undergo thorough evaluations before deployment, validating their predictions against a ground truth that is assumed certain. However, this is actually not the case and the ground truth may be uncertain. Unfortunately, ... ...

    Abstract For safety, AI systems in health undergo thorough evaluations before deployment, validating their predictions against a ground truth that is assumed certain. However, this is actually not the case and the ground truth may be uncertain. Unfortunately, this is largely ignored in standard evaluation of AI models but can have severe consequences such as overestimating the future performance. To avoid this, we measure the effects of ground truth uncertainty, which we assume decomposes into two main components: annotation uncertainty which stems from the lack of reliable annotations, and inherent uncertainty due to limited observational information. This ground truth uncertainty is ignored when estimating the ground truth by deterministically aggregating annotations, e.g., by majority voting or averaging. In contrast, we propose a framework where aggregation is done using a statistical model. Specifically, we frame aggregation of annotations as posterior inference of so-called plausibilities, representing distributions over classes in a classification setting, subject to a hyper-parameter encoding annotator reliability. Based on this model, we propose a metric for measuring annotation uncertainty and provide uncertainty-adjusted metrics for performance evaluation. We present a case study applying our framework to skin condition classification from images where annotations are provided in the form of differential diagnoses. The deterministic adjudication process called inverse rank normalization (IRN) from previous work ignores ground truth uncertainty in evaluation. Instead, we present two alternative statistical models: a probabilistic version of IRN and a Plackett-Luce-based model. We find that a large portion of the dataset exhibits significant ground truth uncertainty and standard IRN-based evaluation severely over-estimates performance without providing uncertainty estimates.
    Keywords Computer Science - Machine Learning ; Computer Science - Computer Vision and Pattern Recognition ; Statistics - Methodology ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2023-07-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Perception Test

    Pătrăucean, Viorica / Smaira, Lucas / Gupta, Ankush / Continente, Adrià Recasens / Markeeva, Larisa / Banarse, Dylan / Koppula, Skanda / Heyward, Joseph / Malinowski, Mateusz / Yang, Yi / Doersch, Carl / Matejovicova, Tatiana / Sulsky, Yury / Miech, Antoine / Frechette, Alex / Klimczak, Hanna / Koster, Raphael / Zhang, Junlin / Winkler, Stephanie /
    Aytar, Yusuf / Osindero, Simon / Damen, Dima / Zisserman, Andrew / Carreira, João

    A Diagnostic Benchmark for Multimodal Video Models

    2023  

    Abstract: We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e. ...

    Abstract We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities, to provide a comprehensive and efficient evaluation tool. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime. For these purposes, the Perception Test introduces 11.6k real-world videos, 23s average length, designed to show perceptually interesting situations, filmed by around 100 participants worldwide. The videos are densely annotated with six types of labels (multiple-choice and grounded video question-answers, object and point tracks, temporal action and sound segments), enabling both language and non-language evaluations. The fine-tuning and validation splits of the benchmark are publicly available (CC-BY license), in addition to a challenge server with a held-out test split. Human baseline results compared to state-of-the-art video QA models show a substantial gap in performance (91.4% vs 46.2%), suggesting that there is significant room for improvement in multimodal video understanding. Dataset, baseline code, and challenge server are available at https://github.com/deepmind/perception_test

    Comment: 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Subject code 004
    Publishing date 2023-05-23
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top