LIVIVO - Search results -

Search results

Result 1 - 9 of total 9

Search options

Book ; Online: Fairness and Bias in Truth Discovery Algorithms

Lazier, Simone / Thirumuruganathan, Saravanan / Anahideh, Hadis

An Experimental Analysis

2023

Abstract: Machine learning (ML) based approaches are increasingly being used in a number of applications with societal impact. Training ML models often require vast amounts of labeled data, and crowdsourcing is a dominant paradigm for obtaining labels from ... ...

Abstract	Machine learning (ML) based approaches are increasingly being used in a number of applications with societal impact. Training ML models often require vast amounts of labeled data, and crowdsourcing is a dominant paradigm for obtaining labels from multiple workers. Crowd workers may sometimes provide unreliable labels, and to address this, truth discovery (TD) algorithms such as majority voting are applied to determine the consensus labels from conflicting worker responses. However, it is important to note that these consensus labels may still be biased based on sensitive attributes such as gender, race, or political affiliation. Even when sensitive attributes are not involved, the labels can be biased due to different perspectives of subjective aspects such as toxicity. In this paper, we conduct a systematic study of the bias and fairness of TD algorithms. Our findings using two existing crowd-labeled datasets, reveal that a non-trivial proportion of workers provide biased results, and using simple approaches for TD is sub-optimal. Our study also demonstrates that popular TD algorithms are not a panacea. Additionally, we quantify the impact of these unfair workers on downstream ML tasks and show that conventional methods for achieving fairness and correcting label biases are ineffective in this setting. We end the paper with a plea for the design of novel bias-aware truth discovery algorithms that can ameliorate these issues. Comment: Accepted in Algorithmic Fairness in Artificial intelligence, Machine learning and Decision Making workshop at SDM 2023
Keywords	Computer Science - Machine Learning ; Computer Science - Computers and Society ; Computer Science - Databases
Subject code	006
Publishing date	2023-04-25
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Big Data, Small Personas: How Algorithms Shape the Demographic Representation of Data-Driven User Segments.

Salminen, Joni / Chhirang, Kamal / Jung, Soon-Gyo / Thirumuruganathan, Saravanan / Guan, Kathleen W / Jansen, Bernard J

Big data

2022 Volume 10, Issue 4, Page(s) 313–336

Abstract: Derived from the notion of algorithmic bias, it is possible that creating user segments such as personas from data results in over- or under-representing certain segments (FAIRNESS), does not properly represent the diversity of the user populations ( ... ...

Abstract	Derived from the notion of algorithmic bias, it is possible that creating user segments such as personas from data results in over- or under-representing certain segments (FAIRNESS), does not properly represent the diversity of the user populations (DIVERSITY), or produces inconsistent results when hyperparameters are changed (CONSISTENCY). Collecting user data on 363M video views from a global news and media organization, we compare personas created from this data using different algorithms. Results indicate that the algorithms fall into two groups: those that generate personas with
MeSH term(s)	Algorithms ; Big Data ; Cultural Diversity ; Demography/statistics & numerical data
Language	English
Publishing date	2022-08-15
Publishing country	United States
Document type	Journal Article
ISSN	2167-647X
ISSN (online)	2167-647X
DOI	10.1089/big.2021.0177
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Fair Active Learning

Anahideh, Hadis / Asudeh, Abolfazl / Thirumuruganathan, Saravanan

2020

Abstract: Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Therefore, it is of critical importance that ML models do not propagate discrimination. Collecting accurate labeled data in societal applications is ... ...

Abstract	Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Therefore, it is of critical importance that ML models do not propagate discrimination. Collecting accurate labeled data in societal applications is challenging and costly. Active learning is a promising approach to build an accurate classifier by interactively querying an oracle within a labeling budget. We design algorithms for fair active learning that carefully selects data points to be labeled so as to balance model accuracy and fairness. We demonstrate the effectiveness and efficiency of our proposed algorithms over widely used benchmark datasets using demographic parity and equalized odds notions of fairness.
Keywords	Computer Science - Machine Learning ; Statistics - Machine Learning
Publishing date	2020-01-06
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Fair Active Learning

Anahideh, Hadis / Asudeh, Abolfazl / Thirumuruganathan, Saravanan

2020

Abstract	Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Therefore, it is of critical importance that ML models do not propagate discrimination. Collecting accurate labeled data in societal applications is challenging and costly. Active learning is a promising approach to build an accurate classifier by interactively querying an oracle within a labeling budget. We design algorithms for fair active learning that carefully selects data points to be labeled so as to balance model accuracy and fairness. Specifically, we focus on demographic parity - a widely used measure of fairness. Extensive experiments over benchmark datasets demonstrate the effectiveness of our proposed approach. Comment: This was intended as a replacement of arXiv:2001.01796 please see the updated version there
Keywords	Computer Science - Machine Learning ; Statistics - Machine Learning
Publishing date	2020-06-20
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Local Embeddings for Relational Data Integration

Cappuzzo, Riccardo / Papotti, Paolo / Thirumuruganathan, Saravanan

2019

Abstract: Deep learning based techniques have been recently used with promising results for data integration problems. Some methods directly use pre-trained embeddings that were trained on a large corpus such as Wikipedia. However, they may not always be an ... ...

Abstract	Deep learning based techniques have been recently used with promising results for data integration problems. Some methods directly use pre-trained embeddings that were trained on a large corpus such as Wikipedia. However, they may not always be an appropriate choice for enterprise datasets with custom vocabulary. Other methods adapt techniques from natural language processing to obtain embeddings for the enterprise's relational data. However, this approach blindly treats a tuple as a sentence, thus losing a large amount of contextual information present in the tuple. We propose algorithms for obtaining local embeddings that are effective for data integration tasks on relational databases. We make four major contributions. First, we describe a compact graph-based representation that allows the specification of a rich set of relationships inherent in the relational world. Second, we propose how to derive sentences from such a graph that effectively "describe" the similarity across elements (tokens, attributes, rows) in the two datasets. The embeddings are learned based on such sentences. Third, we propose effective optimization to improve the quality of the learned embeddings and the performance of integration tasks. Finally, we propose a diverse collection of criteria to evaluate relational embeddings and perform an extensive set of experiments validating them against multiple baseline methods. Our experiments show that our framework, EmbDI, produces meaningful results for data integration tasks such as schema matching and entity resolution both in supervised and unsupervised settings. Comment: Accepted to SIGMOD 2020 as Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks. Code can be found at https://gitlab.eurecom.fr/cappuzzo/embdi
Keywords	Computer Science - Databases ; Computer Science - Computation and Language ; Computer Science - Machine Learning
Subject code	004
Publishing date	2019-09-03
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: ZeroER

Wu, Renzhi / Chaba, Sanya / Sawlani, Saurabh / Chu, Xu / Thirumuruganathan, Saravanan

Entity Resolution using Zero Labeled Examples

2019

Abstract: Entity resolution (ER) refers to the problem of matching records in one or more relations that refer to the same real-world entity. While supervised machine learning (ML) approaches achieve the state-of-the-art results, they require a large amount of ... ...

Abstract	Entity resolution (ER) refers to the problem of matching records in one or more relations that refer to the same real-world entity. While supervised machine learning (ML) approaches achieve the state-of-the-art results, they require a large amount of labeled examples that are expensive to obtain and often times infeasible. We investigate an important problem that vexes practitioners: is it possible to design an effective algorithm for ER that requires Zero labeled examples, yet can achieve performance comparable to supervised approaches? In this paper, we answer in the affirmative through our proposed approach dubbed ZeroER. Our approach is based on a simple observation -- the similarity vectors for matches should look different from that of unmatches. Operationalizing this insight requires a number of technical innovations. First, we propose a simple yet powerful generative model based on Gaussian Mixture Models for learning the match and unmatch distributions. Second, we propose an adaptive regularization technique customized for ER that ameliorates the issue of feature overfitting. Finally, we incorporate the transitivity property into the generative model in a novel way resulting in improved accuracy. On five benchmark ER datasets, we show that ZeroER greatly outperforms existing unsupervised approaches and achieves comparable performance to supervised approaches. Comment: Published at 2020 ACM SIGMOD International Conference on Management of Data
Keywords	Computer Science - Databases ; Computer Science - Machine Learning
Subject code	006
Publishing date	2019-08-16
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: An Empirical Study of Questionnaires for the Diagnosis of Pediatric Obstructive Sleep Apnea.

Ahmed, Sadia / Hasani, Sona / Koone, Mary / Thirumuruganathan, Saravanan / Diaz-Abad, Montserrat / Mitchell, Ron / Isaiah, Amal / Das, Gautam

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

2018 Volume 2018, Page(s) 4097–4100

Abstract: Pediatric Obstructive Sleep Apnea (OSA) is a chronic disorder characterized by the disruption in sleep due to involuntary and temporary cessation of breathing. Definitive diagnosis of OSA requires an intrusive and expensive approach based on ... ...

Abstract	Pediatric Obstructive Sleep Apnea (OSA) is a chronic disorder characterized by the disruption in sleep due to involuntary and temporary cessation of breathing. Definitive diagnosis of OSA requires an intrusive and expensive approach based on polysomnography where the children spend a night in the hospital under the supervision of a sleep technician. The prevalence of OSA is increasing, making the traditional diagnostic approach prohibitively expensive. There has been increasing interest in designing inexpensive approaches to screen children such as the use of questionnaires. In this paper, we study the efficacy of five widely used and representative questionnaires on their ability to diagnose and stratify OSA. Our experiments show that the diagnostic ability of each of these questionnaires is insufficient for widespread clinical use. Using techniques from data mining, we identify the most informative questions and propose a new questionnaire. We show that machine learning models trained based on the answers to our questionnaire can stratify OSA with higher accuracy.
MeSH term(s)	Humans ; Machine Learning ; Polysomnography ; Prevalence ; Sleep Apnea, Obstructive ; Surveys and Questionnaires
Language	English
Publishing date	2018-09-17
Publishing country	United States
Document type	Journal Article ; Research Support, Non-U.S. Gov't ; Research Support, U.S. Gov't, Non-P.H.S.
ISSN	2694-0604
ISSN (online)	2694-0604
DOI	10.1109/EMBC.2018.8513389
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: DeepER -- Deep Entity Resolution

Ebraheem, Muhammad / Thirumuruganathan, Saravanan / Joty, Shafiq / Ouzzani, Mourad / Tang, Nan

2017

Abstract: Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all aspects of ER, there is still a high demand for democratizing ER - humans are heavily involved in labeling data, performing feature engineering, tuning ... ...

Abstract	Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all aspects of ER, there is still a high demand for democratizing ER - humans are heavily involved in labeling data, performing feature engineering, tuning parameters, and defining blocking functions. With the recent advances in deep learning, in particular distributed representation of words (a.k.a. word embeddings), we present a novel ER system, called DeepER, that achieves good accuracy, high efficiency, as well as ease-of-use (i.e., much less human efforts). For accuracy, we use sophisticated composition methods, namely uni- and bi-directional recurrent neural networks (RNNs) with long short term memory (LSTM) hidden units, to convert each tuple to a distributed representation (i.e., a vector), which can in turn be used to effectively capture similarities between tuples. We consider both the case where pre-trained word embeddings are available as well the case where they are not; we present ways to learn and tune the distributed representations. For efficiency, we propose a locality sensitive hashing (LSH) based blocking approach that uses distributed representations of tuples; it takes all attributes of a tuple into consideration and produces much smaller blocks, compared with traditional methods that consider only a few attributes. For ease-of-use, DeepER requires much less human labeled data and does not need feature engineering, compared with traditional machine learning based approaches which require handcrafted features, and similarity functions along with their associated thresholds. We evaluate our algorithms on multiple datasets (including benchmarks, biomedical data, as well as multi-lingual data) and the extensive experimental results show that DeepER outperforms existing solutions. Comment: Accepted to PVLDB 2018 as "Distributed Representations of Tuples for Entity Resolution"
Keywords	Computer Science - Databases
Subject code	006
Publishing date	2017-10-02
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Malware in the Future? Forecasting of Analyst Detection of Cyber Events

Bakdash, Jonathan Z. / Hutchinson, Steve / Zaroukian, Erin G. / Marusich, Laura R. / Thirumuruganathan, Saravanan / Sample, Charmaine / Hoffman, Blaine / Das, Gautam

2017

Abstract: There have been extensive efforts in government, academia, and industry to anticipate, forecast, and mitigate cyber attacks. A common approach is time-series forecasting of cyber attacks based on data from network telescopes, honeypots, and automated ... ...

Abstract	There have been extensive efforts in government, academia, and industry to anticipate, forecast, and mitigate cyber attacks. A common approach is time-series forecasting of cyber attacks based on data from network telescopes, honeypots, and automated intrusion detection/prevention systems. This research has uncovered key insights such as systematicity in cyber attacks. Here, we propose an alternate perspective of this problem by performing forecasting of attacks that are analyst-detected and -verified occurrences of malware. We call these instances of malware cyber event data. Specifically, our dataset was analyst-detected incidents from a large operational Computer Security Service Provider (CSSP) for the U.S. Department of Defense, which rarely relies only on automated systems. Our data set consists of weekly counts of cyber events over approximately seven years. Since all cyber events were validated by analysts, our dataset is unlikely to have false positives which are often endemic in other sources of data. Further, the higher-quality data could be used for a number for resource allocation, estimation of security resources, and the development of effective risk-management strategies. We used a Bayesian State Space Model for forecasting and found that events one week ahead could be predicted. To quantify bursts, we used a Markov model. Our findings of systematicity in analyst-detected cyber attacks are consistent with previous work using other sources. The advanced information provided by a forecast may help with threat awareness by providing a probable value and range for future cyber events one week ahead. Other potential applications for cyber event forecasting include proactive allocation of resources and capabilities for cyber defense (e.g., analyst staffing and sensor configuration) in CSSPs. Enhanced threat awareness may improve cybersecurity. Comment: Revised version resubmitted to journal
Keywords	Computer Science - Cryptography and Security
Subject code	006
Publishing date	2017-07-11
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED