LIVIVO - Search results -

Search results

Result 1 - 10 of total 32

Search options

Article ; Online: Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data.

Pereira, Mayana / Kshirsagar, Meghana / Mukherjee, Sumit / Dodhia, Rahul / Lavista Ferres, Juan / de Sousa, Rafael

2024 Volume 19, Issue 2, Page(s) e0297271

Abstract: Differentially private (DP) synthetic datasets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such ...

Abstract	Differentially private (DP) synthetic datasets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We systematically investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic dataset generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generated using AIM and MWEM PGM algorithms can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data.
MeSH term(s)	Algorithms ; Health Facilities ; Interior Design and Furnishings ; Knowledge ; Machine Learning
Language	English
Publishing date	2024-02-05
Publishing country	United States
Document type	Journal Article
ZDB-ID	2267670-3
ISSN	1932-6203 ; 1932-6203
ISSN (online)	1932-6203
ISSN	1932-6203
DOI	10.1371/journal.pone.0297271
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Article ; Online: Dynamic Grammar Pruning for Program Size Reduction in Symbolic Regression.

Ali, Muhammad Sarmad / Kshirsagar, Meghana / Naredo, Enrique / Ryan, Conor

SN computer science

2023 Volume 4, Issue 4, Page(s) 402

Abstract: Grammar is a key input in grammar-based genetic programming. Grammar design not only influences performance, but also program size. However, grammar design and the choice of productions often require expert input as no automatic approach exists. This ... ...

Abstract	Grammar is a key input in grammar-based genetic programming. Grammar design not only influences performance, but also program size. However, grammar design and the choice of productions often require expert input as no automatic approach exists. This research work discusses our approach to automatically reduce a bloated grammar. By utilizing a simple Production Ranking mechanism, we identify productions which are less useful and dynamically prune those to channel evolutionary search towards better (smaller) solutions. Our objective in this work was program size reduction without compromising generalization performance. We tested our approach on 13 standard symbolic regression datasets with Grammatical Evolution. Using a grammar embodying a well-defined function set as a baseline, we compare effective genome length and test performance with our approach. Dynamic grammar pruning achieved significantly better genome lengths for all datasets, while significantly improving generalization performance on three datasets, although it worsened in five datasets. When we utilized linear scaling during the production ranking stages (the first 20 generations) the results dramatically improved. Not only were the programs smaller in all datasets, but generalization scores were also significantly better than the baseline in 6 out of 13 datasets, and comparable in the rest. When the baseline was also linearly scaled as well, the program size was still smaller with the Production Ranking approach, while generalization scores dropped in only three datasets without any significant compromise in the rest.
Language	English
Publishing date	2023-05-17
Publishing country	Singapore
Document type	Journal Article
ISSN	2661-8907
ISSN (online)	2661-8907
DOI	10.1007/s42979-023-01840-y
Database	MEDical Literature Analysis and Retrieval System OnLINE

Full text online

Accessible to users with ZB MED library card

Order via subito

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Details ▾
- Full text online
- Order with fees

Article ; Online: BindVAE: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin.

Kshirsagar, Meghana / Yuan, Han / Ferres, Juan Lavista / Leslie, Christina

Genome biology

2022 Volume 23, Issue 1, Page(s) 174

Abstract: We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. BindVAE can disentangle an input DNA sequence into distinct ... ...

Abstract	We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. BindVAE can disentangle an input DNA sequence into distinct latent factors that encode cell-type specific in vivo binding signals for individual TFs, composite patterns for TFs involved in cooperative binding, and genomic context surrounding the binding sites. On the task of retrieving the motifs of expressed TFs in a given cell type, BindVAE is competitive with existing motif discovery approaches.
MeSH term(s)	Binding Sites/genetics ; Chromatin ; Chromatin Immunoprecipitation ; Nucleotide Motifs ; Protein Binding/genetics ; Transcription Factors/metabolism
Chemical Substances	Chromatin ; Transcription Factors
Language	English
Publishing date	2022-08-15
Publishing country	England
Document type	Journal Article ; Research Support, Non-U.S. Gov't ; Research Support, N.I.H., Extramural
ZDB-ID	2040529-7
ISSN	1474-760X ; 1474-760X
ISSN (online)	1474-760X
ISSN	1474-760X
DOI	10.1186/s13059-022-02723-w
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article ; Online: Design of a cryptographically secure pseudo random number generator with grammatical evolution.

Ryan, Conor / Kshirsagar, Meghana / Vaidya, Gauri / Cunningham, Andrew / Sivaraman, R

Scientific reports

2022 Volume 12, Issue 1, Page(s) 8602

Abstract: This work investigates the potential for using Grammatical Evolution (GE) to generate an initial seed for the construction of a pseudo-random number generator (PRNG) and cryptographically secure (CS) PRNG. We demonstrate the suitability of GE as an ... ...

Abstract	This work investigates the potential for using Grammatical Evolution (GE) to generate an initial seed for the construction of a pseudo-random number generator (PRNG) and cryptographically secure (CS) PRNG. We demonstrate the suitability of GE as an entropy source and show that the initial seeds exhibit an average entropy value of 7.940560934 for 8-bit entropy, which is close to the ideal value of 8. We then construct two random number generators, GE-PRNG and GE-CSPRNG, both of which employ these initial seeds. We use Monte Carlo simulations to establish the efficacy of the GE-PRNG using an experimental setup designed to estimate the value for pi, in which 100,000,000 random numbers were generated by our system. This returned the value of pi of 3.146564000, which is precise up to six decimal digits for the actual value of pi. We propose a new approach called control_flow_incrementor to generate cryptographically secure random numbers. The random numbers generated with CSPRNG meet the prescribed National Institute of Standards and Technology SP800-22 and the Diehard statistical test requirements. We also present a computational performance analysis of GE-CSPRNG demonstrating its potential to be used in industrial applications.
MeSH term(s)	Monte Carlo Method
Language	English
Publishing date	2022-05-21
Publishing country	England
Document type	Journal Article ; Research Support, Non-U.S. Gov't
ZDB-ID	2615211-3
ISSN	2045-2322 ; 2045-2322
ISSN (online)	2045-2322
ISSN	2045-2322
DOI	10.1038/s41598-022-11613-x
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article: Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning.

Sledzieski, Samuel / Kshirsagar, Meghana / Baek, Minkyung / Berger, Bonnie / Dodhia, Rahul / Ferres, Juan Lavista

bioRxiv : the preprint server for biology

2023

Abstract: Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then fine-tuned in a supervised setting to tailor the model to a ...

Abstract	Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then fine-tuned in a supervised setting to tailor the model to a specific downstream task. However, as model size increases, the computational and memory footprint of fine-tuning becomes a barrier for many research groups. In the field of natural language processing, which has seen a similar explosion in the size of models, these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we newly bring parameter-efficient fine-tuning methods to proteomics. Using the parameter-efficient method LoRA, we train new models for two important proteomic tasks: predicting protein-protein interactions (PPI) and predicting the symmetry of homooligomers. We show that for homooligomer symmetry prediction, these approaches achieve performance competitive with traditional fine-tuning while requiring reduced memory and using three orders of magnitude fewer parameters. On the PPI prediction task, we surprisingly find that PEFT models actually outperform traditional fine-tuning while using two orders of magnitude fewer parameters. Here, we go even further to show that freezing the parameters of the language model and training only a classification head also outperforms fine-tuning, using five orders of magnitude fewer parameters, and that both of these models outperform state-of-the-art PPI prediction methods with substantially reduced compute. We also demonstrate that PEFT is robust to variations in training hyper-parameters, and elucidate where best practices for PEFT in proteomics differ from in natural language processing. Thus, we provide a blueprint to democratize the power of protein language model tuning to groups which have limited computational resources.
Language	English
Publishing date	2023-11-10
Publishing country	United States
Document type	Preprint
DOI	10.1101/2023.11.09.566187
Database	MEDical Literature Analysis and Retrieval System OnLINE

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Details ▾

Article ; Online: Design of a cryptographically secure pseudo random number generator with grammatical evolution

Conor Ryan / Meghana Kshirsagar / Gauri Vaidya / Andrew Cunningham / R. Sivaraman

Scientific Reports, Vol 12, Iss 1, Pp 1-

2022 Volume 10

Abstract: Abstract This work investigates the potential for using Grammatical Evolution (GE) to generate an initial seed for the construction of a pseudo-random number generator (PRNG) and cryptographically secure (CS) PRNG. We demonstrate the suitability of GE as ...

Abstract	Abstract This work investigates the potential for using Grammatical Evolution (GE) to generate an initial seed for the construction of a pseudo-random number generator (PRNG) and cryptographically secure (CS) PRNG. We demonstrate the suitability of GE as an entropy source and show that the initial seeds exhibit an average entropy value of 7.940560934 for 8-bit entropy, which is close to the ideal value of 8. We then construct two random number generators, GE-PRNG and GE-CSPRNG, both of which employ these initial seeds. We use Monte Carlo simulations to establish the efficacy of the GE-PRNG using an experimental setup designed to estimate the value for pi, in which 100,000,000 random numbers were generated by our system. This returned the value of pi of 3.146564000, which is precise up to six decimal digits for the actual value of pi. We propose a new approach called control_flow_incrementor to generate cryptographically secure random numbers. The random numbers generated with CSPRNG meet the prescribed National Institute of Standards and Technology SP800-22 and the Diehard statistical test requirements. We also present a computational performance analysis of GE-CSPRNG demonstrating its potential to be used in industrial applications.
Keywords	Medicine ; R ; Science ; Q
Language	English
Publishing date	2022-05-01T00:00:00Z
Publisher	Nature Portfolio
Document type	Article ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article: BindVAE: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin

Kshirsagar, Meghana / Yuan, Han / Ferres, Juan Lavista / Leslie, Christina

Genome biology. 2022 Dec., v. 23, no. 1

2022

Abstract	We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. BindVAE can disentangle an input DNA sequence into distinct latent factors that encode cell-type specific in vivo binding signals for individual TFs, composite patterns for TFs involved in cooperative binding, and genomic context surrounding the binding sites. On the task of retrieving the motifs of expressed TFs in a given cell type, BindVAE is competitive with existing motif discovery approaches.
Keywords	chromatin ; genome ; genomics ; nucleotide sequences
Language	English
Dates of publication	2022-12
Size	p. 174.
Publishing place	BioMed Central
Document type	Article
ZDB-ID	2040529-7
ISSN	1474-760X
ISSN	1474-760X
DOI	10.1186/s13059-022-02723-w
Database	NAL-Catalogue (AGRICOLA)

Order via subito

Article ; Online: Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network.

Meller, Artur / Ward, Michael / Borowsky, Jonathan / Kshirsagar, Meghana / Lotthammer, Jeffrey M / Oviedo, Felipe / Ferres, Juan Lavista / Bowman, Gregory R

Nature communications

2023 Volume 14, Issue 1, Page(s) 1177

Abstract: Cryptic pockets expand the scope of drug discovery by enabling targeting of proteins currently considered undruggable because they lack pockets in their ground state structures. However, identifying cryptic pockets is labor-intensive and slow. The ... ...

Abstract	Cryptic pockets expand the scope of drug discovery by enabling targeting of proteins currently considered undruggable because they lack pockets in their ground state structures. However, identifying cryptic pockets is labor-intensive and slow. The ability to accurately and rapidly predict if and where cryptic pockets are likely to form from a structure would greatly accelerate the search for druggable pockets. Here, we present PocketMiner, a graph neural network trained to predict where pockets are likely to open in molecular dynamics simulations. Applying PocketMiner to single structures from a newly curated dataset of 39 experimentally confirmed cryptic pockets demonstrates that it accurately identifies cryptic pockets (ROC-AUC: 0.87) >1,000-fold faster than existing methods. We apply PocketMiner across the human proteome and show that predicted pockets open in simulations, suggesting that over half of proteins thought to lack pockets based on available structures likely contain cryptic pockets, vastly expanding the potentially druggable proteome.
MeSH term(s)	Humans ; Pregnancy ; Female ; Proteome ; Drug Discovery ; Labor, Obstetric ; Molecular Dynamics Simulation ; Neural Networks, Computer
Chemical Substances	Proteome
Language	English
Publishing date	2023-03-01
Publishing country	England
Document type	Journal Article ; Research Support, Non-U.S. Gov't ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S.
ZDB-ID	2553671-0
ISSN	2041-1723 ; 2041-1723
ISSN (online)	2041-1723
ISSN	2041-1723
DOI	10.1038/s41467-023-36699-3
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article ; Online: An epigenetic barrier sets the timing of human neuronal maturation.

Ciceri, Gabriele / Baggiolini, Arianna / Cho, Hyein S / Kshirsagar, Meghana / Benito-Kwiecinski, Silvia / Walsh, Ryan M / Aromolaran, Kelly A / Gonzalez-Hernandez, Alberto J / Munguba, Hermany / Koo, So Yeon / Xu, Nan / Sevilla, Kaylin J / Goldstein, Peter A / Levitz, Joshua / Leslie, Christina S / Koche, Richard P / Studer, Lorenz

Nature

2024 Volume 626, Issue 8000, Page(s) 881–890

Abstract: The pace of human brain development is highly protracted compared with most other ... ...

Abstract	The pace of human brain development is highly protracted compared with most other species
MeSH term(s)	Adult ; Animals ; Humans ; Mice ; Epigenesis, Genetic ; Gene Expression Regulation, Developmental ; Histocompatibility Antigens/metabolism ; Histone-Lysine N-Methyltransferase/antagonists & inhibitors ; Histone-Lysine N-Methyltransferase/metabolism ; Human Embryonic Stem Cells/cytology ; Human Embryonic Stem Cells/metabolism ; Neural Stem Cells/cytology ; Neural Stem Cells/metabolism ; Neurogenesis/genetics ; Neurons/cytology ; Neurons/metabolism ; Time Factors ; Transcription, Genetic
Chemical Substances	DOT1L protein, human (EC 2.1.1.-) ; EHMT1 protein, human (EC 2.1.1.-) ; EHMT2 protein, human (EC 2.1.1.43) ; EZH2 protein, human (EC 2.1.1.43) ; Histocompatibility Antigens ; Histone-Lysine N-Methyltransferase (EC 2.1.1.43)
Language	English
Publishing date	2024-01-31
Publishing country	England
Document type	Journal Article
ZDB-ID	120714-3
ISSN	1476-4687 ; 0028-0836
ISSN (online)	1476-4687
ISSN	0028-0836
DOI	10.1038/s41586-023-06984-8
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 26: Show issues			Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (1.OG) ab Jg. 2022: Lesesaal (EG)
Zs.MG 9: Show issues
Zs.MO 244: Show issues

Order via subito

Details ▾
- See ZB MED holdings
- Order with fees

Book ; Online: Assessment of Differentially Private Synthetic Data for Utility and Fairness in End-to-End Machine Learning Pipelines for Tabular Data

Pereira, Mayana / Kshirsagar, Meghana / Mukherjee, Sumit / Dodhia, Rahul / Ferres, Juan Lavista / de Sousa, Rafael

2023

Abstract: Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas ... ...

Abstract	Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic data set generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generator MWEM PGM can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data. Comment: arXiv admin note: text overlap with arXiv:2106.10241
Keywords	Computer Science - Machine Learning ; Computer Science - Cryptography and Security
Subject code	006
Publishing date	2023-10-29
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

More links

Kategorien

Order via subito

Full text online

More links

Kategorien

Order via subito

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

More links

Kategorien

Order via subito

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

More links

Kategorien

Order via subito

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED