LIVIVO - Search results -

Search results

Result 1 - 10 of total 58

Search options

Article ; Online: Gene Set Summarization using Large Language Models.

Joachimiak, Marcin P / Caufield, J Harry / Harris, Nomi L / Kim, Hyeongsik / Mungall, Christopher J

ArXiv

2023

Abstract: Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function ... ...

Abstract	Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.
Language	English
Publishing date	2023-05-25
Publishing country	United States
Document type	Preprint
ISSN	2331-8422
ISSN (online)	2331-8422
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

Caufield, J Harry / Hegde, Harshad / Emonet, Vincent / Harris, Nomi L / Joachimiak, Marcin P / Matentzoglu, Nicolas / Kim, HyeongSik / Moxon, Sierra / Reese, Justin T / Haendel, Melissa A / Robinson, Peter N / Mungall, Christopher J

Bioinformatics (Oxford, England)

2024 Volume 40, Issue 3

Abstract: Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and ... ...

Abstract	Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. Results: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. Availability and implementation: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.
MeSH term(s)	Semantics ; Knowledge Bases ; Databases, Factual
Language	English
Publishing date	2024-02-20
Publishing country	England
Document type	Journal Article
ZDB-ID	1422668-6
ISSN	1367-4811 ; 1367-4803
ISSN (online)	1367-4811
ISSN	1367-4803
DOI	10.1093/bioinformatics/btae104
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 2374: Show issues

Location:
Je nach Verfügbarkeit (siehe Angabe bei Bestand)
bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular
Jg. 1995 - 2021: Lesesall (2.OG)
ab Jg. 2022: Lesesaal (EG)

Order via subito

Details ▾
- See ZB MED holdings
- Order with fees

Book ; Online: Gene Set Summarization using Large Language Models

Joachimiak, Marcin P. / Caufield, J. Harry / Harris, Nomi L. / Kim, Hyeongsik / Mungall, Christopher J.

2023

Abstract	Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.
Keywords	Quantitative Biology - Genomics ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Quantitative Biology - Quantitative Methods
Subject code	004
Publishing date	2023-05-20
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: BOSC 2022: the first hybrid and 23rd annual Bioinformatics Open Source Conference.

Harris, Nomi L / Hokamp, Karsten / Ménager, Hervé / Munoz-Torres, Monica / Unni, Deepak / Vasilevsky, Nicole / Williams, Jason

F1000Research

2022 Volume 11, Page(s) 1034

Abstract: ... The ... ...

Abstract	The 23
MeSH term(s)	Computational Biology ; Congresses as Topic ; Humans ; Systems Biology
Language	English
Publishing date	2022-09-12
Publishing country	England
Document type	Editorial
ZDB-ID	2699932-8
ISSN	2046-1402 ; 2046-1402
ISSN (online)	2046-1402
ISSN	2046-1402
DOI	10.12688/f1000research.125043.1
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article ; Online: BOSC 2023, the 24th annual Bioinformatics Open Source Conference.

Harris, Nomi L / Fields, Christopher J / Hokamp, Karsten / Just, Jérémy / Khetani, Radhika / Maia, Jessica / Ménager, Hervé / Munoz-Torres, Monica C / Unni, Deepak / Williams, Jason

F1000Research

2023 Volume 12, Page(s) 1568

Abstract: The 24th annual Bioinformatics Open Source Conference ( BOSC 2023) was part of the 2023i conference on Intelligent Systems for Molecular Biology and the European Conference on Computational Biology (ISMB/ECCB 2023). Launched in 2000 and held yearly since, ...

Abstract	The 24th annual Bioinformatics Open Source Conference ( BOSC 2023) was part of the 2023i conference on Intelligent Systems for Molecular Biology and the European Conference on Computational Biology (ISMB/ECCB 2023). Launched in 2000 and held yearly since, BOSC is the premier meeting covering open-source bioinformatics and open science. Like ISMB 2022, the 2023 meeting was a hybrid conference, with the in-person component hosted in Lyon, France. ISMB/ECCB attracted a near-record number of attendees, with over 2100 in person and about 900 more online. Approximately 200 people participated in BOSC sessions. In addition to 43 talks and 49 posters, BOSC featured two keynotes: Sara El-Gebali, who spoke about "A New Odyssey: Pioneering the Future of Scientific Progress Through Open Collaboration", and Joseph Yracheta, who spoke about "The Dissonance between Scientific Altruism & Capitalist Extraction: The Zero Trust and Federated Data Sovereignty Solution." Once again, a joint session brought together BOSC and the Bio-Ontologies COSI. The conference ended with a panel on Open and Ethical Data Sharing. As in prior years, BOSC was preceded by a CollaborationFest, a collaborative work event that brought together about 40 participants interested in synergistically combining ideas, shaping project plans, developing software, and more.
MeSH term(s)	Humans ; Computational Biology ; Software ; Information Dissemination
Language	English
Publishing date	2023-12-07
Publishing country	England
Document type	Editorial
ZDB-ID	2699932-8
ISSN	2046-1402 ; 2046-1402
ISSN (online)	2046-1402
ISSN	2046-1402
DOI	10.12688/f1000research.143015.1
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article ; Online: BOSC 2021, the 22nd Annual Bioinformatics Open Source Conference.

Harris, Nomi L / Cock, Peter J A / Fields, Christopher J / Hokamp, Karsten / Maia, Jessica / Munoz-Torres, Monica / Sharan, Malvika / Williams, Jason

F1000Research

2021 Volume 10

Abstract: The 22nd annual Bioinformatics Open Source Conference (BOSC 2021, open-bio.org/events/bosc-2021/) was held online as a track of the 2021 Intelligent Systems for Molecular Biology / European Conference on Computational Biology (ISMB/ECCB) conference. ... ...

Abstract	The 22nd annual Bioinformatics Open Source Conference (BOSC 2021, open-bio.org/events/bosc-2021/) was held online as a track of the 2021 Intelligent Systems for Molecular Biology / European Conference on Computational Biology (ISMB/ECCB) conference. Launched in 2000 and held every year since, BOSC is the premier meeting covering topics related to open source software and open science in bioinformatics. In 2020, BOSC partnered with the Galaxy Community Conference to form the Bioinformatics Community Conference (BCC2020); that was the first BOSC to be held online. This year, BOSC returned to its roots as part of ISMB/ECCB 2021. As in 2020, the Covid-19 pandemic made it impossible to hold the conference in person, so ISMB/ECCB 2021 took place as an online meeting attended by over 2000 people from 79 countries. Nearly 200 people participated in BOSC sessions, which included 27 talks reviewed and selected from submitted abstracts, and three invited keynote talks representing a range of global perspectives on the role of open science and open source in driving research and inclusivity in the biosciences, one of which was presented in French with English subtitles.
MeSH term(s)	Computational Biology ; Humans ; Pandemics ; Software
Language	English
Publishing date	2021-10-18
Publishing country	England
Document type	Congress ; Editorial
ZDB-ID	2699932-8
ISSN	2046-1402 ; 2046-1402
ISSN (online)	2046-1402
ISSN	2046-1402
DOI	10.12688/f1000research.74074.1
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article ; Online: The 21st annual Bioinformatics Open Source Conference (BOSC 2020, part of BCC2020).

Harris, Nomi L / Cock, Peter J A / Fields, Christopher J / Hokamp, Karsten / Maia, Jessica / Munoz-Torres, Monica / Taschuk, Morgan / Yehudi, Yo

F1000Research

2020 Volume 9

Abstract: Launched in 2000 and held every year since, the Bioinformatics Open Source Conference (BOSC) is a volunteer-run meeting coordinated by the Open Bioinformatics Foundation (OBF) that covers open source software development and open science in ... ...

Abstract	Launched in 2000 and held every year since, the Bioinformatics Open Source Conference (BOSC) is a volunteer-run meeting coordinated by the Open Bioinformatics Foundation (OBF) that covers open source software development and open science in bioinformatics. Most years, BOSC has been part of the Intelligent Systems for Molecular Biology (ISMB) conference, but in 2018, and again in 2020, BOSC partnered with the Galaxy Community Conference (GCC). This year's combined BOSC + GCC conference was called the Bioinformatics Community Conference (BCC2020, bcc2020.github.io). Originally slated to take place in Toronto, Canada, BCC2020 was moved online due to COVID-19. The meeting started with a wide array of training sessions; continued with a main program of keynote presentations, talks, posters, Birds of a Feather, and more; and ended with four days of collaboration (CoFest). Efforts to make the meeting accessible and inclusive included very low registration fees, talks presented twice a day, and closed captioning for all videos. More than 800 people from 61 countries registered for at least one part of the meeting, which was held mostly in the Remo.co video-conferencing platform.
MeSH term(s)	Canada ; Computational Biology ; Congresses as Topic ; Humans
Keywords	covid19
Language	English
Publishing date	2020-09-21
Publishing country	England
Document type	Editorial
ZDB-ID	2699932-8
ISSN	2046-1402 ; 2046-1402
ISSN (online)	2046-1402
ISSN	2046-1402
DOI	10.12688/f1000research.26498.1
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Book ; Online: Structured prompt interrogation and recursive extraction of semantics (SPIRES)

Caufield, J. Harry / Hegde, Harshad / Emonet, Vincent / Harris, Nomi L. / Joachimiak, Marcin P. / Matentzoglu, Nicolas / Kim, HyeongSik / Moxon, Sierra A. T. / Reese, Justin T. / Haendel, Melissa A. / Robinson, Peter N. / Mungall, Christopher J.

A method for populating knowledge bases using zero-shot learning

2023

Abstract: Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able ... ...

Abstract	Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.
Keywords	Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
Subject code	004
Publishing date	2023-04-05
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: MapperGPT

Matentzoglu, Nicolas / Caufield, J. Harry / Hegde, Harshad B. / Reese, Justin T. / Moxon, Sierra / Kim, Hyeongsik / Harris, Nomi L. / Haendel, Melissa A / Mungall, Christopher J.

Large Language Models for Linking and Mapping Entities

2023

Abstract: Aligning terminological resources, including ontologies, controlled vocabularies, taxonomies, and value sets is a critical part of data integration in many domains such as healthcare, chemistry, and biomedical research. Entity mapping is the process of ... ...

Abstract	Aligning terminological resources, including ontologies, controlled vocabularies, taxonomies, and value sets is a critical part of data integration in many domains such as healthcare, chemistry, and biomedical research. Entity mapping is the process of determining correspondences between entities across these resources, such as gene identifiers, disease concepts, or chemical entity identifiers. Many tools have been developed to compute such mappings based on common structural features and lexical information such as labels and synonyms. Lexical approaches in particular often provide very high recall, but low precision, due to lexical ambiguity. As a consequence of this, mapping efforts often resort to a labor intensive manual mapping refinement through a human curator. Large Language Models (LLMs), such as the ones employed by ChatGPT, have generalizable abilities to perform a wide range of tasks, including question-answering and information extraction. Here we present MapperGPT, an approach that uses LLMs to review and refine mapping relationships as a post-processing step, in concert with existing high-recall methods that are based on lexical and structural heuristics. We evaluated MapperGPT on a series of alignment tasks from different domains, including anatomy, developmental biology, and renal diseases. We devised a collection of tasks that are designed to be particularly challenging for lexical methods. We show that when used in combination with high-recall methods, MapperGPT can provide a substantial improvement in accuracy, beating state-of-the-art (SOTA) methods such as LogMap.
Keywords	Computer Science - Computation and Language ; Computer Science - Artificial Intelligence
Subject code	401
Publishing date	2023-10-05
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: ROBOT: A Tool for Automating Ontology Workflows.

Jackson, Rebecca C / Balhoff, James P / Douglass, Eric / Harris, Nomi L / Mungall, Christopher J / Overton, James A

BMC bioinformatics

2019 Volume 20, Issue 1, Page(s) 407

Abstract: Background: Ontologies are invaluable in the life sciences, but building and maintaining ontologies often requires a challenging number of distinct tasks such as running automated reasoners and quality control checks, extracting dependencies and ... ...

Abstract	Background: Ontologies are invaluable in the life sciences, but building and maintaining ontologies often requires a challenging number of distinct tasks such as running automated reasoners and quality control checks, extracting dependencies and application-specific subsets, generating standard reports, and generating release files in multiple formats. Similar to more general software development, automation is the key to executing and managing these tasks effectively and to releasing more robust products in standard forms. For ontologies using the Web Ontology Language (OWL), the OWL API Java library is the foundation for a range of software tools, including the Protégé ontology editor. In the Open Biological and Biomedical Ontologies (OBO) community, we recognized the need to package a wide range of low-level OWL API functionality into a library of common higher-level operations and to make those operations available as a command-line tool. Results: ROBOT (a recursive acronym for "ROBOT is an OBO Tool") is an open source library and command-line tool for automating ontology development tasks. The library can be called from any programming language that runs on the Java Virtual Machine (JVM). Most usage is through the command-line tool, which runs on macOS, Linux, and Windows. ROBOT provides ontology processing commands for a variety of tasks, including commands for converting formats, running a reasoner, creating import modules, running reports, and various other tasks. These commands can be combined into larger workflows using a separate task execution system such as GNU Make, and workflows can be automatically executed within continuous integration systems. Conclusions: ROBOT supports automation of a wide range of ontology development tasks, focusing on OBO conventions. It packages common high-level ontology development functionality into a convenient library, and makes it easy to configure, combine, and execute individual tasks in comprehensive, automated workflows. This helps ontology developers to efficiently create, maintain, and release high-quality ontologies, so that they can spend more time focusing on development tasks. It also helps guarantee that released ontologies are free of certain types of logical errors and conform to standard quality control checks, increasing the overall robustness and efficiency of the ontology development lifecycle.
MeSH term(s)	Biological Ontologies ; Disease ; Humans ; Programming Languages ; Software ; Workflow
Language	English
Publishing date	2019-07-29
Publishing country	England
Document type	Journal Article
ZDB-ID	2041484-5
ISSN	1471-2105 ; 1471-2105
ISSN (online)	1471-2105
ISSN	1471-2105
DOI	10.1186/s12859-019-3002-3
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

To top

More links

Kategorien

Order via subito

Inter-library loan at ZB MED

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

More links

Kategorien

Order via subito

More links

Kategorien

Order via subito

More links

Kategorien

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

Order via subito