LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 58

Search options

  1. Article ; Online: Gene Set Summarization using Large Language Models.

    Joachimiak, Marcin P / Caufield, J Harry / Harris, Nomi L / Kim, Hyeongsik / Mungall, Christopher J

    ArXiv

    2023  

    Abstract: Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function ... ...

    Abstract Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.
    Language English
    Publishing date 2023-05-25
    Publishing country United States
    Document type Preprint
    ISSN 2331-8422
    ISSN (online) 2331-8422
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

    Caufield, J Harry / Hegde, Harshad / Emonet, Vincent / Harris, Nomi L / Joachimiak, Marcin P / Matentzoglu, Nicolas / Kim, HyeongSik / Moxon, Sierra / Reese, Justin T / Haendel, Melissa A / Robinson, Peter N / Mungall, Christopher J

    Bioinformatics (Oxford, England)

    2024  Volume 40, Issue 3

    Abstract: Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and ... ...

    Abstract Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas.
    Results: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM.
    Availability and implementation: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.
    MeSH term(s) Semantics ; Knowledge Bases ; Databases, Factual
    Language English
    Publishing date 2024-02-20
    Publishing country England
    Document type Journal Article
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btae104
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Book ; Online: Gene Set Summarization using Large Language Models

    Joachimiak, Marcin P. / Caufield, J. Harry / Harris, Nomi L. / Kim, Hyeongsik / Mungall, Christopher J.

    2023  

    Abstract: Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function ... ...

    Abstract Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.
    Keywords Quantitative Biology - Genomics ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Quantitative Biology - Quantitative Methods
    Subject code 004
    Publishing date 2023-05-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Article ; Online: BOSC 2022: the first hybrid and 23rd annual Bioinformatics Open Source Conference.

    Harris, Nomi L / Hokamp, Karsten / Ménager, Hervé / Munoz-Torres, Monica / Unni, Deepak / Vasilevsky, Nicole / Williams, Jason

    F1000Research

    2022  Volume 11, Page(s) 1034

    Abstract: ... The ... ...

    Abstract The 23
    MeSH term(s) Computational Biology ; Congresses as Topic ; Humans ; Systems Biology
    Language English
    Publishing date 2022-09-12
    Publishing country England
    Document type Editorial
    ZDB-ID 2699932-8
    ISSN 2046-1402 ; 2046-1402
    ISSN (online) 2046-1402
    ISSN 2046-1402
    DOI 10.12688/f1000research.125043.1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: BOSC 2023, the 24th annual Bioinformatics Open Source Conference.

    Harris, Nomi L / Fields, Christopher J / Hokamp, Karsten / Just, Jérémy / Khetani, Radhika / Maia, Jessica / Ménager, Hervé / Munoz-Torres, Monica C / Unni, Deepak / Williams, Jason

    F1000Research

    2023  Volume 12, Page(s) 1568

    Abstract: The 24th annual Bioinformatics Open Source Conference ( BOSC 2023) was part of the 2023i conference on Intelligent Systems for Molecular Biology and the European Conference on Computational Biology (ISMB/ECCB 2023). Launched in 2000 and held yearly since, ...

    Abstract The 24th annual Bioinformatics Open Source Conference ( BOSC 2023) was part of the 2023i conference on Intelligent Systems for Molecular Biology and the European Conference on Computational Biology (ISMB/ECCB 2023). Launched in 2000 and held yearly since, BOSC is the premier meeting covering open-source bioinformatics and open science. Like ISMB 2022, the 2023 meeting was a hybrid conference, with the in-person component hosted in Lyon, France. ISMB/ECCB attracted a near-record number of attendees, with over 2100 in person and about 900 more online. Approximately 200 people participated in BOSC sessions. In addition to 43 talks and 49 posters, BOSC featured two keynotes: Sara El-Gebali, who spoke about "A New Odyssey: Pioneering the Future of Scientific Progress Through Open Collaboration", and Joseph Yracheta, who spoke about "The Dissonance between Scientific Altruism & Capitalist Extraction: The Zero Trust and Federated Data Sovereignty Solution." Once again, a joint session brought together BOSC and the Bio-Ontologies COSI. The conference ended with a panel on Open and Ethical Data Sharing. As in prior years, BOSC was preceded by a CollaborationFest, a collaborative work event that brought together about 40 participants interested in synergistically combining ideas, shaping project plans, developing software, and more.
    MeSH term(s) Humans ; Computational Biology ; Software ; Information Dissemination
    Language English
    Publishing date 2023-12-07
    Publishing country England
    Document type Editorial
    ZDB-ID 2699932-8
    ISSN 2046-1402 ; 2046-1402
    ISSN (online) 2046-1402
    ISSN 2046-1402
    DOI 10.12688/f1000research.143015.1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: BOSC 2021, the 22nd Annual Bioinformatics Open Source Conference.

    Harris, Nomi L / Cock, Peter J A / Fields, Christopher J / Hokamp, Karsten / Maia, Jessica / Munoz-Torres, Monica / Sharan, Malvika / Williams, Jason

    F1000Research

    2021  Volume 10

    Abstract: The 22nd annual Bioinformatics Open Source Conference (BOSC 2021, open-bio.org/events/bosc-2021/) was held online as a track of the 2021 Intelligent Systems for Molecular Biology / European Conference on Computational Biology (ISMB/ECCB) conference. ... ...

    Abstract The 22nd annual Bioinformatics Open Source Conference (BOSC 2021, open-bio.org/events/bosc-2021/) was held online as a track of the 2021 Intelligent Systems for Molecular Biology / European Conference on Computational Biology (ISMB/ECCB) conference. Launched in 2000 and held every year since, BOSC is the premier meeting covering topics related to open source software and open science in bioinformatics. In 2020, BOSC partnered with the Galaxy Community Conference to form the Bioinformatics Community Conference (BCC2020); that was the first BOSC to be held online. This year, BOSC returned to its roots as part of ISMB/ECCB 2021. As in 2020, the Covid-19 pandemic made it impossible to hold the conference in person, so ISMB/ECCB 2021 took place as an online meeting attended by over 2000 people from 79 countries. Nearly 200 people participated in BOSC sessions, which included 27 talks reviewed and selected from submitted abstracts, and three invited keynote talks representing a range of global perspectives on the role of open science and open source in driving research and inclusivity in the biosciences, one of which was presented in French with English subtitles.
    MeSH term(s) Computational Biology ; Humans ; Pandemics ; Software
    Language English
    Publishing date 2021-10-18
    Publishing country England
    Document type Congress ; Editorial
    ZDB-ID 2699932-8
    ISSN 2046-1402 ; 2046-1402
    ISSN (online) 2046-1402
    ISSN 2046-1402
    DOI 10.12688/f1000research.74074.1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: The 21st annual Bioinformatics Open Source Conference (BOSC 2020, part of BCC2020).

    Harris, Nomi L / Cock, Peter J A / Fields, Christopher J / Hokamp, Karsten / Maia, Jessica / Munoz-Torres, Monica / Taschuk, Morgan / Yehudi, Yo

    F1000Research

    2020  Volume 9

    Abstract: Launched in 2000 and held every year since, the Bioinformatics Open Source Conference (BOSC) is a volunteer-run meeting coordinated by the Open Bioinformatics Foundation (OBF) that covers open source software development and open science in ... ...

    Abstract Launched in 2000 and held every year since, the Bioinformatics Open Source Conference (BOSC) is a volunteer-run meeting coordinated by the Open Bioinformatics Foundation (OBF) that covers open source software development and open science in bioinformatics. Most years, BOSC has been part of the Intelligent Systems for Molecular Biology (ISMB) conference, but in 2018, and again in 2020, BOSC partnered with the Galaxy Community Conference (GCC). This year's combined BOSC + GCC conference was called the Bioinformatics Community Conference (BCC2020, bcc2020.github.io). Originally slated to take place in Toronto, Canada, BCC2020 was moved online due to COVID-19. The meeting started with a wide array of training sessions; continued with a main program of keynote presentations, talks, posters, Birds of a Feather, and more; and ended with four days of collaboration (CoFest). Efforts to make the meeting accessible and inclusive included very low registration fees, talks presented twice a day, and closed captioning for all videos. More than 800 people from 61 countries registered for at least one part of the meeting, which was held mostly in the Remo.co video-conferencing platform.
    MeSH term(s) Canada ; Computational Biology ; Congresses as Topic ; Humans
    Keywords covid19
    Language English
    Publishing date 2020-09-21
    Publishing country England
    Document type Editorial
    ZDB-ID 2699932-8
    ISSN 2046-1402 ; 2046-1402
    ISSN (online) 2046-1402
    ISSN 2046-1402
    DOI 10.12688/f1000research.26498.1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Book ; Online: Structured prompt interrogation and recursive extraction of semantics (SPIRES)

    Caufield, J. Harry / Hegde, Harshad / Emonet, Vincent / Harris, Nomi L. / Joachimiak, Marcin P. / Matentzoglu, Nicolas / Kim, HyeongSik / Moxon, Sierra A. T. / Reese, Justin T. / Haendel, Melissa A. / Robinson, Peter N. / Mungall, Christopher J.

    A method for populating knowledge bases using zero-shot learning

    2023  

    Abstract: Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able ... ...

    Abstract Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.
    Keywords Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Subject code 004
    Publishing date 2023-04-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: MapperGPT

    Matentzoglu, Nicolas / Caufield, J. Harry / Hegde, Harshad B. / Reese, Justin T. / Moxon, Sierra / Kim, Hyeongsik / Harris, Nomi L. / Haendel, Melissa A / Mungall, Christopher J.

    Large Language Models for Linking and Mapping Entities

    2023  

    Abstract: Aligning terminological resources, including ontologies, controlled vocabularies, taxonomies, and value sets is a critical part of data integration in many domains such as healthcare, chemistry, and biomedical research. Entity mapping is the process of ... ...

    Abstract Aligning terminological resources, including ontologies, controlled vocabularies, taxonomies, and value sets is a critical part of data integration in many domains such as healthcare, chemistry, and biomedical research. Entity mapping is the process of determining correspondences between entities across these resources, such as gene identifiers, disease concepts, or chemical entity identifiers. Many tools have been developed to compute such mappings based on common structural features and lexical information such as labels and synonyms. Lexical approaches in particular often provide very high recall, but low precision, due to lexical ambiguity. As a consequence of this, mapping efforts often resort to a labor intensive manual mapping refinement through a human curator. Large Language Models (LLMs), such as the ones employed by ChatGPT, have generalizable abilities to perform a wide range of tasks, including question-answering and information extraction. Here we present MapperGPT, an approach that uses LLMs to review and refine mapping relationships as a post-processing step, in concert with existing high-recall methods that are based on lexical and structural heuristics. We evaluated MapperGPT on a series of alignment tasks from different domains, including anatomy, developmental biology, and renal diseases. We devised a collection of tasks that are designed to be particularly challenging for lexical methods. We show that when used in combination with high-recall methods, MapperGPT can provide a substantial improvement in accuracy, beating state-of-the-art (SOTA) methods such as LogMap.
    Keywords Computer Science - Computation and Language ; Computer Science - Artificial Intelligence
    Subject code 401
    Publishing date 2023-10-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article ; Online: ROBOT: A Tool for Automating Ontology Workflows.

    Jackson, Rebecca C / Balhoff, James P / Douglass, Eric / Harris, Nomi L / Mungall, Christopher J / Overton, James A

    BMC bioinformatics

    2019  Volume 20, Issue 1, Page(s) 407

    Abstract: Background: Ontologies are invaluable in the life sciences, but building and maintaining ontologies often requires a challenging number of distinct tasks such as running automated reasoners and quality control checks, extracting dependencies and ... ...

    Abstract Background: Ontologies are invaluable in the life sciences, but building and maintaining ontologies often requires a challenging number of distinct tasks such as running automated reasoners and quality control checks, extracting dependencies and application-specific subsets, generating standard reports, and generating release files in multiple formats. Similar to more general software development, automation is the key to executing and managing these tasks effectively and to releasing more robust products in standard forms. For ontologies using the Web Ontology Language (OWL), the OWL API Java library is the foundation for a range of software tools, including the Protégé ontology editor. In the Open Biological and Biomedical Ontologies (OBO) community, we recognized the need to package a wide range of low-level OWL API functionality into a library of common higher-level operations and to make those operations available as a command-line tool.
    Results: ROBOT (a recursive acronym for "ROBOT is an OBO Tool") is an open source library and command-line tool for automating ontology development tasks. The library can be called from any programming language that runs on the Java Virtual Machine (JVM). Most usage is through the command-line tool, which runs on macOS, Linux, and Windows. ROBOT provides ontology processing commands for a variety of tasks, including commands for converting formats, running a reasoner, creating import modules, running reports, and various other tasks. These commands can be combined into larger workflows using a separate task execution system such as GNU Make, and workflows can be automatically executed within continuous integration systems.
    Conclusions: ROBOT supports automation of a wide range of ontology development tasks, focusing on OBO conventions. It packages common high-level ontology development functionality into a convenient library, and makes it easy to configure, combine, and execute individual tasks in comprehensive, automated workflows. This helps ontology developers to efficiently create, maintain, and release high-quality ontologies, so that they can spend more time focusing on development tasks. It also helps guarantee that released ontologies are free of certain types of logical errors and conform to standard quality control checks, increasing the overall robustness and efficiency of the ontology development lifecycle.
    MeSH term(s) Biological Ontologies ; Disease ; Humans ; Programming Languages ; Software ; Workflow
    Language English
    Publishing date 2019-07-29
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-019-3002-3
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top