LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 172

Search options

  1. Article ; Online: Gene Set Summarization using Large Language Models.

    Joachimiak, Marcin P / Caufield, J Harry / Harris, Nomi L / Kim, Hyeongsik / Mungall, Christopher J

    ArXiv

    2023  

    Abstract: Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function ... ...

    Abstract Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.
    Language English
    Publishing date 2023-05-25
    Publishing country United States
    Document type Preprint
    ISSN 2331-8422
    ISSN (online) 2331-8422
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: BOSC 2022: the first hybrid and 23rd annual Bioinformatics Open Source Conference.

    Harris, Nomi L / Hokamp, Karsten / Ménager, Hervé / Munoz-Torres, Monica / Unni, Deepak / Vasilevsky, Nicole / Williams, Jason

    F1000Research

    2022  Volume 11, Page(s) 1034

    Abstract: ... The ... ...

    Abstract The 23
    MeSH term(s) Computational Biology ; Congresses as Topic ; Humans ; Systems Biology
    Language English
    Publishing date 2022-09-12
    Publishing country England
    Document type Editorial
    ZDB-ID 2699932-8
    ISSN 2046-1402 ; 2046-1402
    ISSN (online) 2046-1402
    ISSN 2046-1402
    DOI 10.12688/f1000research.125043.1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

    Caufield, J Harry / Hegde, Harshad / Emonet, Vincent / Harris, Nomi L / Joachimiak, Marcin P / Matentzoglu, Nicolas / Kim, HyeongSik / Moxon, Sierra / Reese, Justin T / Haendel, Melissa A / Robinson, Peter N / Mungall, Christopher J

    Bioinformatics (Oxford, England)

    2024  Volume 40, Issue 3

    Abstract: Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and ... ...

    Abstract Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas.
    Results: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM.
    Availability and implementation: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.
    MeSH term(s) Semantics ; Knowledge Bases ; Databases, Factual
    Language English
    Publishing date 2024-02-20
    Publishing country England
    Document type Journal Article
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btae104
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: Gene Set Summarization using Large Language Models

    Joachimiak, Marcin P. / Caufield, J. Harry / Harris, Nomi L. / Kim, Hyeongsik / Mungall, Christopher J.

    2023  

    Abstract: Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function ... ...

    Abstract Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.
    Keywords Quantitative Biology - Genomics ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Quantitative Biology - Quantitative Methods
    Subject code 004
    Publishing date 2023-05-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article ; Online: BOSC 2023, the 24th annual Bioinformatics Open Source Conference.

    Harris, Nomi L / Fields, Christopher J / Hokamp, Karsten / Just, Jérémy / Khetani, Radhika / Maia, Jessica / Ménager, Hervé / Munoz-Torres, Monica C / Unni, Deepak / Williams, Jason

    F1000Research

    2023  Volume 12, Page(s) 1568

    Abstract: The 24th annual Bioinformatics Open Source Conference ( BOSC 2023) was part of the 2023i conference on Intelligent Systems for Molecular Biology and the European Conference on Computational Biology (ISMB/ECCB 2023). Launched in 2000 and held yearly since, ...

    Abstract The 24th annual Bioinformatics Open Source Conference ( BOSC 2023) was part of the 2023i conference on Intelligent Systems for Molecular Biology and the European Conference on Computational Biology (ISMB/ECCB 2023). Launched in 2000 and held yearly since, BOSC is the premier meeting covering open-source bioinformatics and open science. Like ISMB 2022, the 2023 meeting was a hybrid conference, with the in-person component hosted in Lyon, France. ISMB/ECCB attracted a near-record number of attendees, with over 2100 in person and about 900 more online. Approximately 200 people participated in BOSC sessions. In addition to 43 talks and 49 posters, BOSC featured two keynotes: Sara El-Gebali, who spoke about "A New Odyssey: Pioneering the Future of Scientific Progress Through Open Collaboration", and Joseph Yracheta, who spoke about "The Dissonance between Scientific Altruism & Capitalist Extraction: The Zero Trust and Federated Data Sovereignty Solution." Once again, a joint session brought together BOSC and the Bio-Ontologies COSI. The conference ended with a panel on Open and Ethical Data Sharing. As in prior years, BOSC was preceded by a CollaborationFest, a collaborative work event that brought together about 40 participants interested in synergistically combining ideas, shaping project plans, developing software, and more.
    MeSH term(s) Humans ; Computational Biology ; Software ; Information Dissemination
    Language English
    Publishing date 2023-12-07
    Publishing country England
    Document type Editorial
    ZDB-ID 2699932-8
    ISSN 2046-1402 ; 2046-1402
    ISSN (online) 2046-1402
    ISSN 2046-1402
    DOI 10.12688/f1000research.143015.1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: BOSC 2021, the 22nd Annual Bioinformatics Open Source Conference.

    Harris, Nomi L / Cock, Peter J A / Fields, Christopher J / Hokamp, Karsten / Maia, Jessica / Munoz-Torres, Monica / Sharan, Malvika / Williams, Jason

    F1000Research

    2021  Volume 10

    Abstract: The 22nd annual Bioinformatics Open Source Conference (BOSC 2021, open-bio.org/events/bosc-2021/) was held online as a track of the 2021 Intelligent Systems for Molecular Biology / European Conference on Computational Biology (ISMB/ECCB) conference. ... ...

    Abstract The 22nd annual Bioinformatics Open Source Conference (BOSC 2021, open-bio.org/events/bosc-2021/) was held online as a track of the 2021 Intelligent Systems for Molecular Biology / European Conference on Computational Biology (ISMB/ECCB) conference. Launched in 2000 and held every year since, BOSC is the premier meeting covering topics related to open source software and open science in bioinformatics. In 2020, BOSC partnered with the Galaxy Community Conference to form the Bioinformatics Community Conference (BCC2020); that was the first BOSC to be held online. This year, BOSC returned to its roots as part of ISMB/ECCB 2021. As in 2020, the Covid-19 pandemic made it impossible to hold the conference in person, so ISMB/ECCB 2021 took place as an online meeting attended by over 2000 people from 79 countries. Nearly 200 people participated in BOSC sessions, which included 27 talks reviewed and selected from submitted abstracts, and three invited keynote talks representing a range of global perspectives on the role of open science and open source in driving research and inclusivity in the biosciences, one of which was presented in French with English subtitles.
    MeSH term(s) Computational Biology ; Humans ; Pandemics ; Software
    Language English
    Publishing date 2021-10-18
    Publishing country England
    Document type Congress ; Editorial
    ZDB-ID 2699932-8
    ISSN 2046-1402 ; 2046-1402
    ISSN (online) 2046-1402
    ISSN 2046-1402
    DOI 10.12688/f1000research.74074.1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Author Correction: Brain Data Standards - A method for building data-driven cell-type ontologies.

    Tan, Shawn Zheng Kai / Kir, Huseyin / Aevermann, Brian D / Gillespie, Tom / Harris, Nomi / Hawrylycz, Michael J / Jorstad, Nikolas L / Lein, Ed S / Matentzoglu, Nicolas / Miller, Jeremy A / Mollenkopf, Tyler S / Mungall, Christopher J / Ray, Patrick L / Sanchez, Raymond E A / Staats, Brian / Vermillion, Jim / Yadav, Ambika / Zhang, Yun / Scheuermann, Richard H /
    Osumi-Sutherland, David

    Scientific data

    2023  Volume 10, Issue 1, Page(s) 246

    Language English
    Publishing date 2023-04-28
    Publishing country England
    Document type Published Erratum
    ZDB-ID 2775191-0
    ISSN 2052-4463 ; 2052-4463
    ISSN (online) 2052-4463
    ISSN 2052-4463
    DOI 10.1038/s41597-023-02165-4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: The 21st annual Bioinformatics Open Source Conference (BOSC 2020, part of BCC2020).

    Harris, Nomi L / Cock, Peter J A / Fields, Christopher J / Hokamp, Karsten / Maia, Jessica / Munoz-Torres, Monica / Taschuk, Morgan / Yehudi, Yo

    F1000Research

    2020  Volume 9

    Abstract: Launched in 2000 and held every year since, the Bioinformatics Open Source Conference (BOSC) is a volunteer-run meeting coordinated by the Open Bioinformatics Foundation (OBF) that covers open source software development and open science in ... ...

    Abstract Launched in 2000 and held every year since, the Bioinformatics Open Source Conference (BOSC) is a volunteer-run meeting coordinated by the Open Bioinformatics Foundation (OBF) that covers open source software development and open science in bioinformatics. Most years, BOSC has been part of the Intelligent Systems for Molecular Biology (ISMB) conference, but in 2018, and again in 2020, BOSC partnered with the Galaxy Community Conference (GCC). This year's combined BOSC + GCC conference was called the Bioinformatics Community Conference (BCC2020, bcc2020.github.io). Originally slated to take place in Toronto, Canada, BCC2020 was moved online due to COVID-19. The meeting started with a wide array of training sessions; continued with a main program of keynote presentations, talks, posters, Birds of a Feather, and more; and ended with four days of collaboration (CoFest). Efforts to make the meeting accessible and inclusive included very low registration fees, talks presented twice a day, and closed captioning for all videos. More than 800 people from 61 countries registered for at least one part of the meeting, which was held mostly in the Remo.co video-conferencing platform.
    MeSH term(s) Canada ; Computational Biology ; Congresses as Topic ; Humans
    Keywords covid19
    Language English
    Publishing date 2020-09-21
    Publishing country England
    Document type Editorial
    ZDB-ID 2699932-8
    ISSN 2046-1402 ; 2046-1402
    ISSN (online) 2046-1402
    ISSN 2046-1402
    DOI 10.12688/f1000research.26498.1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Brain Data Standards - A method for building data-driven cell-type ontologies

    Shawn Zheng Kai Tan / Huseyin Kir / Brian D. Aevermann / Tom Gillespie / Nomi Harris / Michael J. Hawrylycz / Nikolas L. Jorstad / Ed S. Lein / Nicolas Matentzoglu / Jeremy A. Miller / Tyler S. Mollenkopf / Christopher J. Mungall / Patrick L. Ray / Raymond E. A. Sanchez / Brian Staats / Jim Vermillion / Ambika Yadav / Yun Zhang / Richard H. Scheuermann /
    David Osumi-Sutherland

    Scientific Data, Vol 10, Iss 1, Pp 1-

    2023  Volume 11

    Abstract: Abstract Large-scale single-cell ‘omics profiling is being used to define a complete catalogue of brain cell types, something that traditional methods struggle with due to the diversity and complexity of the brain. But this poses a problem: How do we ... ...

    Abstract Abstract Large-scale single-cell ‘omics profiling is being used to define a complete catalogue of brain cell types, something that traditional methods struggle with due to the diversity and complexity of the brain. But this poses a problem: How do we organise such a catalogue - providing a standard way to refer to the cell types discovered, linking their classification and properties to supporting data? Cell ontologies provide a partial solution to these problems, but no existing ontology schemas support the definition of cell types by direct reference to supporting data, classification of cell types using classifications derived directly from data, or links from cell types to marker sets along with confidence scores. Here we describe a generally applicable schema that solves these problems and its application in a semi-automated pipeline to build a data-linked extension to the Cell Ontology representing cell types in the Primary Motor Cortex of humans, mice and marmosets. The methods and resulting ontology are designed to be scalable and applicable to similar whole-brain atlases currently in preparation.
    Keywords Science ; Q
    Subject code 004
    Language English
    Publishing date 2023-01-01T00:00:00Z
    Publisher Nature Portfolio
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article ; Online: Brain Data Standards - A method for building data-driven cell-type ontologies.

    Tan, Shawn Zheng Kai / Kir, Huseyin / Aevermann, Brian D / Gillespie, Tom / Harris, Nomi / Hawrylycz, Michael J / Jorstad, Nikolas L / Lein, Ed S / Matentzoglu, Nicolas / Miller, Jeremy A / Mollenkopf, Tyler S / Mungall, Christopher J / Ray, Patrick L / Sanchez, Raymond E A / Staats, Brian / Vermillion, Jim / Yadav, Ambika / Zhang, Yun / Scheuermann, Richard H /
    Osumi-Sutherland, David

    Scientific data

    2023  Volume 10, Issue 1, Page(s) 50

    Abstract: Large-scale single-cell 'omics profiling is being used to define a complete catalogue of brain cell types, something that traditional methods struggle with due to the diversity and complexity of the brain. But this poses a problem: How do we organise ... ...

    Abstract Large-scale single-cell 'omics profiling is being used to define a complete catalogue of brain cell types, something that traditional methods struggle with due to the diversity and complexity of the brain. But this poses a problem: How do we organise such a catalogue - providing a standard way to refer to the cell types discovered, linking their classification and properties to supporting data? Cell ontologies provide a partial solution to these problems, but no existing ontology schemas support the definition of cell types by direct reference to supporting data, classification of cell types using classifications derived directly from data, or links from cell types to marker sets along with confidence scores. Here we describe a generally applicable schema that solves these problems and its application in a semi-automated pipeline to build a data-linked extension to the Cell Ontology representing cell types in the Primary Motor Cortex of humans, mice and marmosets. The methods and resulting ontology are designed to be scalable and applicable to similar whole-brain atlases currently in preparation.
    MeSH term(s) Animals ; Humans ; Mice ; Biological Ontologies ; Brain ; Callithrix ; Data Collection/standards
    Language English
    Publishing date 2023-01-24
    Publishing country England
    Document type Journal Article
    ZDB-ID 2775191-0
    ISSN 2052-4463 ; 2052-4463
    ISSN (online) 2052-4463
    ISSN 2052-4463
    DOI 10.1038/s41597-022-01886-2
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top