LIVIVO - Das Suchportal für Lebenswissenschaften

switch to English language
Erweiterte Suche

Suchergebnis

Treffer 1 - 4 von insgesamt 4

Suchoptionen

  1. Artikel ; Online: Publisher Correction: Exploiting redundancy in large materials datasets for efficient machine learning with less data.

    Li, Kangming / Persaud, Daniel / Choudhary, Kamal / DeCost, Brian / Greenwood, Michael / Hattrick-Simpers, Jason

    Nature communications

    2024  Band 15, Heft 1, Seite(n) 284

    Sprache Englisch
    Erscheinungsdatum 2024-01-04
    Erscheinungsland England
    Dokumenttyp Published Erratum
    ZDB-ID 2553671-0
    ISSN 2041-1723 ; 2041-1723
    ISSN (online) 2041-1723
    ISSN 2041-1723
    DOI 10.1038/s41467-023-44462-x
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  2. Buch ; Online: Reproducibility in Computational Materials Science

    Persaud, Daniel / Ward, Logan / Hattrick-Simpers, Jason

    Lessons from 'A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials'

    2023  

    Abstract: The integration of machine learning techniques in materials discovery has become prominent in materials science research and has been accompanied by an increasing trend towards open-source data and tools to propel the field. Despite the increasing ... ...

    Abstract The integration of machine learning techniques in materials discovery has become prominent in materials science research and has been accompanied by an increasing trend towards open-source data and tools to propel the field. Despite the increasing usefulness and capabilities of these tools, developers neglecting to follow reproducible practices creates a significant barrier for researchers looking to use or build upon their work. In this study, we investigate the challenges encountered while attempting to reproduce a section of the results presented in "A general-purpose machine learning framework for predicting properties of inorganic materials." Our analysis identifies four major categories of challenges: (1) reporting computational dependencies, (2) recording and sharing version logs, (3) sequential code organization, and (4) clarifying code references within the manuscript. The result is a proposed set of tangible action items for those aiming to make code accessible to, and useful for the community.

    Comment: Main text: 15 pages, 1 table, 1 figure
    Schlagwörter Condensed Matter - Materials Science
    Thema/Rubrik (Code) 670
    Erscheinungsdatum 2023-10-10
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  3. Artikel ; Online: Exploiting redundancy in large materials datasets for efficient machine learning with less data.

    Li, Kangming / Persaud, Daniel / Choudhary, Kamal / DeCost, Brian / Greenwood, Michael / Hattrick-Simpers, Jason

    Nature communications

    2023  Band 14, Heft 1, Seite(n) 7283

    Abstract: Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to ...

    Abstract Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95% of data can be safely removed from machine learning training with little impact on in-distribution prediction performance. The redundant data is related to over-represented material types and does not mitigate the severe performance degradation on out-of-distribution samples. In addition, we show that uncertainty-based active learning algorithms can construct much smaller but equally informative datasets. We discuss the effectiveness of informative data in improving prediction performance and robustness and provide insights into efficient data acquisition and machine learning training. This work challenges the "bigger is better" mentality and calls for attention to the information richness of materials data rather than a narrow emphasis on data volume.
    Sprache Englisch
    Erscheinungsdatum 2023-11-10
    Erscheinungsland England
    Dokumenttyp Journal Article
    ZDB-ID 2553671-0
    ISSN 2041-1723 ; 2041-1723
    ISSN (online) 2041-1723
    ISSN 2041-1723
    DOI 10.1038/s41467-023-42992-y
    Datenquelle MEDical Literature Analysis and Retrieval System OnLINE

    Zusatzmaterialien

    Kategorien

  4. Buch ; Online: On the redundancy in large material datasets

    Li, Kangming / Persaud, Daniel / Choudhary, Kamal / DeCost, Brian / Greenwood, Michael / Hattrick-Simpers, Jason

    efficient and robust learning with less data

    2023  

    Abstract: Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to ...

    Abstract Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95 % of data can be safely removed from machine learning training with little impact on in-distribution prediction performance. The redundant data is related to over-represented material types and does not mitigate the severe performance degradation on out-of-distribution samples. In addition, we show that uncertainty-based active learning algorithms can construct much smaller but equally informative datasets. We discuss the effectiveness of informative data in improving prediction performance and robustness and provide insights into efficient data acquisition and machine learning training. This work challenges the "bigger is better" mentality and calls for attention to the information richness of materials data rather than a narrow emphasis on data volume.

    Comment: Main text: 9 pages, 2 tables, 6 figures. Supplemental information: 31 pages, 1 table, 24 figures
    Schlagwörter Condensed Matter - Materials Science
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-04-25
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

Zum Seitenanfang