LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 4 of total 4

Search options

  1. Article ; Online: Publisher Correction: Exploiting redundancy in large materials datasets for efficient machine learning with less data.

    Li, Kangming / Persaud, Daniel / Choudhary, Kamal / DeCost, Brian / Greenwood, Michael / Hattrick-Simpers, Jason

    Nature communications

    2024  Volume 15, Issue 1, Page(s) 284

    Language English
    Publishing date 2024-01-04
    Publishing country England
    Document type Published Erratum
    ZDB-ID 2553671-0
    ISSN 2041-1723 ; 2041-1723
    ISSN (online) 2041-1723
    ISSN 2041-1723
    DOI 10.1038/s41467-023-44462-x
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Book ; Online: Reproducibility in Computational Materials Science

    Persaud, Daniel / Ward, Logan / Hattrick-Simpers, Jason

    Lessons from 'A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials'

    2023  

    Abstract: The integration of machine learning techniques in materials discovery has become prominent in materials science research and has been accompanied by an increasing trend towards open-source data and tools to propel the field. Despite the increasing ... ...

    Abstract The integration of machine learning techniques in materials discovery has become prominent in materials science research and has been accompanied by an increasing trend towards open-source data and tools to propel the field. Despite the increasing usefulness and capabilities of these tools, developers neglecting to follow reproducible practices creates a significant barrier for researchers looking to use or build upon their work. In this study, we investigate the challenges encountered while attempting to reproduce a section of the results presented in "A general-purpose machine learning framework for predicting properties of inorganic materials." Our analysis identifies four major categories of challenges: (1) reporting computational dependencies, (2) recording and sharing version logs, (3) sequential code organization, and (4) clarifying code references within the manuscript. The result is a proposed set of tangible action items for those aiming to make code accessible to, and useful for the community.

    Comment: Main text: 15 pages, 1 table, 1 figure
    Keywords Condensed Matter - Materials Science
    Subject code 670
    Publishing date 2023-10-10
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: Exploiting redundancy in large materials datasets for efficient machine learning with less data.

    Li, Kangming / Persaud, Daniel / Choudhary, Kamal / DeCost, Brian / Greenwood, Michael / Hattrick-Simpers, Jason

    Nature communications

    2023  Volume 14, Issue 1, Page(s) 7283

    Abstract: Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to ...

    Abstract Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95% of data can be safely removed from machine learning training with little impact on in-distribution prediction performance. The redundant data is related to over-represented material types and does not mitigate the severe performance degradation on out-of-distribution samples. In addition, we show that uncertainty-based active learning algorithms can construct much smaller but equally informative datasets. We discuss the effectiveness of informative data in improving prediction performance and robustness and provide insights into efficient data acquisition and machine learning training. This work challenges the "bigger is better" mentality and calls for attention to the information richness of materials data rather than a narrow emphasis on data volume.
    Language English
    Publishing date 2023-11-10
    Publishing country England
    Document type Journal Article
    ZDB-ID 2553671-0
    ISSN 2041-1723 ; 2041-1723
    ISSN (online) 2041-1723
    ISSN 2041-1723
    DOI 10.1038/s41467-023-42992-y
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: On the redundancy in large material datasets

    Li, Kangming / Persaud, Daniel / Choudhary, Kamal / DeCost, Brian / Greenwood, Michael / Hattrick-Simpers, Jason

    efficient and robust learning with less data

    2023  

    Abstract: Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to ...

    Abstract Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95 % of data can be safely removed from machine learning training with little impact on in-distribution prediction performance. The redundant data is related to over-represented material types and does not mitigate the severe performance degradation on out-of-distribution samples. In addition, we show that uncertainty-based active learning algorithms can construct much smaller but equally informative datasets. We discuss the effectiveness of informative data in improving prediction performance and robustness and provide insights into efficient data acquisition and machine learning training. This work challenges the "bigger is better" mentality and calls for attention to the information richness of materials data rather than a narrow emphasis on data volume.

    Comment: Main text: 9 pages, 2 tables, 6 figures. Supplemental information: 31 pages, 1 table, 24 figures
    Keywords Condensed Matter - Materials Science
    Subject code 006
    Publishing date 2023-04-25
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top