LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 90

Search options

  1. Article: Evaluation of SPARQL query generation from natural language questions.

    Cohen, K Bretonnel / Kim, Jin-Dong

    Proceedings of the conference. Association for Computational Linguistics. Meeting

    2020  Volume 2013, Page(s) 3–7

    Abstract: SPARQL queries have become the standard for querying linked open data knowledge bases, but SPARQL query construction can be challenging and time-consuming even for experts. SPARQL query generation from natural language questions is an attractive modality ...

    Abstract SPARQL queries have become the standard for querying linked open data knowledge bases, but SPARQL query construction can be challenging and time-consuming even for experts. SPARQL query generation from natural language questions is an attractive modality for interfacing with LOD. However, how to evaluate SPARQL query generation from natural language questions is a mostly open research question. This paper presents some issues that arise in SPARQL query generation from natural language, a test suite for evaluating performance with respect to these issues, and a case study in evaluating a system for SPARQL query generation from natural language questions.
    Language English
    Publishing date 2020-09-06
    Publishing country United States
    Document type Journal Article
    ISSN 0736-587X
    ISSN 0736-587X
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article: Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it.

    Cohen, K Bretonnel / Hunter, Lawrence E / Palmer, Martha

    Trustworthy eternal systems via evolving software, data and knowledge : second international workshop, EternalS 2012, Montpellier, France, August 28, 2012, revised selected papers. EternalS (Workshop) (2nd : 2012 : Montpellier, France)

    2021  Volume 379, Page(s) 77–90

    Abstract: Significant progress has been made in addressing the scientific challenges of biomedical text mining. However, the transition from a demonstration of scientific progress to the production of tools on which a broader community can rely requires that ... ...

    Abstract Significant progress has been made in addressing the scientific challenges of biomedical text mining. However, the transition from a demonstration of scientific progress to the production of tools on which a broader community can rely requires that fundamental software engineering requirements be addressed. In this paper we characterize the state of biomedical text mining software with respect to software testing and quality assurance. Biomedical natural language processing software was chosen because it frequently specifically claims to offer production-quality services, rather than just research prototypes. We examined twenty web sites offering a variety of text mining services. On each web site, we performed the most basic software test known to us and classified the results. Seven out of twenty web sites returned either bad results or the worst class of results in response to this simple test. We conclude that biomedical natural language processing tools require greater attention to software quality. We suggest a linguistically motivated approach to granular evaluation of natural language processing applications, and show how it can be used to detect performance errors of several systems and to predict overall performance on specific equivalence classes of inputs. We also assess the ability of linguistically-motivated test suites to provide good software testing, as compared to large corpora of naturally-occurring data. We measure code coverage and find that it is considerably higher when even small structured test suites are utilized than when large corpora are used.
    Language English
    Publishing date 2021-07-15
    Publishing country Germany
    Document type Journal Article
    DOI 10.1007/978-3-642-45260-4_6
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: P-Hacking Lexical Richness Through Definitions of "Type" and "Token".

    Cohen, K Bretonnel / Hunter, Lawrence E / Pressman, Peter S

    Studies in health technology and informatics

    2019  Volume 264, Page(s) 1433–1434

    Abstract: P-hacking" is the repeated analysis of data until a statistically significant result is achieved. We show that p-hacking can also occur during data generation, sometimes unintentionally. We use the type-token ratio to demonstrate that differences in the ...

    Abstract "P-hacking" is the repeated analysis of data until a statistically significant result is achieved. We show that p-hacking can also occur during data generation, sometimes unintentionally. We use the type-token ratio to demonstrate that differences in the definitions of "type" and "token" can produce significantly different results. Since these terms are rarely defined in the biomedical literature, the result is an inability to meaningfully interpret the body of literature that makes use of this measure.
    MeSH term(s) Computer Security ; Vocabulary
    Language English
    Publishing date 2019-08-20
    Publishing country Netherlands
    Document type Journal Article
    ISSN 1879-8365
    ISSN (online) 1879-8365
    DOI 10.3233/SHTI190470
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article: A Primer to the Structure, Content and Linkage of the FDA's Manufacturer and User Facility Device Experience (MAUDE) Files.

    Ensign, Lisa Garnsey / Cohen, K Bretonnel

    EGEMS (Washington, DC)

    2017  Volume 5, Issue 1, Page(s) 12

    Abstract: Introduction and background: The US Food and Drug Administration (FDA)'s Manufacturer and User Facility Device Experience (MAUDE) database is a publicly available resource providing over 4 million records relating to medical device safety. Using ... ...

    Abstract Introduction and background: The US Food and Drug Administration (FDA)'s Manufacturer and User Facility Device Experience (MAUDE) database is a publicly available resource providing over 4 million records relating to medical device safety. Using downloadable MAUDE files avoids limitations of the online MAUDE search interface. However, naive file usage can result in errors, while independent discovery of the nuances required to correctly work with the database can be time-consuming. Practical information is provided to shorten this learning curve and obtain accurate results when using the MAUDE database files.
    Maude file descriptions: The MAUDE database consists of 135 fields in four primary (Master Event, Device, Patient, Text) and two supplemental (Device Problems and Problem Code Descriptions) file types. When combined, these six files provide a detailed account of an adverse event or product problem report. Website instructions for joining the files are incomplete. Comprehensive details are provided to enable precise file linking.
    Lessons learned: MAUDE files have irregularities that must be understood to download and work with the data efficiently. Accurate results depend upon combining the files correctly and understanding the difference between report and event denominators. Appreciating data availability can facilitate successful MAUDE investigations.
    Conclusion: The MAUDE database can provide key insights about medical device safety. Detailed information is provided about the structure, content and interrelationships of the MAUDE database files to enable investigators to use this valuable resource more quickly and accurately.
    Language English
    Publishing date 2017-06-14
    Publishing country England
    Document type Journal Article
    ZDB-ID 2734659-6
    ISSN 2327-9214
    ISSN 2327-9214
    DOI 10.5334/egems.221
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article: MetaMap is a superior baseline to a standard document retrieval engine for the task of finding patient cohorts in clinical free text.

    Cohen, K Bretonnel / Christiansen, Tom / Hunter, Lawrence E

    The ... text REtrieval conference : TREC. Text REtrieval Conference

    2018  Volume 2011

    Abstract: The goal of this work was to establish a reasonable baseline for research in patient cohort retrieval from clinical free text. Much recent work has used Lucene for this purpose. Our approach was to use MetaMap alone. We found that although many TREC 2011 ...

    Abstract The goal of this work was to establish a reasonable baseline for research in patient cohort retrieval from clinical free text. Much recent work has used Lucene for this purpose. Our approach was to use MetaMap alone. We found that although many TREC 2011 Electronic Medical Records track participants found it difficult to beat a Lucene baseline, our MetaMap-based baseline did outperform a number of Lucene runs. We propose that MetaMap is a more valid baseline than Lucene, providing essential concept extraction, and that failure to make use of this industry-standard tool results in an unfairly low baseline for evaluation of system outputs.
    Language English
    Publishing date 2018-08-10
    Publishing country United States
    Document type Journal Article
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article: SuperCAT: The (New and Improved) Corpus Analysis Toolkit.

    Cohen, K Bretonnel / Baumgartner, William A / Temnikova, Irina

    LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources & Evaluation

    2018  Volume 2016, Page(s) 2784–2788

    Abstract: This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no ...

    Abstract This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure-that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure-roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.
    Language English
    Publishing date 2018-02-08
    Publishing country France
    Document type Journal Article
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article: Ontology translation: A case study on translating the Gene Ontology from English to German.

    Hailu, Negacy D / Cohen, K Bretonnel / Hunter, Lawrence E

    Natural language processing and information systems : ... International Conference on Applications of Natural Language to Information Systems, NLDB ... revised papers. International Conference on Applications of Natural Language to Info...

    2018  Volume 8455, Page(s) 33–38

    Abstract: For many researchers, the purpose of ontologies is sharing data. This sharing is facilitated when ontologies are available in multiple languages, but inhibited when an ontology is only available in a single language. Ontologies should be accessible to ... ...

    Abstract For many researchers, the purpose of ontologies is sharing data. This sharing is facilitated when ontologies are available in multiple languages, but inhibited when an ontology is only available in a single language. Ontologies should be accessible to people in multiple languages, since multilingualism is inevitable in any scientific work. Due to resource scarcity, most ontologies of the biomedical domain are available only in English at present. We present techniques to translate Gene Ontology terms from English to German using DBPedia, the Google Translate API for isolated terms, and the Google Translate API for terms in sentential context. Average fluency scores for the three methods were 4.0, 4.4, and 4.5, respectively. Average adequacy scores were 4.0, 4.9, and 4.9.
    Language English
    Publishing date 2018-02-01
    Publishing country Germany
    Document type Journal Article
    DOI 10.1007/978-3-319-07983-7_4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: A Primer to the Structure, Content and Linkage of the FDA’s Manufacturer and User Facility Device Experience (MAUDE) Files

    Lisa Garnsey Ensign / K. Bretonnel Cohen

    eGEMs, Vol 5, Iss

    2017  Volume 1

    Abstract: Introduction and Background: The US Food and Drug Administration (FDA)’s Manufacturer and User Facility Device Experience (MAUDE) database is a publicly available resource providing over 4 million records relating to medical device safety. Using ... ...

    Abstract Introduction and Background: The US Food and Drug Administration (FDA)’s Manufacturer and User Facility Device Experience (MAUDE) database is a publicly available resource providing over 4 million records relating to medical device safety. Using downloadable MAUDE files avoids limitations of the online MAUDE search interface. However, naïve file usage can result in errors, while independent discovery of the nuances required to correctly work with the database can be time-consuming. Practical information is provided to shorten this learning curve and obtain accurate results when using the MAUDE database files. MAUDE File Descriptions: The MAUDE database consists of 135 fields in four primary (Master Event, Device, Patient, Text) and two supplemental (Device Problems and Problem Code Descriptions) file types. When combined, these six files provide a detailed account of an adverse event or product problem report. Website instructions for joining the files are incomplete. Comprehensive details are provided to enable precise file linking. Lessons Learned: MAUDE files have irregularities that must be understood to download and work with the data efficiently. Accurate results depend upon combining the files correctly and understanding the difference between report and event denominators. Appreciating data availability can facilitate successful MAUDE investigations. Conclusion: The MAUDE database can provide key insights about medical device safety. Detailed information is provided about the structure, content and interrelationships of the MAUDE database files to enable investigators to use this valuable resource more quickly and accurately.
    Keywords Computer applications to medicine. Medical informatics ; R858-859.7
    Subject code 005
    Language English
    Publishing date 2017-06-01T00:00:00Z
    Publisher Ubiquity Press
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Article: Reproducibility in Natural Language Processing: A Case Study of Two R Libraries for Mining PubMed/MEDLINE.

    Cohen, K Bretonnel / Xia, Jingbo / Roeder, Christophe / Hunter, Lawrence E

    LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources & Evaluation

    2018  Volume 2016, Issue W23, Page(s) 6–12

    Abstract: There is currently a crisis in science related to highly publicized failures to reproduce large numbers of published studies. The current work proposes, by way of case studies, a methodology for moving the study of reproducibility in computational work ... ...

    Abstract There is currently a crisis in science related to highly publicized failures to reproduce large numbers of published studies. The current work proposes, by way of case studies, a methodology for moving the study of reproducibility in computational work to a full stage beyond that of earlier work. Specifically, it presents a case study in attempting to reproduce the reports of two R libraries for doing text mining of the PubMed/MEDLINE repository of scientific publications. The main findings are that a rational paradigm for reproduction of natural language processing papers can be established; the advertised functionality was difficult, but not impossible, to reproduce; and reproducibility studies can produce additional insights into the functioning of the published system. Additionally, the work on reproducibility lead to the production of novel user-centered documentation that has been accessed 260 times since its publication-an average of once a day per library.
    Language English
    Publishing date 2018-03-26
    Publishing country France
    Document type Journal Article
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: Open Agile text mining for bioinformatics: the PubAnnotation ecosystem.

    Kim, Jin-Dong / Wang, Yue / Fujiwara, Toyofumi / Okuda, Shujiro / Callahan, Tiffany J / Cohen, K Bretonnel

    Bioinformatics (Oxford, England)

    2019  Volume 35, Issue 21, Page(s) 4372–4380

    Abstract: Motivation: Most currently available text mining tools share two characteristics that make them less than optimal for use by biomedical researchers: they require extensive specialist skills in natural language processing and they were built on the ... ...

    Abstract Motivation: Most currently available text mining tools share two characteristics that make them less than optimal for use by biomedical researchers: they require extensive specialist skills in natural language processing and they were built on the assumption that they should optimize global performance metrics on representative datasets. This is a problem because most end-users are not natural language processing specialists and because biomedical researchers often care less about global metrics like F-measure or representative datasets than they do about more granular metrics such as precision and recall on their own specialized datasets. Thus, there are fundamental mismatches between the assumptions of much text mining work and the preferences of potential end-users.
    Results: This article introduces the concept of Agile text mining, and presents the PubAnnotation ecosystem as an example implementation. The system approaches the problems from two perspectives: it allows the reformulation of text mining by biomedical researchers from the task of assembling a complete system to the task of retrieving warehoused annotations, and it makes it possible to do very targeted customization of the pre-existing system to address specific end-user requirements. Two use cases are presented: assisted curation of the GlycoEpitope database, and assessing coverage in the literature of pre-eclampsia-associated genes.
    Availability and implementation: The three tools that make up the ecosystem, PubAnnotation, PubDictionaries and TextAE are publicly available as web services, and also as open source projects. The dictionaries and the annotation datasets associated with the use cases are all publicly available through PubDictionaries and PubAnnotation, respectively.
    MeSH term(s) Computational Biology ; Data Mining ; Ecosystem ; Female ; Humans ; Natural Language Processing ; Pregnancy ; PubMed
    Language English
    Publishing date 2019-04-01
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btz227
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top