LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 2 of total 2

Search options

  1. Article ; Online: Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams.

    AlQabbany, Abdulaziz O / Azmi, Aqil M

    Entropy (Basel, Switzerland)

    2021  Volume 23, Issue 7

    Abstract: We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data's underlying distribution, a significant ... ...

    Abstract We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data's underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances' continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms' efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.
    Language English
    Publishing date 2021-07-04
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2014734-X
    ISSN 1099-4300 ; 1099-4300
    ISSN (online) 1099-4300
    ISSN 1099-4300
    DOI 10.3390/e23070859
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

    Abdulaziz O. AlQabbany / Aqil M. Azmi

    Entropy, Vol 23, Iss 859, p

    2021  Volume 859

    Abstract: We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant ... ...

    Abstract We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></semantics></math> distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness ( <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ρ</mi></semantics></math> ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>λ</mi></semantics></math> of the Poisson distribution that yields the best value for <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ρ</mi></semantics></math> . By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different ...
    Keywords adaptive random forest ; data stream ; concept drift ; online learning ; resampling ; Poisson distribution ; Science ; Q ; Astrophysics ; QB460-466 ; Physics ; QC1-999
    Subject code 511
    Language English
    Publishing date 2021-07-01T00:00:00Z
    Publisher MDPI AG
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top