LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 10

Search options

  1. Book ; Online: Awkward to RDataFrame and back

    Osborne, Ianna / Pivarski, Jim

    2023  

    Abstract: Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and ... ...

    Abstract Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis. In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame source. This view is generated on demand and the data are not copied. The column readers are generated based on the run-time type of the views. The readers are passed to a generated source derived from ROOT::RDF::RDataSource. The ak.from_rdataframe function converts the selected columns as native Awkward Arrays. The details of the implementation exploiting JIT techniques are discussed. The examples of analysis of data stored in Awkward Arrays via a high-level interface of an RDataFrame are presented. A few examples of the column definition, applying user-defined filters written in C++, and plotting or extracting the columnar data as Awkward Arrays are shown. Current limitations and future plans are discussed.

    Comment: 5 pages, 3 figures
    Keywords High Energy Physics - Experiment ; Astrophysics - Instrumentation and Methods for Astrophysics ; Computer Science - Computation and Language ; Physics - Data Analysis ; Statistics and Probability
    Subject code 005
    Publishing date 2023-02-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Using a DSL to read ROOT TTrees faster in Uproot

    Roy, Aryan / Pivarski, Jim

    2023  

    Abstract: Uproot reads ROOT TTrees using pure Python. For numerical and (singly) jagged arrays, this is fast because a whole block of data can be interpreted as an array without modifying the data. For other cases, such as arrays of std::vector >, numerical data ... ...

    Abstract Uproot reads ROOT TTrees using pure Python. For numerical and (singly) jagged arrays, this is fast because a whole block of data can be interpreted as an array without modifying the data. For other cases, such as arrays of std::vector >, numerical data are interleaved with structure, and the only way to deserialize them is with a sequential algorithm. When written in Python, such algorithms are very slow. We solve this problem by writing the same logic in a language that can be executed quickly. AwkwardForth is a Domain Specific Language (DSL), based on Standard Forth with I/O extensions for making Awkward Arrays, and it can be interpreted as a fast virtual machine without requiring LLVM as a dependency. We generate code as late as possible to take advantage of optimization opportunities. All ROOT types previously implemented with Python have been converted to AwkwardForth. Double and triple-jagged arrays, for example, are 400x faster in AwkwardForth than in Python, with multithreaded scaling up to 1 second/GB because AwkwardForth releases the Python GIL. We also investigate the possibility of JIT-compiling the generated AwkwardForth code using LLVM to increase the performance gains. In this paper, we describe design aspects, performance studies, and future directions in accelerating Uproot with AwkwardForth.

    Comment: 6 pages, 3 figures; submitted to ACAT 2022 proceedings
    Keywords High Energy Physics - Experiment ; Computer Science - Performance
    Subject code 006
    Publishing date 2023-03-03
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: The Awkward World of Python and C++

    Goyal, Manasvi / Osborne, Ianna / Pivarski, Jim

    2023  

    Abstract: There are undeniable benefits of binding Python and C++ to take advantage of the best features of both languages. This is especially relevant to the HEP and other scientific communities that have invested heavily in the C++ frameworks and are rapidly ... ...

    Abstract There are undeniable benefits of binding Python and C++ to take advantage of the best features of both languages. This is especially relevant to the HEP and other scientific communities that have invested heavily in the C++ frameworks and are rapidly moving their data analyses to Python. Version 2 of Awkward Array, a Scikit-HEP Python library, introduces a set of header-only C++ libraries that do not depend on any application binary interface. Users can directly include these libraries in their compilation rather than linking against platform-specific libraries. This new development makes the integration of Awkward Arrays into other projects easier and more portable as the implementation is easily separable from the rest of the Awkward Array codebase. The code is minimal, it does not include all of the code needed to use Awkward Arrays in Python, nor does it include references to Python or pybind11. The C++ users can use it to make arrays and then copy them to Python without any specialized data types - only raw buffers, strings, and integers. This C++ code also simplifies the process of just-in-time (JIT) compilation in ROOT. This implementation approach solves some of the drawbacks, like packaging projects where native dependencies can be challenging. In this paper, we demonstrate the technique to integrate C++ and Python by using a header-only approach. We also describe the implementation of a new LayoutBuilder and a GrowableBuffer. Furthermore, examples of wrapping the C++ data into Awkward Arrays and exposing Awkward Arrays to C++ without copying them are discussed.

    Comment: 6 pages, 2 figures; submitted to ACAT 2022 proceedings
    Keywords Computer Science - Mathematical Software ; High Energy Physics - Experiment
    Subject code 020
    Publishing date 2023-03-03
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Awkward Just-In-Time (JIT) Compilation

    Osborne, Ianna / Pivarski, Jim / Ifrim, Ioana / Hollands, Angus / Schreiner, Henry

    A Developer's Experience

    2023  

    Abstract: Awkward Array is a library for performing NumPy-like computations on nested, variable-sized data, enabling array-oriented programming on arbitrary data structures in Python. However, imperative (procedural) solutions can sometimes be easier to write or ... ...

    Abstract Awkward Array is a library for performing NumPy-like computations on nested, variable-sized data, enabling array-oriented programming on arbitrary data structures in Python. However, imperative (procedural) solutions can sometimes be easier to write or faster to run. Performant imperative programming requires compilation; JIT-compilation makes it convenient to compile in an interactive Python environment. Various functions in Awkward Arrays JIT-compile a user's code into executable machine code. They use several different techniques, but reuse parts of each others' implementations. We discuss the techniques used to achieve the Awkward Arrays acceleration with JIT-compilation, focusing on RDataFrame, cppyy, and Numba, particularly Numba on GPUs: conversions of Awkward Arrays to and from RDataFrame; standalone cppyy; passing Awkward Arrays to and from Python functions compiled by Numba; passing Awkward Arrays to Python functions compiled for GPUs by Numba; and header-only libraries for populating Awkward Arrays from C++ without any Python dependencies.

    Comment: 7 pages
    Keywords Computer Science - Programming Languages
    Subject code 005
    Publishing date 2023-10-02
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: An array-oriented Python interface for FastJet

    Roy, Aryan / Pivarski, Jim / Freer, Chad Wells

    2022  

    Abstract: Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide ... ...

    Abstract Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide what to try next. Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slices, combinatorics as composable functions, etc. However, most C++ HEP libraries, such as FastJet, have an imperative, one-particle-at-a-time interface, which would be inefficient in Python and goes against the grain of the array-at-a-time logic of scientific Python. Therefore, we developed fastjet, a pip-installable Python package that provides FastJet C++ binaries, the classic (particle-at-a-time) Python interface, and the new array-oriented interface for use with Awkward Array. The new interface streamlines interoperability with scientific Python software beyond HEP, such as machine learning. In one case, adopting this library along with other array-oriented tools accelerated HEP analysis code by a factor of 20. It was designed to be easily integrated with libraries in the Scikit-HEP ecosystem, including Uproot (file I/O), hist (histogramming), Vector (Lorentz vectors), and Coffea (high-level glue). We discuss the design of the fastjet Python library, integrating the classic interface with the array oriented interface and with the Vector library for Lorentz vector operations. The new interface was developed as open source.

    Comment: 5 pages, 2 figures, submitted to ACAT 2021 proceedings
    Keywords High Energy Physics - Experiment ; Computer Science - Programming Languages ; Physics - Computational Physics ; Physics - Data Analysis ; Statistics and Probability
    Subject code 005 ; 020
    Publishing date 2022-02-08
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Awkward Arrays in Python, C++, and Numba

    Pivarski, Jim / Elmer, Peter / Lange, David

    2020  

    Abstract: The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array's first year that argue for a reimplementation in C++ and Numba. ...

    Abstract The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array's first year that argue for a reimplementation in C++ and Numba. We describe those issues, the new architecture, and present some examples of how the new interface will look to users. Of particular importance is the separation of kernel functions from data structure management, which allows a C++ implementation and a Numba implementation to share kernel functions, and the algorithm that transforms record-oriented data into columnar Awkward Arrays.

    Comment: To be published in CHEP 2019 proceedings, EPJ Web of Conferences; post-review update
    Keywords Computer Science - Mathematical Software ; High Energy Physics - Experiment
    Publishing date 2020-01-15
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: AwkwardForth

    Pivarski, Jim / Osborne, Ianna / Das, Pratyush / Lange, David / Elmer, Peter

    accelerating Uproot with an internal DSL

    2021  

    Abstract: File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both ... ...

    Abstract File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializing data into Awkward Arrays. As a language, it is not intended for humans to write, but it loosens the coupling between Uproot and Awkward Array. AwkwardForth programs for deserializing record-oriented formats (ROOT and Avro) are about as fast as C++ ROOT and 10-80$\times$ faster than fastavro. Columnar formats (simple TTrees, RNTuple, and Parquet) only require specialization to interpret metadata and are therefore faster with precompiled code.

    Comment: 11 pages, 2 figures, submitted to the 25th International Conference on Computing in High Energy & Nuclear Physics
    Keywords Computer Science - Programming Languages ; High Energy Physics - Experiment
    Subject code 005
    Publishing date 2021-02-24
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Coffea -- Columnar Object Framework For Effective Analysis

    Smith, Nicholas / Gray, Lindsey / Cremonesi, Matteo / Jayatilaka, Bo / Gutsche, Oliver / Hall, Allison / Pedro, Kevin / Acosta, Maria / Melo, Andrew / Belforte, Stefano / Pivarski, Jim

    2020  

    Abstract: The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the ... ...

    Abstract The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will discuss our experience in implementing analysis of CMS data using the coffea framework along with a discussion of the user experience and future directions.

    Comment: As presented at CHEP 2019
    Keywords Computer Science - Distributed ; Parallel ; and Cluster Computing ; High Energy Physics - Experiment
    Subject code 005
    Publishing date 2020-08-28
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Using Big Data Technologies for HEP Analysis

    Cremonesi, Matteo / Bellini, Claudio / Bian, Bianny / Canali, Luca / Dimakopoulos, Vasileios / Elmer, Peter / Fisk, Ian / Girone, Maria / Gutsche, Oliver / Hoh, Siew-Yan / Jayatilaka, Bo / Khristenko, Viktor / Luiselli, Andrea / Melo, Andrew / Evangelos, Evangelos / Olivito, Dominick / Pazzini, Jacopo / Pivarski, Jim / Svyatkovskiy, Alexey /
    Zanetti, Marco

    2019  

    Abstract: The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could ... ...

    Abstract The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches have been developed in industry to answer to the necessity to retrieve information as quickly as possible to analyze PB and EB datasets. Providing the scientists with these modern computing tools will lead to rethinking the principles of data analysis in HEP, making the overall scientific process faster and smoother. In this paper, we are presenting the latest developments and the most recent results on the usage of Apache Spark for HEP analysis. The study aims at evaluating the efficiency of the application of the new tools both quantitatively, by measuring the performances, and qualitatively, focusing on the user experience. The first goal is achieved by developing a data reduction facility: working together with CERN Openlab and Intel, CMS replicates a real physics search using Spark-based technologies, with the ambition of reducing 1 PB of public data in 5 hours, collected by the CMS experiment, to 1 TB of data in a format suitable for physics analysis. The second goal is achieved by implementing multiple physics use-cases in Apache Spark using as input preprocessed datasets derived from official CMS data and simulation. By performing different end-analyses up to the publication plots on different hardware, feasibility, usability and portability are compared to the ones of a traditional ROOT-based workflow.
    Keywords Computer Science - Distributed ; Parallel ; and Cluster Computing
    Subject code 020
    Publishing date 2019-01-21
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article ; Online: Inexpensive multi-patient respiratory monitoring system for helmet ventilation during COVID-19 pandemic

    Princeton Open Ventilation Monitor Collaboration / Bourrianne, Philippe / Chidzik, Stanley / Cohen, Daniel J / Elmer, Peter / Hallowell, Thomas / Kilbaugh, Todd J / Lange, David / Leifer, Andrew M / Marlow, Daniel R. / Meyers, Peter D. / Normand, Edna / Nunes, Janine / Oh, Myungchul / Page, Lyman / Pereira, Talmo / Pivarski, Jim / Schreiner, Henry / Stone, Howard A /
    Tank, David W / Thiberge, Stephan / Tully, Christopher

    medRxiv

    Abstract: Helmet non-invasive ventilation (NIV) is a form of continuous positive applied pressure that has emerged as a useful therapy for COVID-19 patients who require respiratory support but may not require invasive ventilation. Helmet NIV has seen an increase ... ...

    Abstract Helmet non-invasive ventilation (NIV) is a form of continuous positive applied pressure that has emerged as a useful therapy for COVID-19 patients who require respiratory support but may not require invasive ventilation. Helmet NIV has seen an increase in use during the COVID-19 pandemic because it is low-cost, readily available, and provides viral filters between the patient and clinician. Helmet NIV may also provide better patient outcomes by delaying or eliminating the need for invasive ventilation. Its widespread adoption has been limited, however, by the lack of a respiratory monitoring system that is needed to address known safety vulnerabilities and to provide clinicians with a respiratory profile of the patient. To address this safety need, we have developed an inexpensive respiratory monitoring system that is based on readily available commercial components and is suitable for rapid local manufacture. The system is designed for use in conjunction with the COVID-19 Helmet developed by Sea-Long Medical Systems, but is modular and can be used with other ventilation systems. The monitoring system comprises one or more flow and pressure sensors and a central remote station that can be used to remotely monitor up to 20 patients simultaneously. The system reports flow, pressure, and clinically relevant metrics including respiratory rate, tidal volume equivalent, peak inspiratory pressure (PIP), positive end-expiratory pressure (PEEP) and the ratio of inspiratory time to expiratory time (I:E). The device will sound alarms based on clinician-set thresholds. In bench tests using a commercial ventilator and artificial lung system, our device performs comparably to a commercial single-patient respiratory monitor. Results are presented from human-subject tests on a healthy volunteer undergoing helmet non-invasive ventilation. Detailed design and manufacturing documents are provided.
    Keywords covid19
    Language English
    Publishing date 2020-06-30
    Publisher Cold Spring Harbor Laboratory Press
    Document type Article ; Online
    DOI 10.1101/2020.06.29.20141283
    Database COVID19

    Kategorien

To top