LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 67

Search options

  1. Book ; Online: Demonstrating 100 Gbps in and out of the public Clouds

    Sfiligoi, Igor

    2020  

    Abstract: There is increased awareness and recognition that public Cloud providers do provide capabilities not found elsewhere, with elasticity being a major driver. The value of elastic scaling is however tightly coupled to the capabilities of the networks that ... ...

    Abstract There is increased awareness and recognition that public Cloud providers do provide capabilities not found elsewhere, with elasticity being a major driver. The value of elastic scaling is however tightly coupled to the capabilities of the networks that connect all involved resources, both in the public Clouds and at the various research institutions. This paper presents results of measurements involving file transfers inside public Cloud providers, fetching data from on-prem resources into public Cloud instances and fetching data from public Cloud storage into on-prem nodes. The networking of the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform, has been benchmarked. The on-prem nodes were managed by either the Pacific Research Platform or located at the University of Wisconsin - Madison. The observed sustained throughput was of the order of 100 Gbps in all the tests moving data in and out of the public Clouds and throughput reaching into the Tbps range for data movements inside the public Cloud providers themselves. All the tests used HTTP as the transfer protocol.

    Comment: 4 pages, 6 figures, 3 tables
    Keywords Computer Science - Performance ; Computer Science - Distributed ; Parallel ; and Cluster Computing
    Subject code 020
    Publishing date 2020-05-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Defining a canonical unit for accounting purposes

    Andrijauskas, Fabio / Sfiligoi, Igor / Würthwein, Frank

    2023  

    Abstract: Compute resource providers often put in place batch compute systems to maximize the utilization of such resources. However, compute nodes in such clusters, both physical and logical, contain several complementary resources, with notable examples being ... ...

    Abstract Compute resource providers often put in place batch compute systems to maximize the utilization of such resources. However, compute nodes in such clusters, both physical and logical, contain several complementary resources, with notable examples being CPUs, GPUs, memory and ephemeral storage. User jobs will typically require more than one such resource, resulting in co-scheduling trade-offs of partial nodes, especially in multi-user environments. When accounting for either user billing or scheduling overhead, it is thus important to consider all such resources together. We thus define the concept of a threshold-based "canonical unit" that combines several resource types into a single discrete unit and use it to characterize scheduling overhead and make resource billing more fair for both resource providers and users. Note that the exact definition of a canonical unit is not prescribed and may change between resource providers. Nevertheless, we provide a template and two example definitions that we consider appropriate in the context of the Open Science Grid.

    Comment: 6 pages, 2 figures, To be published in proceedings of PEARC23
    Keywords Computer Science - Distributed ; Parallel ; and Cluster Computing
    Publishing date 2023-05-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Optimization and Portability of a Fusion OpenACC-based FORTRAN HPC Code from NVIDIA to AMD GPUs

    Sfiligoi, Igor / Belli, Emily A. / Candy, Jeff / Budiardja, Reuben D.

    2023  

    Abstract: NVIDIA has been the main provider of GPU hardware in HPC systems for over a decade. Most applications that benefit from GPUs have thus been developed and optimized for the NVIDIA software stack. Recent exascale HPC systems are, however, introducing GPUs ... ...

    Abstract NVIDIA has been the main provider of GPU hardware in HPC systems for over a decade. Most applications that benefit from GPUs have thus been developed and optimized for the NVIDIA software stack. Recent exascale HPC systems are, however, introducing GPUs from other vendors, e.g. with the AMD GPU-based OLCF Frontier system just becoming available. AMD GPUs cannot be directly accessed using the NVIDIA software stack, and require a porting effort by the application developers. This paper provides an overview of our experience porting and optimizing the CGYRO code, a widely-used fusion simulation tool based on FORTRAN with OpenACC-based GPU acceleration. While the porting from the NVIDIA compilers was relatively straightforward using the CRAY compilers on the AMD systems, the performance optimization required more fine-tuning. In the optimization effort, we uncovered code sections that had performed well on NVIDIA GPUs, but were unexpectedly slow on AMD GPUs. After AMD-targeted code optimizations, performance on AMD GPUs has increased to meet our expectations. Modest speed improvements were also seen on NVIDIA GPUs, which was an unexpected benefit of this exercise.

    Comment: 6 pages, 4 figures, 2 tables, To be published in Proceedings of PEARC23
    Keywords Computer Science - Distributed ; Parallel ; and Cluster Computing ; Computer Science - Performance ; Physics - Plasma Physics
    Subject code 000
    Publishing date 2023-05-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Testing GitHub projects on custom resources using unprivileged Kubernetes runners

    Sfiligoi, Igor / McDonald, Daniel / Knight, Rob / Würthwein, Frank

    2023  

    Abstract: GitHub is a popular repository for hosting software projects, both due to ease of use and the seamless integration with its testing environment. Native GitHub Actions make it easy for software developers to validate new commits and have confidence that ... ...

    Abstract GitHub is a popular repository for hosting software projects, both due to ease of use and the seamless integration with its testing environment. Native GitHub Actions make it easy for software developers to validate new commits and have confidence that new code does not introduce major bugs. The freely available test environments are limited to only a few popular setups but can be extended with custom Action Runners. Our team had access to a Kubernetes cluster with GPU accelerators, so we explored the feasibility of automatically deploying GPU-providing runners there. All available Kubernetes-based setups, however, require cluster-admin level privileges. To address this problem, we developed a simple custom setup that operates in a completely unprivileged manner. In this paper we provide a summary description of the setup and our experience using it in the context of two Knight lab projects on the Prototype National Research Platform system.

    Comment: 5 pages, 1 figure, To be published in proceedings of PEARC23
    Keywords Computer Science - Software Engineering
    Publishing date 2023-05-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Auto-scaling HTCondor pools using Kubernetes compute resources

    Sfiligoi, Igor / DeFanti, Thomas / Würthwein, Frank

    2022  

    Abstract: HTCondor has been very successful in managing globally distributed, pleasantly parallel scientific workloads, especially as part of the Open Science Grid. HTCondor system design makes it ideal for integrating compute resources provisioned from anywhere, ... ...

    Abstract HTCondor has been very successful in managing globally distributed, pleasantly parallel scientific workloads, especially as part of the Open Science Grid. HTCondor system design makes it ideal for integrating compute resources provisioned from anywhere, but it has very limited native support for autonomously provisioning resources managed by other solutions. This work presents a solution that allows for autonomous, demand-driven provisioning of Kubernetes-managed resources. A high-level overview of the employed architectures is presented, paired with the description of the setups used in both on-prem and Cloud deployments in support of several Open Science Grid communities. The experience suggests that the described solution should be generally suitable for contributing Kubernetes-based resources to existing HTCondor pools.

    Comment: 6 pages, 3 figures, to be published in proceedings of PEARC22
    Keywords Computer Science - Distributed ; Parallel ; and Cluster Computing
    Subject code 000
    Publishing date 2022-05-02
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Article: Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase.

    Sfiligoi, Igor / Armstrong, George / Gonzalez, Antonio / McDonald, Daniel / Knight, Rob

    mSystems

    2022  Volume 7, Issue 3, Page(s) e0002822

    Abstract: UniFrac is an important tool in microbiome research that is used for phylogenetically comparing microbiome profiles to one another (beta diversity). Striped UniFrac recently added the ability to split the problem into many independent subproblems, ... ...

    Abstract UniFrac is an important tool in microbiome research that is used for phylogenetically comparing microbiome profiles to one another (beta diversity). Striped UniFrac recently added the ability to split the problem into many independent subproblems, exhibiting nearly linear scaling but suffering from memory contention. Here, we adapt UniFrac to graphics processing units using OpenACC, enabling greater than 1,000× computational improvement, and apply it to 307,237 samples, the largest 16S rRNA V4 uniformly preprocessed microbiome data set analyzed to date.
    Language English
    Publishing date 2022-05-31
    Publishing country United States
    Document type Journal Article
    ISSN 2379-5077
    ISSN 2379-5077
    DOI 10.1128/msystems.00028-22
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Book ; Online: Data intensive physics analysis in Azure cloud

    Sfiligoi, Igor / Würthwein, Frank / Davila, Diego

    2021  

    Abstract: The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is one of the largest data producers in the scientific world, with standard data products centrally produced, and then used by often competing teams within the collaboration. ... ...

    Abstract The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is one of the largest data producers in the scientific world, with standard data products centrally produced, and then used by often competing teams within the collaboration. This work is focused on how a local institution, University of California San Diego (UCSD), partnered with the Open Science Grid (OSG) to use Azure cloud resources to augment its available computing to accelerate time to results for multiple analyses pursued by a small group of collaborators. The OSG is a federated infrastructure allowing many independent resource providers to serve many independent user communities in a transparent manner. Historically the resources would come from various research institutions, spanning small universities to large HPC centers, based on either community needs or grant allocations, so adding commercial clouds as resource providers is a natural evolution. The OSG technology allows for easy integration of cloud resources, but the data-intensive nature of CMS compute jobs required the deployment of additional data caching infrastructure to ensure high efficiency.

    Comment: 11 pages, 5 figures, to be published in proceedings of ICOCBI 2021
    Keywords Computer Science - Distributed ; Parallel ; and Cluster Computing
    Publishing date 2021-10-25
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Enabling microbiome research on personal devices

    Sfiligoi, Igor / McDonald, Daniel / Knight, Rob

    2021  

    Abstract: Microbiome studies have recently transitioned from experimental designs with a few hundred samples to designs spanning tens of thousands of samples. Modern studies such as the Earth Microbiome Project (EMP) afford the statistics crucial for untangling ... ...

    Abstract Microbiome studies have recently transitioned from experimental designs with a few hundred samples to designs spanning tens of thousands of samples. Modern studies such as the Earth Microbiome Project (EMP) afford the statistics crucial for untangling the many factors that influence microbial community composition. Analyzing those data used to require access to a compute cluster, making it both expensive and inconvenient. We show that recent improvements in both hardware and software now allow to compute key bioinformatics tasks on EMP-sized data in minutes using a gaming-class laptop, enabling much faster and broader microbiome science insights.

    Comment: 2 pages, 4 figures, to be published in proceedings of eScience 2021
    Keywords Quantitative Biology - Genomics ; Computer Science - Performance ; Quantitative Biology - Quantitative Methods
    Publishing date 2021-07-08
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Comparing single-node and multi-node performance of an important fusion HPC code benchmark

    Belli, Emily A. / Candy, Jeff / Sfiligoi, Igor / Würthwein, Frank

    2022  

    Abstract: Fusion simulations have traditionally required the use of leadership scale High Performance Computing (HPC) resources in order to produce advances in physics. The impressive improvements in compute and memory capacity of many-GPU compute nodes are now ... ...

    Abstract Fusion simulations have traditionally required the use of leadership scale High Performance Computing (HPC) resources in order to produce advances in physics. The impressive improvements in compute and memory capacity of many-GPU compute nodes are now allowing for some problems that once required a multi-node setup to be also solvable on a single node. When possible, the increased interconnect bandwidth can result in order of magnitude higher science throughput, especially for communication-heavy applications. In this paper we analyze the performance of the fusion simulation tool CGYRO, an Eulerian gyrokinetic turbulence solver designed and optimized for collisional, electromagnetic, multiscale simulation, which is widely used in the fusion research community. Due to the nature of the problem, the application has to work on a large multi-dimensional computational mesh as a whole, requiring frequent exchange of large amounts of data between the compute processes. In particular, we show that the average-scale nl03 benchmark CGYRO simulation can be run at an acceptable speed on a single Google Cloud instance with 16 A100 GPUs, outperforming 8 NERSC Perlmutter Phase1 nodes, 16 ORNL Summit nodes and 256 NERSC Cori nodes. Moving from a multi-node to a single-node GPU setup we get comparable simulation times using less than half the number of GPUs. Larger benchmark problems, however, still require a multi-node HPC setup due to GPU memory capacity needs, since at the time of writing no vendor offers nodes with a sufficient GPU memory setup. The upcoming external NVSWITCH does however promise to deliver an almost equivalent solution for up to 256 NVIDIA GPUs.

    Comment: 6 pages, 1 table, 1 figure, to be published in proceedings of PEARC22
    Keywords Computer Science - Distributed ; Parallel ; and Cluster Computing ; Physics - Plasma Physics
    Subject code 000
    Publishing date 2022-05-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing

    Sfiligoi, I. / Schultz, D. / Riedel, B. / Wuerthwein, F. / Barnet, S. / Brik, V.

    2020  

    Abstract: Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be ...

    Abstract Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. The available capacity of cost-effective instances is not well understood. This paper presents expanding the IceCube's production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.

    Comment: 5 pages, 7 figures, to be published in proceedings of PEARC'20. arXiv admin note: text overlap with arXiv:2002.06667
    Keywords Computer Science - Performance ; Computer Science - Distributed ; Parallel ; and Cluster Computing
    Publishing date 2020-04-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top