LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 3 of total 3

Search options

  1. Book ; Online: Efficient Multi-site Data Movement Using Constraint Programming for Data Hungry Science

    Zerola, Michal / Lauret, Jérôme / Barták, Roman / Šumbera, Michal

    2009  

    Abstract: For the past decade, HENP experiments have been heading towards a distributed computing model in an effort to concurrently process tasks over enormous data sets that have been increasing in size as a function of time. In order to optimize all available ... ...

    Abstract For the past decade, HENP experiments have been heading towards a distributed computing model in an effort to concurrently process tasks over enormous data sets that have been increasing in size as a function of time. In order to optimize all available resources (geographically spread) and minimize the processing time, it is necessary to face also the question of efficient data transfers and placements. A key question is whether the time penalty for moving the data to the computational resources is worth the presumed gain. Onward to the truly distributed task scheduling we present the technique using a Constraint Programming (CP) approach. The CP technique schedules data transfers from multiple resources considering all available paths of diverse characteristic (capacity, sharing and storage) having minimum user's waiting time as an objective. We introduce a model for planning data transfers to a single destination (data transfer) as well as its extension for an optimal data set spreading strategy (data placement). Several enhancements for a solver of the CP model will be shown, leading to a faster schedule computation time using symmetry breaking, branch cutting, well studied principles from job-shop scheduling field and several heuristics. Finally, we will present the design and implementation of a corner-stone application aimed at moving datasets according to the schedule. Results will include comparison of performance and trade-off between CP techniques and a Peer-2-Peer model from simulation framework as well as the real case scenario taken from a practical usage of a CP scheduler.

    Comment: To appear in proceedings of Computing in High Energy and Nuclear Physics 2009
    Keywords Computer Science - Distributed ; Parallel ; and Cluster Computing
    Subject code 000
    Publishing date 2009-06-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Using constraint programming to resolve the multi-source/multi-site data movement paradigm on the Grid

    Zerola, Michal / Lauret, Jerome / Bartak, Roman / Sumbera, Michal

    2008  

    Abstract: In order to achieve both fast and coordinated data transfer to collaborative sites as well as to create a distribution of data over multiple sites, efficient data movement is one of the most essential aspects in distributed environment. With such ... ...

    Abstract In order to achieve both fast and coordinated data transfer to collaborative sites as well as to create a distribution of data over multiple sites, efficient data movement is one of the most essential aspects in distributed environment. With such capabilities at hand, truly distributed task scheduling with minimal latencies would be reachable by internationally distributed collaborations (such as ones in HENP) seeking for scavenging or maximizing on geographically spread computational resources. But it is often not all clear (a) how to move data when available from multiple sources or (b) how to move data to multiple compute resources to achieve an optimal usage of available resources. We present a method of creating a Constraint Programming (CP) model consisting of sites, links and their attributes such as bandwidth for grid network data transfer also considering user tasks as part of the objective function for an optimal solution. We will explore and explain trade-off between schedule generation time and divergence from the optimal solution and show how to improve and render viable the solution's finding time by using search tree time limit, approximations, restrictions such as symmetry breaking or grouping similar tasks together, or generating sequence of optimal schedules by splitting the input problem. Results of data transfer simulation for each case will also include a well known Peer-2-Peer model, and time taken to generate a schedule as well as time needed for a schedule execution will be compared to a CP optimal solution. We will additionally present a possible implementation aimed to bring a distributed datasets (multiple sources) to a given site in a minimal time.

    Comment: 10 pages; ACAT 2008 workshop
    Keywords Computer Science - Performance
    Subject code 006
    Publishing date 2008-12-31
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Setting up a STAR Tier 2 Site at Golias/Prague Farm

    Chaloupka, Petr / Jakl, Pavel / Kapitán, Jan / Lauret, Jérôme / Zerola, Michal

    2009  

    Abstract: High Energy Nuclear Physics (HENP) collaborations' experience show that the computing resources available at a single site are often neither sufficient nor satisfy the need of remote collaborators. From latencies in the network connectivity to the lack ... ...

    Abstract High Energy Nuclear Physics (HENP) collaborations' experience show that the computing resources available at a single site are often neither sufficient nor satisfy the need of remote collaborators. From latencies in the network connectivity to the lack of interactivity, work at distant computing centers is often inefficient. Having fully functional software stack on local resources is a strong enabler of science opportunities for any local group who can afford the time investment. Prague's heavy-ions group participating in STAR experiment at RHIC has been a strong advocate of local computing as the most efficient means of data processing and physics analyses. Tier 2 computing center was set up at a Regional Computing Center for Particle Physics called "Golias". We report on our experience in setting up a fully functional Tier 2 center and discuss the solutions chosen to address storage space and analysis issues and the impact on the farms overall functionality. This includes a locally built STAR analysis framework, integration with a local DPM system (a cost effective storage solution), the influence of the availability and quality of the network connection to Tier 0 via a dedicated CESNET/ESnet link and the development of light-weight yet fully automated data transfer tools allowing the movement of entire datasets from BNL (Tier 0) to Golias (Tier 2).

    Comment: To appear in proceedings of Computing in High Energy and Nuclear Physics 2009
    Keywords Physics - Computational Physics
    Subject code 306
    Publishing date 2009-06-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top