LIVIVO - Das Suchportal für Lebenswissenschaften

switch to English language
Erweiterte Suche

Ihre letzten Suchen

  1. AU="Tumanov, Alexey"
  2. AU="Berns, Lauren"
  3. AU="Elena A. Deshevaya"
  4. AU=Zhang Ruijuan
  5. AU="Mueller, Luke"
  6. AU=Barzon Luisa
  7. AU="Karunakaran, Denuja"
  8. AU="Figueroa-Rivera, Ivonne M"
  9. AU="Blackburn, Fran"
  10. AU="Lee, Hee-Kyung"
  11. AU=Kinoshita J H
  12. AU="Hernesniemi, Juha"
  13. AU="Evans, Matthew L"
  14. AU=Payne Thomas
  15. AU="Brown, Dexter"

Suchergebnis

Treffer 1 - 10 von insgesamt 16

Suchoptionen

  1. Artikel ; Online: Investments in fixed assets in Russia

    Chistik Olga / Ovchinnikov Oleg / Volgin Andrey / Tumanov Alexey / Danilova Lyubov

    E3S Web of Conferences, Vol 389, p

    analysis and forecast

    2023  Band 09016

    Abstract: The relevance of the study is that investment activities associated with investments in fixed assets relate to the "system-forming" activity of the state and are articulated in a number of federal documents and national projects. As it is noted in the ... ...

    Abstract The relevance of the study is that investment activities associated with investments in fixed assets relate to the "system-forming" activity of the state and are articulated in a number of federal documents and national projects. As it is noted in the Decree of the President of Russia "On National Development Goals of the Russian Federation for the Period until 2030" the sustained economic growth is associated with a high level of investment activity. It is planned to increase capital investment in fixed assets by 2030 by at least 70 percent compared to 2020. The author's version of blocks of factors of investment in fixed assets is formed according to a content criterion. Hierarchical classification of Russian regions into qualitatively homogeneous groups according to factor indicators of regional promotion of investments in fixed assets by the method of cluster analysis based on interregional comparisons was performed. A federal approach to the analysis of trends of indicators of investments in fixed assets has been implemented and their forecasting for 2021-2022 has been carried out. The proposed analytical and methodological support for the executive authorities of the federal and regional levels serves as the basis for the development of the appropriate measures to ensure the conditions for growth of investments in fixed assets and sustainable economic growth.
    Schlagwörter Environmental sciences ; GE1-350
    Thema/Rubrik (Code) 330
    Sprache Englisch
    Erscheinungsdatum 2023-01-01T00:00:00Z
    Verlag EDP Sciences
    Dokumenttyp Artikel ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  2. Buch ; Online: SuperFed

    Khare, Alind / Agrawal, Animesh / Lee, Myungjin / Tumanov, Alexey

    Weight Shared Federated Learning

    2023  

    Abstract: Federated Learning (FL) is a well-established technique for privacy preserving distributed training. Much attention has been given to various aspects of FL training. A growing number of applications that consume FL-trained models, however, increasingly ... ...

    Abstract Federated Learning (FL) is a well-established technique for privacy preserving distributed training. Much attention has been given to various aspects of FL training. A growing number of applications that consume FL-trained models, however, increasingly operate under dynamically and unpredictably variable conditions, rendering a single model insufficient. We argue for training a global family of models cost efficiently in a federated fashion. Training them independently for different tradeoff points incurs $O(k)$ cost for any k architectures of interest, however. Straightforward applications of FL techniques to recent weight-shared training approaches is either infeasible or prohibitively expensive. We propose SuperFed - an architectural framework that incurs $O(1)$ cost to co-train a large family of models in a federated fashion by leveraging weight-shared learning. We achieve an order of magnitude cost savings on both communication and computation by proposing two novel training mechanisms: (a) distribution of weight-shared models to federated clients, (b) central aggregation of arbitrarily overlapping weight-shared model parameters. The combination of these mechanisms is shown to reach an order of magnitude (9.43x) reduction in computation and communication cost for training a $5*10^{18}$-sized family of models, compared to independently training as few as $k = 9$ DNNs without any accuracy loss.
    Schlagwörter Computer Science - Machine Learning ; Computer Science - Distributed ; Parallel ; and Cluster Computing
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-01-25
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  3. Buch ; Online: Signed Binarization

    Kuhar, Sachit / Jain, Yash / Tumanov, Alexey

    Unlocking Efficiency Through Repetition-Sparsity Trade-Off

    2023  

    Abstract: Efficient inference of Deep Neural Networks (DNNs) on resource-constrained edge devices is essential. Quantization and sparsity are key algorithmic techniques that translate to repetition and sparsity within tensors at the hardware-software interface. ... ...

    Abstract Efficient inference of Deep Neural Networks (DNNs) on resource-constrained edge devices is essential. Quantization and sparsity are key algorithmic techniques that translate to repetition and sparsity within tensors at the hardware-software interface. This paper introduces the concept of repetition-sparsity trade-off that helps explain computational efficiency during inference. We propose Signed Binarization, a unified co-design framework that synergistically integrates hardware-software systems, quantization functions, and representation learning techniques to address this trade-off. Our results demonstrate that Signed Binarization is more accurate than binarization with the same number of non-zero weights. Detailed analysis indicates that signed binarization generates a smaller distribution of effectual (non-zero) parameters nested within a larger distribution of total parameters, both of the same type, for a DNN block. Finally, our approach achieves a 26% speedup on real hardware, doubles energy efficiency, and reduces density by 2.8x compared to binary methods for ResNet 18, presenting an alternative solution for deploying efficient models in resource-limited environments.
    Schlagwörter Computer Science - Machine Learning ; Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-12-03
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  4. Buch ; Online: Signed Binary Weight Networks

    Kuhar, Sachit / Tumanov, Alexey / Hoffman, Judy

    2022  

    Abstract: Efficient inference of Deep Neural Networks (DNNs) is essential to making AI ubiquitous. Two important algorithmic techniques have shown promise for enabling efficient inference - sparsity and binarization. These techniques translate into weight sparsity ...

    Abstract Efficient inference of Deep Neural Networks (DNNs) is essential to making AI ubiquitous. Two important algorithmic techniques have shown promise for enabling efficient inference - sparsity and binarization. These techniques translate into weight sparsity and weight repetition at the hardware-software level enabling the deployment of DNNs with critically low power and latency requirements. We propose a new method called signed-binary networks to improve efficiency further (by exploiting both weight sparsity and weight repetition together) while maintaining similar accuracy. Our method achieves comparable accuracy on ImageNet and CIFAR10 datasets with binary and can lead to 69% sparsity. We observe real speedup when deploying these models on general-purpose devices and show that this high percentage of unstructured sparsity can lead to a further reduction in energy consumption on ASICs.

    Comment: it is being updated
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Distributed ; Parallel ; and Cluster Computing ; Computer Science - Performance
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2022-11-24
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  5. Buch ; Online: CompOFA

    Sahni, Manas / Varshini, Shreya / Khare, Alind / Tumanov, Alexey

    Compound Once-For-All Networks for Faster Multi-Platform Deployment

    2021  

    Abstract: The emergence of CNNs in mainstream deployment has necessitated methods to design and train efficient architectures tailored to maximize the accuracy under diverse hardware & latency constraints. To scale these resource-intensive tasks with an increasing ...

    Abstract The emergence of CNNs in mainstream deployment has necessitated methods to design and train efficient architectures tailored to maximize the accuracy under diverse hardware & latency constraints. To scale these resource-intensive tasks with an increasing number of deployment targets, Once-For-All (OFA) proposed an approach to jointly train several models at once with a constant training cost. However, this cost remains as high as 40-50 GPU days and also suffers from a combinatorial explosion of sub-optimal model configurations. We seek to reduce this search space -- and hence the training budget -- by constraining search to models close to the accuracy-latency Pareto frontier. We incorporate insights of compound relationships between model dimensions to build CompOFA, a design space smaller by several orders of magnitude. Through experiments on ImageNet, we demonstrate that even with simple heuristics we can achieve a 2x reduction in training time and 216x speedup in model search/extraction time compared to the state of the art, without loss of Pareto optimality! We also show that this smaller design space is dense enough to support equally accurate models for a similar diversity of hardware and latency targets, while also reducing the complexity of the training and subsequent extraction algorithms.

    Comment: Published as a conference paper at ICLR 2021
    Schlagwörter Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2021-04-26
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  6. Buch ; Online: UnfoldML

    Xu, Yanbo / Khare, Alind / Matlin, Glenn / Ramadoss, Monish / Kamaleswaran, Rishikesan / Zhang, Chao / Tumanov, Alexey

    Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification

    2022  

    Abstract: Machine Learning (ML) research has focused on maximizing the accuracy of predictive tasks. ML models, however, are increasingly more complex, resource intensive, and costlier to deploy in resource-constrained environments. These issues are exacerbated ... ...

    Abstract Machine Learning (ML) research has focused on maximizing the accuracy of predictive tasks. ML models, however, are increasingly more complex, resource intensive, and costlier to deploy in resource-constrained environments. These issues are exacerbated for prediction tasks with sequential classification on progressively transitioned stages with ''happens-before'' relation between them.We argue that it is possible to ''unfold'' a monolithic single multi-class classifier, typically trained for all stages using all data, into a series of single-stage classifiers. Each single-stage classifier can be cascaded gradually from cheaper to more expensive binary classifiers that are trained using only the necessary data modalities or features required for that stage. UnfoldML is a cost-aware and uncertainty-based dynamic 2D prediction pipeline for multi-stage classification that enables (1) navigation of the accuracy/cost tradeoff space, (2) reducing the spatio-temporal cost of inference by orders of magnitude, and (3) early prediction on proceeding stages. UnfoldML achieves orders of magnitude better cost in clinical settings, while detecting multi-stage disease development in real time. It achieves within 0.1% accuracy from the highest-performing multi-class baseline, while saving close to 20X on spatio-temporal cost of inference and earlier (3.5hrs) disease onset prediction. We also show that UnfoldML generalizes to image classification, where it can predict different level of labels (from coarse to fine) given different level of abstractions of a image, saving close to 5X cost with as little as 0.4% accuracy reduction.

    Comment: To be published in NeurIPS'22
    Schlagwörter Computer Science - Machine Learning
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2022-10-26
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  7. Buch ; Online: Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing

    Shirako, Jun / Hayashi, Akihiro / Paul, Sri Raj / Tumanov, Alexey / Sarkar, Vivek

    2022  

    Abstract: This paper introduces a novel approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. Our approach enables AOT source-to-source transformation of ... ...

    Abstract This paper introduces a novel approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. Our approach enables AOT source-to-source transformation of Python programs, driven by the inclusion of type hints for function parameters and return values. These hints can be supplied by the programmer or obtained by dynamic profiler tools; multi-version code generation guarantees the correctness of our AOT transformation in all cases. Our compilation framework performs automatic parallelization and sophisticated high-level code optimizations for the target distributed heterogeneous hardware platform. It includes extensions to the polyhedral framework that unify user-written loops and implicit loops present in matrix/tensor operators, as well as automated section of CPU vs. GPU code variants. Further, our polyhedral optimizations enable both intra-node and inter-node parallelism. Finally, the optimized output code is deployed using the Ray runtime for scheduling distributed tasks across multiple heterogeneous nodes in a cluster. Our empirical evaluation shows significant performance improvements relative to sequential Python in both single-node and multi-node experiments, with a performance improvement of over 20,000$\times$ when using 24 nodes and 144 GPUs in the OLCF Summit supercomputer for the Space-Time Adaptive Processing (STAP) radar application.

    Comment: 14 pages, 10 figures, under submission to Euro-Par 2022 conference (https://2022.euro-par.org)
    Schlagwörter Computer Science - Distributed ; Parallel ; and Cluster Computing
    Thema/Rubrik (Code) 005
    Erscheinungsdatum 2022-03-11
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  8. Buch ; Online: DynaQuant

    Agrawal, Amey / Reddy, Sameer / Bhattamishra, Satwik / Nookala, Venkata Prabhakara Sarath / Vashishth, Vidushi / Rong, Kexin / Tumanov, Alexey

    Compressing Deep Learning Training Checkpoints via Dynamic Quantization

    2023  

    Abstract: With the increase in the scale of Deep Learning (DL) training workloads in terms of compute resources and time consumption, the likelihood of encountering in-training failures rises substantially, leading to lost work and resource wastage. Such failures ... ...

    Abstract With the increase in the scale of Deep Learning (DL) training workloads in terms of compute resources and time consumption, the likelihood of encountering in-training failures rises substantially, leading to lost work and resource wastage. Such failures are typically offset by a checkpointing mechanism, which comes at the cost of storage and network bandwidth overhead. State-of-the-art approaches involve lossy model compression mechanisms, which induce a tradeoff between the resulting model quality (accuracy) and compression ratio. Delta compression is then used to further reduce the overhead by only storing the difference between consecutive checkpoints. We make a key enabling observation that the sensitivity of model weights to compression varies during training, and different weights benefit from different quantization levels (ranging from retaining full precision to pruning). We propose (1) a non-uniform quantization scheme that leverages this variation, (2) an efficient search mechanism that dynamically finds the best quantization configurations, and (3) a quantization-aware delta compression mechanism that rearranges weights to minimize checkpoint differences, thereby maximizing compression. We instantiate these contributions in DynaQuant - a framework for DL workload checkpoint compression. Our experiments show that DynaQuant consistently achieves a better tradeoff between accuracy and compression ratios compared to prior works, enabling a compression ratio up to 39x and withstanding up to 10 restores with negligible accuracy impact for fault-tolerant training. DynaQuant achieves at least an order of magnitude reduction in checkpoint storage overhead for training failure recovery as well as transfer learning use cases without any loss of accuracy.
    Schlagwörter Computer Science - Machine Learning
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-06-20
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  9. Buch ; Online: Subgraph Stationary Hardware-Software Inference Co-Design

    Behnam, Payman / Tong, Jianming / Khare, Alind / Chen, Yangyu / Pan, Yue / Gadikar, Pranav / Bambhaniya, Abhimanyu Rajeshkumar / Krishna, Tushar / Tumanov, Alexey

    2023  

    Abstract: A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems ... ...

    Abstract A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency-accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-exit models, mixed DNN precision, as well as ML inference accelerator designs that minimize latency and energy, while preserving delivered accuracy. All of them, however, yield improvements for a single static point in the latency-accuracy tradeoff space. We make a case for applications that operate in dynamically changing deployment scenarios, where no single static point is optimal. We draw on a recently proposed weight-shared SuperNet mechanism to enable serving a stream of queries that uses (activates) different SubNets within this weight-shared construct. This creates an opportunity to exploit the inherent temporal locality with our proposed SubGraph Stationary (SGS) optimization. We take a hardware-software co-design approach with a real implementation of SGS in SushiAccel and the implementation of a software scheduler SushiSched controlling which SubNets to serve and what to cache in real-time. Combined, they are vertically integrated into SUSHI-an inference serving stack. For the stream of queries, SUSHI yields up to 25% improvement in latency, 0.98% increase in served accuracy. SUSHI can achieve up to 78.7% off-chip energy savings.

    Comment: 16 pages; MLSYS 2023
    Schlagwörter Computer Science - Distributed ; Parallel ; and Cluster Computing ; Computer Science - Machine Learning
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-06-21
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  10. Buch ; Online: Pareto-Secure Machine Learning (PSML)

    Sanyal, Debopam / Hung, Jui-Tse / Agrawal, Manav / Jasti, Prahlad / Nikkhoo, Shahab / Jha, Somesh / Wang, Tianhao / Mohan, Sibin / Tumanov, Alexey

    Fingerprinting and Securing Inference Serving Systems

    2023  

    Abstract: With the emergence of large foundational models, model-serving systems are becoming popular. In such a system, users send the queries to the server and specify the desired performance metrics (e.g., accuracy, latency, etc.). The server maintains a set of ...

    Abstract With the emergence of large foundational models, model-serving systems are becoming popular. In such a system, users send the queries to the server and specify the desired performance metrics (e.g., accuracy, latency, etc.). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robustness against model extraction attacks, of such systems. Existing black-box attacks cannot be directly applied to extract a victim model, as models hide among the model zoo behind the inference serving interface, and attackers cannot identify which model is being used. An intermediate step is required to ensure that every input query gets the output from the victim model. To this end, we propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently. We show that by using our fingerprinting algorithm, model extraction can have fidelity and accuracy scores within $1\%$ of the scores obtained if attacking in a single-model setting and up to $14.6\%$ gain in accuracy and up to $7.7\%$ gain in fidelity compared to the naive attack. Finally, we counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics. Our defense strategy reduces the attack's accuracy and fidelity by up to $9.8\%$ and $4.8\%$, respectively (on medium-sized model extraction). We show that the proposed defense induces a fundamental trade-off between the level of protection and system goodput, achieving configurable and significant victim model extraction protection while maintaining acceptable goodput ($>80\%$). We provide anonymous access to our code.

    Comment: 17 pages, 9 figures
    Schlagwörter Computer Science - Cryptography and Security ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-07-03
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

Zum Seitenanfang