Agritrop
Accueil

Cache-aware scheduling of scientific workflows in a multisite cloud

Heidsieck Gaëtan, De Oliveira Daniel, Pacitti Esther, Pradal Christophe, Tardieu François, Valduriez Patrick. 2021. Cache-aware scheduling of scientific workflows in a multisite cloud. Future Generation Computer Systems, 122 : 172-186.

Article de revue ; Article de recherche ; Article de revue à facteur d'impact
[img]
Prévisualisation
Version post-print - Anglais
Utilisation soumise à autorisation de l'auteur ou du Cirad.
FGCS_2021.pdf

Télécharger (3MB) | Prévisualisation
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
597996.pdf

Télécharger (1MB) | Demander une copie

Url - autres données associées : https://doi.org/10.5281/zenodo.1436634

Quartile : Q1, Sujet : COMPUTER SCIENCE, THEORY & METHODS

Résumé : Many scientific experiments today are performed using scientific workflows, which become more and more data-intensive. We consider the efficient execution of such workflows in a multisite cloud, leveraging heterogeneous resources available at multiple geo-distributed data centers. Since it is common for workflow users to reuse code or data from previous workflows, a promising approach for efficient workflow execution is to cache intermediate data in order to avoid re-executing entire workflows. However, caching intermediate data and scheduling workflows to exploit such caching in a multisite cloud is complex. In particular, workflow scheduling must be cache-aware, in order to decide whether reusing cache data or re-executing workflows entirely. In this paper, we propose a solution for cache-aware scheduling of scientific workflows in a multisite cloud. Our solution includes a distributed and parallel architecture and new algorithms for adaptive caching, cache site selection, and dynamic workflow scheduling. We implemented our solution in the OpenAlea workflow system, together with cache-aware distributed scheduling algorithms. Our experimental evaluation in a three-site cloud with a real application in plant phenotyping shows that our solution can yield major performance gains, reducing total time up to 42% with 60% of the same input data for each new execution.

Mots-clés Agrovoc : informatique, processus

Mots-clés complémentaires : cloud (informatique), workflow, cache distribué (informatique), calcul de données

Mots-clés libres : Cloud computing, Distributed caching, Scientific workflows, OpenAlea

Classification Agris : U10 - Informatique, mathématiques et statistiques

Champ stratégique Cirad : CTS 7 (2019-) - Hors champs stratégiques

Auteurs et affiliations

  • Heidsieck Gaëtan, INRIA (FRA) - auteur correspondant
  • De Oliveira Daniel, UFF (BRA)
  • Pacitti Esther, Université de Montpellier (FRA)
  • Pradal Christophe, CIRAD-BIOS-UMR AGAP (FRA) ORCID: 0000-0002-2555-761X
  • Tardieu François, INRAE (FRA)
  • Valduriez Patrick, INRIA (FRA)

Source : Cirad-Agritrop (https://agritrop.cirad.fr/597996/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-07-02 ]