# Computation and Storage in the Cloud: Understanding the Trade-Offs (Elsevier Insights)

## Dong Yuan

Language: English

Pages: 128

ISBN: 0124077676

Format: PDF / Kindle (mobi) / ePub

Computation and Storage in the Cloud is the first comprehensive and systematic work investigating the issue of computation and storage trade-off in the cloud in order to reduce the overall application cost. Scientific applications are usually computation and data intensive, where complex computation tasks take a long time for execution and the generated datasets are often terabytes or petabytes in size. Storing valuable generated application datasets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific datasets is a big challenge for their storage. By proposing innovative concepts, theorems and algorithms, this book will help bring the cost down dramatically for both cloud users and service providers to run computation and data intensive scientific applications in the cloud.

- Covers cost models and benchmarking that explain the necessary tradeoffs for both cloud providers and users
- Describes several novel strategies for storing application datasets in the cloud
- Includes real-world case studies of scientific research applications

- Covers cost models and benchmarking that explain the necessary tradeoffs for both cloud providers and users
- Describes several novel strategies for storing application datasets in the cloud
- Includes real-world case studies of scientific research applications

regenerated once deleted. Hence, our research only focuses on generated data in the cloud where the system can automatically decide their storage status for achieving the best trade-off between computation and storage. In this book, we refer to generated data as data set(s). 4.2 Data Provenance and DDG Scientific applications have many computation and data-intensive tasks that generate many data sets of considerable size. There exist dependencies among these data sets, which depict the

this duration as a function of time t, which is ð Total Cost 5 t ! X CostRi Udt ð4:3Þ di ADDG We further define the storage strategy of a DDG as S, where S is a set of data sets in the DDG denoted as SDDDG, which means storing the data sets in S in the cloud and deleting the rest. We denote the sum of cost rates of storing the data sets recorded in a DDG with the storage strategy S as SCR (sum of cost rates), formally: SCR 5 X di ADDG ! ð4:4Þ CostRi S Based on the definition above,

discovered by the Dijkstra algorithm. In the process of calculating the weights of outblock and over-block edges, there are two new situations for finding the MCSS of the sub-branches. 1. The sub-branches may have more than one stored adjacent predecessor. For example, e , di, dj . in Figure 5.5 is an out-block edge of block B1, and also an in-block edge of block B2. In the algorithm, if edge e , di, dj . is found by the Dijkstra algorithm, we create a new CTT(e , di, dj.) from the current CTT,

branches, the MCSS of the DDG segment is determined by three variables, which are X1, V2, and V3. Hence the solution space of this DDG segment is a three-dimensional space where every MCSS occupies some space. Similar to the solution space of DDG_LS, we can find the border of two MCSSs, which is a partition plane in the three-dimensional solution space. For example, we assume that Sh,i,j and Sh0 ,i 0 ,j 0 are two adjacent MCSSs in the solution space, where SCRh,i,j , SCRh0 ,i0 ,j0 . The first and

distributed processing symposium. Miami (FL); 2008. [63] Muniswamy-Reddy K-K, Macko P, Seltzer M. Provenance for the cloud. In: Eighth USENIX conference on file and storage technology. San Jose (CA); 2010. pp. 197À210. [64] Odifreddi P. Classical recursion theory: the theory of functions and sets of natural numbers. Elsevier. Amsterdam, The Netherlands; 1992. pp. iiÀxi, 1À668. [65] Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, et al. Taverna: a tool for the composition and enactment