Economically Deploying Applications in Elastic Clouds
thesisposted on 2020-05-01, 00:00 authored by Abdullah I Alourani
Cloud computing provides key features of cloud platforms to enable customers to economically deploy their applications. First, customers can deploy their applications on a cloud infrastructure that provisions resources (e.g., memory) to these applications on as-needed basis. However, certain workloads can result in situations when customers pay for resources that are provisioned, but not fully used by their applications, and as a result, some performance characteristics of these applications are not met, i.e., the Cost-Utility Violations of Elasticity (CUVE). Second, customers can economically deploy their applications on cloud spot instances (i.e., virtual machines (VMs)) in cloud computing at much lower costs than that of other types of cloud instances. In exchange, spot instances are often exposed to revocations (i.e., terminations) by cloud providers; thus, when applications that run in spot instances are being irregularly terminated due to spot instance revocations, these applications might lose their states that lead to certain bugs, i.e., Bugs of cloud-based Applications resulting from Spot Instance Revocations (BASIR). Also, applications often employ different fault-tolerance mechanisms to minimize the lost work for each spot instance revocation. However, these fault-tolerance mechanisms incur additional overhead related to application completion time and deployment cost, i.e., the Deployment Cost And Time Overhead (DCATO). Unfortunately, cloud-based applications are not designed or tested to deal with CUVE, BASIR, and DCATO problems in the cloud environment, and as a result, the benefits of economically deploying applications in elastic clouds may be significantly reduced or even completely obliterated. In this thesis, we propose a novel model that reduces the impact of CUVE, BASIR, and DCATO problems in the cloud environment to economically deploy applications in elastic clouds, and this model leads to practical frameworks for optimizing cloud elasticity, improving the design of the shutdown process, and reducing the deployment cost and completion time for cloud-based applications. This ensures efficient cloud computing services that lead to greater economies of scale. In the first work, we develop a novel approach for Testing for Infractions of Cloud Elasticity (TICLE) that combines a search-based heuristic with rule-guided resource provisioning by stress testing the elastic resource provisioning for cloud-based applications to automatically discover irregular workloads that led to CUVE. We conduct our experiments with four nontrivial open-source applications in the Microsoft Azure cloud to determine how automatically and accurately TICLE explores a large search space of over 10^40 input combinations while discovering CUVEs. The results show that TICLE finds the first irregular workload faster, thus enabling stakeholders to investigate its impact sooner, and it finds more irregular workloads that lead to much higher costs and performance degradations for applications in the cloud compared to the random approach. In the second work, we implement a novel approach for Testing for Bugs of Cloud-Based Applications Resulting from Spot Instance Revocations (T-BASIR) that uses kernel modules to automatically find BASIR and locate their causes in the source code. We evaluate T-BASIR using 10 popular open-source applications. Our results show that T-BASIR not only finds more instances and different types of BASIR (e.g., data loss) compared to the random approach, but it also locates the causes of BASIR to help developers improve the design of the shutdown process for cloud-based applications during the testing of these applications. In the third work, we develop a novel cloud market-based approach that leverages features of cloud spot markets for Provisioning Spot Instances WithOut employing Fault-Tolerance mechanisms (P-SIWOFT) to reduce the overhead related to application completion time and deployment cost (i.e., DCATO) and, as a result, reduces the deployment cost and completion time of applications. We evaluate P-SIWOFT in simulations and use Amazon spot instances that contain jobs in Docker containers and realistic price traces from EC2 markets. Our simulation results show that our approach reduces the deployment cost and completion time compared to approaches based on fault-tolerance mechanisms.