Typical mobile edge computing infrastructures have to contend with unreliable computing devices at their end-points. The limited resource capacities of mobile edge devices gives rise to frequent contentions, node overloads or failures. This is exacerbated by the strict deadlines of modern applications. To avoid failures, fault-tolerant approaches utilize preemptive migration to transfer active tasks across nodes and prevent nodes running at capacity. However, prior work struggles to dynamically adapt in settings with highly volatile workloads or even accurately detect and diagnose anomalies for optimal remediation. To meet the strict service level objectives of contemporary workloads, there is a need for dynamic fault-tolerant methods that can quickly adapt to changes in edge environments while having parsimonious remediation in the form of preemptive migration to avoid stressing the system network. This work proposes PreGAN, featuring a Generative Adversarial Network (GAN) based approach to predict contentions, pinpoint specific resource types with high chance of overload, and generate migration decisions to proactively avoid system downtime. PreGAN leverages coupled-simulations to train the GAN model at run-time and a few-shot fault classifier to update decisions of an underpinning scheduler. We also extend it to PreGAN+ that also periodically tunes the decision model using semi-supervised training and a Transformer based neural network for low tuning time, albeit with higher memory overheads. Experiments on a Raspberry-Pi based edge environment demonstrate that both models outperform state-of-the-art baselines in fault detection and diagnosis scores by up to 12.5% and 31.2% respectively. This also translates in improvements in Quality of Service against baseline approaches.
Clone repo.
git clone https://github.com/imperial-qore/PreGANPlus.git
cd PreGAN/
Install dependencies.
sudo apt -y update
python3 -m pip --upgrade pip
python3 -m pip install matplotlib scikit-learn
python3 -m pip install -r requirements.txt
python3 -m pip install torch==1.7.1+cpu torchvision==0.8.2+cpu -f https://download.pytorch.org/whl/torch_stable.html
export PATH=$PATH:~/.local/bin
Change line 117 in main.py
to use one of the implemented fault-tolerance techniques: PreGANPlusRecovery
, PreGANRecovery
, PCFTRecovery
, DFTMRecovery
, ECLBRecovery
or CMODLBRecovery
and run the code using the following command.
python3 main.py
Items | Contents |
---|---|
Pre-print | (coming soon) |
Video | https://youtu.be/Pp82aZu5dJw |
Contact | Shreshth Tuli (@shreshthtuli) |
Funding | Imperial President's scholarship |
## License
BSD-3-Clause.
Copyright (c) 2022, Shreshth Tuli.
All rights reserved.
See License file for more details.