Haibin:
-
要安装CMake 和 make
-
安装 nvidia-container-toolkit
-
要给docker权限
sudo usermod -aG docker cc
newgrp docker
-
python不要太新,我用3.10能跑通,最好用conda新建环境
-
小心脚本会删除所有其他的容器
This repository contains one version of the source code for our NSDI'23 paper "Transparent GPU Sharing in Container Clouds for Deep Learning Workloads" [Paper]
Please see requirement.txt and paper for more details.
Run the following commands:
sudo apt install patchelf wget unzip make
pip3 install -r requirement.txt
docker pull bingyangwu2000/tf_torch
docker pull bingyangwu2000/pytorch_with_unified_memory
docker pull bingyangwu2000/antman
docker pull bingyangwu2000/espnet2Run the following commands
git clone --recursive https://github.com/BingyangWu/TGS.git
cd TGS
make rpc
./download.sh
cd hijack
./build.sh1.如果要跑推理模型,在model name下,写 **_infer
2.如果要跑训练模型,在model name下,写 **_train
3.可以更改submit_time来更好的观察TGS对于低优先级的抑制效果
TGS:
./scripts/test_tgs.sh
