Skip to content

Sherry0121-AC/TGS-GPU-kernel

Repository files navigation

TGS

Haibin:

  1. 要安装CMake 和 make

  2. 安装 nvidia-container-toolkit

  3. 要给docker权限

sudo usermod -aG docker cc
newgrp docker
  1. python不要太新,我用3.10能跑通,最好用conda新建环境

  2. 小心脚本会删除所有其他的容器

框架解读

alt text

1. Introduction

This repository contains one version of the source code for our NSDI'23 paper "Transparent GPU Sharing in Container Clouds for Deep Learning Workloads" [Paper]

2. Environment requirement

Please see requirement.txt and paper for more details.

3. Prerequisites

Run the following commands:

sudo apt install patchelf wget unzip make
pip3 install -r requirement.txt
docker pull bingyangwu2000/tf_torch
docker pull bingyangwu2000/pytorch_with_unified_memory
docker pull bingyangwu2000/antman
docker pull bingyangwu2000/espnet2

4. Build

Run the following commands

git clone --recursive https://github.com/BingyangWu/TGS.git
cd TGS
make rpc
./download.sh
cd hijack
./build.sh

5. Model Configuration --> config/test_tgs.csv

1.如果要跑推理模型,在model name下,写 **_infer

2.如果要跑训练模型,在model name下,写 **_train

3.可以更改submit_time来更好的观察TGS对于低优先级的抑制效果

6. Run example

TGS:

./scripts/test_tgs.sh

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors