Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi model deployment #208

Draft
wants to merge 74 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
4eac006
Removing load balancing config
TosinSeg Jun 19, 2023
c68e999
Reformatting tests
TosinSeg Jun 20, 2023
5ce1a92
Fixed the formatting
TosinSeg Jun 20, 2023
fa10e19
Removed print statement
TosinSeg Jun 20, 2023
f9cbd74
Merging main
TosinSeg Jun 26, 2023
8970f4e
Removing unused import
TosinSeg Jun 26, 2023
517bea8
Fixing tests
TosinSeg Jun 26, 2023
58dd2b2
Fixing merge issue
TosinSeg Jun 26, 2023
bb0d551
Creating hostfile when one is not provided
TosinSeg Jun 26, 2023
e2bb9d5
Merge branch 'main' into Always_enable_load_balancing
TosinSeg Jun 26, 2023
3823534
Fixing import statements removed by merge
TosinSeg Jun 26, 2023
6f9b4ad
Removing load_balancing check
TosinSeg Jun 26, 2023
499b9ad
Removing redudant definitions
TosinSeg Jun 26, 2023
5419ef6
Removing hostfile from test
TosinSeg Jun 26, 2023
a70b6de
Removing hostfile from non-persistent test
TosinSeg Jun 26, 2023
eea658b
initial changes
TosinSeg Jun 27, 2023
20f0878
Merge branch 'main' into multi-model-deployment
TosinSeg Jun 27, 2023
c21c31b
Maintaining current behavior
TosinSeg Jun 28, 2023
f525329
Reading from score file
TosinSeg Jun 28, 2023
3c0937f
fixing syntax errors
TosinSeg Jun 28, 2023
156ac83
Fixing more syntax errors
TosinSeg Jun 28, 2023
38e270e
Fixing more syntax issues
TosinSeg Jun 29, 2023
4d4e0d8
initial lb changes
TosinSeg Jun 29, 2023
01c8e59
Merge branch 'main' into multi-model-deployment
TosinSeg Jun 29, 2023
f801b36
More load balancing changes
TosinSeg Jun 29, 2023
fd4e2ed
LB changes and syntax
TosinSeg Jun 30, 2023
0a3b7e5
Refactor client, and unpack request in load balancer
TosinSeg Jun 30, 2023
6523c04
First working queries
TosinSeg Jul 3, 2023
06b40f5
Fixing conversational and q&a args
TosinSeg Jul 3, 2023
96d0dcb
Updates to _allocate_processes and fixing example
TosinSeg Jul 5, 2023
ab41d24
Adding host map for allocating processes and formatting
TosinSeg Jul 5, 2023
8673a9a
Fixing terminate functionality
TosinSeg Jul 5, 2023
8d09b37
Refactored client
TosinSeg Jul 6, 2023
7a136d6
More Refactoring and q/a example
TosinSeg Jul 6, 2023
2c6ec08
Reformatting to maintain previous syntax
TosinSeg Jul 6, 2023
0cb88a9
Removing print/debug statements
TosinSeg Jul 6, 2023
7c0ee12
Fixing non-persistent deloyments
TosinSeg Jul 6, 2023
7a956d5
Refactoring Load balancer launch
TosinSeg Jul 7, 2023
f8cfe28
Fixing restful gateway client
TosinSeg Jul 10, 2023
079807d
Fixing replica issue
TosinSeg Jul 10, 2023
ea1e47e
Fixing non persistent client
TosinSeg Jul 10, 2023
98b6129
Adding trust_remote_code support (#203)
msinha251 Jul 11, 2023
daab5e6
Refactoring
TosinSeg Jul 12, 2023
84073f9
Update mii/models/score/generate.py
TosinSeg Jul 12, 2023
3ee3410
Merge branch 'multi-model-deployment' of github.com:TosinSeg/DeepSpee…
Jul 13, 2023
b4edc2b
Refactoring Load Balancer and request_proto
Jul 13, 2023
6346194
Formatting
Jul 13, 2023
94b6699
Fixing the client
Jul 14, 2023
710c20b
Initial partial deployment commit
Jul 21, 2023
c2636b7
More partial deploy updates
Jul 21, 2023
189e75c
Partial deploy started
Jul 21, 2023
adee843
fixing add deploy api queries
Jul 24, 2023
a145be5
Support for empty deployment 'group'
Jul 24, 2023
082c05e
Support for empty deployment 'group'
Jul 24, 2023
3ce77d2
Partial Termination
Jul 25, 2023
b40ecbd
Refactoring
Jul 25, 2023
72dd95c
formatting
Jul 25, 2023
a4e3d56
fixing bug for partial termination
Jul 25, 2023
4b5bb47
Removing comments
Jul 25, 2023
30d2b03
Including GPU index map in score file
Jul 26, 2023
c5d5996
Refactoring deployment
Jul 26, 2023
3ae1781
Refactoring and formatting
Jul 26, 2023
4b8f02f
Refactoring
Jul 28, 2023
c51ce37
Fixing Readme
Jul 28, 2023
43479db
Refactoring GRPC
Jul 28, 2023
e1b6d23
Fixing LB process not terminating
Jul 28, 2023
1675bd8
Adding multi_deployment and partial deploy/terminate unit tests
Jul 31, 2023
8684a61
Removing comments
Jul 31, 2023
56a7fce
Fixing spelling issues
Aug 1, 2023
fb70c3d
Update mii/client.py
TosinSeg Aug 1, 2023
e2cfe8a
Update mii/client.py
TosinSeg Aug 1, 2023
1312738
Removing AML from addDeploy
Aug 1, 2023
b0f0da4
Refactoring MIIConfig and DeploymentConfig
Aug 2, 2023
b78068e
Partial deploy/termination example
Aug 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mii/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

__version__ = "0.0.0"
non_persistent_models = {}
multi_model_deployments = {}
try:
from .version import __version__
except ImportError:
Expand Down
15 changes: 14 additions & 1 deletion mii/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ class Config:


class ReplicaConfig(BaseModel):
deployment_name: str = ""
hostname: str = ""
tensor_parallel_ports: List[int] = []
torch_dist_port: int = None
Expand All @@ -123,4 +124,16 @@ class LoadBalancerConfig(BaseModel):

class Config:
validate_all = True
validate_assignment = True
validate_assignment = True


class Deployment(BaseModel):
deployment_name: str
task: str
model: str
enable_deepspeed: bool = True
enable_zero: bool = True
GPU_index_map: dict = None
mii_config: dict = None
ds_config: dict = None
version: int = 1
2 changes: 1 addition & 1 deletion mii/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ class ModelProvider(enum.Enum):
DEPLOYMENT_NAME_KEY = 'deployment_name'
MODEL_PATH_KEY = 'model_path'
LOAD_BALANCER_CONFIG_KEY = 'load_balancer_config'

DEPLOYMENT_TAG_KEY = 'deployment_tag'
ENABLE_DEEPSPEED_KEY = 'ds_optimize'
ENABLE_DEEPSPEED_ZERO_KEY = 'ds_zero'
DEEPSPEED_CONFIG_KEY = 'ds_config'
Expand Down
86 changes: 47 additions & 39 deletions mii/deployment.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,17 @@
from .config import ReplicaConfig, LoadBalancerConfig


def deploy(task,
model,
deployment_name,
deployment_type=DeploymentType.LOCAL,
model_path=None,
def deploy(task=None,
model=None,
deployment_name=None,
enable_deepspeed=True,
enable_zero=False,
ds_config=None,
mii_config={},
version=1):
deployment_tag=None,
deployments=[],
deployment_type=DeploymentType.LOCAL,
model_path=None):
"""Deploy a task using specified model. For usage examples see:

mii/examples/local/text-generation-example.py
Expand Down Expand Up @@ -66,37 +67,49 @@ def deploy(task,
If deployment_type is `LOCAL`, returns just the name of the deployment that can be used to create a query handle using `mii.mii_query_handle(deployment_name)`

"""
if len(deployments == 0):
assert model is not None and task is not None and deployment_name is not None, "model, task, and deployment name must be set to deploy sigular model"
deployments = [Deployment(deployment_name, task, model, enable_deepspeed, enable_zero, None, mii_config, ds_config, version)]
deployment_tag = deployment_name + "_tag"
else:
assert deployment_tag is not None, "deployment_tag must be set to deploy multiple models"

mii.multi_model_deployments[deployment_tag] = deployments
ports = set()
# parse and validate mii config
mii_config = mii.config.MIIConfig(**mii_config)
if enable_zero:
if ds_config.get("fp16", {}).get("enabled", False):
assert (mii_config.dtype == torch.half), "MII Config Error: MII dtype and ZeRO dtype must match"
else:
assert (mii_config.dtype == torch.float), "MII Config Error: MII dtype and ZeRO dtype must match"
assert not (enable_deepspeed and enable_zero), "MII Config Error: DeepSpeed and ZeRO cannot both be enabled, select only one"
for deployment in deployments:
mii_config = mii.config.MIIConfig(**deployment.mii_config)
assert mii_config.port_number not in ports, f"duplicate port numbers not allowed - {mii.config.port_number}"
ports.add(mii_config.port_number)
if deployment.enable_zero:
if deployment.ds_config.get("fp16", {}).get("enabled", False):
assert (mii_config.dtype == torch.half), "MII Config Error: MII dtype and ZeRO dtype must match"
else:
assert (mii_config.dtype == torch.float), "MII Config Error: MII dtype and ZeRO dtype must match"
assert not (enable_deepspeed and enable_zero), "MII Config Error: DeepSpeed and ZeRO cannot both be enabled, select only one"

# aml only allows certain characters for deployment names
if deployment_type == DeploymentType.AML:
allowed_chars = set(string.ascii_lowercase + string.ascii_uppercase +
string.digits + '-')
assert set(deployment_name) <= allowed_chars, "AML deployment names can only contain a-z, A-Z, 0-9, and '-'"

task = mii.utils.get_task(task)
for deployment in deployments:
deployment.task = mii.utils.get_task(deployment.task)

if not mii_config.skip_model_check:
mii.utils.check_if_task_and_model_is_valid(task, model)
if enable_deepspeed:
mii.utils.check_if_task_and_model_is_supported(task, model)
if not mii_config.skip_model_check:
mii.utils.check_if_task_and_model_is_valid(deployment.task, deployment.model)
if enable_deepspeed:
mii.utils.check_if_task_and_model_is_supported(deployment.task, deployment.model)

if enable_deepspeed:
logger.info(
f"************* MII is using DeepSpeed Optimizations to accelerate your model *************"
)
else:
logger.info(
f"************* DeepSpeed Optimizations not enabled. Please use enable_deepspeed to get better performance *************"
)
if enable_deepspeed:
logger.info(
f"************* MII is using DeepSpeed Optimizations to accelerate your model: {deployment.model} *************"
)
else:
logger.info(
f"************* DeepSpeed Optimizations not enabled. Please use enable_deepspeed to get better performance for: {deployment.model} *************"
)

# In local deployments use default path if no model path set
if model_path is None and deployment_type == DeploymentType.LOCAL:
Expand Down Expand Up @@ -126,21 +139,16 @@ def deploy(task,
replica_configs=replica_configs)

if deployment_type != DeploymentType.NON_PERSISTENT:
create_score_file(deployment_name=deployment_name,
create_score_file(deployment_tag=deployment_tag,
deployments=deployments,
deployment_type=deployment_type,
task=task,
model_name=model,
ds_optimize=enable_deepspeed,
ds_zero=enable_zero,
ds_config=ds_config,
mii_config=mii_config,
model_path=model_path,
lb_config=lb_config)

if deployment_type == DeploymentType.AML:
_deploy_aml(deployment_name=deployment_name, model_name=model, version=version)
_deploy_aml(deployment_tag=deployment_tag, model_name=model, version=version)
elif deployment_type == DeploymentType.LOCAL:
return _deploy_local(deployment_name, model_path=model_path)
return _deploy_local(deployment_tag, model_path=model_path)
elif deployment_type == DeploymentType.NON_PERSISTENT:
assert int(os.getenv('WORLD_SIZE', '1')) == mii_config.tensor_parallel, "World Size does not equal number of tensors. When using non-persistent deployment type, please launch with `deepspeed --num_gpus <tensor_parallel>`"
provider = MODEL_PROVIDER_MAP[get_provider_name(model, task)]
Expand All @@ -157,14 +165,14 @@ def deploy(task,
raise Exception(f"Unknown deployment type: {deployment_type}")


def _deploy_local(deployment_name, model_path):
mii.utils.import_score_file(deployment_name).init()
def _deploy_local(deployment_tag, model_path):
mii.utils.import_score_file(deployment_tag).init()


def _deploy_aml(deployment_name, model_name, version):
def _deploy_aml(deployment_tag, model_name, version):
acr_name = mii.aml_related.utils.get_acr_name()
mii.aml_related.utils.generate_aml_scripts(acr_name=acr_name,
deployment_name=deployment_name,
deployment_name=deployment_tag,
model_name=model_name,
version=version)
print(
Expand Down
55 changes: 27 additions & 28 deletions mii/models/score/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,53 +9,52 @@
from mii.constants import DeploymentType


def create_score_file(deployment_name,
def create_score_file(deployment_tag,
deployment_type,
task,
model_name,
ds_optimize,
ds_zero,
ds_config,
mii_config,
deployments,
model_path,
lb_config):

config_dict = {}
config_dict[mii.constants.DEPLOYMENT_NAME_KEY] = deployment_name
config_dict[mii.constants.TASK_NAME_KEY] = mii.utils.get_task_name(task)
config_dict[mii.constants.MODEL_NAME_KEY] = model_name
config_dict[mii.constants.ENABLE_DEEPSPEED_KEY] = ds_optimize
config_dict[mii.constants.MII_CONFIGS_KEY] = mii_config.dict()
config_dict[mii.constants.ENABLE_DEEPSPEED_ZERO_KEY] = ds_zero
config_dict[mii.constants.DEEPSPEED_CONFIG_KEY] = ds_config
config_dict[mii.constants.MODEL_PATH_KEY] = model_path

if lb_config is not None:
config_dict[mii.constants.LOAD_BALANCER_CONFIG_KEY] = lb_config

if len(mii.__path__) > 1:
logger.warning(
f"Detected mii path as multiple sources: {mii.__path__}, might cause unknown behavior"
)
config_dict[mii.constants.DEPLOYMENT_TAG_KEY] = deployment_tag
for deployment in deployments:
config_dict[deployment.deployment_name] = {}
config_dict[deployment.deployment_name][mii.constants.DEPLOYMENT_NAME_KEY] = deployment_name
config_dict[deployment.deployment_name][mii.constants.TASK_NAME_KEY] = mii.utils.get_task_name(task)
config_dict[deployment.deployment_name][mii.constants.MODEL_NAME_KEY] = model_name
config_dict[deployment.deployment_name][mii.constants.ENABLE_DEEPSPEED_KEY] = ds_optimize
config_dict[deployment.deployment_name][mii.constants.MII_CONFIGS_KEY] = mii_config.dict()
config_dict[deployment.deployment_name][mii.constants.ENABLE_DEEPSPEED_ZERO_KEY] = ds_zero
config_dict[deployment.deployment_name][mii.constants.DEEPSPEED_CONFIG_KEY] = ds_config

if lb_config is not None:
config_dict[deployment.deployment_name][mii.constants.LOAD_BALANCER_CONFIG_KEY] = lb_config

if len(mii.__path__) > 1:
logger.warning(
f"Detected mii path as multiple sources: {mii.__path__}, might cause unknown behavior"
)

with open(os.path.join(mii.__path__[0],
"models/score/score_template.py"),
"r") as fd:
"models/score/score_template.py"),
"r") as fd:
score_src = fd.read()

# update score file w. global config dict
source_with_config = f"{score_src}\n"
source_with_config += f"configs = {pprint.pformat(config_dict, indent=4)}"

with open(generated_score_path(deployment_name, deployment_type), "w") as fd:
with open(generated_score_path(deployment_tag, deployment_type), "w") as fd:
fd.write(source_with_config)
fd.write("\n")


def generated_score_path(deployment_name, deployment_type):
def generated_score_path(deployment_tag, deployment_type):
if deployment_type == DeploymentType.LOCAL:
score_path = os.path.join(mii.utils.mii_cache_path(), deployment_name)
score_path = os.path.join(mii.utils.mii_cache_path(), deployment_tag)
elif deployment_type == DeploymentType.AML:
score_path = os.path.join(mii.aml_related.utils.aml_output_path(deployment_name),
score_path = os.path.join(mii.aml_related.utils.aml_output_path(deployment_tag),
"code")
if not os.path.isdir(score_path):
os.makedirs(score_path)
Expand Down
2 changes: 2 additions & 0 deletions mii/models/score/score_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

def init():
model_path = mii.utils.full_model_path(configs[mii.constants.MODEL_PATH_KEY])
deployment_tag = configs[mii.constants.DEPLOYMENT_TAG_KEY]
deployments = mii.multi_model_deployments[deployment_tag]

deployment_name = configs[mii.constants.DEPLOYMENT_NAME_KEY]
model_name = configs[mii.constants.MODEL_NAME_KEY]
Expand Down
38 changes: 13 additions & 25 deletions mii/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,9 @@ def config_to_b64_str(config):
class MIIServer():
'''Initialize the model, setup the server for the model under model_path'''
def __init__(self,
deployment_name,
task_name,
model_name,
deployment_tag,
deployments,
model_path,
ds_optimize=True,
ds_zero=False,
ds_config=None,
mii_configs={},
lb_config=None):

mii_configs = mii.config.MIIConfig(**mii_configs)
Expand All @@ -55,13 +50,9 @@ def __init__(self,
f.write(f"localhost slots={num_gpu}")
mii.configs.hostfile = hostfile

processes = self._initialize_service(deployment_name,
model_name,
processes = self._initialize_service(deployment_tag,
deployments,
model_path,
ds_optimize,
ds_zero,
ds_config,
mii_configs,
lb_config)
self._wait_until_server_is_live(processes, lb_config.replica_configs)

Expand Down Expand Up @@ -278,13 +269,9 @@ def _launch_deepspeed(self,
ds_launch_str=ds_launch_str)

def _initialize_service(self,
deployment_name,
model_name,
deployment_tag,
deployments,
model_path,
ds_optimize,
ds_zero,
ds_config,
mii_configs,
lb_config):

processes = []
Expand All @@ -295,19 +282,20 @@ def _initialize_service(self,

# Start replica instances
for i, repl_config in enumerate(lb_config.replica_configs):
name = repl_config.deployment_name
hostfile = tempfile.NamedTemporaryFile(delete=False)
hostfile.write(
f'{repl_config.hostname} slots={max(host_gpus[repl_config.hostname])+1}\n'
.encode())
processes.append(
self._launch_deepspeed(
deployment_name,
model_name,
name,
deployments[name].model,
model_path,
ds_optimize,
ds_zero,
ds_config,
mii_configs,
deployments[name].enable_deepspeed,
deployments[name].enable_zero,
deployments[name].ds_config,
deployments[name].mii_configs,
hostfile.name,
repl_config.hostname,
repl_config.tensor_parallel_ports[0],
Expand Down