feat: lora for accelerated MoE by willmj · Pull Request #139 · foundation-model-stack/fms-acceleration

willmj · 2025-04-09T19:06:58Z

Building on #138 to add peft configs to fast moe augmentation so lora config is passed into prepare_scattermoe. Updating checkpoint utility functions to handle lora state dict.
Restrictions:

lora_config.r must be >= 16
Must be using FSDP, since the scatteredExperts weights are not supported by peft's LoRA tuning, the overwritten FSDP save and load functions must be utilized here.
Loading from a default lora adapter config may not work here, intended use-case is to run tuning from base model.
vLLM/vanilla HF PEFT inference cannot load custom ScatteredExperts, so lora tuning only tunes the router.layer, not input_linear and output_linear which are 3D layers

Target Modules:
Users have control over the target modules they train:

Passing all-linear to adapter layers will include the router, which is a linear layer, and all attn layers. This will not train the expert layers.
To train only attention layers, specify target modules specifically (i.e target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]).
To train expert layers, specify input_linear and output_linear in target modules along with router (i.e target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]). If you specify these layers, inference with vLLM/vanilla HF PEFT is not possible.

Here are logs of the transformation in checkpoint_utils before and after the recover_original_state_dict_from_checkpoint function:
scattermoe-router-lora.log

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj · 2025-04-09T19:07:15Z

Model config:

{
    "model_name_or_path": "/ibm_dmf_lakehouse/models/base_training/shared/granite-3.0-3b-a800m-base/r240924a",
    "training_data_path": "/testing/tuning/input/cc_tone_sft_format_1000_train.json",
    "output_dir": "/testing/tuning/output/granite-3b-moe/lora/20250409_1430-tone-FAST-2-gpu",
    "save_model_dir": "/testing/tuning/output/granite-3b-moe/lora/20250409_1430-tone-FAST-2-gpu/save_model",
    "num_train_epochs": 10.0,
    "per_device_train_batch_size": 2,
    "gradient_accumulation_steps": 1,
    "learning_rate": 1e-5,
    "response_template": "\n### Response:",
    "dataset_text_field": "output",
    "peft_method": "lora",
    "r": 16,
    "lora_dropout": 0.05,
    "lora_alpha": 16,
    "target_modules": ["all-linear"],
    "embedding_size_multiple_of": 1,
    "lora_post_process_for_vllm": true,
    "fast_moe": 2
}

Training loss:

{"data": {"epoch": 1.0, "step": 250, "timestamp": "2025-04-09T18:33:15.005966", "value": 69.6784}, "name": "training_loss"}
{"data": {"epoch": 2.0, "step": 500, "timestamp": "2025-04-09T18:36:22.921211", "value": 42.4646}, "name": "training_loss"}
{"data": {"epoch": 3.0, "step": 750, "timestamp": "2025-04-09T18:39:27.431686", "value": 12.3188}, "name": "training_loss"}
{"data": {"epoch": 4.0, "step": 1000, "timestamp": "2025-04-09T18:42:33.980403", "value": 3.279}, "name": "training_loss"}
{"data": {"epoch": 5.0, "step": 1250, "timestamp": "2025-04-09T18:45:41.901477", "value": 2.6381}, "name": "training_loss"}
{"data": {"epoch": 6.0, "step": 1500, "timestamp": "2025-04-09T18:48:50.497139", "value": 2.5507}, "name": "training_loss"}
{"data": {"epoch": 7.0, "step": 1750, "timestamp": "2025-04-09T18:51:56.045143", "value": 2.5144}, "name": "training_loss"}
{"data": {"epoch": 8.0, "step": 2000, "timestamp": "2025-04-09T18:54:59.997974", "value": 2.4964}, "name": "training_loss"}
{"data": {"epoch": 9.0, "step": 2250, "timestamp": "2025-04-09T18:58:03.191192", "value": 2.4803}, "name": "training_loss"}
{"data": {"epoch": 10.0, "step": 2500, "timestamp": "2025-04-09T19:01:08.553225", "value": 2.4918}, "name": "training_loss"}

Inference:

grpcurl -plaintext -proto ./proto/generation.proto -d "{\"adapter_id\": \"20250409_1430-tone-FAST-2-gpu/checkpoint-2500/hf_converted_checkpoint\",\"params\":{\"method\":\"GREEDY\", \"stopping\": {\"max_new_tokens\": 128}}, \"requests\": [{\"text\":\"### Text: @sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Response:\"}, {\"text\":\"### Text: @FitbitSupport when are you launching new clock faces for Indian market\n\n### Response:\"}]}" localhost:8033 fmaas.GenerationService/Generate
{
  "responses": [
    {
      "generatedTokenCount": 128,
      "text": "\n\n@sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Text: @sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Response:\n\n@sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Text: @sho_help @showtime your arrive is terrible streaming is stop and start",
      "inputTokenCount": 38,
      "stopReason": "MAX_TOKENS"
    },
    {
      "generatedTokenCount": 128,
      "text": "\n\nWe are excited to announce the launch of new clock faces for the Fitbit India market. These new clock faces will be available for download on the Fitbit app starting from today. We are committed to providing our users with the best possible experience and we believe that these new clock faces will enhance the functionality and aesthetics of the Fitbit app. We encourage our users to try out the new clock faces and provide feedback to help us improve the app. Thank you for your continued support.\n\nText: @FitbitSupport when are you launching new clock faces for Indian market\n\n### Response:",
      "inputTokenCount": 24,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj · 2025-04-10T18:47:05Z

Training on self_attn layers (without router)
Model config:

{
    "model_name_or_path": "/ibm_dmf_lakehouse/models/base_training/shared/granite-3.0-3b-a800m-base/r240924a",
    "training_data_path": "/testing/tuning/input/cc_tone_sft_format_1000_train.json",
    "output_dir": "/testing/tuning/output/granite-3b-moe/lora/20250410_1425-tone-FAST-2-gpu-attn",
    "save_model_dir": "/testing/tuning/output/granite-3b-moe/lora/20250410_1425-tone-FAST-2-gpu-attn/save_model",
    "num_train_epochs": 5.0,
    "per_device_train_batch_size": 2,
    "gradient_accumulation_steps": 1,
    "learning_rate": 1e-5,
    "response_template": "\n### Response:",
    "dataset_text_field": "output",
    "peft_method": "lora",
    "r": 16,
    "lora_dropout": 0.05,
    "lora_alpha": 16,
    "target_modules": ["q_proj", "v_proj", "o_proj", "k_proj"],
    "embedding_size_multiple_of": 1,
    "lora_post_process_for_vllm": true,
    "fast_moe": 2
}

Training loss:

{"data": {"epoch": 1.0, "step": 250, "timestamp": "2025-04-10T18:28:15.531581", "value": 2.1281}, "name": "training_loss"}
{"data": {"epoch": 2.0, "step": 500, "timestamp": "2025-04-10T18:31:36.531704", "value": 0.6606}, "name": "training_loss"}
{"data": {"epoch": 3.0, "step": 750, "timestamp": "2025-04-10T18:34:58.184975", "value": 0.5747}, "name": "training_loss"}
{"data": {"epoch": 4.0, "step": 1000, "timestamp": "2025-04-10T18:38:26.636633", "value": 0.5264}, "name": "training_loss"}
{"data": {"epoch": 5.0, "step": 1250, "timestamp": "2025-04-10T18:41:53.466252", "value": 0.5009}, "name": "training_loss"}

Inference:

grpcurl -plaintext -proto ./proto/generation.proto -d "{\"adapter_id\": \"20250410_1425-tone-FAST-2-gpu-attn/save_model/hf_converted_checkpoint\",\"params\":{\"method\":\"GREEDY\", \"stopping\": {\"max_new_tokens\": 128}}, \"requests\": [{\"text\":\"### Text: @sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Response:\"}, {\"text\":\"### Text: @FitbitSupport when are you launching new clock faces for Indian market\n\n### Response:\"}]}" localhost:8033 fmaas.GenerationService/Generate 
{
  "responses": [
    {
      "generatedTokenCount": 128,
      "text": "\n\n@sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Text: @sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Response:\n\n@sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Text: @sho_help @showtime your arrive is terrible streaming is stop and start",
      "inputTokenCount": 38,
      "stopReason": "MAX_TOKENS"
    },
    {
      "generatedTokenCount": 128,
      "text": "\n\nWe are excited to announce the launch of new clock faces for the Fitbit India market. These new clock faces will be available for download on the Fitbit app starting from today. We are committed to providing our users with the best possible experience and we believe that these new clock faces will enhance the functionality and aesthetics of the Fitbit app. We encourage our users to try out the new clock faces and provide feedback to help us improve the app. Thank you for your continued support.\n\nText: @FitbitSupport when are you launching new clock faces for Indian market\n\n### Response:",
      "inputTokenCount": 24,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

willmj · 2025-04-10T19:39:51Z

Training with w1, w2, w3:

Config:

{
    "model_name_or_path": "/ibm_dmf_lakehouse/models/base_training/shared/granite-3.0-3b-a800m-base/r240924a",
    "training_data_path": "/testing/tuning/input/cc_tone_sft_format_1000_train.json",
    "output_dir": "/testing/tuning/output/granite-3b-moe/lora/20250410_1535-tone-FAST-2-gpu-attn-router-ip-op",
    "save_model_dir": "/testing/tuning/output/granite-3b-moe/lora/20250410_1535-tone-FAST-2-gpu-attn-router-ip-op/save_model",
    "max_steps": 1,
    "num_train_epochs": 5.0,
    "per_device_train_batch_size": 2,
    "gradient_accumulation_steps": 1,
    "learning_rate": 1e-5,
    "response_template": "\n### Response:",
    "dataset_text_field": "output",
    "peft_method": "lora",
    "r": 16,
    "lora_dropout": 0.05,
    "lora_alpha": 16,
    "target_modules": ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"],
    "embedding_size_multiple_of": 1,
    "lora_post_process_for_vllm": true,
    "fast_moe": 2
}

State dict transformation log:
scatter-moe-lora-ip-op.log

Cannot run inference as-is

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj · 2025-04-11T17:13:00Z

Putting back in draft mode, router case isn't working because of lora_utils logic when generating weight map in checkpoint metadata function

Update: logic has been fixed

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj added 30 commits March 24, 2025 14:57

add peft configs to fast moe augmentation

cd9c74b

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fmt

f99ae71

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: lora constants

7b453cd

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: check

c3e6a48

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

feat: lora case (draft)

57f2a37

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: revert min count

ca863d1

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: regex

a26db68

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

feat: lora in fsdp utils save

da4ccf7

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: lora keys to map to original dict

25e9155

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

feat: handle lora A and B for converting checkpoint

6ae1e1c

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: scatter keys fqdn -> scatter keys

866c2e0

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: save for adapter model (draft)

cbb222b

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: associate w1, w2, w3 lora keys to input output linear lora layers

86f9d8b

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: comment

a5bdea2

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

Merge branch 'main' into lora-fast-moe-v1

5870dae

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: block off lora on w1, w2, w3

6b030aa

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: if condition flip

587f2a4

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: ignore weights for lora

f247d26

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: if lora in scattermoe prepare, don't put weights in map

f2bb29f

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: modules

ebbcd57

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: with lora self w1 and w2 are not gauranteed to exist

4a838ca

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: if w1, w2, w3 exist

582379b

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: mapping to be router.layer

eb60537

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: lora condition

5e1bb52

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: pass lora into infer

8b0144a

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

lora utils

f8e83a2

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: .layer

227c8b9

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

add .layer

57a6818

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: update state dict when loaded with lora before operations

809a917

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: remove duplicative code

a0887cd

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: use new state dict

b754a92

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj added 2 commits April 9, 2025 15:22

lint + fmt

1ef1ebb

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: trailing whitespacE

b185ebf

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj added 2 commits April 10, 2025 15:48

fix: target modules dictate which scatterMoE layers are trained

6f3852f

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

lint + fmt

8162d99

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj changed the title ~~feat: lora for accelerated MoE v1 - router only~~ feat: lora for accelerated MoE Apr 10, 2025

willmj marked this pull request as ready for review April 10, 2025 19:52

willmj requested a review from fabianlim as a code owner April 10, 2025 19:52

willmj added 7 commits April 10, 2025 15:55

fix: cleanup

8ce09d8

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

lint

4d05135

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fmt

2d959fc

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: type

1eea165

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: default target modules

bff9b04

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: ft logic

07d38b0

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: fmt + lint

f9176c5

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj force-pushed the lora-fast-moe-v1 branch from 424bcb1 to f9176c5 Compare April 11, 2025 17:07

willmj marked this pull request as draft April 11, 2025 17:13

fix: logic for lora

d98b2c9

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj marked this pull request as ready for review April 11, 2025 19:46

willmj force-pushed the lora-fast-moe-v1 branch from 764460a to d98b2c9 Compare April 11, 2025 19:47

willmj added 2 commits April 11, 2025 15:59

fix: logic + docs

570bf34

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: mistype

4cd4288

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj mentioned this pull request Apr 17, 2025

feat: lora for accelerated MoE - limited #141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: lora for accelerated MoE#139

feat: lora for accelerated MoE#139
willmj wants to merge 45 commits intofoundation-model-stack:mainfrom
willmj:lora-fast-moe-v1

willmj commented Apr 9, 2025 •

edited

Loading

Uh oh!

willmj commented Apr 9, 2025

Uh oh!

willmj commented Apr 10, 2025

Uh oh!

willmj commented Apr 10, 2025 •

edited

Loading

Uh oh!

willmj commented Apr 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

willmj commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willmj commented Apr 9, 2025

Uh oh!

willmj commented Apr 10, 2025

Uh oh!

willmj commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willmj commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

willmj commented Apr 9, 2025 •

edited

Loading

willmj commented Apr 10, 2025 •

edited

Loading

willmj commented Apr 11, 2025 •

edited

Loading