Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
88705fb
remove bash
yining610 Dec 20, 2023
3f31c74
Merge branch 'main' of https://github.com/yining610/REV-reimpl
yining610 Dec 20, 2023
1a15dbb
add CKPT and cache_dir
yining610 Dec 20, 2023
baa2f7e
add rationale generator model
yining610 Dec 20, 2023
05817e4
add rationale generator and update README
yining610 Dec 22, 2023
491eb5e
add rationale finetuning for rationale generation
yining610 Dec 23, 2023
b4047aa
fix bugs for GPT2. now support both language modeling and seq2seq fin…
yining610 Dec 23, 2023
7b319d3
fix language modeling bug and get model generated rationales
yining610 Dec 23, 2023
2c1d677
update data store directory
yining610 Dec 23, 2023
4d42f19
fix bugs for model rationale generation and evaluating
yining610 Dec 24, 2023
a033bc0
add rationale evaluation results to README
yining610 Dec 25, 2023
edabf54
Update README
yining610 Dec 31, 2023
646c888
update README
yining610 Dec 31, 2023
673cf55
update README
yining610 Dec 31, 2023
1b2a37f
update README
yining610 Dec 31, 2023
698f412
add las evaluation code and command
yining610 Jan 2, 2024
ed05269
add rev baseline and fix bugs for las
yining610 Jan 2, 2024
323f4f7
add RQ baseline
yining610 Jan 3, 2024
71066f9
update readme
yining610 Jan 3, 2024
2b453b3
add extra experiment and bash scripts
yining610 Jan 9, 2024
a3014d8
finish all baseline experiments for strategyqa and start ecqa exp
yining610 Jan 10, 2024
083c1b4
add ecqa simulation experiment code
yining610 Jan 12, 2024
bbe0bb0
add baseline experiments rq and las for ecqa
yining610 Jan 20, 2024
af7b0b8
add experiments for COSE
yining610 Jan 21, 2024
e2abb9d
add new experiments
yining610 Jan 25, 2024
fe2ee04
add experiments for deberta
yining610 Jan 27, 2024
c206c98
update
yining610 Feb 14, 2024
77ad0b4
add exp results
yining610 Mar 30, 2024
4261369
compute correlation between rev and human eval
yining610 Mar 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,10 @@
data/
ckpt/
pycache/
Zhengping/
Yining/
wandb/
backup/
training_reports/
log/
*.py[cod]
86 changes: 70 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,17 @@ Each row corresponds to the $\lambda$ of the IRM regularizations.
| 100. | $x$ - 0.676 | $x$ - 0.678 | $x$ - 0.684 |
| 1000. | $x$ - 0.692 | $x$ - 0.692 | $x$ - 0.691 |

## Latest Result on Model rationale evaluation

$\lambda$ is the IRM regularizations coefficient and $\delta$ is the removal threshold

|T5-base | g ($\lambda = 100$, $\delta = 0.1$) | g($\lambda = 10$, $\delta = 0.1$) |
| ----- | ----------------- | ---------------- |
| GPT-4 | $x$ - 0.441 | $x$ - 0.400 |
| GPT-3.5 | $x$ - 0.538 | $x$ - 0.810 |
| T5-large| $x$ - 0.705 | $x$ - 0.871 |
| GPT-2 | $x$ - 0.779 | $x$ - 1.086 |

## File Structure Description

```shellscript
Expand All @@ -21,19 +32,62 @@ scripts/ // helper scripts to do examination, sanity check etc.

## Steps

Variables:
* `INPUT_DATA_PATH=/scratch/ylu130/data/strategyqa_dataset/strategyqa_train.json`
* `OUTPUT_DIRECTORY=Zhengping/strategyqa_custom_split`
* `PROCESSED_DATA_DIRECTORY=data/processed_datasets/strategyqa`

1. Split datasets: `python scripts/prepare_strategy_qa.py --input-path={INPUT_DATA_PATH} --output-path={OUTPUT_DIRECTORY}`
2. Prepare huggingface dataset: `python steps/rationale_preprocessing.py --data-handle={OUTPUT_DIRECTORY} --split={SPLIT} --write-to={PROCESSED_DATA_DIRECTORY}`
3. Generate rationale variants: `python scripts/generate_vocabs.py --dataset-dir={PROCESSED_DATA_DIRECTORY} --rationale-format={RATIONALE_FORMAT}`
4. Train models: `python steps/train_rev_model.py --task-name {MODEL-DATASET} --rationale-format {RATIONALE_FORMAT}`
1. Train all fasttext models: `bash bash/train_fasttext_models.sh`
2. Train all t5 models: `bash bash/train_t5_models.sh`
5. Run IG and mask tokens: `python scripts/sample_masking.py --dataset-dir {PROCESSED_DATA_DIRECTORY} --rationale-format {RATIONALE_FORMAT} --minimum-frequency {MF} --write-to {OUTPUT_PATH}`
6. Train Generator: `python steps/train_generator.py --rationale-format {RATIONALE_FORMAT} --removal-threshold {THRESHOLD}`
7. Sample intervened rationale datapoint: `python scripts/sample_intervention_generation.py --model-dir {TRAINED_MODEL_SAV_DIR} --data-dir {PROCESSED_DATA_DIRECTORY}`
8. Train IRM: `python steps/train_irm_model.py --rationale-format {RATIONALE_FORMAT} --removal-threshold {THRESHOLD}`
9. Evaluate: `python steps/eval_rev_with_model.py --dataset-dir {PROCESSED_DATA_DIRECTORY} --model-dir {EVALUATING_MODEL_DIR} --rationale-format {RATIONALE_FORMAT} --removal-threshold {THRESHOLD} --removal-model-dir {REMOVAL_MODEL_DIR}`
Variable Examples:
```
INPUT_DATA_PATH=/scratch/ylu130/data/strategyqa_dataset/strategyqa_train.json
OUTPUT_DIRECTORY=Zhengping/strategyqa_custom_split
OUTPUT_DIRECTORY2=Yining/generated_rationales/strategyqa
PROCESSED_DATA_DIRECTORY=data/processed_datasets/strategyqa
PROCESSED_DATA_DIRECTORY2=data/processed_datasets/strategyqa_model_rationale
DATA_NAME=gpt-4_demo=2_raw=True
```
### Prepare Synthetic Leaky Rationales
1. Split datasets: `python scripts/prepare_strategy_qa.py --input-path {INPUT_DATA_PATH} --output-path {OUTPUT_DIRECTORY}`
2. Prepare huggingface dataset: `python steps/rationale_preprocessing.py --data-handle {OUTPUT_DIRECTORY} --split {SPLIT} --write-to {PROCESSED_DATA_DIRECTORY}`
3. Prepare vocabulary: `python scripts/generate_vocabs.py --dataset-dir {PROCESSED_DATA_DIRECTORY} --rationale-format {RATIONALE_FORMAT} --rationale-only`

### Prepare Base Models
1. Train models: `python steps/train_rev_model.py --task-name {TASK_NAME} --rationale-format {RATIONALE_FORMAT}`
1. Train fasttext models used for leakage detection: `python steps/train_rev_model.py --task-name fasttext-{DATASET} --rationale-format {RATIONALE_FORMAT}`
2. Train t5 models used for regular REV evaluation: `python steps/train_rev_model.py --task-name t5-{DATASET} --rationale-format {RATIONALE_FORMAT}`

### Detecting and Handling Leaky Parts
#### Detecting and Masking Leaky Tokens
1. Run IG and mask tokens: `python scripts/sample_masking.py --dataset-dir {PROCESSED_DATA_DIRECTORY} --rationale-format {RATIONALE_FORMAT} --minimum-frequency {MF} --write-to {OUTPUT_PATH}`

#### IRM Finetuning Evaluation Models
1. Train Generator: `python steps/train_generator.py --rationale-format {RATIONALE_FORMAT} --removal-threshold {THRESHOLD}`
2. Sample intervened rationale datapoint: `python scripts/sample_intervention_generation.py --model-dir {TRAINED_MODEL_SAV_DIR} --data-dir {PROCESSED_DATA_DIRECTORY}`
3. Train IRM: `python steps/train_irm_model.py --rationale-format {RATIONALE_FORMAT} --removal-threshold {THRESHOLD} --irm-coefficient {IRM_COEF}`

### Final REV Evaluation
1. Evaluate: `python steps/eval_rev_with_model.py --dataset-dir {PROCESSED_DATA_DIRECTORY} --model-dir {EVALUATING_MODEL_DIR} --rationale-format {RATIONALE_FORMAT} --removal-threshold {THRESHOLD} --removal-model-dir {REMOVAL_MODEL_DIR}`
1. Use IRM finetuned model to evaluate: `python steps/eval_rev_with_model.py --dataset-dir {PROCESSED_DATA_DIRECTORY} --model-dir {EVALUATING_MODEL_DIR} --rationale-format {RATIONALE_FORMAT}`
2. Use masked rationale to evaluate: `python steps/eval_rev_with_model.py --dataset-dir {PROCESSED_DATA_DIRECTORY} --model-dir {EVALUATING_MODEL_DIR} --rationale-format {RATIONALE_FORMAT} --removal-threshold {THRESHOLD} --removal-model-dir {REMOVAL_MODEL_DIR}`

## Tests
### Evaluating Model Generated Rationales
1. Train rationale generator: `python steps/train_rationale_generator.py --task-name {TASK_NAME} --model-name {MODEL_NAME}`
2. Generate model rationales for strategyqa: `python scripts/generate_rationales.py --dataset-dir {OUTPUT_DIRECTORY} --model-name {MODEL_CHOICE} --num-sample {GENERATION_NUM} --demonstration-num {DEMONSTRATION_NUM} --output-dir {OUTPUT_DIRECTORY2}`
3. Prepare model-generated rationale dataset: `python steps/rationale_preprocessing.py --data-handle {OUTPUT_DIRECTORY2} --data-name {DATA_NAME} --split test --write-to {PROCESSED_DATA_DIRECTORY2}`
4. [Use IRM finetuned model to evaluate](#Final-REV-Evaluation)

### Evaluating Baselines
1. [LAS](baselines/las/README.md)
2. [REV](baselines/rev/README.md)
3. [RQ](baselines/rq/README.md)

## Experiment on the ECQA Dataset
### Simulation Experiment
1. Prepare ECQA dataset: `python steps/rationale_preprocessing.py --data-handle yangdong/ecqa --split {SPLIT} --write-to data/processed_datasets/ecqa`
2. Prepare ECQA simulation dataset: `python scripts/prepare_ecqa_simulation.py --split {SPLIT} --write-to data/processed_datasets/ecqa_simulation`
3. Train rationale generator: `python steps/train_rationale_generator.py --task-name ecqa --model-name {MODEL_NAME}`
4. Generate model rationales for ecqa: `python scripts/generate_rationales.py --dataset-dir data/processed_datasets/ecqa --model-name {MODEL-NAME} --num-sample {GENERATION_NUM} --demonstration-num {DEMONSTRATION_NUM} --output-dir Yining/generated_rationales/ecqa`
5. Prepare model-generated ECQA simulation dataset: `python steps/rationale_preprocessing.py --data-handle Yining/generated_rationales/ecqa_simulation --data-name {DATA_NAME} --split test --write-to data/processed_datasets/ecqa_simulation_model_rationale`
6. Prepare model-generated ECQA dataset: `python steps/rationale_preprocessing.py --data-handle Yining/generated_rationales/ecqa --data-name {DATA_NAME} --split test --write-to data/processed_datasets/ecqa_simulation_model_rationale`
7. Prepare vocabulary: `python scripts/generate_vocabs.py --dataset-dir data/processed_datasets/ecqa_simulation --rationale-format {RATIONALE_FORMAT}`

**Run all simulation experiments:** `bash/run_all_ecqa_simulation.sh`

## Experiment on the COS-E Dataset
`python scripts/run_cose.py --exp-name {EXPERIMENT_TYPE}`
Loading