This repository contains code for the language model experiments from the paper Implicit meta-learning may lead language models to trust more reliable sources (paper, ICML 2024 poster).
EDIT -- the paper Fresh in memory: Training-order recency is linearly encoded in language model activations is also based on this codebase, though the usage isn't documented. This paper expands / reuses data groups from the Implicit Meta Learning paper in a somewhat unintuitive way -- for the IML paper, names like "qd1consis" meant something; for the Fresh in Memory paper, these are just groups of entities used in different stages of finetuning. Basic workflow to get activation centroids is to 1) run python -m run.py to finetune the model in 6 stages, and 2) collect the centroids using centroid_collection_script.py.
Steps to get started:
git clone https://github.com/krasheninnikov/internalization.git
cd internalization-
Step 1. Create and activate a new Conda environment:
conda create --name internalization python=3.11 conda activate internalization
-
Step 2. Install the dependencies and download the datasets:
pip install -r requirements.txt # download the datasets from Google Drive gdown --folder 'https://drive.google.com/drive/folders/1KQDClI3cbFzPhzfknF2xmtqE-aIW1EDf?usp=sharing'
-
Step 3 (Optional). Configure
wandb:wandb login wandb init --entity=your-entity --project=your-project
To run the experiment with the default configuration (configs/current_experiment.yaml), use the following command:
python -m src.runChoosing/modifying/creating an experiment configuration. Go to the configs directory to select an existing configuration or create a new one. Some parameter descriptions can be found in the configs readme.
Once the configuration is ready, run the experiment with the following command:
python -m src.run -cp <your-config-path>