Implicit meta-learning may lead language models to trust more reliable sources

This repository contains code for the language model experiments from the paper Implicit meta-learning may lead language models to trust more reliable sources (paper, ICML 2024 poster).

EDIT -- the paper Fresh in memory: Training-order recency is linearly encoded in language model activations is also based on this codebase, though the usage isn't documented. This paper expands / reuses data groups from the Implicit Meta Learning paper in a somewhat unintuitive way -- for the IML paper, names like "qd1consis" meant something; for the Fresh in Memory paper, these are just groups of entities used in different stages of finetuning. Basic workflow to get activation centroids is to 1) run python -m run.py to finetune the model in 6 stages, and 2) collect the centroids using centroid_collection_script.py.

Steps to get started:

1. Clone the repository

git clone https://github.com/krasheninnikov/internalization.git
cd internalization

2. Configure your Python environment

Step 1. Create and activate a new Conda environment:

conda create --name internalization python=3.11
conda activate internalization

Step 2. Install the dependencies and download the datasets:

pip install -r requirements.txt
# download the datasets from Google Drive
gdown --folder 'https://drive.google.com/drive/folders/1KQDClI3cbFzPhzfknF2xmtqE-aIW1EDf?usp=sharing'

Step 3 (Optional). Configure wandb:

wandb login
wandb init --entity=your-entity --project=your-project

3. Run the experiment

To run the experiment with the default configuration (configs/current_experiment.yaml), use the following command:

python -m src.run

Choosing/modifying/creating an experiment configuration. Go to the configs directory to select an existing configuration or create a new one. Some parameter descriptions can be found in the configs readme.

Once the configuration is ready, run the experiment with the following command:

python -m src.run -cp <your-config-path>

Name		Name	Last commit message	Last commit date
Latest commit History 579 Commits
.github/workflows		.github/workflows
configs		configs
data_generation		data_generation
src		src
tests		tests
utils		utils
.gitignore		.gitignore
README.md		README.md
activation_centroid_plotting.py		activation_centroid_plotting.py
centroid_3d_plot.py		centroid_3d_plot.py
centroid_collection_script.py		centroid_collection_script.py
internalization-icml-poster.png		internalization-icml-poster.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock
vision_data_order_centroids.py		vision_data_order_centroids.py
vision_data_order_probing.py		vision_data_order_probing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implicit meta-learning may lead language models to trust more reliable sources

1. Clone the repository

2. Configure your Python environment

3. Run the experiment

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Implicit meta-learning may lead language models to trust more reliable sources

1. Clone the repository

2. Configure your Python environment

3. Run the experiment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages