diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 0b3beaa6..4192db87 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,6 +1,6 @@
 repos:
   - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.3.0
+    rev: v5.0.0
     hooks:
       # list of supported hooks: https://pre-commit.com/hooks.html
       - id: trailing-whitespace
@@ -30,7 +30,7 @@ repos:
 
   # python docstring formatting
   - repo: https://github.com/myint/docformatter
-    rev: v1.4
+    rev: 06907d0 # v1.4
     hooks:
       - id: docformatter
         args: [--in-place, --wrap-summaries=99, --wrap-descriptions=99]
@@ -64,7 +64,7 @@ repos:
 
   # md formatting
   - repo: https://github.com/executablebooks/mdformat
-    rev: 0.7.14
+    rev: 0.7.17
     hooks:
       - id: mdformat
         args: ["--number"]
diff --git a/README.md b/README.md
index b76a9471..0607525a 100644
--- a/README.md
+++ b/README.md
@@ -6,14 +6,25 @@
 <a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning&logoColor=white"></a>
 <a href="https://hydra.cc/"><img alt="Config: Hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a>
 <a href="https://github.com/ChristophAlt/pytorch-ie-hydra-template"><img alt="Template" src="https://img.shields.io/badge/-PyTorch--IE--Hydra--Template-017F2F?style=flat&logo=github&labelColor=gray"></a><br>
-[![Paper](http://img.shields.io/badge/paper-arxiv.1001.2234-B31B1B.svg)](https://www.nature.com/articles/nature14539)
-[![Conference](http://img.shields.io/badge/AnyConference-year-4b44ce.svg)](https://papers.nips.cc/paper/2020)
+[![Paper](http://img.shields.io/badge/paper-arxiv.2501.19316-B31B1B.svg)](https://arxiv.org/abs/2501.19316)
+[![Conference](http://img.shields.io/badge/RepL4NLP@NAACL-2025-4b44ce.svg)](https://sites.google.com/view/repl4nlp2025)
 
 </div>
 
+<p align="center">
+<img src="figures/probing_workflow.png" alt="Probing workflow with Coreference Resolution (Coref) as target task and four different source tasks: Relation Extraction (RE), Question Answering (QA), Named Entity Recognition (NER), and Paraphrase Detection (MRPC)." width=42% height=42%>
+</p>
+
 ## 📌 Description
 
-What it does
+This repository contains the code for the experiments described in the
+paper [Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution (Anikina et al., RepL4NLP 2025)](https://arxiv.org/pdf/2501.19316) that will be presented at the 10th Workshop on Representation Learning for NLP co-located with NAACL 2025 in Albuquerque, New Mexico. See the [official website](https://sites.google.com/view/repl4nlp2025) for more information.
+
+## 📃 Abstract
+
+In this work, we reimagine classical probing to evaluate knowledge transfer from simple source to more complex target tasks. Instead of probing frozen representations from a complex source task on diverse simple target probing tasks (as usually done in probing), we explore the effectiveness of embeddings from multiple simple source tasks on a single target task. We select coreference resolution, a linguistically complex problem requiring contextual understanding, as focus target task, and test the usefulness of embeddings from comparably simpler tasks such as paraphrase detection, named entity recognition, and relation extraction. Through systematic experiments, we evaluate the impact of individual and combined task embeddings.
+
+Our findings reveal that task embeddings vary significantly in utility for coreference resolution, with semantic similarity tasks (e.g., paraphrase detection) proving most beneficial. Additionally, representations from intermediate layers of fine-tuned models often outperform those from final layers. Combining embeddings from multiple tasks consistently improves performance, with attention-based aggregation yielding substantial gains. These insights shed light on relationships between task-specific representations and their adaptability to complex downstream tasks, encouraging further exploration of embedding-level task transfer.
 
 ## 🚀 Quickstart
 
@@ -109,6 +120,26 @@ To run the data preparation code on the DFKI cluster, you can execute the follow
 $ usrun.sh --output=$PWD/preprocess-coref.out -p RTX3090-MLT --mem=24G scripts/prepare_coref_data.sh &
 ```
 
+Note that `usrun.sh` script is simply a wrapper for the `srun` command that loads the corresponding image that already includes all the libraries installed from `requirements.txt`, but you can also load any other image that supports torch, e.g. `IMAGE=/netscratch/enroot/nvcr.io_nvidia_pytorch_23.06-py3.sqsh` and then simply run `pip install -r requirements.txt` to get the same environment on the cluster.
+
+<details>
+
+<summary>Content of the `usrun.sh` script</summary>
+
+```
+#!/bin/sh
+IMAGE=/netscratch/anikina/updated-mtask-knowledge-transfer.sqsh
+srun -K \
+  --container-mounts=/netscratch:/netscratch,/ds:/ds,$HOME:$HOME \
+  --container-workdir=$HOME \
+  --container-image=$IMAGE \
+  --ntasks=1 \
+  --nodes=1 \
+  $*
+```
+
+</details>
+
 DFKI-internal: On the cluster, use `CONLL2012_ONTONOTESV5_PREPROCESSED_DATA_DIR=/ds/text/cora4nlp/datasets/ontonotes_coref`
 
 #### Extractive Question Answering
@@ -252,3 +283,37 @@ pre-commit run -a
 # run tests
 pytest -k "not slow" --cov --cov-report term-missing
 ```
+
+## How to reproduce our results?
+
+We have performed extensive experiments with different models and configurations. The experiments that are relevant for the paper are summarized in [`results/coref.md`](https://github.com/Cora4NLP/multi-task-knowledge-transfer/blob/main/results/coref.md). Each set of experiments has a link to the log entry that includes the exact command to train a model for each configuration together with the obtained results and links to the W&B project.
+
+For instance, for the experiments with layer truncation with frozen target + frozen MRPC where we truncate only the MRPC model (frozen-target<sub>12</sub> + frozen-MRPC<sub>2</sub>) you can have a look at [the corresponding log entry](https://github.com/Cora4NLP/multi-task-knowledge-transfer/blob/main/log.md#coreference-resolution---frozen-pre-trained-target-model--frozen-mrpc-model-mrpc-truncated-to-2-layers) linked in [this table](https://github.com/Cora4NLP/multi-task-knowledge-transfer/blob/main/results/coref.md#experiments-with-layer-truncation-with-frozen-target--frozen-mrpc-where-we-truncate-only-the-mrpc-model) in `results/coref.md` where you can find the training command and the results:
+
+```
+python src/train.py \
+experiment=conll2012_coref_hoi_multimodel_base \
++model.pretrained_models={bert-base-cased-coref-hoi:models/pretrained/bert-base-cased-coref-hoi,bert-base-cased-mrpc:bert-base-cased-finetuned-mrpc} \
++model.freeze_models=[bert-base-cased-coref-hoi,bert-base-cased-mrpc] \
++model.aggregate=attention \
+model.task_learning_rate=1e-4 \
+trainer=gpu \
++model.truncate_models.bert-base-cased-mrpc=2 \
+seed=1,2,3 \
++wandb_watch=attention_activation \
++hydra.callbacks.save_job_return.integrate_multirun_result=true \
+--multirun
+```
+
+## 📃 Citation
+
+```bibtex
+@article{Anikina2025ReversePE,
+  title={Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution},
+  author={Tatiana Anikina and Arne Binder and David Harbecke and Stalin Varanasi and Leonhard Hennig and Simon Ostermann and Sebastian Moller and Josef van Genabith},
+  journal={ArXiv},
+  year={2025},
+  volume={abs/2501.19316},
+  url={https://api.semanticscholar.org/CorpusID:276079972}
+}
+```
diff --git a/dataset_builders/pie/conll2012_ontonotesv5_preprocessed/conll2012_ontonotesv5_preprocessed.py b/dataset_builders/pie/conll2012_ontonotesv5_preprocessed/conll2012_ontonotesv5_preprocessed.py
index a75c2951..8dba14ec 100644
--- a/dataset_builders/pie/conll2012_ontonotesv5_preprocessed/conll2012_ontonotesv5_preprocessed.py
+++ b/dataset_builders/pie/conll2012_ontonotesv5_preprocessed/conll2012_ontonotesv5_preprocessed.py
@@ -80,6 +80,7 @@ class Conll2012OntonotesV5PreprocessedConfig(datasets.BuilderConfig):
 
     def __init__(self, **kwargs):
         """BuilderConfig for CDCP.
+
         Args:
           **kwargs: keyword arguments forwarded to super.
         """
diff --git a/dataset_builders/pie/squad_v2/squad_v2.py b/dataset_builders/pie/squad_v2/squad_v2.py
index 8f2235c9..698a0081 100644
--- a/dataset_builders/pie/squad_v2/squad_v2.py
+++ b/dataset_builders/pie/squad_v2/squad_v2.py
@@ -81,6 +81,7 @@ class SquadV2Config(datasets.BuilderConfig):
 
     def __init__(self, **kwargs):
         """BuilderConfig for SQuAD v2.0.
+
         Args:
           **kwargs: keyword arguments forwarded to super.
         """
diff --git a/figures/probing_workflow.png b/figures/probing_workflow.png
new file mode 100644
index 00000000..1cf87011
Binary files /dev/null and b/figures/probing_workflow.png differ
diff --git a/requirements.txt b/requirements.txt
index c91e10fa..7154b03b 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -13,7 +13,7 @@ hydra-colorlog>=1.2.0
 hydra-optuna-sweeper>=1.2.0
 
 # --------- loggers --------- #
-wandb
+wandb==0.16.0
 # neptune-client
 # mlflow
 # comet-ml
@@ -39,4 +39,5 @@ asciidag        # to print the document annotation graph on the console
 tabulate        # show statistics as markdown
 plotext         # show statistics as plots
 scipy           # linear_assignment for computing ceafe (coreference evaluation)
+numpy==1.24.1   # older version of numpy that supports np.float_
 # huggingface-hub>=0.13  # interaction with HF hub
diff --git a/src/models/coref_hoi.py b/src/models/coref_hoi.py
index 9e8206c2..a80aef33 100644
--- a/src/models/coref_hoi.py
+++ b/src/models/coref_hoi.py
@@ -264,7 +264,6 @@ def get_predictions_and_loss(
         sentence_len = sentence_len[0]
         genre = genre[0]
         sentence_map = sentence_map[0]
-
         """Model and input are already on the device."""
         device = self.device
 
diff --git a/src/models/multi_model_coref_hoi.py b/src/models/multi_model_coref_hoi.py
index 12034ec4..25ad1295 100644
--- a/src/models/multi_model_coref_hoi.py
+++ b/src/models/multi_model_coref_hoi.py
@@ -289,7 +289,6 @@ def get_predictions_and_loss(
         sentence_len = sentence_len[0]
         genre = genre[0]
         sentence_map = sentence_map[0]
-
         """Model and input are already on the device."""
         device = self.device