Building Intelligent User Interfaces for Human-AI Alignment

A List of User Interface Research for Human-AI Alignment.

Overview

RL Agent Alignment
LLM Alignment
Contributing
Citation

RL Agent Alignment

User interfaces for reward learning and preference elicitation to align RL agents.

Classic Pairwise Comparison, "Deep Reinforcement Learning from Human Preferences".

Christiano et al., NeurIPS 2017. Introduces pairwise comparison of trajectory segments as a human feedback interface for RL, enabling non-experts to train complex behaviors in Atari and simulated robotics with approximately one hour of human oversight.

Cluster Ranking, "Time-Efficient Reward Learning via Visually Assisted Cluster Ranking".

Zhang et al., NeurIPS 2022 HiLL Workshop. Improves efficiency of human feedback by batching comparisons together through an interactive visualization interface rather than labeling each comparison separately, greatly increasing agent performance given the same labeling time.

Groupwise Comparison, "Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback".

Kompatscher et al., Computer Graphics Forum, 2025. Enables groupwise evaluation of RL agent trajectories through an exploratory overview with hierarchical clustering and a detailed comparison interface, increasing final rewards by 69.34% over conventional pairwise approaches.

LLM Alignment

User interfaces for collecting and improving human feedback to align language models.

HH-RLHF, "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback".

Bai et al., 2022. Applies preference modeling and RLHF to train helpful and harmless language assistants, with a comparison interface for crowdworkers to choose between model responses and iterative online training with weekly-updated preference models.

InstructGPT, "Training Language Models to Follow Instructions with Human Feedback".

Ouyang et al., NeurIPS 2022. Fine-tunes language models with human feedback via a labeling interface for demonstrations and rankings, producing InstructGPT models that outperform much larger GPT-3 in human evaluations.

DxHF, "Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition".

Shi et al., 2025. Addresses cognitive challenges in comparing lengthy text responses by decomposing text into individual claims with visual encoding of relevance and linked similar statements, improving feedback accuracy by ~5%.

Contributing

Contributions are welcome! If you know of a paper on user interfaces for AI alignment that should be included, please submit a pull request or open an issue. You can also directly send me email.

To add a paper, follow this format:

**Short Name**, "Full Paper Title". [![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg)](https://arxiv.org/abs/XXXX.XXXXX)
> *Authors, Venue, Year.* Brief description of the UI contribution.

Citation

If you find this repository useful, please consider citing it:

@article{shi2026building,
  title={Building Intelligent User Interfaces for Human-AI Alignment},
  author={Shi, Danqing},
  journal={arXiv preprint arXiv:2602.11753},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building Intelligent User Interfaces for Human-AI Alignment

Overview

RL Agent Alignment

LLM Alignment

Contributing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Building Intelligent User Interfaces for Human-AI Alignment

Overview

RL Agent Alignment

LLM Alignment

Contributing

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages