A List of User Interface Research for Human-AI Alignment.
User interfaces for reward learning and preference elicitation to align RL agents.
Classic Pairwise Comparison, "Deep Reinforcement Learning from Human Preferences".
Christiano et al., NeurIPS 2017. Introduces pairwise comparison of trajectory segments as a human feedback interface for RL, enabling non-experts to train complex behaviors in Atari and simulated robotics with approximately one hour of human oversight.
Cluster Ranking, "Time-Efficient Reward Learning via Visually Assisted Cluster Ranking".
Zhang et al., NeurIPS 2022 HiLL Workshop. Improves efficiency of human feedback by batching comparisons together through an interactive visualization interface rather than labeling each comparison separately, greatly increasing agent performance given the same labeling time.
Groupwise Comparison, "Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback".
Kompatscher et al., Computer Graphics Forum, 2025. Enables groupwise evaluation of RL agent trajectories through an exploratory overview with hierarchical clustering and a detailed comparison interface, increasing final rewards by 69.34% over conventional pairwise approaches.
User interfaces for collecting and improving human feedback to align language models.
HH-RLHF, "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback".
Bai et al., 2022. Applies preference modeling and RLHF to train helpful and harmless language assistants, with a comparison interface for crowdworkers to choose between model responses and iterative online training with weekly-updated preference models.
InstructGPT, "Training Language Models to Follow Instructions with Human Feedback".
Ouyang et al., NeurIPS 2022. Fine-tunes language models with human feedback via a labeling interface for demonstrations and rankings, producing InstructGPT models that outperform much larger GPT-3 in human evaluations.
DxHF, "Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition".
Shi et al., 2025. Addresses cognitive challenges in comparing lengthy text responses by decomposing text into individual claims with visual encoding of relevance and linked similar statements, improving feedback accuracy by ~5%.
Contributions are welcome! If you know of a paper on user interfaces for AI alignment that should be included, please submit a pull request or open an issue. You can also directly send me email.
To add a paper, follow this format:
**Short Name**, "Full Paper Title". [](https://arxiv.org/abs/XXXX.XXXXX)
> *Authors, Venue, Year.* Brief description of the UI contribution.If you find this repository useful, please consider citing it:
@article{shi2026building,
title={Building Intelligent User Interfaces for Human-AI Alignment},
author={Shi, Danqing},
journal={arXiv preprint arXiv:2602.11753},
year={2026}
}




