Skip to content

Marwolaeth/stancer

Repository files navigation

stancer

R-CMD-check Codecov test coverage Lifecycle: experimental

Stance Analysis using Ensemble of LLM Agents through ellmer

Overview

stancer provides tools for automated stance analysis in R. It uses an ensemble of Large Language Models (LLM) from user-provided ellmer — Wickham et al. (2025) — Chat objects to determine whether a text is in favour of, against, or neutral towards a specific target.

The package is built upon the COLA (Collaborative rOle-infused LLM-based Agents) framework proposed by Lan et al. (2024). Instead of a single prompt, stancer coordinates a team of LLM agents—linguists, domain experts, and social media specialists—who analyse the text and debate its meaning before reaching a final judgement.

The COLA framework design (described below) requires several LLM API calls. Specifically, it makes seven separate calls for each document processed.

Installation

You can install the development version of stancer from GitHub with:

# install.packages("pak")
pak::pak("Marwolaeth/stancer")

Stance Analysis

In the era of “text-as-data”, researchers in communication studies and sociology face the challenge of extracting structured meaning from vast amounts of unstructured text. Traditional content analysis, as established by Klaus Krippendorff (2004), emphasizes that communication is not just about frequencies but about inferences — how text relates to its context and the intentions of its author. While practitioners often aim for the “interpretive” depth described by Ahuvia (2001), manual coding at scale is frequently impossible.

Large Language Models (LLMs) can serve as automated coders, shifting content analysis from dictionary-based methods to zero-shot and few-shot classification (Ziems et al. 2024). This paradigm shift enables the replication of human-like annotation at scale, bridging the gap between quantitative metrics and qualitative interpretation (Ziems et al. 2024).

A common pitfall in automated research is treating Sentiment Analysis and Stance Analysis as interchangeable. However, as noted by Bestvater and Monroe (2023), they are conceptually distinct:

Sentiment analysis techniques have a long history in natural language processing and have become a standard tool in the analysis of political texts, promising a conceptually straightforward automated method of extracting meaning from textual data by scoring documents on a scale from positive to negative. However, while these kinds of sentiment scores can capture the overall tone of a document, the underlying concept of interest for political analysis is often actually the document’s stance with respect to a given target—how positively or negatively it frames a specific idea, individual, or group—as this reflects the author’s underlying political attitudes.

For a comprehensive exploration of the concept for those more familiar with sentiment analysis, we highly recommend the study by Bestvater and Monroe (2023), “Sentiment is Not Stance: Target-Aware Opinion Classification for Political Text Analysis”.

In political discourse and social media monitoring, sentiment and stance are often orthogonal and sometimes opposite. For example:

“I am absolutely disgusted that the plastic ban proposal was rejected.”

Sentiment: Negative (the author is disgusted; also, for more conservative methods, it may also matter that the ban proposal was rejected).

Stance (Target: Plastic Ban): Positive (the author supports the ban).

Standard sentiment lexicons would likely misclassify this as “Against” due to the negative tone. stancer addresses this by using the COLA framework (Collaborative rOle-infused LLM-based Agents), which mimics the deliberation of human coders to capture the “interpretive” nuances of stance detection.

How it works: The COLA Framework

Following the approach by Lan et al. (2024), stancer breaks down stance detection into three distinct stages:

  1. Multidimensional Analysis: Three specialised agents (linguist, domain expert, and social media veteran) analyse the text’s meaning, style, terminology, and context.
  2. Reasoning-Enhanced Debate: For each possible stance polarity (Positive/Neutral/Negative), an agent is assigned to argue why the text might fit that category, based on analyses results from step 1. This helps uncover implicit viewpoints that a simple analysis might miss.
  3. Stance Conclusion: A final decision-maker agent reviews the analyses and the debate to provide a reasoned judgement and a final (structured) score.

Supported Languages

stancer includes built-in, hand-crafted prompts for the following languages:

  • English ("en")
  • Ukrainian ("uk")
  • Russian ("ru")

The package automatically detects the language of your text (using cld2 if available) or allows you to specify it manually in llm_stance(). Currently, only the three languages listed above are available.

For other languages or multilingual corpora, specifying English (“en”) is usually a reliable fallback, since most modern LLMs have strong cross-lingual capabilities.

Usage

stancer works with chat objects from the ellmer package. This gives you the flexibility to use any supported model (OpenAI, Anthropic, Ollama, etc.) as your analysis engine.

Simple text analysis

library(stancer)
library(ellmer)

# Set up your LLM client
chat <- ellmer::chat_anthropic()

text <- "I am absolutely disgusted that the plastic ban proposal was rejected."
target <- "Plastic ban"

result <- llm_stance(
  text,
  target,
  type = "object", # stance towards a given object or entity 
  # type = "claim", # whether a text agrees with a certain proposition or statement
  chat_base = chat,
  domain_role = "economic analyst"
)

# View the summary
summary(result)
inspect(result, "analysis", "linguistic")
as.data.frame(result)
#> # A tibble: 1 × 6
#>   text                            target target_type language stance explanation
#>   <chr>                           <chr>  <chr>       <chr>    <fct>  <chr>      
#> 1 I am absolutely disgusted that… Plast… object      en       Posit… The explic…

Besides the traditional three-way scale (Negative/Neutral/Positive): scale = "categorical", two scale options are available:

  • likert returns Likert scale ordinal factor response (Strongly Disagree … Strongly Agree)
  • numeric returns numerical stance values. Warning: these values typically range from -1 to 1, but there is no guarantee of the value range. Therefore, scale = "numeric" is currently not recommended, especially for smaller models.
result_likert <- llm_stance(
    text,
    target = "Plastic ban",
    type = "object",
    chat_base = chat,
    domain_role = "social commentator",
    language = "en",
    scale = "likert"
)

as.data.frame(result_likert)[, c("target", "stance", "explanation")]
#> # A tibble: 1 × 3
#>   target      stance         explanation                                        
#>   <chr>       <ord>          <chr>                                              
#> 1 Plastic ban Strongly Agree The author expresses strong disgust at the rejecti…

The inspect method provides a deeper look at the analysis results, including intermediate steps and detailed outputs generated during processing. See ?inspect for details and available arguments.

inspect(result_likert, "explanation")
#> 
#>  ── EXPLANATION: [row 1] ──────────────────────────────────────────────────────── 
#> 
#> The author expresses strong disgust at the rejection of the plastic ban proposal, indicating they view the ban as desirable. Explicit evaluation of the ban is absent, but the emotional tone (intensifier 'absolutely' + 'disgusted') directed at the rejection signals a clear, intense positive stance toward the ban itself. Implicitly, the presupposition that the ban should have been accepted reinforces this support. Hence the majority of textual elements align with Strongly Agree.

Data frame integration (mall-style)

Inspired by the mall package by Ruiz (2026) from mlverse, stancer provides a seamless way to process entire datasets. It handles the row-wise operations and returns a tidy data frame with the results.

library(stancer)
library(ellmer)
library(dplyr)

data("programming_tweets")

chat <- ellmer::chat_anthropic()

# Process the first three rows of the data frame
results <- programming_tweets |>
    dplyr::slice_head(n = 3) |>
    llm_stance(
        tweet,
        target = "Julia programming language",
        type = "object",
        chat = chat,
        domain_role = "computer scientis",
        language = "en",
        scale = "categorical"
    )

# The result is a tibble with a new .stance column
glimpse(results)
#> Rows: 3
#> Columns: 2
#> $ tweet   <chr> "Julia's SciML ecosystem is absolutely phenomenal for solving complex differential equations. The performance and expressiveness combined i…
#> $ .stance <fct> Positive, Positive, Positive

Customizing Prompts

If you need to adapt the agents’ behaviour, support a new language, or you find that the original prompts are too verbose and slow, you can provide your own instructions. By using the prompts_dir argument, you can point the package to a local folder containing custom .md files.

stancer will look for specific files (e.g., user-linguist.md, system-judger.md, description-likert.md) in that directory. If a file is missing, the package will gracefully fall back to its internal defaults with respect to the language argument ("en" by default). This allows you to override the system partially or entirely.

# Use custom instructions from a local folder
result <- llm_stance(
  text,
  target,
  type = "object",
  chat_base = chat,
  prompts_dir = "path/to/my_prompts/"
)

Requirements

  • R >= 4.1.0
  • ellmer for LLM integration.
  • API access to your chosen LLM provider.

Related Tools

For researchers working with text-as-data and political discourse, we recommend these complementary R packages that bridge the gap between social science and state-of-the-art NLP:

  1. text by Kjell, Giorgi, and Schwartz (2023): This package provides a seamless interface to HuggingFace’s vast library of Transformer models directly from R. While it requires an initial Python setup, it abstracts the complexity away, allowing you to use advanced word embeddings and language models without ever leaving your R console.

  2. mall by Ruiz (2026): One of the pioneers in the “tidy LLM” workflow, mall allows to run language model tasks—like sentiment analysis or summarisation—directly on data frames. It served as a key inspiration for the interface of stancer.

  3. manifestoR & manifestoberta: Developed by the Manifesto Project, manifestoR by Lewandowski, Merz, and Regel (2026) provides programmatic access to over a million hand-annotated political statements. For automated classification, the manifestoberta by Burst (2024) models (available via HuggingFace and usable through the text package) offer specialized performance for categorising political discourse across dozens of languages.

Citation & Attribution

This implementation is based on the COLA framework from Lan et al. (2024).

When using this package, please cite the original COLA paper.

References

Ahuvia, Aaron. 2001. “Traditional, Interpretive, and Reception Based Content Analyses: Improving the Ability of Content Analysis to Address Issues of Pragmatic and Theoretical Concern.” Soc Indic Res 54 (May). https://doi.org/10.1023/A:1011087813505.

Bestvater, Samuel E., and Burt L. Monroe. 2023. “Sentiment Is Not Stance: Target-Aware Opinion Classification for Political Text Analysis.” Political Analysis 31 (2): 235–56. https://doi.org/10.1017/pan.2022.10.

Burst, Pola AND Franzmann, Tobias AND Lehmann. 2024. “Manifestoberta. Version 56topics.sentence.2024.1.1.” Berlin / Göttingen: Wissenschaftszentrum Berlin für Sozialforschung / Göttinger Institut für Demokratieforschung. https://doi.org/10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1.

Kjell, Oscar, Salvatore Giorgi, and H. Andrew Schwartz. 2023. “The Text-Package: An r-Package for Analyzing and Visualizing Human Language Using Natural Language Processing and Deep Learning.” Psychological Methods. https://doi.org/10.1037/met0000542.

Krippendorff, K. 2004. Content Analysis: An Introduction to Its Methodology. Sage. https://books.google.pl/books?id=jYdAAQAAIAAJ.

Lan, Xiaochong, Chen Gao, Depeng Jin, and Yong Li. 2024. “Stance Detection with Collaborative Role-Infused LLM-Based Agents.” https://arxiv.org/abs/2310.10467.

Lewandowski, Jirka, Nicolas Merz, and Sven Regel. 2026. manifestoR: Access and Process Data and Documents of the Manifesto Project. https://doi.org/10.32614/CRAN.package.manifestoR.

Ruiz, Edgar. 2026. Mall: Run Multiple Large Language Model Predictions Against a Table, or Vectors. https://mlverse.github.io/mall/.

Wickham, Hadley, Joe Cheng, Aaron Jacobs, Garrick Aden-Buie, and Barret Schloerke. 2025. Ellmer: Chat with Large Language Models. https://ellmer.tidyverse.org.

Ziems, Caleb, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, and Diyi Yang. 2024. “Can Large Language Models Transform Computational Social Science?” Computational Linguistics 50 (1): 237–91. https://doi.org/10.1162/coli_a_00502.