Skip to content

[DOCS] Document return types for aggregate functions (stddev, variance, etc.) #54986

@sanketitnal

Description

@sanketitnal

The PySpark API docs for aggregate functions like stddev, stddev_samp, stddev_pop,
variance, var_samp, and var_pop don't document their return data types.

From reading the source code (CentralMomentAgg.scala), these all return DoubleType
regardless of input column type. However, this isn't stated anywhere in the official
Python API docs or the SQL function reference.

It would be helpful to:

  1. Explicitly document the return type (DoubleType) in the PySpark function docstrings
  2. Clarify whether this is a guaranteed API contract or an implementation detail

This applies to other aggregate functions as well (e.g., avg, sum, etc.) where return
types may not be obvious to users.

Spark version: 3.5.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions