The PySpark API docs for aggregate functions like stddev, stddev_samp, stddev_pop,
variance, var_samp, and var_pop don't document their return data types.
From reading the source code (CentralMomentAgg.scala), these all return DoubleType
regardless of input column type. However, this isn't stated anywhere in the official
Python API docs or the SQL function reference.
It would be helpful to:
- Explicitly document the return type (DoubleType) in the PySpark function docstrings
- Clarify whether this is a guaranteed API contract or an implementation detail
This applies to other aggregate functions as well (e.g., avg, sum, etc.) where return
types may not be obvious to users.
Spark version: 3.5.0