Skip to content

Add alias method to DataFrame for improved column name resolution#93

Open
georgemgichuru wants to merge 1 commit into
irfanghat:branch-4.1-stablefrom
georgemgichuru:branch-4.1-stable
Open

Add alias method to DataFrame for improved column name resolution#93
georgemgichuru wants to merge 1 commit into
irfanghat:branch-4.1-stablefrom
georgemgichuru:branch-4.1-stable

Conversation

@georgemgichuru

Copy link
Copy Markdown

This pull request introduces aliasing support to the DataFrame class, making it easier to handle column name ambiguities, especially during self-joins. The main change is the addition of an alias method, along with its documentation and implementation.

Aliasing support for DataFrames:

  • Added an alias method to the DataFrame class in dataframe.h, including detailed documentation and usage example. This method returns a new DataFrame with an alias set, which is useful for resolving column ambiguities in operations like self-joins.
  • Implemented the alias method in dataframe.cpp, constructing a new logical plan with a subquery alias node that wraps the current DataFrame's plan and sets the provided alias name.

Copilot AI review requested due to automatic review settings May 5, 2026 09:01

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds DataFrame-level aliasing by introducing a new DataFrame::alias() API that wraps the existing logical plan in a Spark Connect SubqueryAlias relation, enabling qualified column references (useful for disambiguation in patterns like self-joins).

Changes:

  • Added DataFrame::alias(const std::string&) declaration and Doxygen documentation in dataframe.h.
  • Implemented DataFrame::alias() in dataframe.cpp by constructing a SubqueryAlias plan node over the current plan.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
src/dataframe.h Declares the new alias() API and documents intended usage.
src/dataframe.cpp Implements alias() by building a SubqueryAlias-wrapped logical plan.

Comment thread src/dataframe.h
* @example
* auto df1 = df.alias("a");
* auto df2 = df.alias("b");
* auto joined = df1.join(df2, col("a.id") == col("b.id"));
Comment thread src/dataframe.h
Comment on lines +440 to +448
* @param alias_name The string alias to apply to this DataFrame.
* @return A new DataFrame containing the aliased logical plan.
*
* @example
* auto df1 = df.alias("a");
* auto df2 = df.alias("b");
* auto joined = df1.join(df2, col("a.id") == col("b.id"));
*/
DataFrame alias(const std::string& alias_name) const;
Comment thread src/dataframe.cpp
Comment on lines +1177 to +1186
Plan plan;
auto* subquery_alias = plan.mutable_root()->mutable_subquery_alias();

if (this->plan_.has_root())
{
subquery_alias->mutable_input()->CopyFrom(this->plan_.root());
}

subquery_alias->set_alias(alias_name);

Comment thread src/dataframe.cpp
Comment on lines +1172 to +1174
/*
*@brief aliasing support for dataframes
* */
Comment thread src/dataframe.cpp
Comment on lines +1175 to +1187
DataFrame DataFrame::alias(const std::string& alias_name) const
{
Plan plan;
auto* subquery_alias = plan.mutable_root()->mutable_subquery_alias();

if (this->plan_.has_root())
{
subquery_alias->mutable_input()->CopyFrom(this->plan_.root());
}

subquery_alias->set_alias(alias_name);

return DataFrame(stub_, plan, session_id_, user_id_);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants