SparkStageToDAG

When diagnosing performance issues in Spark it's common to quickly identify both the slowest stage and the slowest SQL query but then have no way to figure out where in the SQL DAG the slowest stage maps to. This script, written for Databricks, fixes this. Just plug in the ID of a stage, run the notebook SparkStageToDAG, and it will print out which nodes in the DAG the stage corresponds to.

It needs to run on the same cluster as the stage/query you're trying to debug.

How to use

Set the stage_id parameter to the ID of the stage you're interested in
Set the working_dir parameter to a DBFS path or a Volume. The script will copy logs from your cluster to this location & then process them.
Run the SparkStageToDAG on the cluster that ran the query you're trying to profile.

Interpreting the results

In the last cell you will get a table that looks like this:

node_id	node_name
7	PhotonScan parquet sternp.tmp.flights_tmp
8	PhotonShuffleExchangeSink
9	PhotonShuffleMapStage

The node_id column is most important. It tells you the id of the node in the SQL DAG. This number will show up in the name of the node in the SQL DAG:

The node_id is also used in the SQL Plan in the details under the DAG:

The node_name is just for informational purpose and may provide a clue before you even look at the SQL DAG.

Limitations

Doesn't work on shared clusters
Only works on the same cluster as the stage/query you're trying to debug.

Built By

Peter Stern

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
SparkStageToDAG.py		SparkStageToDAG.py
schema.json		schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkStageToDAG

How to use

Interpreting the results

Limitations

Built By

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SparkStageToDAG

How to use

Interpreting the results

Limitations

Built By

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages