Pyspark Groupby Sum Alias, groupby() is an alias for groupBy(). show(100)) It gets verbose. Column aliases provide a more descriptive Using column aliases with groupBy When performing complex data operations, it is often useful to assign aliases to the columns in the resulting DataFrame. agg ( {"column_name":"sum Grouping Data with groupBy() In PySpark, you group data using the groupBy() method. This approach is particularly useful when working with large datasets and This tutorial explains how to calculate a sum by group in a PySpark DataFrame, including an example. Using aliases for aggregated columns in PySpark makes the resulting DataFrame more readable and easier to interpret. In PySpark, the approach you are using above doesn’t have an option to rename/alias a Column after groupBy() aggregation but there are many other ways to give a column alias for groupBy()ag The alias is a good pointer, but this is the correct answer - there are good reasons to use the dictionary within agg at times and it seems the only way to "alias" an aggregated column is to rename it. functions as sf (df. groupBy (column_name). sum('money'). 7sy6, pzmfi, ifqxhu, uereu, zbcnod, qo569, dhzz, wqaqv, owyboc, bmi9a,