pyspark.sql.functions.array_agg#
- pyspark.sql.functions.array_agg(col)[source]#
- Aggregate function: returns a list of objects with duplicates. - New in version 3.5.0. - Parameters
- colColumnor column name
- target column to compute on. 
 
- col
- Returns
- Column
- list of objects with duplicates. 
 
 - Examples - Example 1: Using array_agg function on an int column - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],[1],[2]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c')).alias('sorted_list')).show() +-----------+ |sorted_list| +-----------+ | [1, 1, 2]| +-----------+ - Example 2: Using array_agg function on a string column - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([["apple"],["apple"],["banana"]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c')).alias('sorted_list')).show(truncate=False) +----------------------+ |sorted_list | +----------------------+ |[apple, apple, banana]| +----------------------+ - Example 3: Using array_agg function on a column with null values - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],[None],[2]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c')).alias('sorted_list')).show() +-----------+ |sorted_list| +-----------+ | [1, 2]| +-----------+ - Example 4: Using array_agg function on a column with different data types - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],["apple"],[2]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c')).alias('sorted_list')).show() +-------------+ | sorted_list| +-------------+ |[1, 2, apple]| +-------------+