pyspark.pandas.DataFrame.unstack#
- DataFrame.unstack()[source]#
- Pivot the (necessarily hierarchical) index labels. - Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. - If the index is not a MultiIndex, the output will be a Series. - Note - If the index is a MultiIndex, the output DataFrame could be very wide, and it could cause a serious performance degradation since Spark partitions its row based. - Returns
- Series or DataFrame
 
 - See also - DataFrame.pivot
- Pivot a table based on column values. 
- DataFrame.stack
- Pivot a level of the column labels (inverse operation from unstack). 
 - Examples - >>> df = ps.DataFrame({"A": {"0": "a", "1": "b", "2": "c"}, ... "B": {"0": "1", "1": "3", "2": "5"}, ... "C": {"0": "2", "1": "4", "2": "6"}}, ... columns=["A", "B", "C"]) >>> df A B C 0 a 1 2 1 b 3 4 2 c 5 6 - >>> df.unstack().sort_index() A 0 a 1 b 2 c B 0 1 1 3 2 5 C 0 2 1 4 2 6 dtype: object - >>> df.columns = pd.MultiIndex.from_tuples([('X', 'A'), ('X', 'B'), ('Y', 'C')]) >>> df.unstack().sort_index() X A 0 a 1 b 2 c B 0 1 1 3 2 5 Y C 0 2 1 4 2 6 dtype: object - For MultiIndex case: - >>> df = ps.DataFrame({"A": ["a", "b", "c"], ... "B": [1, 3, 5], ... "C": [2, 4, 6]}, ... columns=["A", "B", "C"]) >>> df = df.set_index('A', append=True) >>> df B C A 0 a 1 2 1 b 3 4 2 c 5 6 >>> df.unstack().sort_index() B C A a b c a b c 0 1.0 NaN NaN 2.0 NaN NaN 1 NaN 3.0 NaN NaN 4.0 NaN 2 NaN NaN 5.0 NaN NaN 6.0