pyspark.pandas.DataFrame.unstack#

DataFrame.unstack()[source]#

Pivot the (necessarily hierarchical) index labels.

Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

If the index is not a MultiIndex, the output will be a Series.

Note

If the index is a MultiIndex, the output DataFrame could be very wide, and it could cause a serious performance degradation since Spark partitions its row based.

Returns

Series or DataFrame

See also

DataFrame.pivot: Pivot a table based on column values.
DataFrame.stack: Pivot a level of the column labels (inverse operation from unstack).

Examples

>>> df = ps.DataFrame({"A": {"0": "a", "1": "b", "2": "c"},
...                    "B": {"0": "1", "1": "3", "2": "5"},
...                    "C": {"0": "2", "1": "4", "2": "6"}},
...                   columns=["A", "B", "C"])
>>> df
   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6

>>> df.unstack().sort_index()
A  0    a
   1    b
   2    c
B  0    1
   1    3
   2    5
C  0    2
   1    4
   2    6
dtype: object

>>> df.columns = pd.MultiIndex.from_tuples([('X', 'A'), ('X', 'B'), ('Y', 'C')])
>>> df.unstack().sort_index()
X  A  0    a
      1    b
      2    c
   B  0    1
      1    3
      2    5
Y  C  0    2
      1    4
      2    6
dtype: object

For MultiIndex case:

>>> df = ps.DataFrame({"A": ["a", "b", "c"],
...                    "B": [1, 3, 5],
...                    "C": [2, 4, 6]},
...                   columns=["A", "B", "C"])
>>> df = df.set_index('A', append=True)
>>> df  
     B  C
  A
0 a  1  2
1 b  3  4
2 c  5  6
>>> df.unstack().sort_index()  
     B              C
A    a    b    c    a    b    c
0  1.0  NaN  NaN  2.0  NaN  NaN
1  NaN  3.0  NaN  NaN  4.0  NaN
2  NaN  NaN  5.0  NaN  NaN  6.0