pyspark.pandas.DataFrame.transpose¶
-
DataFrame.
transpose
() → pyspark.pandas.frame.DataFrame[source]¶ Transpose index and columns.
Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property
T
is an accessor to the methodtranspose()
.Note
This method is based on an expensive operation due to the nature of big data. Internally it needs to generate each row for each value, and then group twice - it is a huge operation. To prevent misusage, this method has the ‘compute.max_rows’ default limit of input length, and raises a ValueError.
>>> from pyspark.pandas.config import option_context >>> with option_context('compute.max_rows', 1000): ... ps.DataFrame({'a': range(1001)}).transpose() Traceback (most recent call last): ... ValueError: Current DataFrame has more then the given limit 1000 rows. Please set 'compute.max_rows' by using 'pyspark.pandas.config.set_option' to retrieve to retrieve more than 1000 rows. Note that, before changing the 'compute.max_rows', this operation is considerably expensive.
- Returns
- DataFrame
The transposed DataFrame.
Notes
Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the coerced dtype. For instance, if int and float have to be placed in same column, it becomes float. If type coercion is not possible, it fails.
Also, note that the values in index should be unique because they become unique column names.
In addition, if Spark 2.3 is used, the types should always be exactly same.
Examples
Square DataFrame with homogeneous dtype
>>> d1 = {'col1': [1, 2], 'col2': [3, 4]} >>> df1 = ps.DataFrame(data=d1, columns=['col1', 'col2']) >>> df1 col1 col2 0 1 3 1 2 4
>>> df1_transposed = df1.T.sort_index() >>> df1_transposed 0 1 col1 1 2 col2 3 4
When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:
>>> df1.dtypes col1 int64 col2 int64 dtype: object >>> df1_transposed.dtypes 0 int64 1 int64 dtype: object
Non-square DataFrame with mixed dtypes
>>> d2 = {'score': [9.5, 8], ... 'kids': [0, 0], ... 'age': [12, 22]} >>> df2 = ps.DataFrame(data=d2, columns=['score', 'kids', 'age']) >>> df2 score kids age 0 9.5 0 12 1 8.0 0 22
>>> df2_transposed = df2.T.sort_index() >>> df2_transposed 0 1 age 12.0 22.0 kids 0.0 0.0 score 9.5 8.0
When the DataFrame has mixed dtypes, we get a transposed DataFrame with the coerced dtype:
>>> df2.dtypes score float64 kids int64 age int64 dtype: object
>>> df2_transposed.dtypes 0 float64 1 float64 dtype: object