pyspark.pandas.DataFrame.nsmallest#
- DataFrame.nsmallest(n, columns, keep='first')[source]#
- Return the first n rows ordered by columns in ascending order. - Return the first n rows with the smallest values in columns, in ascending order. The columns that are not specified are returned as well, but not used for ordering. - This method is equivalent to - df.sort_values(columns, ascending=True).head(n), but more performant. In pandas-on-Spark, thanks to Spark’s lazy execution and query optimizer, the two would have same performance.- Parameters
- nint
- Number of items to retrieve. 
- columnslist or str
- Column name or names to order by. 
- keep{‘first’, ‘last’}, default ‘first’. ‘all’ is not implemented yet.
- Determines which duplicates (if any) to keep. - - first: Keep the first occurrence. -- last: Keep the last occurrence.
 
- Returns
- DataFrame
 
 - See also - DataFrame.nlargest
- Return the first n rows ordered by columns in descending order. 
- DataFrame.sort_values
- Sort DataFrame by the values. 
- DataFrame.head
- Return the first n rows without re-ordering. 
 - Examples - >>> df = ps.DataFrame({'X': [1, 2, 3, 5, 6, 7, np.nan], ... 'Y': [6, 7, 8, 9, 10, 11, 12]}) >>> df X Y 0 1.0 6 1 2.0 7 2 3.0 8 3 5.0 9 4 6.0 10 5 7.0 11 6 NaN 12 - In the following example, we will use - nsmallestto select the three rows having the smallest values in column “X”.- >>> df.nsmallest(n=3, columns='X') X Y 0 1.0 6 1 2.0 7 2 3.0 8 - To order by the smallest values in column “Y” and then “X”, we can specify multiple columns like in the next example. - >>> df.nsmallest(n=3, columns=['Y', 'X']) X Y 0 1.0 6 1 2.0 7 2 3.0 8 - The examples below show how ties are resolved, which is decided by keep. - >>> tied_df = ps.DataFrame({'X': [1, 1, 2, 2, 3]}, index=['a', 'b', 'c', 'd', 'e']) >>> tied_df X a 1 b 1 c 2 d 2 e 3 - When using keep=’first’ (default), ties are resolved in order: - >>> tied_df.nsmallest(3, 'X') X a 1 b 1 c 2 - >>> tied_df.nsmallest(3, 'X', keep='first') X a 1 b 1 c 2 - When using keep=’last’, ties are resolved in reverse order: - >>> tied_df.nsmallest(3, 'X', keep='last') X b 1 a 1 d 2