pyspark.sql.Column.isin#

Column.isin(*cols)[source]#

A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Changed in version 4.1.0: Also takes a single DataFrame to be used as IN subquery.

Parameters

colsAny: The values to compare with the column values. The result will only be true at a location if any value matches in the Column.

Returns

Column: Column of booleans showing whether each element in the Column is contained in cols.

Examples

>>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob"), (8, "Mike")], ["age", "name"])

Example 1: Filter rows with names in the specified values

>>> df[df.name.isin("Bob", "Mike")].orderBy("age").show()
+---+----+
|age|name|
+---+----+
|  5| Bob|
|  8|Mike|
+---+----+

Example 2: Filter rows with ages in the specified list

>>> df[df.age.isin([1, 2, 3])].show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
+---+-----+

Example 3: Filter rows with names not in the specified values

>>> df[~df.name.isin("Alice", "Bob")].show()
+---+----+
|age|name|
+---+----+
|  8|Mike|
+---+----+

Example 4: Take a DataFrame and work as IN subquery

>>> df.where(df.age.isin(spark.range(6))).orderBy("age").show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
+---+-----+

Example 5: Multiple values for IN subquery

>>> from pyspark.sql.functions import lit, struct
>>> df.where(struct(df.age, df.name).isin(spark.range(6).select("id", lit("Bob")))).show()
+---+----+
|age|name|
+---+----+
|  5| Bob|
+---+----+