pyspark.sql.Column.isin#
- Column.isin(*cols)[source]#
- A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments. - New in version 1.5.0. - Changed in version 3.4.0: Supports Spark Connect. - Changed in version 4.1.0: Also takes a single - DataFrameto be used as IN subquery.- Parameters
- colsAny
- The values to compare with the column values. The result will only be true at a location if any value matches in the Column. 
 
- Returns
- Column
- Column of booleans showing whether each element in the Column is contained in cols. 
 
 - Examples - >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob"), (8, "Mike")], ["age", "name"]) - Example 1: Filter rows with names in the specified values - >>> df[df.name.isin("Bob", "Mike")].orderBy("age").show() +---+----+ |age|name| +---+----+ | 5| Bob| | 8|Mike| +---+----+ - Example 2: Filter rows with ages in the specified list - >>> df[df.age.isin([1, 2, 3])].show() +---+-----+ |age| name| +---+-----+ | 2|Alice| +---+-----+ - Example 3: Filter rows with names not in the specified values - >>> df[~df.name.isin("Alice", "Bob")].show() +---+----+ |age|name| +---+----+ | 8|Mike| +---+----+ - Example 4: Take a - DataFrameand work as IN subquery- >>> df.where(df.age.isin(spark.range(6))).orderBy("age").show() +---+-----+ |age| name| +---+-----+ | 2|Alice| | 5| Bob| +---+-----+ - Example 5: Multiple values for IN subquery - >>> from pyspark.sql.functions import lit, struct >>> df.where(struct(df.age, df.name).isin(spark.range(6).select("id", lit("Bob")))).show() +---+----+ |age|name| +---+----+ | 5| Bob| +---+----+