pyspark.sql.functions.schema_of_xml#
- pyspark.sql.functions.schema_of_xml(xml, options=None)[source]#
- Parses a XML string and infers its schema in DDL format. - New in version 4.0.0. - Parameters
- xmlColumnor str
- a XML string or a foldable string column containing a XML string. 
- optionsdict, optional
- options to control parsing. accepts the same options as the XML datasource. See Data Source Option for the version you use. 
 
- xml
- Returns
- Column
- a string representation of a - StructTypeparsed from given XML.
 
 - Examples - Example 1: Parsing a simple XML with a single element - >>> from pyspark.sql import functions as sf >>> df = spark.range(1) >>> df.select(sf.schema_of_xml(sf.lit('<p><a>1</a></p>')).alias("xml")).collect() [Row(xml='STRUCT<a: BIGINT>')] - Example 2: Parsing an XML with multiple elements in an array - >>> from pyspark.sql import functions as sf >>> df.select(sf.schema_of_xml(sf.lit('<p><a>1</a><a>2</a></p>')).alias("xml")).collect() [Row(xml='STRUCT<a: ARRAY<BIGINT>>')] - Example 3: Parsing XML with options to exclude attributes - >>> from pyspark.sql import functions as sf >>> schema = sf.schema_of_xml('<p><a attr="2">1</a></p>', {'excludeAttribute':'true'}) >>> df.select(schema.alias("xml")).collect() [Row(xml='STRUCT<a: BIGINT>')] - Example 4: Parsing XML with complex structure - >>> from pyspark.sql import functions as sf >>> df.select( ... sf.schema_of_xml( ... sf.lit('<root><person><name>Alice</name><age>30</age></person></root>') ... ).alias("xml") ... ).collect() [Row(xml='STRUCT<person: STRUCT<age: BIGINT, name: STRING>>')] - Example 5: Parsing XML with nested arrays - >>> from pyspark.sql import functions as sf >>> df.select( ... sf.schema_of_xml( ... sf.lit('<data><values><value>1</value><value>2</value></values></data>') ... ).alias("xml") ... ).collect() [Row(xml='STRUCT<values: STRUCT<value: ARRAY<BIGINT>>>')]