pyspark.SparkContext.binaryRecords#
- SparkContext.binaryRecords(path, recordLength)[source]#
- Load data from a flat binary file, assuming each record is a set of numbers with the specified numerical format (see ByteBuffer), and the number of bytes per record is constant. - New in version 1.3.0. - Parameters
- pathstr
- Directory to the input data files 
- recordLengthint
- The length at which to split the records 
 
- Returns
- RDD
- RDD of data with values, represented as byte arrays 
 
 - See also - Examples - >>> import os >>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="binaryRecords") as d: ... # Write a temporary file ... with open(os.path.join(d, "1.bin"), "w") as f: ... for i in range(3): ... _ = f.write("%04d" % i) ... ... # Write another file ... with open(os.path.join(d, "2.bin"), "w") as f: ... for i in [-1, -2, -10]: ... _ = f.write("%04d" % i) ... ... collected = sorted(sc.binaryRecords(d, 4).collect()) - >>> collected [b'-001', b'-002', b'-010', b'0000', b'0001', b'0002']