org.apache.spark.input.PortableDataStream

All Implemented Interfaces:: Serializable

public class PortableDataStream extends Object implements Serializable

A class that allows DataStreams to be serialized and moved around by not creating them until they need to be read

See Also:

Serialized Form

Note:

TaskAttemptContext is not serializable resulting in the confBytes construct, CombineFileSplit is not serializable resulting in the splitBytes construct

Constructor Summary

Constructors

Constructor

Description

PortableDataStream(org.apache.hadoop.mapreduce.lib.input.CombineFileSplit isplit, org.apache.hadoop.mapreduce.TaskAttemptContext context, Integer index)
Method Summary

Modifier and Type

Method

Description

org.apache.hadoop.conf.Configuration

getConfiguration()

String

getPath()

DataInputStream

open()

Create a new DataInputStream from the split and context.

byte[]

toArray()

Read the file as a byte array

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- PortableDataStream
  
  public PortableDataStream(org.apache.hadoop.mapreduce.lib.input.CombineFileSplit isplit, org.apache.hadoop.mapreduce.TaskAttemptContext context, Integer index)
Method Details
- getConfiguration
  
  public org.apache.hadoop.conf.Configuration getConfiguration()
- getPath
  
  public String getPath()
- open
  
  public DataInputStream open()
  
  Create a new DataInputStream from the split and context. The user of this method is responsible for closing the stream after usage.
  
  Returns:
  
  (undocumented)
- toArray
  
  public byte[] toArray()
  
  Read the file as a byte array
  
  Returns:
  
  (undocumented)

Class PortableDataStream

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

PortableDataStream

Method Details

getConfiguration

getPath

open

toArray