Package org.apache.cassandra.spark.data
Class CassandraDataLayer
- java.lang.Object
-
- org.apache.cassandra.spark.data.DataLayer
-
- org.apache.cassandra.spark.data.PartitionedDataLayer
-
- org.apache.cassandra.spark.data.CassandraDataLayer
-
- All Implemented Interfaces:
java.io.Serializable,StartupValidatable
public class CassandraDataLayer extends PartitionedDataLayer implements StartupValidatable, java.io.Serializable
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCassandraDataLayer.Serializer-
Nested classes/interfaces inherited from class org.apache.cassandra.spark.data.PartitionedDataLayer
PartitionedDataLayer.AvailabilityHint, PartitionedDataLayer.ReplicaSet
-
-
Field Summary
Fields Modifier and Type Field Description protected java.util.Map<java.lang.String,PartitionedDataLayer.AvailabilityHint>availabilityHintsprotected java.util.Map<java.lang.String,org.apache.cassandra.bridge.BigNumberConfigImpl>bigNumberConfigMapprotected org.apache.cassandra.bridge.CassandraBridgebridgeprotected java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance>clusterConfigprotected org.apache.cassandra.spark.data.CqlTablecqlTableprotected booleanenableStatsprotected java.lang.Stringkeyspaceprotected java.lang.StringlastModifiedTimestampFieldstatic org.slf4j.LoggerLOGGERprotected java.lang.StringmaybeQuotedKeyspaceprotected java.lang.StringmaybeQuotedTableprotected booleanquoteIdentifiersprotected booleanreadIndexOffsetprotected java.util.List<org.apache.cassandra.spark.config.SchemaFeature>requestedFeaturesprotected java.util.Map<java.lang.String,org.apache.cassandra.spark.data.ReplicationFactor>rfMapprotected o.a.c.sidecar.client.shaded.client.SidecarClientsidecarprotected org.apache.cassandra.clients.Sidecar.ClientConfigsidecarClientConfigprotected java.lang.StringsidecarInstancesprotected intsidecarPortprotected java.lang.StringsnapshotNameprotected java.lang.Stringtableprotected org.apache.cassandra.spark.utils.TimeProvidertimeProviderprotected org.apache.cassandra.spark.data.partitioner.TokenPartitionertokenPartitionerprotected booleanuseIncrementalRepair-
Fields inherited from class org.apache.cassandra.spark.data.PartitionedDataLayer
consistencyLevel, datacenter
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedCassandraDataLayer(java.lang.String keyspace, java.lang.String table, boolean quoteIdentifiers, java.lang.String snapshotName, java.lang.String datacenter, org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig, org.apache.cassandra.secrets.SslConfig sslConfig, org.apache.cassandra.spark.data.CqlTable cqlTable, org.apache.cassandra.spark.data.partitioner.TokenPartitioner tokenPartitioner, org.apache.cassandra.bridge.CassandraVersion version, org.apache.cassandra.spark.data.partitioner.ConsistencyLevel consistencyLevel, java.lang.String sidecarInstances, int sidecarPort, java.util.Map<java.lang.String,PartitionedDataLayer.AvailabilityHint> availabilityHints, java.util.Map<java.lang.String,org.apache.cassandra.bridge.BigNumberConfigImpl> bigNumberConfigMap, boolean enableStats, boolean readIndexOffset, boolean useIncrementalRepair, java.lang.String lastModifiedTimestampField, java.util.List<org.apache.cassandra.spark.config.SchemaFeature> requestedFeatures, java.util.Map<java.lang.String,org.apache.cassandra.spark.data.ReplicationFactor> rfMap, org.apache.cassandra.spark.utils.TimeProvider timeProvider, org.apache.cassandra.spark.sparksql.filters.SSTableTimeRangeFilter sstableTimeRangeFilter)CassandraDataLayer(ClientConfig options, org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig, org.apache.cassandra.secrets.SslConfig sslConfig)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidawait(java.util.concurrent.CountDownLatch latch)org.apache.cassandra.bridge.BigNumberConfigbigNumberConfig(org.apache.cassandra.spark.data.CqlField field)DataLayer can override this method to return the BigInteger/BigDecimal precision/scale values for a given columnjava.util.Map<java.lang.String,org.apache.cassandra.bridge.BigNumberConfigImpl>bigNumberConfigMap()org.apache.cassandra.bridge.CassandraBridgebridge()protected voidclearSnapshot(java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig, ClientConfig options)org.apache.cassandra.spark.data.CqlTablecqlTable()org.apache.cassandra.spark.data.partitioner.CassandraRingcreateCassandraRingFromRing(org.apache.cassandra.spark.data.partitioner.Partitioner partitioner, org.apache.cassandra.spark.data.ReplicationFactor replicationFactor, o.a.c.sidecar.client.shaded.common.response.RingResponse ring)protected voiddialHome(ClientConfig options)booleanequals(java.lang.Object other)protected java.util.concurrent.ExecutorServiceexecutorService()DataLayer implementation should provide an ExecutorService for doing blocking I/O when opening SSTable readers.protected PartitionedDataLayer.AvailabilityHintgetAvailability(org.apache.cassandra.spark.data.partitioner.CassandraInstance instance)Data Layer can override this method to hint availability of a Cassandra instance so Bulk Reader attempts UP instances first, and avoids instances known to be down e.g.protected java.lang.StringgetEffectiveCassandraVersionForRead(java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig, o.a.c.sidecar.client.shaded.common.response.NodeSettings nodeSettings)protected org.apache.cassandra.spark.data.SizinggetSizing(java.util.concurrent.CompletableFuture<o.a.c.sidecar.client.shaded.common.response.RingResponse> ringFuture, org.apache.cassandra.spark.data.ReplicationFactor replicationFactor, ClientConfig options)Returns theSizingobject based on thesizingoption provided by the user, orDefaultSizingas the default sizinginthashCode()voidinitialize(ClientConfig options)protected java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance>initializeClusterConfig(ClientConfig options)protected voidinitInstanceMap()protected voidinitSidecarClient()protected booleanisExhausted(java.lang.Throwable throwable)java.lang.StringjobId()java.util.concurrent.CompletableFuture<java.util.stream.Stream<org.apache.cassandra.spark.data.SSTable>>listInstance(int partitionId, com.google.common.collect.Range<java.math.BigInteger> range, org.apache.cassandra.spark.data.partitioner.CassandraInstance instance)booleanreadIndexOffset()When true the SSTableReader should attempt to find the offset into the Data.db file for the Spark worker's token range.org.apache.cassandra.spark.data.ReplicationFactorreplicationFactor(java.lang.String keyspace)java.util.List<org.apache.cassandra.spark.config.SchemaFeature>requestedFeatures()org.apache.cassandra.spark.data.partitioner.CassandraRingring()protected voidshutdownHook(ClientConfig options)org.apache.cassandra.spark.sparksql.filters.SSTableTimeRangeFiltersstableTimeRangeFilter()ReturnsSSTableTimeRangeFilterto filter out SSTables based on min and max timestamp.voidstartupValidate()Performs startup validation usingStartupValidatorwith currently registeredStartupValidations, throws aRuntimeExceptionif any violations are found, needs to be invoked once per execution before any actual work is startedorg.apache.cassandra.analytics.stats.Statsstats()Override to plug in your own Stats instrumentation for recording internal eventsorg.apache.cassandra.spark.utils.TimeProvidertimeProvider()org.apache.cassandra.spark.data.partitioner.TokenPartitionertokenPartitioner()booleanuseIncrementalRepair()When true the SSTableReader should only read repaired SSTables from a single 'primary repair' replica and read unrepaired SSTables at the user set consistency level-
Methods inherited from class org.apache.cassandra.spark.data.PartitionedDataLayer
consistencylevel, filterNonIntersectingSSTables, isInPartition, partitionCount, partitioner, partitionKeyFiltersInRange, sparkRangeFilter, sstables, validateReplicationFactor, validateReplicationFactor
-
Methods inherited from class org.apache.cassandra.spark.data.DataLayer
openCompactionScanner, openCompactionScanner, openPartitionSizeIterator, partitionSizeStructType, structType, typeConverter, unsupportedPushDownFilters, version
-
-
-
-
Field Detail
-
LOGGER
public static final org.slf4j.Logger LOGGER
-
snapshotName
protected java.lang.String snapshotName
-
quoteIdentifiers
protected boolean quoteIdentifiers
-
keyspace
protected java.lang.String keyspace
-
table
protected java.lang.String table
-
maybeQuotedKeyspace
protected java.lang.String maybeQuotedKeyspace
-
maybeQuotedTable
protected java.lang.String maybeQuotedTable
-
bridge
protected org.apache.cassandra.bridge.CassandraBridge bridge
-
sidecarInstances
protected java.lang.String sidecarInstances
-
sidecarPort
protected int sidecarPort
-
clusterConfig
protected transient java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig
-
tokenPartitioner
protected org.apache.cassandra.spark.data.partitioner.TokenPartitioner tokenPartitioner
-
availabilityHints
protected java.util.Map<java.lang.String,PartitionedDataLayer.AvailabilityHint> availabilityHints
-
sidecarClientConfig
protected org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig
-
bigNumberConfigMap
protected java.util.Map<java.lang.String,org.apache.cassandra.bridge.BigNumberConfigImpl> bigNumberConfigMap
-
enableStats
protected boolean enableStats
-
readIndexOffset
protected boolean readIndexOffset
-
useIncrementalRepair
protected boolean useIncrementalRepair
-
requestedFeatures
protected java.util.List<org.apache.cassandra.spark.config.SchemaFeature> requestedFeatures
-
rfMap
protected java.util.Map<java.lang.String,org.apache.cassandra.spark.data.ReplicationFactor> rfMap
-
lastModifiedTimestampField
@Nullable protected java.lang.String lastModifiedTimestampField
-
cqlTable
protected volatile org.apache.cassandra.spark.data.CqlTable cqlTable
-
timeProvider
protected transient org.apache.cassandra.spark.utils.TimeProvider timeProvider
-
sidecar
protected transient o.a.c.sidecar.client.shaded.client.SidecarClient sidecar
-
-
Constructor Detail
-
CassandraDataLayer
public CassandraDataLayer(@NotNull ClientConfig options, @NotNull org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig, @Nullable org.apache.cassandra.secrets.SslConfig sslConfig)
-
CassandraDataLayer
protected CassandraDataLayer(@Nullable java.lang.String keyspace, @Nullable java.lang.String table, boolean quoteIdentifiers, @NotNull java.lang.String snapshotName, @Nullable java.lang.String datacenter, @NotNull org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig, @Nullable org.apache.cassandra.secrets.SslConfig sslConfig, @NotNull org.apache.cassandra.spark.data.CqlTable cqlTable, @NotNull org.apache.cassandra.spark.data.partitioner.TokenPartitioner tokenPartitioner, @NotNull org.apache.cassandra.bridge.CassandraVersion version, @NotNull org.apache.cassandra.spark.data.partitioner.ConsistencyLevel consistencyLevel, @NotNull java.lang.String sidecarInstances, @NotNull int sidecarPort, @NotNull java.util.Map<java.lang.String,PartitionedDataLayer.AvailabilityHint> availabilityHints, @NotNull java.util.Map<java.lang.String,org.apache.cassandra.bridge.BigNumberConfigImpl> bigNumberConfigMap, boolean enableStats, boolean readIndexOffset, boolean useIncrementalRepair, @Nullable java.lang.String lastModifiedTimestampField, java.util.List<org.apache.cassandra.spark.config.SchemaFeature> requestedFeatures, @NotNull java.util.Map<java.lang.String,org.apache.cassandra.spark.data.ReplicationFactor> rfMap, org.apache.cassandra.spark.utils.TimeProvider timeProvider, org.apache.cassandra.spark.sparksql.filters.SSTableTimeRangeFilter sstableTimeRangeFilter)
-
-
Method Detail
-
initialize
public void initialize(@NotNull ClientConfig options)
-
shutdownHook
protected void shutdownHook(ClientConfig options)
-
isExhausted
protected boolean isExhausted(@Nullable java.lang.Throwable throwable)
-
timeProvider
public org.apache.cassandra.spark.utils.TimeProvider timeProvider()
- Specified by:
timeProviderin classDataLayer- Returns:
- a TimeProvider
-
useIncrementalRepair
public boolean useIncrementalRepair()
Description copied from class:DataLayerWhen true the SSTableReader should only read repaired SSTables from a single 'primary repair' replica and read unrepaired SSTables at the user set consistency level- Overrides:
useIncrementalRepairin classDataLayer- Returns:
- true if the SSTableReader should only read repaired SSTables on single 'repair primary' replica
-
readIndexOffset
public boolean readIndexOffset()
Description copied from class:DataLayerWhen true the SSTableReader should attempt to find the offset into the Data.db file for the Spark worker's token range. This works by first binary searching the Summary.db file to find offset into Index.db file, then reading the Index.db from the Summary.db offset to find the first offset in the Data.db file that overlaps with the Spark worker's token range. This enables the reader to start reading from the first in-range partition in the Data.db file, and close after reading the last partition. This feature improves scalability as more Spark workers shard the token range into smaller subranges. This avoids wastefully reading the Data.db file for out-of-range partitions.- Overrides:
readIndexOffsetin classDataLayer- Returns:
- true if, the SSTableReader should attempt to read Summary.db and Index.db files to find the start index offset into the Data.db file that overlaps with the Spark workers token range
-
initInstanceMap
protected void initInstanceMap()
-
initSidecarClient
protected void initSidecarClient()
-
bridge
public org.apache.cassandra.bridge.CassandraBridge bridge()
-
stats
public org.apache.cassandra.analytics.stats.Stats stats()
Description copied from class:DataLayerOverride to plug in your own Stats instrumentation for recording internal events
-
requestedFeatures
public java.util.List<org.apache.cassandra.spark.config.SchemaFeature> requestedFeatures()
- Overrides:
requestedFeaturesin classDataLayer
-
ring
public org.apache.cassandra.spark.data.partitioner.CassandraRing ring()
- Specified by:
ringin classPartitionedDataLayer
-
tokenPartitioner
public org.apache.cassandra.spark.data.partitioner.TokenPartitioner tokenPartitioner()
- Specified by:
tokenPartitionerin classPartitionedDataLayer
-
executorService
protected java.util.concurrent.ExecutorService executorService()
Description copied from class:DataLayerDataLayer implementation should provide an ExecutorService for doing blocking I/O when opening SSTable readers. It is the responsibility of the DataLayer implementation to appropriately size and manage this ExecutorService.- Specified by:
executorServicein classDataLayer- Returns:
- executor service
-
jobId
public java.lang.String jobId()
-
sstableTimeRangeFilter
@NotNull public org.apache.cassandra.spark.sparksql.filters.SSTableTimeRangeFilter sstableTimeRangeFilter()
Description copied from class:DataLayerReturnsSSTableTimeRangeFilterto filter out SSTables based on min and max timestamp.- Overrides:
sstableTimeRangeFilterin classDataLayer- Returns:
SSTableTimeRangeFilter
-
cqlTable
public org.apache.cassandra.spark.data.CqlTable cqlTable()
-
replicationFactor
public org.apache.cassandra.spark.data.ReplicationFactor replicationFactor(java.lang.String keyspace)
- Specified by:
replicationFactorin classPartitionedDataLayer
-
getAvailability
protected PartitionedDataLayer.AvailabilityHint getAvailability(org.apache.cassandra.spark.data.partitioner.CassandraInstance instance)
Description copied from class:PartitionedDataLayerData Layer can override this method to hint availability of a Cassandra instance so Bulk Reader attempts UP instances first, and avoids instances known to be down e.g. if create snapshot request already failed- Overrides:
getAvailabilityin classPartitionedDataLayer- Parameters:
instance- a cassandra instance- Returns:
- availability hint
-
listInstance
public java.util.concurrent.CompletableFuture<java.util.stream.Stream<org.apache.cassandra.spark.data.SSTable>> listInstance(int partitionId, @NotNull com.google.common.collect.Range<java.math.BigInteger> range, @NotNull org.apache.cassandra.spark.data.partitioner.CassandraInstance instance)- Specified by:
listInstancein classPartitionedDataLayer
-
hashCode
public int hashCode()
- Overrides:
hashCodein classPartitionedDataLayer
-
equals
public boolean equals(java.lang.Object other)
- Overrides:
equalsin classPartitionedDataLayer
-
bigNumberConfigMap
public java.util.Map<java.lang.String,org.apache.cassandra.bridge.BigNumberConfigImpl> bigNumberConfigMap()
-
bigNumberConfig
public org.apache.cassandra.bridge.BigNumberConfig bigNumberConfig(org.apache.cassandra.spark.data.CqlField field)
Description copied from class:DataLayerDataLayer can override this method to return the BigInteger/BigDecimal precision/scale values for a given column- Overrides:
bigNumberConfigin classDataLayer- Parameters:
field- the CQL field- Returns:
- a BigNumberConfig object that specifies the desired precision/scale for BigDecimal and BigInteger
-
createCassandraRingFromRing
public org.apache.cassandra.spark.data.partitioner.CassandraRing createCassandraRingFromRing(org.apache.cassandra.spark.data.partitioner.Partitioner partitioner, org.apache.cassandra.spark.data.ReplicationFactor replicationFactor, o.a.c.sidecar.client.shaded.common.response.RingResponse ring)
-
startupValidate
public void startupValidate()
Description copied from interface:StartupValidatablePerforms startup validation usingStartupValidatorwith currently registeredStartupValidations, throws aRuntimeExceptionif any violations are found, needs to be invoked once per execution before any actual work is started- Specified by:
startupValidatein interfaceStartupValidatable
-
initializeClusterConfig
protected java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> initializeClusterConfig(ClientConfig options)
-
getEffectiveCassandraVersionForRead
protected java.lang.String getEffectiveCassandraVersionForRead(java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig, o.a.c.sidecar.client.shaded.common.response.NodeSettings nodeSettings)
-
dialHome
protected void dialHome(@NotNull ClientConfig options)
-
clearSnapshot
protected void clearSnapshot(java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig, @NotNull ClientConfig options)
-
getSizing
protected org.apache.cassandra.spark.data.Sizing getSizing(java.util.concurrent.CompletableFuture<o.a.c.sidecar.client.shaded.common.response.RingResponse> ringFuture, org.apache.cassandra.spark.data.ReplicationFactor replicationFactor, ClientConfig options)Returns theSizingobject based on thesizingoption provided by the user, orDefaultSizingas the default sizing- Parameters:
ringFuture- a future with a view of the ringreplicationFactor- the replication factoroptions- theClientConfigoptions- Returns:
- the
Sizingobject based on thesizingoption provided by the user
-
await
protected void await(java.util.concurrent.CountDownLatch latch)
-
-