All Classes and Interfaces (Spark 3.5.1 JavaDoc)

Class

Description

Class for absolute error loss calculation (for regression).

AbstractLauncher<T extends AbstractLauncher<T>>

Base class for launcher implementations.

Indicates that the source accepts the latest seen offset, which requires streaming execution to provide the latest seen offset when restarting the streaming query from checkpoint.

AccumulableInfo

:: DeveloperApi :: Information about an AccumulatorV2 modified during a task or stage.

AccumulableInfo

AccumulableInfoSerializer

AccumulatorContext

An internal class used to track accumulators by Spark itself.

AccumulatorV2<IN,OUT>

The base class for accumulators, that can accumulate inputs of type IN, and produce output of type OUT.

ActivationFunction

Trait for functions and their derivatives for functional layers

AFTSurvivalRegression

Fit a parametric survival regression model named accelerated failure time (AFT) model (see Accelerated failure time model (Wikipedia)) based on the Weibull distribution of the survival time.

AFTSurvivalRegressionModel

Model produced by AFTSurvivalRegression.

AFTSurvivalRegressionParams

Params for accelerated failure time (AFT) regression.

AggregatedDialect

AggregatedDialect can unify multiple dialects into one virtual Dialect.

AggregateFunc

Base class of the Aggregate Functions.

AggregateFunction<S extends Serializable,R>

Interface for a function that produces a result value by aggregating over multiple input rows.

AggregatingEdgeContext<VD,ED,A>

Aggregation

Aggregation in SQL statement.

Aggregator<K,V,C>

:: DeveloperApi :: A set of functions used to aggregate data.

Aggregator<IN,BUF,OUT>

A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.

Algo

Enum to select the algorithm for the decision tree

AllJobsCancelled

AllReceiverIds

A message used by ReceiverTracker to ask all receiver's ids still stored in ReceiverTrackerEndpoint.

ALS

Alternating Least Squares (ALS) matrix factorization.

ALS

Alternating Least Squares matrix factorization.

ALS.InBlock$

ALS.LeastSquaresNESolver

Trait for least squares solvers applied to the normal equation.

ALS.Rating<ID>

Rating class for better code readability.

Model fitted by ALS.

Common params for ALS and ALSModel.

ALSParams

Common params for ALS.

AlwaysFalse

A predicate that always evaluates to false.

AlwaysFalse

A filter that always evaluates to false.

AlwaysTrue

A predicate that always evaluates to true.

AlwaysTrue

A filter that always evaluates to true.

AnalysisException

Thrown when a query fails to analyze, usually because the query itself is invalid.

And

A predicate that evaluates to true iff both left and right evaluate to true.

And

A filter that evaluates to true iff both left or right evaluate to true.

ANOVATest

ANOVA Test for continuous data.

AnyDataType

An AbstractDataType that matches any concrete data types.

AnyTimestampType

AnyTimestampTypeExpression

ApiHelper

ApiRequestContext

AppHistoryServerPlugin

An interface for creating history listeners(to replay event logs) defined in other modules like SQL, and setup the UI of the plugin to rebuild the history UI.

ApplicationAttemptInfo

ApplicationEnvironmentInfo

ApplicationInfo

ApplicationStatus

ApplyInPlace

Implements in-place application of functions in the arrays

ApproximateEvaluator<U,R>

An object that computes a function incrementally by merging in results of type U from multiple tasks.

AppStatusUtils

AreaUnderCurve

Computes the area under the curve (AUC) using the trapezoidal rule.

ARPACK

ARPACK routines for MLlib's vectors and matrices.

ArrayType

ArrowColumnVector

A column vector backed by Apache Arrow.

ArrowUtils

AskPermissionToCommitOutput

AssociationRules

Generates association rules from a RDD[FreqItemset[Item}.

AssociationRules.Rule<Item>

An association rule between sets of items.

AsyncEventQueue

An asynchronous queue for events.

AsyncRDDActions<T>

A set of asynchronous RDD actions available through an implicit conversion.

Attribute

Abstract class for ML attributes.

AttributeFactory

Trait for ML attribute factories.

AttributeGroup

Attributes that describe a vector ML column.

AttributeKeys

Keys used to store attributes.

AttributeType

An enum-like type for attribute types: AttributeType$.Numeric, AttributeType$.Nominal, and AttributeType$.Binary.

Avg

An aggregate function that returns the mean of all the values in a group.

AvroUtils

AvroUtils.AvroMatchedField$

AvroUtils.AvroSchemaHelper

Helper class to perform field lookup/matching on Avro schemas.

AvroUtils.RowReader

BarrierCoordinatorMessage

BarrierTaskContext

:: Experimental :: A TaskContext with extra contextual info and tooling for tasks in a barrier stage.

BarrierTaskInfo

:: Experimental :: Carries all task infos of a barrier task.

BaseAppResource

Base class for resource handlers that use app-specific data.

BaseReadWrite

Trait for MLWriter and MLReader.

BaseRelation

Represents a collection of tuples with a known schema.

BaseRRDD<T,U>

BaseStreamingAppResource

Base class for streaming API handlers, provides easy access to the streaming listener that holds the app's information.

BasicBlockReplicationPolicy

Batch

A physical representation of a data source scan for batch queries.

BatchInfo

:: DeveloperApi :: Class having information on completed batches.

BatchStatus

BatchWrite

An interface that defines how to write the data to data source for batch processing.

BernoulliCellSampler<T>

:: DeveloperApi :: A sampler based on Bernoulli trials for partitioning a data sequence.

BernoulliSampler<T>

:: DeveloperApi :: A sampler based on Bernoulli trials.

Binarizer

Binarize a column of continuous features given a threshold.

BinaryAttribute

A binary attribute.

BinaryClassificationEvaluator

Evaluator for binary classification, which expects input columns rawPrediction, label and an optional weight column.

BinaryClassificationMetricComputer

Trait for a binary classification evaluation metric computer.

BinaryClassificationMetrics

Evaluator for binary classification.

BinaryClassificationSummary

Abstraction for binary classification results for a given model.

BinaryConfusionMatrix

Trait for a binary confusion matrix.

BinaryLogisticRegressionSummary

Abstraction for binary logistic regression results for a given model.

BinaryLogisticRegressionSummaryImpl

Binary logistic regression results for a given model.

BinaryLogisticRegressionTrainingSummary

Abstraction for binary logistic regression training results.

BinaryLogisticRegressionTrainingSummaryImpl

Binary logistic regression training results.

BinaryRandomForestClassificationSummary

Abstraction for BinaryRandomForestClassification results for a given model.

BinaryRandomForestClassificationSummaryImpl

Binary RandomForestClassification for a given model.

BinaryRandomForestClassificationTrainingSummary

Abstraction for BinaryRandomForestClassification training results.

BinaryRandomForestClassificationTrainingSummaryImpl

Binary RandomForestClassification training results.

BinarySample

Class that represents the group and value of a sample.

BinaryType

The data type representing Array[Byte] values.

BinomialBounds

Utility functions that help us determine bounds on adjusted sampling rate to guarantee exact sample size with high confidence when sampling without replacement.

BisectingKMeans

A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.

BisectingKMeans

A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.

BisectingKMeansModel

Model fitted by BisectingKMeans.

BisectingKMeansModel

Clustering model produced by BisectingKMeans.

BisectingKMeansModel.SaveLoadV1_0$

BisectingKMeansModel.SaveLoadV2_0$

BisectingKMeansModel.SaveLoadV3_0$

BisectingKMeansParams

Common params for BisectingKMeans and BisectingKMeansModel

BisectingKMeansSummary

Summary of BisectingKMeans.

BLAS

BLAS routines for MLlib's vectors and matrices.

BLAS

BLAS routines for MLlib's vectors and matrices.

BlockData

Abstracts away how blocks are stored and provides different ways to read the underlying block data.

BlockEvictionHandler

BlockGeneratorListener

Listener object for BlockGenerator events

BlockId

:: DeveloperApi :: Identifies a particular Block of data, usually associated with a single file.

BlockInfoWrapper

BlockManagerId

:: DeveloperApi :: This class represent a unique identifier for a BlockManager.

BlockManagerMessages

BlockManagerMessages.BlockLocationsAndStatus

The response message of GetLocationsAndStatus request.

BlockManagerMessages.BlockLocationsAndStatus$

BlockManagerMessages.BlockManagerHeartbeat

BlockManagerMessages.BlockManagerHeartbeat$

BlockManagerMessages.DecommissionBlockManager$

BlockManagerMessages.DecommissionBlockManagers

BlockManagerMessages.DecommissionBlockManagers$

BlockManagerMessages.GetBlockStatus

BlockManagerMessages.GetBlockStatus$

BlockManagerMessages.GetExecutorEndpointRef

BlockManagerMessages.GetExecutorEndpointRef$

BlockManagerMessages.GetLocations

BlockManagerMessages.GetLocations$

BlockManagerMessages.GetLocationsAndStatus

BlockManagerMessages.GetLocationsAndStatus$

BlockManagerMessages.GetLocationsMultipleBlockIds

BlockManagerMessages.GetLocationsMultipleBlockIds$

BlockManagerMessages.GetMatchingBlockIds

BlockManagerMessages.GetMatchingBlockIds$

BlockManagerMessages.GetMemoryStatus$

BlockManagerMessages.GetPeers

BlockManagerMessages.GetPeers$

BlockManagerMessages.GetRDDBlockVisibility

BlockManagerMessages.GetRDDBlockVisibility$

BlockManagerMessages.GetReplicateInfoForRDDBlocks

BlockManagerMessages.GetReplicateInfoForRDDBlocks$

BlockManagerMessages.GetShufflePushMergerLocations

BlockManagerMessages.GetShufflePushMergerLocations$

BlockManagerMessages.GetStorageStatus$

BlockManagerMessages.IsExecutorAlive

BlockManagerMessages.IsExecutorAlive$

BlockManagerMessages.MarkRDDBlockAsVisible

BlockManagerMessages.MarkRDDBlockAsVisible$

BlockManagerMessages.RegisterBlockManager

BlockManagerMessages.RegisterBlockManager$

BlockManagerMessages.RemoveBlock

BlockManagerMessages.RemoveBlock$

BlockManagerMessages.RemoveBroadcast

BlockManagerMessages.RemoveBroadcast$

BlockManagerMessages.RemoveExecutor

BlockManagerMessages.RemoveExecutor$

BlockManagerMessages.RemoveRdd

BlockManagerMessages.RemoveRdd$

BlockManagerMessages.RemoveShuffle

BlockManagerMessages.RemoveShuffle$

BlockManagerMessages.RemoveShufflePushMergerLocation

BlockManagerMessages.RemoveShufflePushMergerLocation$

BlockManagerMessages.ReplicateBlock

BlockManagerMessages.ReplicateBlock$

BlockManagerMessages.StopBlockManagerMaster$

BlockManagerMessages.ToBlockManagerMaster

BlockManagerMessages.ToBlockManagerMasterStorageEndpoint

BlockManagerMessages.TriggerHeapHistogram$

Driver to Executor message to get a heap histogram.

BlockManagerMessages.TriggerThreadDump$

Driver to Executor message to trigger a thread dump.

BlockManagerMessages.UpdateBlockInfo

BlockManagerMessages.UpdateBlockInfo$

BlockManagerMessages.UpdateRDDBlockTaskInfo

BlockManagerMessages.UpdateRDDBlockTaskInfo$

BlockManagerMessages.UpdateRDDBlockVisibility

BlockManagerMessages.UpdateRDDBlockVisibility$

BlockMatrix

Represents a distributed matrix in blocks of local matrices.

BlockNotFoundException

BlockReplicationPolicy

::DeveloperApi:: BlockReplicationPrioritization provides logic for prioritizing a sequence of peers for replicating blocks.

BlockReplicationUtils

BlockStatus

BlockUpdatedInfo

:: DeveloperApi :: Stores information about a block status in a block manager.

BloomFilter

A Bloom filter is a space-efficient probabilistic data structure that offers an approximate containment test with one-sided error: if it claims that an item is contained in it, this might be in error, but if it claims that an item is not contained in it, then this is definitely true.

BloomFilter.Version

BooleanParam

Specialized version of Param[Boolean] for Java.

BooleanType

The data type representing Boolean values.

BooleanTypeExpression

BoostingStrategy

Configuration options for GradientBoostedTrees.

BoundedDouble

A Double value with error bars and associated confidence.

BoundFunction

Represents a function that is bound to an input type.

BreezeUtil

In-place DGEMM and DGEMV for Breeze

Broadcast<T>

A broadcast variable.

BroadcastBlockId

BroadcastFactory

An interface for all the broadcast implementations in Spark (to allow multiple broadcast implementations).

BucketedRandomProjectionLSH

This BucketedRandomProjectionLSH implements Locality Sensitive Hashing functions for Euclidean distance metrics.

BucketedRandomProjectionLSHModel

Model produced by BucketedRandomProjectionLSH, where multiple random vectors are stored.

BucketedRandomProjectionLSHParams

Params for BucketedRandomProjectionLSH.

Bucketizer

Bucketizer maps a column of continuous features to a column of feature buckets.

BufferReleasingInputStream

Helper class that ensures a ManagedBuffer is released upon InputStream.close() and also detects stream corruption if streamCompressedOrEncrypted is true

BytecodeUtils

Includes an utility function to test whether a function accesses a specific attribute of an object.

ByteExactNumeric

ByteType

The data type representing Byte values.

ByteTypeExpression

CachedBatch

Basic interface that all cached batches of data must support.

CachedBatchSerializer

Provides APIs that handle transformations of SQL data associated with the cache/persist APIs.

CacheId

CalendarInterval

The class representing calendar intervals.

CalendarIntervalType

The data type representing calendar intervals.

CaseInsensitiveStringMap

Case-insensitive map of string keys to string values.

Cast

Represents a cast expression in the public logical expression API.

Catalog

Catalog interface for Spark.

CatalogExtension

An API to extend the Spark built-in session catalog.

CatalogMetadata

A catalog in Spark, as returned by the listCatalogs method defined in Catalog.

CatalogNotFoundException

CatalogPlugin

A marker interface to provide a catalog implementation for Spark.

Catalogs

CatalogV2Implicits

Conversion helpers for working with v2 CatalogPlugin.

CatalogV2Implicits.BucketSpecHelper

CatalogV2Implicits.CatalogHelper

CatalogV2Implicits.ColumnsHelper

CatalogV2Implicits.FunctionIdentifierHelper

CatalogV2Implicits.IdentifierHelper

CatalogV2Implicits.MultipartIdentifierHelper

CatalogV2Implicits.NamespaceHelper

CatalogV2Implicits.PartitionTypeHelper

CatalogV2Implicits.TableIdentifierHelper

CatalogV2Implicits.TransformHelper

CatalogV2Util

CatalystScan

::Experimental:: An interface for experimenting with a more direct connection to the query planner.

CategoricalSplit

Split which tests a categorical feature.

CausedBy

Extractor Object for pulling out the root cause of an error.

CharType

CheckpointReader

CheckpointState

Enumeration to manage state transitions of an RDD through checkpointing

ChildFirstURLClassLoader

A mutable class loader that gives preference to its own URLs over the parent class loader when loading classes and resources.

ChiSqSelector

Deprecated.

use UnivariateFeatureSelector instead.

ChiSqSelector

Creates a ChiSquared feature selector.

ChiSqSelectorModel

Model fitted by ChiSqSelector.

ChiSqSelectorModel

Chi Squared selector model.

ChiSqSelectorModel.ChiSqSelectorModelWriter

ChiSqSelectorModel.SaveLoadV1_0$

ChiSqTest

Conduct the chi-squared test for the input RDDs using the specified method.

ChiSqTest.Method

param: name String name for the method.

ChiSqTest.Method$

ChiSqTest.NullHypothesis$

ChiSqTestResult

Object containing the test results for the chi-squared hypothesis test.

ChiSquareTest

Chi-square hypothesis testing for categorical data.

CholeskyDecomposition

Compute Cholesky decomposition.

ClassificationLoss

ClassificationModel<FeaturesType,M extends ClassificationModel<FeaturesType,M>>

Model produced by a Classifier.

ClassificationModel

Represents a classification model that predicts to which of a set of categories an example belongs.

ClassificationSummary

Abstraction for multiclass classification results for a given model.

Classifier<FeaturesType,E extends Classifier<FeaturesType,E,M>,M extends ClassificationModel<FeaturesType,M>>

Single-label binary or multiclass classification.

ClassifierParams

(private[spark]) Params for classification.

Listener class used when any item has been cleaned by the Cleaner class.

Classes that represent cleaning tasks.

CleanupTaskWeakReference

A WeakReference associated with a CleanupTask.

Clock

An interface to represent clocks, so that they can be mocked out in unit tests.

ClosureCleaner

A cleaner that renders closures serializable if they can be done so safely.

ClusterData

Helper class for storing model data

ClusteredDistribution

A distribution where tuples that share the same values for clustering expressions are co-located in the same partition.

ClusteringEvaluator

Evaluator for clustering results.

ClusteringMetrics

Metrics for clustering, which expects two input columns: prediction and label.

ClusteringSummary

Summary of clustering algorithms.

CoarseGrainedClusterMessage

CoarseGrainedClusterMessages

CoarseGrainedClusterMessages.AddWebUIFilter

CoarseGrainedClusterMessages.AddWebUIFilter$

CoarseGrainedClusterMessages.DecommissionExecutor$

CoarseGrainedClusterMessages.DecommissionExecutorsOnHost

CoarseGrainedClusterMessages.DecommissionExecutorsOnHost$

CoarseGrainedClusterMessages.ExecutorDecommissioning

CoarseGrainedClusterMessages.ExecutorDecommissioning$

CoarseGrainedClusterMessages.ExecutorDecommissionSigReceived$

CoarseGrainedClusterMessages.GetExecutorLossReason

CoarseGrainedClusterMessages.GetExecutorLossReason$

CoarseGrainedClusterMessages.IsExecutorAlive

CoarseGrainedClusterMessages.IsExecutorAlive$

CoarseGrainedClusterMessages.KillExecutors

CoarseGrainedClusterMessages.KillExecutors$

CoarseGrainedClusterMessages.KillExecutorsOnHost

CoarseGrainedClusterMessages.KillExecutorsOnHost$

CoarseGrainedClusterMessages.KillTask

CoarseGrainedClusterMessages.KillTask$

CoarseGrainedClusterMessages.LaunchedExecutor

CoarseGrainedClusterMessages.LaunchedExecutor$

CoarseGrainedClusterMessages.LaunchTask

CoarseGrainedClusterMessages.LaunchTask$

CoarseGrainedClusterMessages.MiscellaneousProcessAdded

CoarseGrainedClusterMessages.MiscellaneousProcessAdded$

CoarseGrainedClusterMessages.RegisterClusterManager

CoarseGrainedClusterMessages.RegisterClusterManager$

CoarseGrainedClusterMessages.RegisterExecutor

CoarseGrainedClusterMessages.RegisterExecutor$

CoarseGrainedClusterMessages.RemoveExecutor

CoarseGrainedClusterMessages.RemoveExecutor$

CoarseGrainedClusterMessages.RemoveWorker

CoarseGrainedClusterMessages.RemoveWorker$

CoarseGrainedClusterMessages.RequestExecutors

CoarseGrainedClusterMessages.RequestExecutors$

CoarseGrainedClusterMessages.RetrieveDelegationTokens$

CoarseGrainedClusterMessages.RetrieveLastAllocatedExecutorId$

CoarseGrainedClusterMessages.RetrieveSparkAppConfig

CoarseGrainedClusterMessages.RetrieveSparkAppConfig$

CoarseGrainedClusterMessages.ReviveOffers$

CoarseGrainedClusterMessages.SetupDriver

CoarseGrainedClusterMessages.SetupDriver$

CoarseGrainedClusterMessages.ShufflePushCompletion

CoarseGrainedClusterMessages.ShufflePushCompletion$

CoarseGrainedClusterMessages.Shutdown

CoarseGrainedClusterMessages.Shutdown$

CoarseGrainedClusterMessages.SparkAppConfig

CoarseGrainedClusterMessages.SparkAppConfig$

CoarseGrainedClusterMessages.StatusUpdate

CoarseGrainedClusterMessages.StatusUpdate$

CoarseGrainedClusterMessages.StopDriver$

CoarseGrainedClusterMessages.StopExecutor$

CoarseGrainedClusterMessages.StopExecutors$

CoarseGrainedClusterMessages.UpdateDelegationTokens

CoarseGrainedClusterMessages.UpdateDelegationTokens$

CodegenMetrics

Metrics for code generation.

CoGroupedRDD<K>

:: DeveloperApi :: An RDD that cogroups its parents.

CoGroupFunction<K,V1,V2,R>

A function that returns zero or more output records from each grouping key and its values from 2 Datasets.

CollectionAccumulator<T>

An accumulator for collecting a list of elements.

CollectionsUtils

Column

A column in Spark, as returned by listColumns method in Catalog.

Column

A column that will be computed based on the data in a DataFrame.

Column

An interface representing a column of a Table.

ColumnarArray

Array abstraction in ColumnVector.

ColumnarBatch

This class wraps multiple ColumnVectors as a row-wise table.

ColumnarBatchRow

This class wraps an array of ColumnVector and provides a row view.

ColumnarMap

Map abstraction in ColumnVector.

ColumnarRow

Row abstraction in ColumnVector.

ColumnDefaultValue

A class representing the default value of a column.

ColumnName

A convenient class used for constructing schema.

ColumnPruner

Utility transformer for removing temporary columns from a DataFrame.

ColumnStatistics

An interface to represent column statistics, which is part of Statistics.

ColumnVector

An interface representing in-memory columnar data in Spark.

CommandLineLoggingUtils

CommandLineUtils

Contains basic command line parsing functionality and methods to parse some common Spark CLI options.

CompilationErrors

ComplexFutureAction<T>

A FutureAction for actions that could trigger multiple Spark jobs.

CompositeReadLimit

/** Represents a ReadLimit where the MicroBatchStream should scan approximately given maximum number of rows with at least the given minimum number of rows.

CompressionCodec

:: DeveloperApi :: CompressionCodec allows the customization of choosing different compression implementations to be used in block storage.

Configurable

A trait to implement Configurable interface.

ConnectedComponents

Connected components algorithm.

ConstantInputDStream<T>

An input stream that always returns the same RDD on each time step.

ContextAwareIterator<T>

:: DeveloperApi :: A TaskContext aware iterator.

ContextBarrierId

For each barrier stage attempt, only at most one barrier() call can be active at any time, thus we can use (stageId, stageAttemptId) to identify the stage attempt where the barrier() call is from.

ContinuousPartitionReader<T>

A variation on PartitionReader for use with continuous streaming processing.

ContinuousPartitionReaderFactory

A variation on PartitionReaderFactory that returns ContinuousPartitionReader instead of PartitionReader.

ContinuousSplit

Split which tests a continuous feature.

ContinuousStream

A SparkDataStream for streaming queries with continuous mode.

CoordinateMatrix

Represents a matrix in coordinate format.

Correlation

API for correlation functions in MLlib, compatible with DataFrames and Datasets.

Correlation

Trait for correlation algorithms.

CorrelationNames

Maintains supported and default correlation names.

Correlations

Delegates computation to the specific correlation object based on the input method name.

CosineSilhouette

The algorithm which is implemented in this object, instead, is an efficient and parallel implementation of the Silhouette using the cosine distance measure.

Count

An aggregate function that returns the number of the specific row in a group.

CountingWritableChannel

CountMinSketch

A Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

CountMinSketch.Version

CountStar

An aggregate function that returns the number of rows in a group.

CountVectorizer

Extracts a vocabulary from document collections and generates a CountVectorizerModel.

CountVectorizerModel

Converts a text document to a sparse vector of token counts.

CountVectorizerParams

Params for CountVectorizer and CountVectorizerModel.

CreatableRelationProvider

CreateTableWriter<T>

Trait to restrict calls to create and replace operations.

CrossValidator

K-fold cross validation performs model selection by splitting the dataset into a set of non-overlapping randomly partitioned folds which are used as separate training and test datasets e.g., with k=3 folds, K-fold cross validation will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing.

CrossValidatorModel

CrossValidatorModel contains the model with the highest average cross-validation metric across folds and uses this model to transform input data.

CrossValidatorModel.CrossValidatorModelWriter

Writer for CrossValidatorModel.

CrossValidatorParams

Params for CrossValidator and CrossValidatorModel.

CryptoStreamUtils

A util class for manipulating IO encryption and decryption streams.

CryptoStreamUtils.BaseErrorHandler

SPARK-25535.

CryptoStreamUtils.ErrorHandlingReadableChannel

CustomAvgMetric

Built-in `CustomMetric` that computes average of metric values.

CustomMetric

A custom metric.

CustomSumMetric

Built-in `CustomMetric` that sums up metric values.

CustomTaskMetric

A custom task metric.

DAGSchedulerEvent

Types of events that can be handled by the DAGScheduler.

Database

A database in Spark, as returned by the listDatabases method defined in Catalog.

DataFrameNaFunctions

Functionality for working with missing data in DataFrames.

DataFrameReader

Interface used to load a Dataset from external storage systems (e.g. file systems, key-value stores, etc).

DataFrameStatFunctions

Statistic functions for DataFrames.

DataFrameWriter<T>

Interface used to write a Dataset to external storage systems (e.g. file systems, key-value stores, etc).

DataFrameWriterV2<T>

Interface used to write a Dataset to external storage using the v2 API.

Dataset<T>

A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations.

DatasetHolder<T>

A container for a Dataset, used for implicit conversions in Scala.

DatasetUtils

DataSourceRegister

Data sources should implement this trait so that they can register an alias to their data source.

DataStreamReader

Interface used to load a streaming Dataset from external storage systems (e.g. file systems, key-value stores, etc).

DataStreamWriter<T>

Interface used to write a streaming Dataset to external storage systems (e.g. file systems, key-value stores, etc).

DataType

The base type of all Spark SQL data types.

DataTypeErrors

Object for grouping error messages from (most) exceptions thrown during query execution.

DataTypeErrorsBase

DataTypes

To get/create specific data type, users should use singleton objects and factory methods provided by this class.

DataValidators

A collection of methods used to validate data before applying ML algorithms.

DataWriter<T>

A data writer returned by DataWriterFactory.createWriter(int, long) and is responsible for writing data for an input RDD partition.

DataWriterFactory

A factory of DataWriter returned by BatchWrite.createBatchWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing the actual data writer at executor side.

DateType

The date type represents a valid date in the proleptic Gregorian calendar.

DateTypeExpression

DayTimeIntervalType

The type represents day-time intervals of the SQL standard.

DB2Dialect

DB2Dialect.DB2SQLBuilder

DB2Dialect.DB2SQLQueryBuilder

DCT

A feature transformer that takes the 1D discrete cosine transform of a real vector.

Decimal

A mutable implementation of BigDecimal that can hold a Long if values are small enough.

Decimal.DecimalAsIfIntegral$

A Integral evidence parameter for Decimals.

Decimal.DecimalIsConflicted

Common methods for Decimal evidence parameters

Decimal.DecimalIsFractional$

A Fractional evidence parameter for Decimals.

DecimalExactNumeric

DecimalExpression

DecimalType

The data type representing java.math.BigDecimal values.

DecimalType.Fixed$

DecisionTree

A class which implements a decision tree learning algorithm for classification and regression.

DecisionTreeClassificationModel

Decision tree model (http://en.wikipedia.org/wiki/Decision_tree_learning) for classification.

DecisionTreeClassifier

Decision tree learning algorithm (http://en.wikipedia.org/wiki/Decision_tree_learning) for classification.

DecisionTreeClassifierParams

DecisionTreeModel

Abstraction for Decision Tree models.

DecisionTreeModel

Decision tree model for classification or regression.

DecisionTreeModel.SaveLoadV1_0$

DecisionTreeModelReadWrite

Helper classes for tree model persistence

DecisionTreeModelReadWrite.NodeData

Info for a Node

DecisionTreeModelReadWrite.NodeData$

DecisionTreeModelReadWrite.SplitData

Info for a Split

DecisionTreeModelReadWrite.SplitData$

DecisionTreeParams

Parameters for Decision Tree-based algorithms.

DecisionTreeRegressionModel

Decision tree (Wikipedia) model for regression.

DecisionTreeRegressor

Decision tree learning algorithm for regression.

DecisionTreeRegressorParams

DefaultCredentials

Returns DefaultAWSCredentialsProviderChain for authentication.

DefaultParamsReadable<T>

Helper trait for making simple Params types readable.

DefaultParamsWritable

Helper trait for making simple Params types writable.

DefaultPartitionCoalescer

Coalesce the partitions of a parent RDD (prev) into fewer partitions, so that each partition of this RDD computes one or more of the parent ones.

DefaultTopologyMapper

A TopologyMapper that assumes all nodes are in the same rack

DelegatingCatalogExtension

A simple implementation of CatalogExtension, which implements all the catalog functions by calling the built-in session catalog directly.

DeltaBatchWrite

An interface that defines how to write a delta of rows during batch processing.

DeltaWrite

A logical representation of a data source write that handles a delta of rows.

DeltaWriteBuilder

An interface for building a DeltaWrite.

DeltaWriter<T>

A data writer returned by DeltaWriterFactory.createWriter(int, long) and is responsible for writing a delta of rows.

DeltaWriterFactory

A factory for creating DeltaWriters returned by DeltaBatchWrite.createBatchWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing writers at the executor side.

DenseMatrix

Column-major dense matrix.

DenseMatrix

Column-major dense matrix.

DenseVector

A dense vector represented by a value array.

DenseVector

A dense vector represented by a value array.

Dependency<T>

:: DeveloperApi :: Base class for dependencies.

DependencyUtils

DerbyDialect

DeserializationStream

:: DeveloperApi :: A stream for reading serialized objects.

DeserializedMemoryEntry<T>

DeserializedValuesHolder<T>

A holder for storing the deserialized values.

DeterministicLevel

The deterministic level of RDD's output (i.e. what RDD#compute returns).

DeterministicLevelSerializer

DifferentiableLossAggregator<Datum,Agg extends DifferentiableLossAggregator<Datum,Agg>>

A parent trait for aggregators used in fitting MLlib models.

DifferentiableRegularization<T>

A Breeze diff function which represents a cost function for differentiable regularization of parameters. e.g.

DirectPoolMemory

DiskBlockData

DistributedLDAModel

Distributed model fitted by LDA.

DistributedLDAModel

Distributed LDA model.

DistributedMatrix

Represents a distributively stored matrix backed by one or more RDDs.

Distribution

An interface that defines how data is distributed across partitions.

Distributions

Helper methods to create distributions to pass into Spark.

Dot

DoubleAccumulator

An accumulator for computing sum, count, and averages for double precision floating numbers.

DoubleAccumulatorSource

DoubleArrayArrayParam

Specialized version of Param[Array[Array[Double}] for Java.

DoubleArrayParam

Specialized version of Param[Array[Double} for Java.

DoubleExactNumeric

DoubleFlatMapFunction<T>

A function that returns zero or more records of type Double from each input record.

DoubleFunction<T>

A function that returns Doubles, and can be used to construct DoubleRDDs.

DoubleParam

Specialized version of Param[Double] for Java.

DoubleRDDFunctions

Extra functions available on RDDs of Doubles through an implicit conversion.

DoubleType

The data type representing Double values.

DoubleType.DoubleAsIfIntegral

DoubleType.DoubleAsIfIntegral$

DoubleType.DoubleIsConflicted

DoubleTypeExpression

DriverPlugin

:: DeveloperApi :: Driver component of a SparkPlugin.

DStream<T>

A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).

DummySerializerInstance

Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.

Duration

Durations

Edge<ED>

A single directed edge consisting of a source id, target id, and the data associated with the edge.

EdgeActiveness

Criteria for filtering edges based on activeness.

EdgeContext<VD,ED,A>

Represents an edge along with its neighboring vertices and allows sending messages along the edge.

EdgeDirection

The direction of a directed edge relative to a vertex.

EdgeRDD<ED>

EdgeRDD[ED, VD] extends RDD[Edge[ED} by storing the edges in columnar format on each partition for performance.

EdgeRDDImpl<ED,VD>

EdgeTriplet<VD,ED>

An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.

EigenValueDecomposition

Compute eigen-decomposition.

ElementwiseProduct

Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a provided "weight" vector.

ElementwiseProduct

Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a provided "weight" vector.

EMLDAOptimizer

Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.

EmptyTerm

Placeholder term for the result of undefined interactions, e.g. '1:1' or 'a:1'

Encoder<T>

Used to convert a JVM object of type T to and from the internal Spark SQL representation.

Encoders

Methods for creating an Encoder.

EnsembleCombiningStrategy

Enum to select ensemble combining strategy for base learners

EnsembleModelReadWrite

EnsembleModelReadWrite.EnsembleNodeData

Info for one Node in a tree ensemble

EnsembleModelReadWrite.EnsembleNodeData$

Entropy

Class for calculating entropy during multiclass classification.

EnumUtil

EqualNullSafe

Performs equality comparison, similar to EqualTo.

EqualTo

A filter that evaluates to true iff the column evaluates to a value equal to value.

ErrorClassesJsonReader

A reader to load error information from one or more JSON files.

ErrorInfo

Information associated with an error class.

ErrorMessageFormat

ErrorSubInfo

Information associated with an error subclass.

Estimator<M extends Model<M>>

Abstract class for estimators that fit models to data.

Evaluator

Abstract class for evaluators that compute metrics from predictions.

ExceptionFailure

:: DeveloperApi :: Task failed due to a runtime exception.

ExcludedExecutor

ExecutionData

ExecutionErrors

ExecutionListenerManager

Manager for QueryExecutionListener.

ExecutorInfo

:: DeveloperApi :: Stores information about an executor to pass from the scheduler to SparkListeners.

ExecutorKilled

ExecutorLossMessage

ExecutorLostFailure

:: DeveloperApi :: The task failed because the executor that it was running on was lost.

ExecutorMetricsDistributions

ExecutorMetricsSerializer

ExecutorMetricType

Executor metric types for executor-level metrics stored in ExecutorMetrics.

ExecutorPeakMetricsDistributions

ExecutorPlugin

:: DeveloperApi :: Executor component of a SparkPlugin.

ExecutorRegistered

ExecutorRemoved

ExecutorResourceRequest

An Executor resource request.

ExecutorResourceRequests

A set of Executor resource requests.

ExecutorStageSummary

ExecutorStageSummarySerializer

ExecutorStreamSummary

ExecutorSummary

ExpectationAggregator

ExpectationAggregator computes the partial expectation results.

ExpectationSum

ExperimentalMethods

:: Experimental :: Holder for experimental methods for the bravest.

ExpireDeadHosts

ExponentialGenerator

Generates i.i.d. samples from the exponential distribution with the given mean.

Expression

Base class of the public logical expression API.

Expressions

Helper methods to create logical transforms to pass into Spark.

ExternalClusterManager

A cluster manager interface to plugin external scheduler.

ExternalCommandRunner

An interface to execute an arbitrary string command inside an external execution engine rather than Spark.

Extract

Represent an extract function, which extracts and returns the value of a specified datetime field from a datetime or interval value expression.

ExtractableLiteral

FactorizationMachines

FactorizationMachinesParams

Params for Factorization Machines

FalsePositiveRate

False positive rate.

FeatureHasher

Feature hashing projects a set of categorical or numerical features into a feature vector of specified dimension (typically substantially smaller than that of the original feature space).

FeatureType

Enum to describe whether a feature is "continuous" or "categorical"

FetchFailed

:: DeveloperApi :: Task failed to fetch shuffle data from a remote node.

FileBasedTopologyMapper

A simple file based topology mapper.

Filter

A filter predicate for data sources.

FilterFunction<T>

Base interface for a function used in Dataset's filter function.

FitEnd<M extends Model<M>>

Event fired after Estimator.fit.

FitStart<M extends Model<M>>

Event fired before Estimator.fit.

FlatMapFunction<T,R>

A function that returns zero or more output records from each input record.

FlatMapFunction2<T1,T2,R>

A function that takes two inputs and returns zero or more output records.

FlatMapGroupsFunction<K,V,R>

A function that returns zero or more output records from each grouping key and its values.

FlatMapGroupsWithStateFunction<K,V,S,R>

::Experimental:: Base interface for a map function used in

org.apache.spark.sql.KeyValueGroupedDataset.flatMapGroupsWithState(
 FlatMapGroupsWithStateFunction, org.apache.spark.sql.streaming.OutputMode,
 org.apache.spark.sql.Encoder, org.apache.spark.sql.Encoder)

FloatExactNumeric

FloatParam

Specialized version of Param[Float] for Java.

FloatType

The data type representing Float values.

FloatType.FloatAsIfIntegral

FloatType.FloatAsIfIntegral$

FloatType.FloatIsConflicted

FloatTypeExpression

FMClassificationModel

Model produced by FMClassifier

FMClassificationSummary

Abstraction for FMClassifier results for a given model.

FMClassificationSummaryImpl

FMClassifier results for a given model.

FMClassificationTrainingSummary

Abstraction for FMClassifier training results.

FMClassificationTrainingSummaryImpl

FMClassifier training results.

FMClassifier

Factorization Machines learning algorithm for classification.

FMClassifierParams

Params for FMClassifier.

FMRegressionModel

Model produced by FMRegressor.

FMRegressor

Factorization Machines learning algorithm for regression.

FMRegressorParams

Params for FMRegressor

ForeachFunction<T>

Base interface for a function used in Dataset's foreach function.

ForeachPartitionFunction<T>

Base interface for a function used in Dataset's foreachPartition function.

ForeachWriter<T>

The abstract class for writing custom logic to process data generated by a query.

FPGrowth

A parallel FP-growth algorithm to mine frequent itemsets.

FPGrowth

A parallel FP-growth algorithm to mine frequent itemsets.

FPGrowth.FreqItemset<Item>

Frequent itemset.

FPGrowthModel

Model fitted by FPGrowth.

FPGrowthModel<Item>

Model trained by FPGrowth, which holds frequent itemsets.

FPGrowthModel.SaveLoadV1_0$

FPGrowthParams

Common params for FPGrowth and FPGrowthModel

Function<T1,R>

Base interface for functions whose return types do not create special RDDs.

Function

A user-defined function in Spark, as returned by listFunctions method in Catalog.

Function

Base class for user-defined functions.

Function0<R>

A zero-argument function that returns an R.

Function2<T1,T2,R>

A two-argument function that takes arguments of type T1 and T2 and returns an R.

Function3<T1,T2,T3,R>

A three-argument function that takes arguments of type T1, T2 and T3 and returns an R.

Function4<T1,T2,T3,T4,R>

A four-argument function that takes arguments of type T1, T2, T3 and T4 and returns an R.

FunctionCatalog

Catalog methods for working with Functions.

functions

Commonly used functions available for DataFrame operations.

FutureAction<T>

A future for the result of an action to support cancellation.

FValueTest

FValue test for continuous data.

GammaGenerator

Generates i.i.d. samples from the gamma distribution with the given shape and scale.

GarbageCollectionMetrics

GaussianMixture

Gaussian Mixture clustering.

GaussianMixture

This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs).

GaussianMixtureModel

Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i with probability weights(i).

GaussianMixtureModel

Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i=1..k with probability w(i); mu(i) and sigma(i) are the respective mean and covariance for each Gaussian distribution i=1..k.

GaussianMixtureParams

Common params for GaussianMixture and GaussianMixtureModel

GaussianMixtureSummary

Summary of GaussianMixture.

GBTClassificationModel

Gradient-Boosted Trees (GBTs) (http://en.wikipedia.org/wiki/Gradient_boosting) model for classification.

GBTClassifier

Gradient-Boosted Trees (GBTs) (http://en.wikipedia.org/wiki/Gradient_boosting) learning algorithm for classification.

GBTClassifierParams

GBTParams

Parameters for Gradient-Boosted Tree algorithms.

GBTRegressionModel

Gradient-Boosted Trees (GBTs) model for regression.

GBTRegressor

Gradient-Boosted Trees (GBTs) learning algorithm for regression.

GBTRegressorParams

GeneralAggregateFunc

The general implementation of AggregateFunc, which contains the upper-cased function name, the `isDistinct` flag and all the inputs.

GeneralizedLinearAlgorithm<M extends GeneralizedLinearModel>

GeneralizedLinearAlgorithm implements methods to train a Generalized Linear Model (GLM).

GeneralizedLinearModel

GeneralizedLinearModel (GLM) represents a model trained using GeneralizedLinearAlgorithm.

GeneralizedLinearRegression

Fit a Generalized Linear Model (see Generalized linear model (Wikipedia)) specified by giving a symbolic description of the linear predictor (link function) and a description of the error distribution (family).

GeneralizedLinearRegression.Binomial$

Binomial exponential family distribution.

GeneralizedLinearRegression.CLogLog$

GeneralizedLinearRegression.Family$

GeneralizedLinearRegression.FamilyAndLink$

GeneralizedLinearRegression.Gamma$

Gamma exponential family distribution.

GeneralizedLinearRegression.Gaussian$

Gaussian exponential family distribution.

GeneralizedLinearRegression.Identity$

GeneralizedLinearRegression.Inverse$

GeneralizedLinearRegression.Link$

GeneralizedLinearRegression.Log$

GeneralizedLinearRegression.Logit$

GeneralizedLinearRegression.Poisson$

Poisson exponential family distribution.

GeneralizedLinearRegression.Probit$

GeneralizedLinearRegression.Sqrt$

GeneralizedLinearRegression.Tweedie$

GeneralizedLinearRegressionBase

Params for Generalized Linear Regression.

GeneralizedLinearRegressionModel

Model produced by GeneralizedLinearRegression.

GeneralizedLinearRegressionSummary

Summary of GeneralizedLinearRegression model and predictions.

GeneralizedLinearRegressionTrainingSummary

Summary of GeneralizedLinearRegression fitting and model.

GeneralMLWritable

Trait for classes that provide GeneralMLWriter.

GeneralMLWriter

A ML Writer which delegates based on the requested format.

GeneralScalarExpression

The general representation of SQL scalar expressions, which contains the upper-cased expression name and all the children expressions.

GetAllReceiverInfo

Gini

Class for calculating the Gini impurity (http://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity) during multiclass classification.

GLMClassificationModel

Helper class for import/export of GLM classification models.

GLMClassificationModel.SaveLoadV1_0$

GLMRegressionModel

Helper methods for import/export of GLM regression models.

GLMRegressionModel.SaveLoadV1_0$

Gradient

Class used to compute the gradient for a loss function, given a single data point.

GradientBoostedTrees

A class that implements Stochastic Gradient Boosting for regression and binary classification.

GradientBoostedTreesModel

Represents a gradient boosted trees model.

GradientDescent

Class used to solve an optimization problem using Gradient Descent.

Graph<VD,ED>

The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges.

GraphGenerators

A collection of graph generating functions.

GraphImpl<VD,ED>

An implementation of Graph to support computation on graphs.

GraphLoader

Provides utilities for loading Graphs from files.

GraphOps<VD,ED>

Contains additional functionality for Graph.

GraphXUtils

GreaterThan

A filter that evaluates to true iff the attribute evaluates to a value greater than value.

GreaterThanOrEqual

A filter that evaluates to true iff the attribute evaluates to a value greater than or equal to value.

GroupMappingServiceProvider

This Spark trait is used for mapping a given userName to a set of groups which it belongs to.

GroupState<S>

:: Experimental ::

GroupStateTimeout

Represents the type of timeouts possible for the Dataset operations mapGroupsWithState and flatMapGroupsWithState.

H2Dialect

H2Dialect.H2SQLBuilder

HadoopDelegationTokenProvider

::DeveloperApi:: Hadoop delegation token provider.

HadoopFSUtils

Utility functions to simplify and speed-up file listing.

HadoopRDD<K,V>

:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the older MapReduce API (org.apache.hadoop.mapred).

HadoopRDD.HadoopMapPartitionsWithSplitRDD$

HasAggregationDepth

Trait for shared param aggregationDepth (default: 2).

HasBlockSize

Trait for shared param blockSize.

HasCheckpointInterval

Trait for shared param checkpointInterval.

HasCollectSubModels

Trait for shared param collectSubModels (default: false).

HasDistanceMeasure

Trait for shared param distanceMeasure (default: "euclidean").

HasElasticNetParam

Trait for shared param elasticNetParam.

HasFeaturesCol

Trait for shared param featuresCol (default: "features").

HasFitIntercept

Trait for shared param fitIntercept (default: true).

HasHandleInvalid

Trait for shared param handleInvalid.

HashingTF

Maps a sequence of terms to their term frequencies using the hashing trick.

HashingTF

Maps a sequence of terms to their term frequencies using the hashing trick.

HashPartitioner

A Partitioner that implements hash-based partitioning using Java's Object.hashCode.

HasInputCol

Trait for shared param inputCol.

HasInputCols

Trait for shared param inputCols.

HasLabelCol

Trait for shared param labelCol (default: "label").

HasLoss

Trait for shared param loss.

HasMaxBlockSizeInMB

Trait for shared param maxBlockSizeInMB (default: 0.0).

HasMaxIter

Trait for shared param maxIter.

HasNumFeatures

Trait for shared param numFeatures (default: 262144).

HasOutputCol

Trait for shared param outputCol (default: uid + "__output").

HasOutputCols

Trait for shared param outputCols.

HasParallelism

Trait to define a level of parallelism for algorithms that are able to use multithreaded execution, and provide a thread-pool based execution context.

HasPartitionKey

A mix-in for input partitions whose records are clustered on the same set of partition keys (provided via SupportsReportPartitioning, see below).

HasPredictionCol

Trait for shared param predictionCol (default: "prediction").

HasProbabilityCol

Trait for shared param probabilityCol (default: "probability").

HasRawPredictionCol

Trait for shared param rawPredictionCol (default: "rawPrediction").

HasRegParam

Trait for shared param regParam.

HasRelativeError

Trait for shared param relativeError (default: 0.001).

HasSeed

Trait for shared param seed (default: this.getClass.getName.hashCode.toLong).

HasSolver

Trait for shared param solver.

HasStandardization

Trait for shared param standardization (default: true).

HasStepSize

Trait for shared param stepSize.

HasThreshold

Trait for shared param threshold.

HasThresholds

Trait for shared param thresholds.

HasTol

Trait for shared param tol.

HasTrainingSummary<T>

Trait for models that provides Training summary.

HasValidationIndicatorCol

Trait for shared param validationIndicatorCol.

HasVarianceCol

Trait for shared param varianceCol.

HasVarianceImpurity

HasWeightCol

Trait for shared param weightCol.

HdfsUtils

HingeGradient

Compute gradient and loss for a Hinge loss function, as used in SVM binary classification.

Histogram

An interface to represent an equi-height histogram, which is a part of ColumnStatistics.

HistogramBin

An interface to represent a bin in an equi-height histogram.

HiveCatalogMetrics

Metrics for access to the hive external catalog.

HttpSecurityFilter

A servlet filter that implements HTTP security features.

Identifiable

Trait for an object with an immutable unique ID that identifies itself and its derivatives.

Identifier

Identifies an object in a catalog.

IDF

Compute the Inverse Document Frequency (IDF) given a collection of documents.

IDF

Inverse document frequency (IDF).

IDF.DocumentFrequencyAggregator

Document frequency aggregator.

IDFBase

Params for IDF and IDFModel.

IDFModel

Model fitted by IDF.

IDFModel

Represents an IDF model that can transform term frequency vectors.

ImageDataSource

image package implements Spark SQL data source API for loading image data as DataFrame.

ImageSchema

Defines the image schema and methods to read and manipulate images.

Impurities

Factory for Impurity instances.

Impurity

Trait for calculating information gain.

Imputer

Imputation estimator for completing missing values, using the mean, median or mode of the columns in which the missing values are located.

ImputerModel

Model fitted by Imputer.

ImputerParams

Params for Imputer and ImputerModel.

A filter that evaluates to true iff the attribute evaluates to one of the values in the array.

IncompatibleMergeException

IndexedRow

Represents a row of IndexedRowMatrix.

IndexedRowMatrix

Represents a row-oriented DistributedMatrix with indexed rows.

IndexToString

A Transformer that maps a column of indices back to a new column of corresponding string values.

IndylambdaScalaClosures

InformationGainStats

Information gain statistics for each split param: gain information gain value param: impurity current node impurity param: leftImpurity left node impurity param: rightImpurity right node impurity param: leftPredict left node predict param: rightPredict right node predict

InnerClosureFinder

InProcessLauncher

In-process launcher for Spark applications.

InputDStream<T>

This is the abstract base class for all input streams.

InputFileBlockHolder

This holds file names of the current Spark task.

InputFormatInfo

:: DeveloperApi :: Parses and holds information about inputFormat (and files) specified as a parameter.

InputMetricDistributions

InputMetrics

InputPartition

A serializable representation of an input partition returned by Batch.planInputPartitions() and the corresponding ones in streaming .

InsertableRelation

A BaseRelation that can be used to insert data into it through the insert method.

IntArrayParam

Specialized version of Param[Array[Int} for Java.

IntegerExactNumeric

IntegerType

The data type representing Int values.

IntegerTypeExpression

IntegralTypeExpression

InteractableTerm

A term that may be part of an interaction, e.g.

Interaction

Implements the feature interaction transform.

InternalAccumulator

A collection of fields and methods concerned with internal accumulators that represent task level metrics.

InternalAccumulator.input$

InternalAccumulator.output$

InternalAccumulator.shuffleRead$

InternalAccumulator.shuffleWrite$

InternalKMeansModelWriter

A writer for KMeans that handles the "internal" (or default) format

InternalLinearRegressionModelWriter

A writer for LinearRegression that handles the "internal" (or default) format

InternalNode

Internal Decision Tree node.

InterruptibleIterator<T>

:: DeveloperApi :: An iterator that wraps around an existing iterator to provide task killing functionality.

IntParam

Specialized version of Param[Int] for Java.

IntParam

An extractor object for parsing strings into integers.

IsNotNull

A filter that evaluates to true iff the attribute evaluates to a non-null value.

IsNull

A filter that evaluates to true iff the attribute evaluates to null.

IsotonicRegression

Isotonic regression.

IsotonicRegression

Isotonic regression.

IsotonicRegressionBase

Params for isotonic regression.

IsotonicRegressionModel

Model fitted by IsotonicRegression.

IsotonicRegressionModel

Regression model for isotonic regression.

Iterators

JavaDoubleRDD

JavaDStream<T>

A Java-friendly interface to DStream, the basic abstraction in Spark Streaming that represents a continuous stream of data.

JavaDStreamLike<T,This extends JavaDStreamLike<T,This,R>,R extends JavaRDDLike<T,R>>

JavaFutureAction<T>

JavaHadoopRDD<K,V>

JavaInputDStream<T>

A Java-friendly interface to InputDStream.

JavaIterableWrapperSerializer

A Kryo serializer for serializing results returned by asJavaIterable.

JavaMapWithStateDStream<KeyType,ValueType,StateType,MappedType>

DStream representing the stream of data generated by mapWithState operation on a JavaPairDStream.

JavaModuleOptions

This helper class is used to place the all `--add-opens` options required by Spark when using Java 17.

JavaNewHadoopRDD<K,V>

JavaPackage

A dummy class as a workaround to show the package doc of spark.mllib in generated Java API docs.

JavaPairDStream<K,V>

A Java-friendly interface to a DStream of key-value pairs, which provides extra methods like reduceByKey and join.

JavaPairInputDStream<K,V>

A Java-friendly interface to InputDStream of key-value pairs.

JavaPairRDD<K,V>

JavaPairReceiverInputDStream<K,V>

A Java-friendly interface to ReceiverInputDStream, the abstract class for defining any input stream that receives data over the network.

JavaParams

Java-friendly wrapper for Params.

JavaRDD<T>

JavaRDDLike<T,This extends JavaRDDLike<T,This>>

Defines operations common to several Java RDD implementations.

JavaReceiverInputDStream<T>

A Java-friendly interface to ReceiverInputDStream, the abstract class for defining any input stream that receives data over the network.

JavaSerializer

:: DeveloperApi :: A Spark serializer that uses Java's built-in serialization.

JavaSparkContext

A Java-friendly version of SparkContext that returns JavaRDDs and works with Java collections instead of Scala ones.

JavaSparkStatusTracker

Low-level status reporting APIs for monitoring job and stage progress.

JavaStreamingContext

Deprecated.

This is deprecated as of Spark 3.4.0.

JavaStreamingListenerEvent

Base trait for events related to JavaStreamingListener

JavaUtils

JavaUtils.SerializableMapWrapper<A,B>

JdbcConnectionProvider

::DeveloperApi:: Connection provider which opens connection toward various databases (database specific instance needed).

JdbcDialect

:: DeveloperApi :: Encapsulates everything (extensions, workarounds, quirks) to handle the SQL dialect of a certain database or jdbc driver.

JdbcDialects

:: DeveloperApi :: Registry of dialects that apply to every new jdbc org.apache.spark.sql.DataFrame.

JdbcRDD<T>

An RDD that executes a SQL query on a JDBC connection and reads results.

JdbcRDD.ConnectionFactory

JdbcSQLQueryBuilder

The builder to build a single SELECT query.

JdbcType

:: DeveloperApi :: A database type definition coupled with the jdbc type needed to send null values to the database.

JettyUtils

Utilities for launching a web server using Jetty's HTTP Server class

JettyUtils.ServletParams<T>

JettyUtils.ServletParams$

JobData

JobDataUtil

JobExecutionStatus

JobExecutionStatusSerializer

JobGeneratorEvent

Event classes for JobGenerator

JobListener

Interface used to listen for job completion or failure events after submitting a job to the DAGScheduler.

JobResult

:: DeveloperApi :: A result of a job in the DAGScheduler.

JobSchedulerEvent

JobSubmitter

Handle via which a "run" function passed to a ComplexFutureAction can submit jobs for execution.

JobSucceeded

JsonMatrixConverter

JsonProtocol

Serializes SparkListener events to/from JSON.

Kernel density estimation.

KeyGroupedPartitioning

Represents a partitioning where rows are split across partitions based on the partition transform expressions returned by KeyGroupedPartitioning.keys.

KeyValueGroupedDataset<K,V>

A Dataset has been logically grouped by a user specified grouping key.

KillTask

KinesisDataGenerator

A wrapper interface that will allow us to consolidate the code for synthetic data generation.

KinesisInitialPositions

KinesisInitialPositions.AtTimestamp

KinesisInitialPositions.Latest

KinesisInitialPositions.TrimHorizon

KinesisUtilsPythonHelper

This is a helper class that wraps the methods in KinesisUtils into more Python-friendly class and function so that it can be easily instantiated and called from Python's KinesisUtils.

KMeans

K-means clustering with support for k-means|| initialization proposed by Bahmani et al.

KMeans

K-means clustering with a k-means++ like initialization mode (the k-means|| algorithm by Bahmani et al).

KMeansAggregator

KMeansAggregator computes the distances and updates the centers for blocks in sparse or dense matrix in an online fashion.

KMeansDataGenerator

Generate test data for KMeans.

KMeansModel

Model fitted by KMeans.

KMeansModel

A clustering model for K-means.

KMeansModel.Cluster$

KMeansModel.SaveLoadV1_0$

KMeansModel.SaveLoadV2_0$

KMeansParams

Common params for KMeans and KMeansModel

KMeansSummary

Summary of KMeans.

KnownSizeEstimation

A trait that allows a class to give SizeEstimator more accurate size estimation.

KolmogorovSmirnovTest

Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous distribution.

KolmogorovSmirnovTest

Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous distribution.

KolmogorovSmirnovTest.NullHypothesis$

KolmogorovSmirnovTestResult

Object containing the test results for the Kolmogorov-Smirnov test.

KryoRegistrator

Interface implemented by clients to register their classes with Kryo when using Kryo serialization.

KryoSerializer

A Spark serializer that uses the Kryo serialization library.

KVUtils

L1Updater

Updater for L1 regularized problems.

LabeledPoint

Class that represents the features and label of a data point.

LabeledPoint

Class that represents the features and labels of a data point.

LabelPropagation

Label Propagation algorithm.

LAPACK

LAPACK routines for MLlib's vectors and matrices.

LassoModel

Regression model trained using Lasso.

LassoWithSGD

Train a regression model with L1-regularization using Stochastic Gradient Descent.

Layer

Trait that holds Layer properties, that are needed to instantiate it.

LayerModel

Trait that holds Layer weights (or parameters).

LBFGS

Class used to solve an optimization problem using Limited-memory BFGS.

LDA

Latent Dirichlet Allocation (LDA), a topic model designed for text documents.

LDA

Latent Dirichlet Allocation (LDA), a topic model designed for text documents.

LDAModel

Model fitted by LDA.

LDAModel

Latent Dirichlet Allocation (LDA) model.

LDAOptimizer

An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can hold optimizer-specific parameters for users to set.

LDAParams

LDAUtils

Utility methods for LDA.

LeafNode

Decision tree leaf node.

LeastSquaresGradient

Compute gradient and loss for a Least-squared loss function, as used in linear regression.

LessThan

A filter that evaluates to true iff the attribute evaluates to a value less than value.

LessThanOrEqual

A filter that evaluates to true iff the attribute evaluates to a value less than or equal to value.

LibSVMDataSource

libsvm package implements Spark SQL data source API for loading LIBSVM data as DataFrame.

LinearDataGenerator

Generate sample data used for Linear Data.

LinearRegression

Linear regression.

LinearRegressionModel

Model produced by LinearRegression.

LinearRegressionModel

Regression model trained using LinearRegression.

LinearRegressionParams

Params for linear regression.

LinearRegressionSummary

Linear regression results evaluated on a dataset.

LinearRegressionTrainingSummary

Linear regression training results.

LinearRegressionWithSGD

Train a linear regression model with no regularization using Stochastic Gradient Descent.

LinearSVC

Linear SVM Classifier

LinearSVCModel

Linear SVM Model trained by LinearSVC

LinearSVCParams

Params for linear SVM Classifier.

LinearSVCSummary

Abstraction for LinearSVC results for a given model.

LinearSVCSummaryImpl

LinearSVC results for a given model.

LinearSVCTrainingSummary

Abstraction for LinearSVC training results.

LinearSVCTrainingSummaryImpl

LinearSVC training results.

ListenerBus<L,E>

An event bus which posts events to its listeners.

Lit

Convenience extractor for any Literal.

Literal<T>

Represents a constant literal value in the public expression API.

LiveEntityHelpers

LiveExecutorStageSummary

LiveJob

LiveRDD

Tracker for data related to a persisted RDD.

LiveRDDDistribution

LiveRDDPartition

Data about a single partition of a cached RDD.

LiveResourceProfile

LiveSpeculationStageSummary

LiveStage

LiveTask

Loader<M extends Saveable>

Trait for classes which can load models and transformers from files.

LoadInstanceEnd<T>

Event fired after MLReader.load.

LoadInstanceStart<T>

Event fired before MLReader.load.

LocalKMeans

An utility object to run K-means locally.

LocalLDAModel

Local (non-distributed) model fitted by LDA.

LocalLDAModel

Local LDA model.

LocalScan

A special Scan which will happen on Driver locally instead of Executors.

LogicalDistributions

LogicalExpressions

Helper methods for working with the logical expressions API.

LogicalWriteInfo

This interface contains logical write information that data sources can use when generating a WriteBuilder.

LogisticGradient

Compute gradient and loss for a multinomial logistic loss function, as used in multi-class classification (it is also used in binary logistic regression).

LogisticRegression

Logistic regression.

LogisticRegressionDataGenerator

Generate test data for LogisticRegression.

LogisticRegressionModel

Model produced by LogisticRegression.

LogisticRegressionModel

Classification model trained using Multinomial/Binary Logistic Regression.

LogisticRegressionParams

Params for logistic regression.

LogisticRegressionSummary

Abstraction for logistic regression results for a given model.

LogisticRegressionSummaryImpl

Multiclass logistic regression results for a given model.

LogisticRegressionTrainingSummary

Abstraction for multiclass logistic regression training results.

LogisticRegressionTrainingSummaryImpl

Multiclass logistic regression training results.

LogisticRegressionWithLBFGS

Train a classification model for Multinomial/Binary Logistic Regression using Limited-memory BFGS.

LogisticRegressionWithSGD

Train a classification model for Binary Logistic Regression using Stochastic Gradient Descent.

LogLoss

Class for log loss calculation (for classification).

LogNormalGenerator

Generates i.i.d. samples from the log normal distribution with the given mean and standard deviation.

LongAccumulator

An accumulator for computing sum, count, and average of 64-bit integers.

LongAccumulatorSource

LongExactNumeric

LongParam

Specialized version of Param[Long] for Java.

LongType

The data type representing Long values.

LongTypeExpression

LookupCatalog

A trait to encapsulate catalog lookup function and helpful extractors.

LookupCatalog.AsTableIdentifier

Extract legacy table identifier from a multi-part identifier.

LookupCatalog.AsTableIdentifier$

Extract legacy table identifier from a multi-part identifier.

LookupCatalog.CatalogAndIdentifier

Extract catalog and identifier from a multi-part name with the current catalog if needed.

LookupCatalog.CatalogAndIdentifier$

Extract catalog and identifier from a multi-part name with the current catalog if needed.

LookupCatalog.CatalogAndNamespace

Extract catalog and namespace from a multi-part name with the current catalog if needed.

LookupCatalog.CatalogAndNamespace$

Extract catalog and namespace from a multi-part name with the current catalog if needed.

LookupCatalog.NonSessionCatalogAndIdentifier

Extract non-session catalog and identifier from a multi-part identifier.

LookupCatalog.NonSessionCatalogAndIdentifier$

Extract non-session catalog and identifier from a multi-part identifier.

LookupCatalog.SessionCatalogAndIdentifier

Extract session catalog and identifier from a multi-part identifier.

LookupCatalog.SessionCatalogAndIdentifier$

Extract session catalog and identifier from a multi-part identifier.

Loss

Trait for adding "pluggable" loss functions for the gradient boosting algorithm.

Losses

LossFunction

Trait for loss function

LossReasonPending

A loss reason that means we don't yet know why the executor exited.

LowPrioritySQLImplicits

Lower priority implicit methods for converting Scala objects into Datasets.

LSHParams

Params for LSH.

LZ4CompressionCodec

:: DeveloperApi :: LZ4 implementation of CompressionCodec.

LZFCompressionCodec

:: DeveloperApi :: LZF implementation of CompressionCodec.

MapFunction<T,U>

Base interface for a map function used in Dataset's map function.

MapGroupsFunction<K,V,R>

Base interface for a map function used in GroupedDataset's mapGroup function.

MapGroupsWithStateFunction<K,V,S,R>

::Experimental:: Base interface for a map function used in

KeyValueGroupedDataset.mapGroupsWithState(MapGroupsWithStateFunction, org.apache.spark.sql.Encoder, org.apache.spark.sql.Encoder)

MapOutputCommitMessage

:: Private :: Represents the result of writing map outputs for a shuffle map task.

MapOutputMetadata

:: Private :: An opaque metadata tag for registering the result of committing the output of a shuffle map task.

MapOutputTrackerMasterMessage

MapOutputTrackerMessage

MapPartitionsFunction<T,U>

Base interface for function used in Dataset's mapPartitions.

MappedPoolMemory

MapperRowCounter

An AccumulatorV2 counter for collecting a list of (mapper index, row count).

MapStatus

Result returned by a ShuffleMapTask to a scheduler.

MapType

The data type for Maps.

MapWithStateDStream<KeyType,ValueType,StateType,MappedType>

DStream representing the stream of data generated by mapWithState operation on a pair DStream.

Matrices

Factory methods for Matrix.

Matrices

Factory methods for Matrix.

Matrix

Trait for a local matrix.

Matrix

Trait for a local matrix.

MatrixEntry

Represents an entry in a distributed matrix.

MatrixFactorizationModel

Model representing the result of matrix factorization.

MatrixFactorizationModel.SaveLoadV1_0$

MatrixImplicits

Implicit methods available in Scala for converting Matrix to Matrix and vice versa.

Max

An aggregate function that returns the maximum value in a group.

MaxAbsScaler

Rescale each feature individually to range [-1, 1] by dividing through the largest maximum absolute value in each feature.

MaxAbsScalerModel

Model fitted by MaxAbsScaler.

MaxAbsScalerParams

Params for MaxAbsScaler and MaxAbsScalerModel.

MemoryEntry<T>

MemoryEntryBuilder<T>

MemoryMetrics

MemoryParam

An extractor object for parsing JVM memory strings, such as "10g", into an Int representing the number of megabytes.

MetaAlgorithmReadWrite

Default Meta-Algorithm read and write implementation.

Metadata

Metadata is a wrapper over Map[String, Any] that limits the value type to simple ones: Boolean, Long, Double, String, Metadata, Array[Boolean], Array[Long], Array[Double], Array[String], and Array[Metadata].

MetadataBuilder

Builder for Metadata.

MetadataColumn

Interface for a metadata column.

MetadataUtils

Helper utilities for algorithms using ML metadata

MethodIdentifier<T>

Helper class to identify a method.

Metric

MetricsSystemInstances

MFDataGenerator

Generate RDD(s) containing data for Matrix Factorization.

MicroBatchStream

A SparkDataStream for streaming queries with micro-batch mode.

Milliseconds

Helper object that creates instance of Duration representing a given number of milliseconds.

Min

An aggregate function that returns the minimum value in a group.

MinHashLSH

LSH class for Jaccard distance.

MinHashLSHModel

Model produced by MinHashLSH, where multiple hash functions are stored.

MinMaxScaler

Rescale each feature individually to a common range [min, max] linearly using column summary statistics, which is also known as min-max normalization or Rescaling.

MinMaxScalerModel

Model fitted by MinMaxScaler.

MinMaxScalerParams

Params for MinMaxScaler and MinMaxScalerModel.

Minutes

Helper object that creates instance of Duration representing a given number of minutes.

MiscellaneousProcessDetails

:: DeveloperApi :: Stores information about an Miscellaneous Process to pass from the scheduler to SparkListeners.

MLEvent

Event emitted by ML operations.

MLEvents

A small trait that defines some methods to send MLEvent.

MLFormatRegister

ML export formats for should implement this trait so that users can specify a shortname rather than the fully qualified class name of the exporter.

MLPairRDDFunctions<K,V>

Machine learning specific Pair RDD functions.

MLReadable<T>

Trait for objects that provide MLReader.

MLReader<T>

Abstract class for utility classes that can load ML instances.

MLUtils

Helper methods to load, save and pre-process data used in MLLib.

MLWritable

Trait for classes that provide MLWriter.

MLWriter

Abstract class for utility classes that can save ML instances in Spark's internal format.

MLWriterFormat

Abstract class to be implemented by objects that provide ML exportability.

Model<M extends Model<M>>

A fitted model, i.e., a Transformer produced by an Estimator.

MsSqlServerDialect

MsSqlServerDialect.MsSqlServerSQLBuilder

MsSqlServerDialect.MsSqlServerSQLQueryBuilder

MulticlassClassificationEvaluator

Evaluator for multiclass classification, which expects input columns: prediction, label, weight (optional) and probability (only for logLoss).

MulticlassMetrics

Evaluator for multiclass classification.

MultilabelClassificationEvaluator

:: Experimental :: Evaluator for multi-label classification, which expects two input columns: prediction and label.

MultilabelMetrics

Evaluator for multilabel classification.

MultilayerPerceptronClassificationModel

Classification model based on the Multilayer Perceptron.

MultilayerPerceptronClassificationSummary

Abstraction for MultilayerPerceptronClassification results for a given model.

MultilayerPerceptronClassificationSummaryImpl

MultilayerPerceptronClassification results for a given model.

MultilayerPerceptronClassificationTrainingSummary

Abstraction for MultilayerPerceptronClassification training results.

MultilayerPerceptronClassificationTrainingSummaryImpl

MultilayerPerceptronClassification training results.

MultilayerPerceptronClassifier

Classifier trainer based on the Multilayer Perceptron.

MultilayerPerceptronParams

Params for Multilayer Perceptron.

MultivariateGaussian

This class provides basic functionality for a Multivariate Gaussian (Normal) Distribution.

MultivariateGaussian

This class provides basic functionality for a Multivariate Gaussian (Normal) Distribution.

MultivariateOnlineSummarizer

MultivariateOnlineSummarizer implements MultivariateStatisticalSummary to compute the mean, variance, minimum, maximum, counts, and nonzero counts for instances in sparse or dense vector format in an online fashion.

MultivariateStatisticalSummary

Trait for multivariate statistical summary of a data matrix.

MutableAggregationBuffer

A Row representing a mutable aggregation buffer.

MutablePair<T1,T2>

:: DeveloperApi :: A tuple of 2 elements.

MutableURLClassLoader

URL class loader that exposes the `addURL` method in URLClassLoader.

MySQLDialect

MySQLDialect.MySQLSQLBuilder

MySQLDialect.MySQLSQLQueryBuilder

NaiveBayes

Naive Bayes Classifiers.

NaiveBayes

Trains a Naive Bayes model given an RDD of (label, features) pairs.

NaiveBayesModel

Model produced by NaiveBayes

NaiveBayesModel

Model for Naive Bayes Classifiers.

NaiveBayesModel.SaveLoadV1_0$

NaiveBayesModel.SaveLoadV2_0$

NaiveBayesParams

Params for Naive Bayes Classifiers.

NamedReference

Represents a field or column reference in the public logical expression API.

NamedTransform

Convenience extractor for any Transform.

NamespaceChange

NamespaceChange subclasses represent requested changes to a namespace.

NamespaceChange.RemoveProperty

A NamespaceChange to remove a namespace property.

NamespaceChange.SetProperty

A NamespaceChange to set a namespace property.

NarrowDependency<T>

:: DeveloperApi :: Base class for dependencies where each partition of the child RDD depends on a small number of partitions of the parent RDD.

NewHadoopRDD<K,V>

:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the new MapReduce API (org.apache.hadoop.mapreduce).

NewHadoopRDD.NewHadoopMapPartitionsWithSplitRDD$

NGram

A feature transformer that converts the input array of strings into an array of n-grams.

NioBufferedFileInputStream

InputStream implementation which uses direct buffer to read a file to avoid extra copy of data between Java and native memory which happens when using BufferedInputStream.

NNLS

Object used to solve nonnegative least squares problems using a modified projected gradient method.

NNLS.Workspace

Node

Decision tree node interface.

Node

Node in a decision tree.

Node

NominalAttribute

A nominal attribute.

NoopDialect

NOOP dialect object, always returning the neutral element.

NormalEquationSolver

Interface for classes that solve the normal equations locally.

Normalizer

Normalize a vector to have unit norm using the given p-norm.

Normalizer

Normalizes samples individually to unit L^p^ norm

Not

A predicate that evaluates to true iff child is evaluated to false.

Not

A filter that evaluates to true iff child is evaluated to false.

NullOrdering

A null order used in sorting expressions.

NullType

The data type representing NULL values.

NumericAttribute

A numeric attribute with optional summary statistics.

NumericHistogram

A generic, re-usable histogram class that supports partial aggregations.

NumericHistogram.Coord

The Coord class defines a histogram bin, which is just an (x,y) pair.

NumericParser

Simple parser for a numeric structure consisting of three types:

NumericType

Numeric data types.

NumericTypeExpression

ObjectType

Observation

Helper class to simplify usage of Dataset.observe(String, Column, Column*):

OffHeapExecutionMemory

OffHeapStorageMemory

OffHeapUnifiedMemory

Offset

An abstract representation of progress through a MicroBatchStream or ContinuousStream.

OneHotEncoder

A one-hot encoder that maps a column of category indices to a column of binary vectors, with at most a single one-value per row that indicates the input category index.

OneHotEncoderBase

Private trait for params and common methods for OneHotEncoder and OneHotEncoderModel

OneHotEncoderCommon

Provides some helper methods used by OneHotEncoder.

OneHotEncoderModel

param: categorySizes Original number of categories for each feature being encoded.

OneToOneDependency<T>

:: DeveloperApi :: Represents a one-to-one dependency between partitions of the parent and child RDDs.

OneVsRest

Reduction of Multiclass Classification to Binary Classification.

OneVsRestModel

Model produced by OneVsRest.

OneVsRestParams

Params for OneVsRest.

OnHeapExecutionMemory

OnHeapStorageMemory

OnHeapUnifiedMemory

OnlineLDAOptimizer

An online optimizer for LDA.

Optimizer

Trait for optimization problem solvers.

Optional<T>

Like java.util.Optional in Java 8, scala.Option in Scala, and com.google.common.base.Optional in Google Guava, this class represents a value of a given type that may or may not exist.

A predicate that evaluates to true iff at least one of left or right evaluates to true.

A filter that evaluates to true iff at least one of left or right evaluates to true.

OracleDialect

OracleDialect.OracleSQLBuilder

OracleDialect.OracleSQLQueryBuilder

OrderedDistribution

A distribution where tuples have been ordered across partitions according to ordering expressions, but not necessarily within a given partition.

OrderedRDDFunctions<K,V,P extends scala.Product2<K,V>>

Extra functions available on RDDs of (key, value) pairs where the key is sortable through an implicit conversion.

OutputCommitCoordinationMessage

OutputMetricDistributions

OutputMetrics

OutputMode

OutputMode describes what data will be written to a streaming sink when there is new data available in a streaming DataFrame/Dataset.

OutputOperationInfo

:: DeveloperApi :: Class having information on output operations.

PagedTable<T>

A paged table that will generate a HTML table for a specified page and also the page navigation.

PageRank

PageRank algorithm implementation.

PairDStreamFunctions<K,V>

Extra functions available on DStream of (key, value) pairs through an implicit conversion.

PairFlatMapFunction<T,K,V>

A function that returns zero or more key-value pair records from each input record.

PairFunction<T,K,V>

A function that returns key-value pairs (Tuple2<K, V>), and can be used to construct PairRDDs.

PairRDDFunctions<K,V>

Extra functions available on RDDs of (key, value) pairs through an implicit conversion.

PairwiseRRDD<T>

Form an RDD[(Int, Array[Byte])] from key-value pairs returned from R.

Param<T>

A param with self-contained documentation and optionally default value.

ParamGridBuilder

Builder for a param grid used in grid search-based model selection.

ParamMap

A param to value map.

ParamPair<T>

A param and its value.

Params

Trait for components that take parameters.

ParamValidators

Factory methods for common validation functions for Param.isValid.

ParentClassLoader

A class loader which makes some protected methods in ClassLoader accessible.

PartialResult<R>

Partition

An identifier for a partition in an RDD.

PartitionCoalescer

::DeveloperApi:: A PartitionCoalescer defines how to coalesce the partitions of a given RDD.

Partitioner

An object that defines how the elements in a key-value pair RDD are partitioned by key.

PartitionEvaluator<T,U>

An evaluator for computing RDD partitions.

PartitionEvaluatorFactory<T,U>

A factory to create PartitionEvaluator.

PartitionGroup

::DeveloperApi:: A group of Partitions param: prefLoc preferred location for the partition group

Partitioning

An interface to represent the output data partitioning for a data source, which is returned by SupportsReportPartitioning.outputPartitioning().

PartitioningUtils

PartitionOffset

Used for per-partition offsets in continuous processing.

PartitionPruningRDD<T>

:: DeveloperApi :: An RDD used to prune RDD partitions/partitions so we can avoid launching tasks on all partitions.

PartitionReader<T>

A partition reader returned by PartitionReaderFactory.createReader(InputPartition) or PartitionReaderFactory.createColumnarReader(InputPartition).

PartitionReaderFactory

A factory used to create PartitionReader instances.

PartitionStrategy

Represents the way edges are assigned to edge partitions based on their source and destination vertex IDs.

PartitionStrategy.CanonicalRandomVertexCut$

Assigns edges to partitions by hashing the source and destination vertex IDs in a canonical direction, resulting in a random vertex cut that colocates all edges between two vertices, regardless of direction.

PartitionStrategy.EdgePartition1D$

Assigns edges to partitions using only the source vertex ID, colocating edges with the same source.

PartitionStrategy.EdgePartition2D$

Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix, guaranteeing a 2 * sqrt(numParts) bound on vertex replication.

PartitionStrategy.RandomVertexCut$

Assigns edges to partitions by hashing the source and destination vertex IDs, resulting in a random vertex cut that colocates all same-direction edges between two vertices.

PCA

PCA trains a model to project vectors to a lower dimensional space of the top PCA!.


PCA

A feature transformer that projects vectors to a low-dimensional space using PCA.

PCAModel

Model fitted by PCA.

PCAModel

Model fitted by PCA that can project vectors to a low-dimensional space using PCA.

PCAParams

Params for PCA and PCAModel.

PCAUtil
 
PearsonCorrelation

Compute Pearson correlation for two RDDs of the type RDD[Double] or the correlation matrix
 for an RDD of the type RDD[Vector].

PhysicalWriteInfo

This interface contains physical write information that data sources can use when
 generating a DataWriterFactory or a StreamingDataWriterFactory.

Pipeline

A simple pipeline, which acts as an estimator.

Pipeline.SharedReadWrite$

Methods for MLReader and MLWriter shared between Pipeline and PipelineModel

PipelineModel

Represents a fitted pipeline.

PipelineStage

A stage in a pipeline, either an Estimator or a Transformer.

PluginContext

:: DeveloperApi ::
 Context information and operations for plugins loaded by Spark.

PMMLExportable

Export model to the PMML format
 Predictive Model Markup Language (PMML) is an XML-based file format
 developed by the Data Mining Group (www.dmg.org).

PMMLKMeansModelWriter

A writer for KMeans that handles the "pmml" format

PMMLLinearRegressionModelWriter

A writer for LinearRegression that handles the "pmml" format

PMMLModelExport
 
PMMLModelExportFactory
 
PoissonBounds

Utility functions that help us determine bounds on adjusted sampling rate to guarantee exact
 sample sizes with high confidence when sampling with replacement.

PoissonGenerator

Generates i.i.d. samples from the Poisson distribution with the given mean.

PoissonSampler<T>

:: DeveloperApi ::
 A sampler for sampling with replacement, based on values drawn from Poisson distribution.

PolynomialExpansion

Perform feature expansion in a polynomial space.

PortableDataStream

A class that allows DataStreams to be serialized and moved around by not creating them
 until they need to be read

PostgresDialect
 
PowerIterationClustering

Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
 Lin and Cohen.

PowerIterationClustering

Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
 Lin and Cohen.

PowerIterationClustering.Assignment

Cluster assignment.

PowerIterationClustering.Assignment$
 
PowerIterationClusteringModel

Model produced by PowerIterationClustering.

PowerIterationClusteringModel.SaveLoadV1_0$
 
PowerIterationClusteringParams

Common params for PowerIterationClustering

PowerIterationClusteringWrapper
 
Precision

Precision.

Predicate

The general representation of predicate expressions, which contains the upper-cased expression
 name and all the children expressions.

Predict

Predicted value for a node
 param:  predict predicted value
 param:  prob probability of the label (classification only)

PredictionModel<FeaturesType,M extends PredictionModel<FeaturesType,M>>

Abstraction for a model for prediction tasks (regression and classification).

Predictor<FeaturesType,Learner extends Predictor<FeaturesType,Learner,M>,M extends PredictionModel<FeaturesType,M>>

Abstraction for prediction problems (regression and classification).

PredictorParams

(private[ml])  Trait for parameters for prediction (regression and classification).

PrefixSpan

A parallel PrefixSpan algorithm to mine frequent sequential patterns.

PrefixSpan

A parallel PrefixSpan algorithm to mine frequent sequential patterns.

PrefixSpan.FreqSequence<Item>

Represents a frequent sequence.

PrefixSpan.Postfix$
 
PrefixSpan.Prefix$
 
PrefixSpanModel<Item>

Model fitted by PrefixSpan
 param:  freqSequences frequent sequences

PrefixSpanModel.SaveLoadV1_0$
 
PrefixSpanWrapper
 
Pregel

Implements a Pregel-like bulk-synchronous message-passing API.

ProbabilisticClassificationModel<FeaturesType,M extends ProbabilisticClassificationModel<FeaturesType,M>>

Model produced by a ProbabilisticClassifier.

ProbabilisticClassifier<FeaturesType,E extends ProbabilisticClassifier<FeaturesType,E,M>,M extends ProbabilisticClassificationModel<FeaturesType,M>>

Single-label binary or multiclass classifier which can output class conditional probabilities.

ProbabilisticClassifierParams

(private[classification])  Params for probabilistic classification.

ProcessSummary
 
ProcessTreeMetrics
 
ProtobufSerDe<T>

:: DeveloperApi ::
 ProtobufSerDe used to represent the API for serialize and deserialize of
 Protobuf data related to UI.

ProxyRedirectHandler

A Jetty handler to handle redirects to a proxy server.

PrunedFilteredScan

A BaseRelation that can eliminate unneeded columns and filter using selected
 predicates before producing an RDD containing all matching tuples as Row objects.

PrunedScan

A BaseRelation that can eliminate unneeded columns before producing an RDD
 containing all of its tuples as Row objects.

Pseudorandom

:: DeveloperApi ::
 A class with pseudorandom behavior.

PushBasedFetchHelper

Helper class for ShuffleBlockFetcherIterator that encapsulates all the push-based
 functionality to fetch push-merged block meta and shuffle chunks.

PythonStreamingListener
 
PythonStreamingQueryListener

Py4J allows a pure interface so this proxy is required.

QRDecomposition<QType,RType>

Represents QR factors.

QuantileDiscretizer

QuantileDiscretizer takes a column with continuous features and outputs a column with binned
 categorical features.

QuantileDiscretizerBase

Params for QuantileDiscretizer.

QuantileStrategy

Enum for selecting the quantile calculation strategy

QueryCompilationErrors

Object for grouping error messages from exceptions thrown during query compilation.

QueryContext

Query context of a SparkThrowable.

QueryErrorsBase

The trait exposes util methods for preparing error messages such as quoting of error elements.

QueryExecutionErrors

Object for grouping error messages from (most) exceptions thrown during query execution.

QueryExecutionListener

The interface of query execution listener that can be used to analyze execution metrics.

QueryParsingErrors

Object for grouping all error messages of the query parsing.

RandomBlockReplicationPolicy
 
RandomDataGenerator<T>

Trait for random data generators that generate i.i.d. data.

RandomForest

ALGORITHM

RandomForest

A class that implements a Random Forest
 learning algorithm for classification and regression.

RandomForestClassificationModel

Random Forest model for classification.

RandomForestClassificationSummary

Abstraction for multiclass RandomForestClassification results for a given model.

RandomForestClassificationSummaryImpl

Multiclass RandomForestClassification results for a given model.

RandomForestClassificationTrainingSummary

Abstraction for multiclass RandomForestClassification training results.

RandomForestClassificationTrainingSummaryImpl

Multiclass RandomForestClassification training results.

RandomForestClassifier

Random Forest learning algorithm for
 classification.

RandomForestClassifierParams
 
RandomForestModel

Represents a random forest model.

RandomForestParams

Parameters for Random Forest algorithms.

RandomForestRegressionModel

Random Forest model for regression.

RandomForestRegressor

Random Forest
 learning algorithm for regression.

RandomForestRegressorParams
 
RandomRDDs

Generator methods for creating RDDs comprised of i.i.d. samples from some distribution.

RandomSampler<T,U>

:: DeveloperApi ::
 A pseudorandom sampler.

RangeDependency<T>

:: DeveloperApi ::
 Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.

RangePartitioner<K,V>

A Partitioner that partitions sortable records by range into roughly
 equal ranges.

RankingEvaluator

:: Experimental ::
 Evaluator for ranking, which expects two input columns: prediction and label.

RankingMetrics<T>

Evaluator for ranking algorithms.

RateEstimator

A component that estimates the rate at which an InputDStream should ingest
 records, based on updates at every batch completion.

Rating

A more compact class to represent a rating than Tuple3[Int, Int, Double].

RawTextHelper
 
RawTextSender

A helper program that sends blocks of Kryo-serialized text strings out on a socket at a
 specified rate.

RBackendAuthHandler

Authentication handler for connections from the R process.

RDD<T>

A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.

RDDBarrier<T>

:: Experimental ::
 Wraps an RDD in a barrier stage, which forces Spark to launch tasks of this stage together.

RDDBlockId
 
RDDDataDistribution
 
RDDFunctions<T>

Machine learning specific RDD functions.

RDDInfo
 
RDDPartitionInfo
 
RDDPartitionSeq

A custom sequence of partitions based on a mutable linked list.

RDDStorageInfo
 
ReadableChannelFileRegion
 
ReadAheadInputStream

InputStream implementation which asynchronously reads ahead from the underlying input
 stream when specified amount of data has been read from the current buffer.

ReadAllAvailable

Represents a ReadLimit where the MicroBatchStream must scan all the data
 available at the streaming source.

ReadLimit

Interface representing limits on how much to read from a MicroBatchStream when it
 implements SupportsAdmissionControl.

ReadMaxFiles

Represents a ReadLimit where the MicroBatchStream should scan approximately the
 given maximum number of files.

ReadMaxRows

Represents a ReadLimit where the MicroBatchStream should scan approximately the
 given maximum number of rows.

ReadMinRows

Represents a ReadLimit where the MicroBatchStream should scan approximately
 at least the given minimum number of rows.

Recall

Recall.

ReceivedBlock

Trait representing a received block

ReceivedBlockHandler

Trait that represents a class that handles the storage of blocks received by receiver

ReceivedBlockStoreResult

Trait that represents the metadata related to storage of blocks

ReceivedBlockTrackerLogEvent

Trait representing any event in the ReceivedBlockTracker that updates its state.

Receiver<T>

:: DeveloperApi ::
 Abstract class of a receiver that can be run on worker nodes to receive external data.

ReceiverInfo
 
ReceiverInfo

:: DeveloperApi ::
 Class having information about a receiver

ReceiverInputDStream<T>

Abstract class for defining any InputDStream
 that has to start a receiver on worker nodes to receive external data.

ReceiverMessage

Messages sent to the Receiver.

ReceiverState

Enumeration to identify current state of a Receiver

ReceiverTrackerLocalMessage

Messages used by the driver and ReceiverTrackerEndpoint to communicate locally.

ReceiverTrackerMessage

Messages used by the NetworkReceiver and the ReceiverTracker to communicate
 with each other.

RecursiveFlag
 
ReduceFunction<T>

Base interface for function used in Dataset's reduce.

Ref

Convenience extractor for any NamedReference.

RegexTokenizer

A regex based tokenizer that extracts tokens either by using the provided regex pattern to split
 the text (default) or repeatedly matching the regex (if gaps is false).

RegressionEvaluator

Evaluator for regression, which expects input columns prediction, label and
 an optional weight column.

RegressionMetrics

Evaluator for regression.

RegressionModel<FeaturesType,M extends RegressionModel<FeaturesType,M>>

Model produced by a Regressor.

RegressionModel
 
Regressor<FeaturesType,Learner extends Regressor<FeaturesType,Learner,M>,M extends RegressionModel<FeaturesType,M>>

Single-label regression

RelationalGroupedDataset

A set of methods for aggregations on a DataFrame, created by groupBy,
 cube or rollup (and also pivot).

RelationalGroupedDataset.CubeType$

To indicate it's the CUBE

RelationalGroupedDataset.GroupByType$

To indicate it's the GroupBy

RelationalGroupedDataset.GroupType

The Grouping Type

RelationalGroupedDataset.PivotType$
 
RelationalGroupedDataset.RollupType$

To indicate it's the ROLLUP

RelationProvider

Implemented by objects that produce relations for a specific kind of data source.

ReportsSinkMetrics

A mix-in interface for streaming sinks to signal that they can report
 metrics.

ReportsSourceMetrics

A mix-in interface for SparkDataStream streaming sources to signal that they can report
 metrics.

RequestMethod
 
RequiresDistributionAndOrdering

A write that requires a specific distribution and ordering of data.

ResourceAllocator

Trait used to help executor/worker allocate resources.

ResourceDiscoveryPlugin

:: DeveloperApi ::
 A plugin that can be dynamically loaded into a Spark application to control how custom
 resources are discovered.

ResourceDiscoveryScriptPlugin

The default plugin that is loaded into a Spark application to control how custom
 resources are discovered.

ResourceID

Resource identifier.

ResourceInformation

Class to hold information about a type of Resource.

ResourceInformationJson

A case class to simplify JSON serialization of ResourceInformation.

ResourceProfile

Resource profile to associate with an RDD.

ResourceProfile.DefaultProfileExecutorResources$
 
ResourceProfile.ExecutorResourcesOrDefaults$
 
ResourceProfileBuilder

Resource profile builder to build a ResourceProfile to associate with an RDD.

ResourceProfileInfo
 
ResourceRequest

Class that represents a resource request.

ResourceUtils
 
ResubmitFailedStages
 
Resubmitted

:: DeveloperApi ::
 A org.apache.spark.scheduler.ShuffleMapTask that completed successfully earlier, but we
 lost the executor before the stage completed.

ReturnStatementFinder
 
ReviveOffers
 
RewritableTransform

Allows Spark to rewrite the given references of the transform during analysis.

RFormula

Implements the transforms required for fitting a dataset against an R model formula.

RFormulaBase

Base trait for RFormula and RFormulaModel.

RFormulaModel

Model fitted by RFormula.

RFormulaParser

Limited implementation of R formula parsing.

RidgeRegressionModel

Regression model trained using RidgeRegression.

RidgeRegressionWithSGD

Train a regression model with L2-regularization using Stochastic Gradient Descent.

RobustScaler

Scale features using statistics that are robust to outliers.

RobustScalerModel

Model fitted by RobustScaler.

RobustScalerParams

Params for RobustScaler and RobustScalerModel.

RollingPolicy

Defines the policy based on which RollingFileAppender will
 generate rolling files.

Row

Represents one row of output from a relational operator.

RowFactory

A factory class used to construct Row objects.

RowLevelOperation

A logical representation of a data source DELETE, UPDATE, or MERGE operation that requires
 rewriting data.

RowLevelOperation.Command

A row-level SQL command.

RowLevelOperationBuilder

An interface for building a RowLevelOperation.

RowLevelOperationInfo

An interface with logical information for a row-level operation such as DELETE, UPDATE, MERGE.

RowMatrix

Represents a row-oriented distributed Matrix with no meaningful row indices.

RpcUtils
 
RRDD<T>

An RDD that stores serialized R objects as Array[Byte].

RRunnerModes
 
RuntimeConfig

Runtime configuration interface for Spark.

RuntimeInfo
 
RuntimePercentage
 
RUtils
 
RWrappers

This is the Scala stub of SparkR read.ml.

RWrapperUtils
 
SafeJsonSerializer
 
SamplePathFilter

Filter that allows loading a fraction of HDFS files.

SamplingUtils
 
Saveable

Trait for models and transformers which may be saved as files.

SaveInstanceEnd

Event fired after MLWriter.save.

SaveInstanceStart

Event fired before MLWriter.save.

SaveMode

SaveMode is used to specify the expected behavior of saving a DataFrame to a data source.

ScalarFunction<R>

Interface for a function that produces a result value for each input row.

Scan

A logical representation of a data source scan.

Scan.ColumnarSupportMode

This enum defines how the columnar support for the partitions of the data source
 should be determined.

ScanBuilder

An interface for building the Scan.

Schedulable

An interface for schedulable entities.

SchedulableBuilder

An interface to build Schedulable tree
 buildPools: build the tree nodes(pools)
 addTaskSetManager: build the leaf nodes(TaskSetManagers)

SchedulerBackend

A backend interface for scheduling systems that allows plugging in different ones under
 TaskSchedulerImpl.

SchedulerBackendUtils
 
SchedulerPool
 
SchedulingAlgorithm

An interface for sort algorithm
 FIFO: FIFO algorithm between TaskSetManagers
 FS: FS algorithm between Pools, and FIFO or FS within Pools

SchedulingMode

"FAIR" and "FIFO" determines which policy is used
    to order tasks amongst a Schedulable's sub-queues
  "NONE" is used when the a Schedulable has no sub-queues.

SchemaConverters

This object contains method that are used to convert sparkSQL schemas to avro schemas and vice
 versa.

SchemaConverters.SchemaType

Internal wrapper for SQL data type and nullability.

SchemaConverters.SchemaType$
 
SchemaRelationProvider

Implemented by objects that produce relations for a specific kind of data source
 with a given schema.

SchemaUtils

Utils for handling schemas.

SchemaUtils

Utils for handling schemas.

Seconds

Helper object that creates instance of Duration representing
 a given number of seconds.

SecurityConfigurationLock

There are cases when global JVM security configuration must be modified.

SecurityUtils

Various utility methods used by Spark Security.

SelectorParams

Params for Selector and SelectorModel.

SequenceFileRDDFunctions<K,V>

Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile,
 through an implicit conversion.

SerDe

Utility functions to serialize, deserialize objects to / from R

SerializableConfiguration

Hadoop configuration but serializable.

SerializableWritable<T extends org.apache.hadoop.io.Writable>
 
SerializationDebugger
 
SerializationDebugger.ObjectStreamClassMethods

An implicit class that allows us to call private methods of ObjectStreamClass.

SerializationDebugger.ObjectStreamClassMethods$
 
SerializationFormats
 
SerializationStream

:: DeveloperApi ::
 A stream for writing serialized objects.

SerializedMemoryEntry<T>
 
SerializedValuesHolder<T>

A holder for storing the serialized values.

Serializer

:: DeveloperApi ::
 A serializer.

SerializerHelper
 
SerializerInstance

:: DeveloperApi ::
 An instance of a serializer, for use by one thread at a time.

SessionConfigSupport

A mix-in interface for TableProvider.

SharedParamsCodeGen

Code generator for shared params (sharedParams.scala).

ShortestPaths

Computes shortest paths to the given set of landmark vertices, returning a graph where each
 vertex attribute is a map containing the shortest-path distance to each reachable landmark.

ShortExactNumeric
 
ShortType

The data type representing Short values.

ShortTypeExpression
 
ShuffleBlockBatchId
 
ShuffleBlockChunkId
 
ShuffleBlockId
 
ShuffleChecksumBlockId
 
ShuffleDataBlockId
 
ShuffleDataIO

:: Private ::
 An interface for plugging in modules for storing and reading temporary shuffle data.

ShuffleDependency<K,V,C>

:: DeveloperApi ::
 Represents a dependency on the output of a shuffle stage.

ShuffledRDD<K,V,C>

:: DeveloperApi ::
 The resulting RDD from a shuffle (e.g. repartitioning of data).

ShuffleDriverComponents

:: Private ::
 An interface for building shuffle support modules for the Driver.

ShuffleExecutorComponents

:: Private ::
 An interface for building shuffle support for Executors.

ShuffleFetchCompletionListener

A listener to be called at the completion of the ShuffleBlockFetcherIterator
 param:  data the ShuffleBlockFetcherIterator to process

ShuffleIndexBlockId
 
ShuffleMapOutputWriter

:: Private ::
 A top-level writer that returns child writers for persisting the output of a map task,
 and then commits all of the writes as one atomic operation.

ShuffleMergedBlockId
 
ShuffleMergedDataBlockId
 
ShuffleMergedIndexBlockId
 
ShuffleMergedMetaBlockId
 
ShuffleOutputStatus

A common trait between MapStatus and MergeStatus.

ShufflePartitionWriter

:: Private ::
 An interface for opening streams to persist partition bytes to a backing data store.

ShufflePushBlockId
 
ShufflePushReadMetricDistributions
 
ShufflePushReadMetrics
 
ShuffleReadMetricDistributions
 
ShuffleReadMetrics
 
ShuffleStatus

Helper class used by the MapOutputTrackerMaster to perform bookkeeping for a single
 ShuffleMapStage.

ShuffleWriteMetricDistributions
 
ShuffleWriteMetrics
 
ShutdownHookManager

Various utility methods used by Spark.

SignalUtils

Contains utilities for working with posix signals.

SimpleFutureAction<T>

A FutureAction holding the result of an action that triggers a single job.

SimpleMetricsCachedBatch

A CachedBatch that stores some simple metrics that can be used for filtering of batches with
 the SimpleMetricsCachedBatchSerializer.

SimpleMetricsCachedBatchSerializer

Provides basic filtering for CachedBatchSerializer implementations.

SimpleUpdater

A simple updater for gradient descent *without* any regularization.

SingleSpillShuffleMapOutputWriter

Optional extension for partition writing that is optimized for transferring a single
 file to the backing store.

SingleValueExecutorMetricType
 
SingularValueDecomposition<UType,VType>

Represents singular value decomposition (SVD) factors.

Sink
 
SinkProgress

Information about progress made for a sink in the execution of a StreamingQuery
 during a trigger.

SinkProgressSerializer
 
SizeEstimator

:: DeveloperApi ::
 Estimates the sizes of Java objects (number of bytes of memory they occupy), for use in
 memory-aware caches.

SnappyCompressionCodec

:: DeveloperApi ::
 Snappy implementation of CompressionCodec.

SortDirection

A sort direction used in sorting expressions.

SortOrder

Represents a sort order in the public expression API.

Source
 
SourceProgress

Information about progress made for a source in the execution of a StreamingQuery
 during a trigger.

SourceProgressSerializer
 
SparkAppHandle

A handle to a running Spark application.

SparkAppHandle.Listener

Listener for updates to a handle's state.

SparkAppHandle.State

Represents the application's state.

SparkAWSCredentials

Serializable interface providing a method executors can call to obtain an
 AWSCredentialsProvider instance for authenticating to AWS services.

SparkAWSCredentials.Builder

Builder for SparkAWSCredentials instances.

SparkBuildInfo
 
SparkClassUtils
 
SparkConf

Configuration for a Spark application.

SparkContext

Main entry point for Spark functionality.

SparkCoreErrors

Object for grouping error messages from (most) exceptions thrown during query execution.

SparkDataStream

The base interface representing a readable data stream in a Spark streaming query.

SparkEnv

:: DeveloperApi ::
 Holds all the runtime environment objects for a running Spark instance (either master or worker),
 including the serializer, RpcEnv, block manager, map output tracker, etc.

SparkErrorUtils
 
SparkException
 
SparkExecutorInfo

Exposes information about Spark Executors.

SparkExecutorInfoImpl
 
SparkExitCode
 
SparkFiles

Resolves paths to files added through SparkContext.addFile().

SparkFileUtils
 
SparkFilterApi

TODO (PARQUET-1809): This is a temporary workaround; it is intended to be moved to Parquet.

SparkFirehoseListener

Class that allows users to receive all SparkListener events.

SparkHadoopMapRedUtil
 
SparkJobInfo

Exposes information about Spark Jobs.

SparkJobInfoImpl
 
SparkLauncher

Launcher for Spark applications.

SparkListener

:: DeveloperApi ::
 A default implementation for SparkListenerInterface that has no-op implementations for
 all callbacks.

SparkListenerApplicationEnd
 
SparkListenerApplicationStart
 
SparkListenerBlockManagerAdded
 
SparkListenerBlockManagerRemoved
 
SparkListenerBlockUpdated
 
SparkListenerBus

A SparkListenerEvent bus that relays SparkListenerEvents to its listeners

SparkListenerEnvironmentUpdate
 
SparkListenerEvent
 
SparkListenerExecutorAdded
 
SparkListenerExecutorBlacklisted
Deprecated.
use SparkListenerExecutorExcluded instead.

SparkListenerExecutorBlacklistedForStage
Deprecated.
use SparkListenerExecutorExcludedForStage instead.

SparkListenerExecutorExcluded
 
SparkListenerExecutorExcludedForStage
 
SparkListenerExecutorMetricsUpdate

Periodic updates from executors.

SparkListenerExecutorRemoved
 
SparkListenerExecutorUnblacklisted
Deprecated.
use SparkListenerExecutorUnexcluded instead.

SparkListenerExecutorUnexcluded
 
SparkListenerInterface

Interface for listening to events from the Spark scheduler.

SparkListenerJobEnd
 
SparkListenerJobStart
 
SparkListenerLogStart

An internal class that describes the metadata of an event log.

SparkListenerMiscellaneousProcessAdded
 
SparkListenerNodeBlacklisted
Deprecated.
use SparkListenerNodeExcluded instead.

SparkListenerNodeBlacklistedForStage
Deprecated.
use SparkListenerNodeExcludedForStage instead.

SparkListenerNodeExcluded
 
SparkListenerNodeExcludedForStage
 
SparkListenerNodeUnblacklisted
Deprecated.
use SparkListenerNodeUnexcluded instead.

SparkListenerNodeUnexcluded
 
SparkListenerResourceProfileAdded
 
SparkListenerSpeculativeTaskSubmitted
 
SparkListenerStageCompleted
 
SparkListenerStageExecutorMetrics

Peak metric values for the executor for the stage, written to the history log at stage
 completion.

SparkListenerStageSubmitted
 
SparkListenerTaskEnd
 
SparkListenerTaskGettingResult
 
SparkListenerTaskStart
 
SparkListenerUnpersistRDD
 
SparkListenerUnschedulableTaskSetAdded
 
SparkListenerUnschedulableTaskSetRemoved
 
SparkMasterRegex

A collection of regexes for extracting information from the master string.

SparkPath

A canonical representation of a file path.

SparkPlugin

:: DeveloperApi ::
 A plugin that can be dynamically loaded into a Spark application.

SparkSchemaUtils

Utils for handling schemas.

SparkSerDeUtils
 
SparkSession

The entry point to programming Spark with the Dataset and DataFrame API.

SparkSession.Builder

Builder for SparkSession.

SparkSessionExtensions

:: Experimental ::
 Holder for injection points to the SparkSession.

SparkSessionExtensionsProvider

:: Unstable ::

SparkShutdownHook
 
SparkStageInfo

Exposes information about Spark Stages.

SparkStageInfoImpl
 
SparkStatusTracker

Low-level status reporting APIs for monitoring job and stage progress.

SparkThreadUtils
 
SparkThrowable

Interface mixed into Throwables thrown from Spark

SparkThrowableHelper

Companion object used by instances of SparkThrowable to access error class information and
 construct error messages.

SparseMatrix

Column-major sparse matrix.

SparseMatrix

Column-major sparse matrix.

SparseVector

A sparse vector represented by an index array and a value array.

SparseVector

A sparse vector represented by an index array and a value array.

SpearmanCorrelation

Compute Spearman's correlation for two RDDs of the type RDD[Double] or the correlation matrix
 for an RDD of the type RDD[Vector].

SpecialLengths
 
SpeculationStageSummary
 
SpillListener

A SparkListener that detects whether spills have occurred in Spark jobs.

Split

Interface for a "Split," which specifies a test made at a decision tree node
 to choose the left or right path.

Split

Split applied to a feature
 param:  feature feature index
 param:  threshold Threshold for continuous feature.

SplitInfo
 
SQLContext

The entry point for working with structured data (rows and columns) in Spark 1.x.

SQLDataTypes

SQL data types for vectors and matrices.

SQLImplicits

A collection of implicit methods for converting common Scala objects into Datasets.

SQLOpenHashSet<T>
 
SQLPlanMetricSerializer
 
SQLTransformer

Implements the transformations which are defined by SQL statement.

SQLUserDefinedType

::DeveloperApi::
 A user-defined type which can be automatically recognized by a SQLContext and registered.

SQLUtils
 
SquaredError

Class for squared error loss calculation.

SquaredEuclideanSilhouette

SquaredEuclideanSilhouette computes the average of the
 Silhouette over all the data of the dataset, which is
 a measure of how appropriately the data have been clustered.

SquaredEuclideanSilhouette.ClusterStats
 
SquaredEuclideanSilhouette.ClusterStats$
 
SquaredL2Updater

Updater for L2 regularized problems.

StackTrace
 
StageData
 
StagedTable

Represents a table which is staged for being committed to the metastore.

StageInfo

:: DeveloperApi ::
 Stores information about a stage to pass from the scheduler to SparkListeners.

StageStatus
 
StageStatusSerializer
 
StagingTableCatalog

An optional mix-in for implementations of TableCatalog that support staging creation of
 the a table before committing the table's metadata along with its contents in CREATE TABLE AS
 SELECT or REPLACE TABLE AS SELECT operations.

StandardNormalGenerator

Generates i.i.d. samples from the standard normal distribution.

StandardScaler

Standardizes features by removing the mean and scaling to unit variance using column summary
 statistics on the samples in the training set.

StandardScaler

Standardizes features by removing the mean and scaling to unit std using column summary
 statistics on the samples in the training set.

StandardScalerModel

Model fitted by StandardScaler.

StandardScalerModel

Represents a StandardScaler model that can transform vectors.

StandardScalerParams

Params for StandardScaler and StandardScalerModel.

StatCounter

A class for tracking the statistics of a set of numbers (count, mean and variance) in a
 numerically robust way.

State<S>

:: Experimental ::
 Abstract class for getting and updating the state in mapping function used in the mapWithState
 operation of a pair DStream (Scala)
 or a JavaPairDStream (Java).

StateOperatorProgress

Information about updates made to stateful operators in a StreamingQuery during a trigger.

StateOperatorProgressSerializer
 
StateSpec<KeyType,ValueType,StateType,MappedType>

:: Experimental ::
 Abstract class representing all the specifications of the DStream transformation
 mapWithState operation of a
 pair DStream (Scala) or a
 JavaPairDStream (Java).

StaticSources
 
Statistics

API for statistical functions in MLlib.

Statistics

An interface to represent statistics for a data source, which is returned by
 SupportsReportStatistics.estimateStatistics().

StatsdMetricType
 
StatsReportListener

:: DeveloperApi ::
 Simple SparkListener that logs a few summary statistics when each stage completes.

StatsReportListener

:: DeveloperApi ::
 A simple StreamingListener that logs summary statistics across Spark Streaming batches
 param:  numBatchInfos Number of last batches to consider for generating statistics (default: 10)

StatusUpdate
 
StopAllReceivers

This message will trigger ReceiverTrackerEndpoint to send stop signals to all registered
 receivers.

StopCoordinator
 
StopExecutor
 
StopMapOutputTracker
 
StopReceiver
 
StopWordsRemover

A feature transformer that filters out stop words from input.

StorageLevel

:: DeveloperApi ::
 Flags for controlling the storage of an RDD.

StorageLevels

Expose some commonly useful storage level constants.

StorageUtils

Helper methods for storage-related objects.

StoreTypes
 
StoreTypes.AccumulableInfo

Protobuf type org.apache.spark.status.protobuf.AccumulableInfo

StoreTypes.AccumulableInfo.Builder

Protobuf type org.apache.spark.status.protobuf.AccumulableInfo

StoreTypes.AccumulableInfoOrBuilder
 
StoreTypes.ApplicationAttemptInfo

Protobuf type org.apache.spark.status.protobuf.ApplicationAttemptInfo

StoreTypes.ApplicationAttemptInfo.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationAttemptInfo

StoreTypes.ApplicationAttemptInfoOrBuilder
 
StoreTypes.ApplicationEnvironmentInfo

Protobuf type org.apache.spark.status.protobuf.ApplicationEnvironmentInfo

StoreTypes.ApplicationEnvironmentInfo.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationEnvironmentInfo

StoreTypes.ApplicationEnvironmentInfoOrBuilder
 
StoreTypes.ApplicationEnvironmentInfoWrapper

Protobuf type org.apache.spark.status.protobuf.ApplicationEnvironmentInfoWrapper

StoreTypes.ApplicationEnvironmentInfoWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationEnvironmentInfoWrapper

StoreTypes.ApplicationEnvironmentInfoWrapperOrBuilder
 
StoreTypes.ApplicationInfo

Protobuf type org.apache.spark.status.protobuf.ApplicationInfo

StoreTypes.ApplicationInfo.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationInfo

StoreTypes.ApplicationInfoOrBuilder
 
StoreTypes.ApplicationInfoWrapper

Protobuf type org.apache.spark.status.protobuf.ApplicationInfoWrapper

StoreTypes.ApplicationInfoWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationInfoWrapper

StoreTypes.ApplicationInfoWrapperOrBuilder
 
StoreTypes.AppSummary

Protobuf type org.apache.spark.status.protobuf.AppSummary

StoreTypes.AppSummary.Builder

Protobuf type org.apache.spark.status.protobuf.AppSummary

StoreTypes.AppSummaryOrBuilder
 
StoreTypes.CachedQuantile

Protobuf type org.apache.spark.status.protobuf.CachedQuantile

StoreTypes.CachedQuantile.Builder

Protobuf type org.apache.spark.status.protobuf.CachedQuantile

StoreTypes.CachedQuantileOrBuilder
 
StoreTypes.DeterministicLevel

Protobuf enum org.apache.spark.status.protobuf.DeterministicLevel

StoreTypes.ExecutorMetrics

Protobuf type org.apache.spark.status.protobuf.ExecutorMetrics

StoreTypes.ExecutorMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorMetrics

StoreTypes.ExecutorMetricsDistributions

Protobuf type org.apache.spark.status.protobuf.ExecutorMetricsDistributions

StoreTypes.ExecutorMetricsDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorMetricsDistributions

StoreTypes.ExecutorMetricsDistributionsOrBuilder
 
StoreTypes.ExecutorMetricsOrBuilder
 
StoreTypes.ExecutorPeakMetricsDistributions

Protobuf type org.apache.spark.status.protobuf.ExecutorPeakMetricsDistributions

StoreTypes.ExecutorPeakMetricsDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorPeakMetricsDistributions

StoreTypes.ExecutorPeakMetricsDistributionsOrBuilder
 
StoreTypes.ExecutorResourceRequest

Protobuf type org.apache.spark.status.protobuf.ExecutorResourceRequest

StoreTypes.ExecutorResourceRequest.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorResourceRequest

StoreTypes.ExecutorResourceRequestOrBuilder
 
StoreTypes.ExecutorStageSummary

Protobuf type org.apache.spark.status.protobuf.ExecutorStageSummary

StoreTypes.ExecutorStageSummary.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorStageSummary

StoreTypes.ExecutorStageSummaryOrBuilder
 
StoreTypes.ExecutorStageSummaryWrapper

Protobuf type org.apache.spark.status.protobuf.ExecutorStageSummaryWrapper

StoreTypes.ExecutorStageSummaryWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorStageSummaryWrapper

StoreTypes.ExecutorStageSummaryWrapperOrBuilder
 
StoreTypes.ExecutorSummary

Protobuf type org.apache.spark.status.protobuf.ExecutorSummary

StoreTypes.ExecutorSummary.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorSummary

StoreTypes.ExecutorSummaryOrBuilder
 
StoreTypes.ExecutorSummaryWrapper

Protobuf type org.apache.spark.status.protobuf.ExecutorSummaryWrapper

StoreTypes.ExecutorSummaryWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorSummaryWrapper

StoreTypes.ExecutorSummaryWrapperOrBuilder
 
StoreTypes.InputMetricDistributions

Protobuf type org.apache.spark.status.protobuf.InputMetricDistributions

StoreTypes.InputMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.InputMetricDistributions

StoreTypes.InputMetricDistributionsOrBuilder
 
StoreTypes.InputMetrics

Protobuf type org.apache.spark.status.protobuf.InputMetrics

StoreTypes.InputMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.InputMetrics

StoreTypes.InputMetricsOrBuilder
 
StoreTypes.JobData

Protobuf type org.apache.spark.status.protobuf.JobData

StoreTypes.JobData.Builder

Protobuf type org.apache.spark.status.protobuf.JobData

StoreTypes.JobDataOrBuilder
 
StoreTypes.JobDataWrapper

Protobuf type org.apache.spark.status.protobuf.JobDataWrapper

StoreTypes.JobDataWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.JobDataWrapper

StoreTypes.JobDataWrapperOrBuilder
 
StoreTypes.JobExecutionStatus

Protobuf enum org.apache.spark.status.protobuf.JobExecutionStatus

StoreTypes.MemoryMetrics

Protobuf type org.apache.spark.status.protobuf.MemoryMetrics

StoreTypes.MemoryMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.MemoryMetrics

StoreTypes.MemoryMetricsOrBuilder
 
StoreTypes.OutputMetricDistributions

Protobuf type org.apache.spark.status.protobuf.OutputMetricDistributions

StoreTypes.OutputMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.OutputMetricDistributions

StoreTypes.OutputMetricDistributionsOrBuilder
 
StoreTypes.OutputMetrics

Protobuf type org.apache.spark.status.protobuf.OutputMetrics

StoreTypes.OutputMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.OutputMetrics

StoreTypes.OutputMetricsOrBuilder
 
StoreTypes.PairStrings

Protobuf type org.apache.spark.status.protobuf.PairStrings

StoreTypes.PairStrings.Builder

Protobuf type org.apache.spark.status.protobuf.PairStrings

StoreTypes.PairStringsOrBuilder
 
StoreTypes.PoolData

Protobuf type org.apache.spark.status.protobuf.PoolData

StoreTypes.PoolData.Builder

Protobuf type org.apache.spark.status.protobuf.PoolData

StoreTypes.PoolDataOrBuilder
 
StoreTypes.ProcessSummary

Protobuf type org.apache.spark.status.protobuf.ProcessSummary

StoreTypes.ProcessSummary.Builder

Protobuf type org.apache.spark.status.protobuf.ProcessSummary

StoreTypes.ProcessSummaryOrBuilder
 
StoreTypes.ProcessSummaryWrapper

Protobuf type org.apache.spark.status.protobuf.ProcessSummaryWrapper

StoreTypes.ProcessSummaryWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ProcessSummaryWrapper

StoreTypes.ProcessSummaryWrapperOrBuilder
 
StoreTypes.RDDDataDistribution

Protobuf type org.apache.spark.status.protobuf.RDDDataDistribution

StoreTypes.RDDDataDistribution.Builder

Protobuf type org.apache.spark.status.protobuf.RDDDataDistribution

StoreTypes.RDDDataDistributionOrBuilder
 
StoreTypes.RDDOperationClusterWrapper

Protobuf type org.apache.spark.status.protobuf.RDDOperationClusterWrapper

StoreTypes.RDDOperationClusterWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.RDDOperationClusterWrapper

StoreTypes.RDDOperationClusterWrapperOrBuilder
 
StoreTypes.RDDOperationEdge

Protobuf type org.apache.spark.status.protobuf.RDDOperationEdge

StoreTypes.RDDOperationEdge.Builder

Protobuf type org.apache.spark.status.protobuf.RDDOperationEdge

StoreTypes.RDDOperationEdgeOrBuilder
 
StoreTypes.RDDOperationGraphWrapper

Protobuf type org.apache.spark.status.protobuf.RDDOperationGraphWrapper

StoreTypes.RDDOperationGraphWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.RDDOperationGraphWrapper

StoreTypes.RDDOperationGraphWrapperOrBuilder
 
StoreTypes.RDDOperationNode

Protobuf type org.apache.spark.status.protobuf.RDDOperationNode

StoreTypes.RDDOperationNode.Builder

Protobuf type org.apache.spark.status.protobuf.RDDOperationNode

StoreTypes.RDDOperationNodeOrBuilder
 
StoreTypes.RDDPartitionInfo

Protobuf type org.apache.spark.status.protobuf.RDDPartitionInfo

StoreTypes.RDDPartitionInfo.Builder

Protobuf type org.apache.spark.status.protobuf.RDDPartitionInfo

StoreTypes.RDDPartitionInfoOrBuilder
 
StoreTypes.RDDStorageInfo

Protobuf type org.apache.spark.status.protobuf.RDDStorageInfo

StoreTypes.RDDStorageInfo.Builder

Protobuf type org.apache.spark.status.protobuf.RDDStorageInfo

StoreTypes.RDDStorageInfoOrBuilder
 
StoreTypes.RDDStorageInfoWrapper

Protobuf type org.apache.spark.status.protobuf.RDDStorageInfoWrapper

StoreTypes.RDDStorageInfoWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.RDDStorageInfoWrapper

StoreTypes.RDDStorageInfoWrapperOrBuilder
 
StoreTypes.ResourceInformation

Protobuf type org.apache.spark.status.protobuf.ResourceInformation

StoreTypes.ResourceInformation.Builder

Protobuf type org.apache.spark.status.protobuf.ResourceInformation

StoreTypes.ResourceInformationOrBuilder
 
StoreTypes.ResourceProfileInfo

Protobuf type org.apache.spark.status.protobuf.ResourceProfileInfo

StoreTypes.ResourceProfileInfo.Builder

Protobuf type org.apache.spark.status.protobuf.ResourceProfileInfo

StoreTypes.ResourceProfileInfoOrBuilder
 
StoreTypes.ResourceProfileWrapper

Protobuf type org.apache.spark.status.protobuf.ResourceProfileWrapper

StoreTypes.ResourceProfileWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ResourceProfileWrapper

StoreTypes.ResourceProfileWrapperOrBuilder
 
StoreTypes.RuntimeInfo

Protobuf type org.apache.spark.status.protobuf.RuntimeInfo

StoreTypes.RuntimeInfo.Builder

Protobuf type org.apache.spark.status.protobuf.RuntimeInfo

StoreTypes.RuntimeInfoOrBuilder
 
StoreTypes.ShufflePushReadMetricDistributions

Protobuf type org.apache.spark.status.protobuf.ShufflePushReadMetricDistributions

StoreTypes.ShufflePushReadMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ShufflePushReadMetricDistributions

StoreTypes.ShufflePushReadMetricDistributionsOrBuilder
 
StoreTypes.ShufflePushReadMetrics

Protobuf type org.apache.spark.status.protobuf.ShufflePushReadMetrics

StoreTypes.ShufflePushReadMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.ShufflePushReadMetrics

StoreTypes.ShufflePushReadMetricsOrBuilder
 
StoreTypes.ShuffleReadMetricDistributions

Protobuf type org.apache.spark.status.protobuf.ShuffleReadMetricDistributions

StoreTypes.ShuffleReadMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ShuffleReadMetricDistributions

StoreTypes.ShuffleReadMetricDistributionsOrBuilder
 
StoreTypes.ShuffleReadMetrics

Protobuf type org.apache.spark.status.protobuf.ShuffleReadMetrics

StoreTypes.ShuffleReadMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.ShuffleReadMetrics

StoreTypes.ShuffleReadMetricsOrBuilder
 
StoreTypes.ShuffleWriteMetricDistributions

Protobuf type org.apache.spark.status.protobuf.ShuffleWriteMetricDistributions

StoreTypes.ShuffleWriteMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ShuffleWriteMetricDistributions

StoreTypes.ShuffleWriteMetricDistributionsOrBuilder
 
StoreTypes.ShuffleWriteMetrics

Protobuf type org.apache.spark.status.protobuf.ShuffleWriteMetrics

StoreTypes.ShuffleWriteMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.ShuffleWriteMetrics

StoreTypes.ShuffleWriteMetricsOrBuilder
 
StoreTypes.SinkProgress

Protobuf type org.apache.spark.status.protobuf.SinkProgress

StoreTypes.SinkProgress.Builder

Protobuf type org.apache.spark.status.protobuf.SinkProgress

StoreTypes.SinkProgressOrBuilder
 
StoreTypes.SourceProgress

Protobuf type org.apache.spark.status.protobuf.SourceProgress

StoreTypes.SourceProgress.Builder

Protobuf type org.apache.spark.status.protobuf.SourceProgress

StoreTypes.SourceProgressOrBuilder
 
StoreTypes.SparkPlanGraphClusterWrapper

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphClusterWrapper

StoreTypes.SparkPlanGraphClusterWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphClusterWrapper

StoreTypes.SparkPlanGraphClusterWrapperOrBuilder
 
StoreTypes.SparkPlanGraphEdge

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphEdge

StoreTypes.SparkPlanGraphEdge.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphEdge

StoreTypes.SparkPlanGraphEdgeOrBuilder
 
StoreTypes.SparkPlanGraphNode

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphNode

StoreTypes.SparkPlanGraphNode.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphNode

StoreTypes.SparkPlanGraphNodeOrBuilder
 
StoreTypes.SparkPlanGraphNodeWrapper

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphNodeWrapper

StoreTypes.SparkPlanGraphNodeWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphNodeWrapper

StoreTypes.SparkPlanGraphNodeWrapper.WrapperCase
 
StoreTypes.SparkPlanGraphNodeWrapperOrBuilder
 
StoreTypes.SparkPlanGraphWrapper

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphWrapper

StoreTypes.SparkPlanGraphWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphWrapper

StoreTypes.SparkPlanGraphWrapperOrBuilder
 
StoreTypes.SpeculationStageSummary

Protobuf type org.apache.spark.status.protobuf.SpeculationStageSummary

StoreTypes.SpeculationStageSummary.Builder

Protobuf type org.apache.spark.status.protobuf.SpeculationStageSummary

StoreTypes.SpeculationStageSummaryOrBuilder
 
StoreTypes.SpeculationStageSummaryWrapper

Protobuf type org.apache.spark.status.protobuf.SpeculationStageSummaryWrapper

StoreTypes.SpeculationStageSummaryWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.SpeculationStageSummaryWrapper

StoreTypes.SpeculationStageSummaryWrapperOrBuilder
 
StoreTypes.SQLExecutionUIData

Protobuf type org.apache.spark.status.protobuf.SQLExecutionUIData

StoreTypes.SQLExecutionUIData.Builder

Protobuf type org.apache.spark.status.protobuf.SQLExecutionUIData

StoreTypes.SQLExecutionUIDataOrBuilder
 
StoreTypes.SQLPlanMetric

Protobuf type org.apache.spark.status.protobuf.SQLPlanMetric

StoreTypes.SQLPlanMetric.Builder

Protobuf type org.apache.spark.status.protobuf.SQLPlanMetric

StoreTypes.SQLPlanMetricOrBuilder
 
StoreTypes.StageData

Protobuf type org.apache.spark.status.protobuf.StageData

StoreTypes.StageData.Builder

Protobuf type org.apache.spark.status.protobuf.StageData

StoreTypes.StageDataOrBuilder
 
StoreTypes.StageDataWrapper

Protobuf type org.apache.spark.status.protobuf.StageDataWrapper

StoreTypes.StageDataWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.StageDataWrapper

StoreTypes.StageDataWrapperOrBuilder
 
StoreTypes.StageStatus

Protobuf enum org.apache.spark.status.protobuf.StageStatus

StoreTypes.StateOperatorProgress

Protobuf type org.apache.spark.status.protobuf.StateOperatorProgress

StoreTypes.StateOperatorProgress.Builder

Protobuf type org.apache.spark.status.protobuf.StateOperatorProgress

StoreTypes.StateOperatorProgressOrBuilder
 
StoreTypes.StreamBlockData

Protobuf type org.apache.spark.status.protobuf.StreamBlockData

StoreTypes.StreamBlockData.Builder

Protobuf type org.apache.spark.status.protobuf.StreamBlockData

StoreTypes.StreamBlockDataOrBuilder
 
StoreTypes.StreamingQueryData

Protobuf type org.apache.spark.status.protobuf.StreamingQueryData

StoreTypes.StreamingQueryData.Builder

Protobuf type org.apache.spark.status.protobuf.StreamingQueryData

StoreTypes.StreamingQueryDataOrBuilder
 
StoreTypes.StreamingQueryProgress

Protobuf type org.apache.spark.status.protobuf.StreamingQueryProgress

StoreTypes.StreamingQueryProgress.Builder

Protobuf type org.apache.spark.status.protobuf.StreamingQueryProgress

StoreTypes.StreamingQueryProgressOrBuilder
 
StoreTypes.StreamingQueryProgressWrapper

Protobuf type org.apache.spark.status.protobuf.StreamingQueryProgressWrapper

StoreTypes.StreamingQueryProgressWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.StreamingQueryProgressWrapper

StoreTypes.StreamingQueryProgressWrapperOrBuilder
 
StoreTypes.TaskData

Protobuf type org.apache.spark.status.protobuf.TaskData

StoreTypes.TaskData.Builder

Protobuf type org.apache.spark.status.protobuf.TaskData

StoreTypes.TaskDataOrBuilder
 
StoreTypes.TaskDataWrapper

Protobuf type org.apache.spark.status.protobuf.TaskDataWrapper

StoreTypes.TaskDataWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.TaskDataWrapper

StoreTypes.TaskDataWrapperOrBuilder
 
StoreTypes.TaskMetricDistributions

Protobuf type org.apache.spark.status.protobuf.TaskMetricDistributions

StoreTypes.TaskMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.TaskMetricDistributions

StoreTypes.TaskMetricDistributionsOrBuilder
 
StoreTypes.TaskMetrics

Protobuf type org.apache.spark.status.protobuf.TaskMetrics

StoreTypes.TaskMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.TaskMetrics

StoreTypes.TaskMetricsOrBuilder
 
StoreTypes.TaskResourceRequest

Protobuf type org.apache.spark.status.protobuf.TaskResourceRequest

StoreTypes.TaskResourceRequest.Builder

Protobuf type org.apache.spark.status.protobuf.TaskResourceRequest

StoreTypes.TaskResourceRequestOrBuilder
 
Strategy

Stores all the configuration options for tree construction
 param:  algo  Learning goal.

StratifiedSamplingUtils

Auxiliary functions and data structures for the sampleByKey method in PairRDDFunctions.

StreamBlockId
 
StreamingConf
 
StreamingContext
Deprecated.
This is deprecated as of Spark 3.4.0.

StreamingContextPythonHelper
 
StreamingContextState

:: DeveloperApi ::

 Represents the state of a StreamingContext.

StreamingDataWriterFactory

A factory of DataWriter returned by
 StreamingWrite.createStreamingWriterFactory(PhysicalWriteInfo), which is responsible for
 creating and initializing the actual data writer at executor side.

StreamingKMeans

StreamingKMeans provides methods for configuring a
 streaming k-means analysis, training the model on streaming,
 and using the model to make predictions on streaming data.

StreamingKMeansModel

StreamingKMeansModel extends MLlib's KMeansModel for streaming
 algorithms, so it can keep track of a continuously updated weight
 associated with each cluster, and also update the model by
 doing a single iteration of the standard k-means algorithm.

StreamingLinearAlgorithm<M extends GeneralizedLinearModel,A extends GeneralizedLinearAlgorithm<M>>

StreamingLinearAlgorithm implements methods for continuously
 training a generalized linear model on streaming data,
 and using it for prediction on (possibly different) streaming data.

StreamingLinearRegressionWithSGD

Train or predict a linear regression model on streaming data.

StreamingListener

:: DeveloperApi ::
 A listener interface for receiving information about an ongoing streaming
 computation.

StreamingListenerBatchCompleted
 
StreamingListenerBatchStarted
 
StreamingListenerBatchSubmitted
 
StreamingListenerEvent

:: DeveloperApi ::
 Base trait for events related to StreamingListener

StreamingListenerOutputOperationCompleted
 
StreamingListenerOutputOperationStarted
 
StreamingListenerReceiverError
 
StreamingListenerReceiverStarted
 
StreamingListenerReceiverStopped
 
StreamingListenerStreamingStarted
 
StreamingLogisticRegressionWithSGD

Train or predict a logistic regression model on streaming data.

StreamingQuery

A handle to a query that is executing continuously in the background as new data arrives.

StreamingQueryException

Exception that stopped a StreamingQuery.

StreamingQueryListener

Interface for listening to events related to StreamingQueries.

StreamingQueryListener.Event

Base type of StreamingQueryListener events

StreamingQueryListener.QueryIdleEvent

Event representing that query is idle and waiting for new data to process.

StreamingQueryListener.QueryProgressEvent

Event representing any progress updates in a query.

StreamingQueryListener.QueryStartedEvent

Event representing the start of a query
 param:  id A unique query id that persists across restarts.

StreamingQueryListener.QueryTerminatedEvent

Event representing that termination of a query.

StreamingQueryManager

A class to manage all the StreamingQuery active in a SparkSession.

StreamingQueryProgress

Information about progress made in the execution of a StreamingQuery during
 a trigger.

StreamingQueryProgressSerializer
 
StreamingQueryStatus

Reports information about the instantaneous status of a streaming query.

StreamingStatistics
 
StreamingTest

Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs.

StreamingTestMethod

Significance testing methods for StreamingTest.

StreamingWrite

An interface that defines how to write the data to data source in streaming queries.

StreamInputInfo

:: DeveloperApi ::
 Track the information of input stream at specified batch time.

StreamSinkProvider

::Experimental::
 Implemented by objects that can produce a streaming Sink for a specific format or system.

StreamSourceProvider

::Experimental::
 Implemented by objects that can produce a streaming Source for a specific format or system.

StringArrayParam

Specialized version of Param[Array[String} for Java.

StringContains

A filter that evaluates to true iff the attribute evaluates to
 a string that contains the string value.

StringEndsWith

A filter that evaluates to true iff the attribute evaluates to
 a string that ends with value.

StringIndexer

A label indexer that maps string column(s) of labels to ML column(s) of label indices.

StringIndexerAggregator

A SQL Aggregator used by StringIndexer to count labels in string columns during fitting.

StringIndexerBase

Base trait for StringIndexer and StringIndexerModel.

StringIndexerModel

Model fitted by StringIndexer.

StringRRDD<T>

An RDD that stores R objects as Array[String].

StringStartsWith

A filter that evaluates to true iff the attribute evaluates to
 a string that starts with value.

StringType

The data type representing String values.

StringTypeExpression
 
StronglyConnectedComponents

Strongly connected components algorithm implementation.

StructField

A field inside a StructType.

StructType

A StructType object can be constructed by

StudentTTest

Performs Students's 2-sample t-test.

Success

:: DeveloperApi ::
 Task succeeded.

Sum

An aggregate function that returns the summation of all the values in a group.

Summarizer

Tools for vectorized statistics on MLlib Vectors.

SummaryBuilder

A builder object that provides summary statistics about a given column.

SupportsAdmissionControl

A mix-in interface for SparkDataStream streaming sources to signal that they can control
 the rate of data ingested into the system.

SupportsAtomicPartitionManagement

An atomic partition interface of Table to operate multiple partitions atomically.

SupportsCatalogOptions

An interface, which TableProviders can implement, to support table existence checks and creation
 through a catalog, without having to use table identifiers.

SupportsDelete

A mix-in interface for Table delete support.

SupportsDeleteV2

A mix-in interface for Table delete support.

SupportsDelta

A mix-in interface for RowLevelOperation.

SupportsDynamicOverwrite

Write builder trait for tables that support dynamic partition overwrite.

SupportsIndex

Table methods for working with index

SupportsMetadataColumns

An interface for exposing data columns for a table that are not in the table schema.

SupportsNamespaces

Catalog methods for working with namespaces.

SupportsOverwrite

Write builder trait for tables that support overwrite by filter.

SupportsOverwriteV2

Write builder trait for tables that support overwrite by filter.

SupportsPartitionManagement

A partition interface of Table.

SupportsPushDownAggregates

A mix-in interface for ScanBuilder.

SupportsPushDownFilters

A mix-in interface for ScanBuilder.

SupportsPushDownLimit

A mix-in interface for ScanBuilder.

SupportsPushDownOffset

A mix-in interface for ScanBuilder.

SupportsPushDownRequiredColumns

A mix-in interface for ScanBuilder.

SupportsPushDownTableSample

A mix-in interface for Scan.

SupportsPushDownTopN

A mix-in interface for ScanBuilder.

SupportsPushDownV2Filters

A mix-in interface for ScanBuilder.

SupportsRead

A mix-in interface of Table, to indicate that it's readable.

SupportsReportOrdering

A mix in interface for Scan.

SupportsReportPartitioning

A mix in interface for Scan.

SupportsReportStatistics

A mix in interface for Scan.

SupportsRowLevelOperations

A mix-in interface for Table row-level operations support.

SupportsRuntimeFiltering

A mix-in interface for Scan.

SupportsRuntimeV2Filtering

A mix-in interface for Scan.

SupportsTriggerAvailableNow

An interface for streaming sources that supports running in Trigger.AvailableNow mode, which
 will process all the available data at the beginning of the query in (possibly) multiple batches.

SupportsTruncate

Write builder trait for tables that support truncation.

SupportsWrite

A mix-in interface of Table, to indicate that it's writable.

SVDPlusPlus

Implementation of SVD++ algorithm.

SVDPlusPlus.Conf

Configuration parameters for SVDPlusPlus.

SVMDataGenerator

Generate sample data used for SVM.

SVMModel

Model for Support Vector Machines (SVMs).

SVMWithSGD

Train a Support Vector Machine (SVM) using Stochastic Gradient Descent.

Table

A table in Spark, as returned by the listTables method in Catalog.

Table

An interface representing a logical structured data set of a data source.

TableCapability

Capabilities that can be provided by a Table implementation.

TableCatalog

Catalog methods for working with Tables.

TableCatalogCapability

Capabilities that can be provided by a TableCatalog implementation.

TableChange

TableChange subclasses represent requested changes to a table.

TableChange.AddColumn

A TableChange to add a field.

TableChange.After

Column position AFTER means the specified column should be put after the given `column`.

TableChange.ColumnChange
 
TableChange.ColumnPosition
 
TableChange.DeleteColumn

A TableChange to delete a field.

TableChange.First

Column position FIRST means the specified column should be the first column.

TableChange.RemoveProperty

A TableChange to remove a table property.

TableChange.RenameColumn

A TableChange to rename a field.

TableChange.SetProperty

A TableChange to set a table property.

TableChange.UpdateColumnComment

A TableChange to update the comment of a field.

TableChange.UpdateColumnDefaultValue

A TableChange to update the default value of a field.

TableChange.UpdateColumnNullability

A TableChange to update the nullability of a field.

TableChange.UpdateColumnPosition

A TableChange to update the position of a field.

TableChange.UpdateColumnType

A TableChange to update the type of a field.

TableIndex

Index in a table

TableProvider

The base interface for v2 data sources which don't have a real catalog.

TableScan

A BaseRelation that can produce all of its tuples as an RDD of Row objects.

TaskCommitDenied

:: DeveloperApi ::
 Task requested the driver to commit, but was denied.

TaskCompletionListener

:: DeveloperApi ::

TaskContext

Contextual information about a task which can be read or mutated during
 execution.

TaskData
 
TaskDetailsClassNames

Names of the CSS classes corresponding to each type of task detail.

TaskEndReason

:: DeveloperApi ::
 Various possible reasons why a task ended.

TaskFailedReason

:: DeveloperApi ::
 Various possible reasons why a task failed.

TaskFailureListener

:: DeveloperApi ::

TaskIndexNames

Tasks have a lot of indices that are used in a few different places.

TaskInfo

:: DeveloperApi ::
 Information about a running task attempt inside a TaskSet.

TaskKilled

:: DeveloperApi ::
 Task was killed intentionally and needs to be rescheduled.

TaskKilledException

:: DeveloperApi ::
 Exception thrown when a task is explicitly killed (i.e., task failure is expected).

TaskLocality
 
TaskLocation

A location where a task should run.

TaskMetricDistributions
 
TaskMetrics
 
TaskResourceRequest

A task resource request.

TaskResourceRequests

A set of task resource requests.

TaskResult<T>
 
TaskResultBlockId
 
TaskResultLost

:: DeveloperApi ::
 The task finished successfully, but the result was lost from the executor's block manager before
 it was fetched.

TaskScheduler

Low-level task scheduler interface, currently implemented exclusively by
 TaskSchedulerImpl.

TaskSchedulerIsSet

An event that SparkContext uses to notify HeartbeatReceiver that SparkContext.taskScheduler is
 created.

TaskSorting
 
TaskState
 
TaskStatus
 
TeradataDialect
 
Term

R formula terms.

TestGroupState<S>

:: Experimental ::

TestResult<DF>

Trait for hypothesis test results.

TestUtils

Utilities for tests.

ThreadStackTrace
 
ThreadUtils
 
Time

This is a simple class that represents an absolute instant of time.

TimestampNTZType

The timestamp without time zone type represents a local time in microsecond precision,
 which is independent of time zone.

TimestampType

The timestamp type represents a time instant in microsecond precision.

TimestampTypeExpression
 
TimeTrackingOutputStream

Intercepts write calls and tracks total time spent writing in order to update shuffle write
 metrics.

Tokenizer

A tokenizer that converts the input string to lowercase and then splits it by white spaces.

ToolTips
 
ToolTips
 
Topology

Trait for the artificial neural network (ANN) topology properties

TopologyMapper

::DeveloperApi::
 TopologyMapper provides topology information for a given host
 param:  conf SparkConf to get required properties, if needed

TopologyModel

Trait for ANN topology model

TrainingSummary

Abstraction for training results.

TrainValidationSplit

Validation for hyper-parameter tuning.

TrainValidationSplitModel

Model from train validation split.

TrainValidationSplitModel.TrainValidationSplitModelWriter

Writer for TrainValidationSplitModel.

TrainValidationSplitParams

Params for TrainValidationSplit and TrainValidationSplitModel.

Transform

Represents a transform function in the public logical expression API.

TransformEnd

Event fired after Transformer.transform.

Transformer

Abstract class for transformers that transform one dataset into another.

TransformStart

Event fired before Transformer.transform.

TreeClassifierParams

Parameters for Decision Tree-based classification algorithms.

TreeEnsembleClassifierParams

Parameters for Decision Tree-based ensemble classification algorithms.

TreeEnsembleModel<M extends DecisionTreeModel>

Abstraction for models which are ensembles of decision trees

TreeEnsembleParams

Parameters for Decision Tree-based ensemble algorithms.

TreeEnsembleRegressorParams

Parameters for Decision Tree-based ensemble regression algorithms.

TreeRegressorParams

Parameters for Decision Tree-based regression algorithms.

TriangleCount

Compute the number of triangles passing through each vertex.

Trigger

Policy used to indicate how often results should be produced by a [[StreamingQuery]].

TripletFields

Represents a subset of the fields of an [[EdgeTriplet]] or [[EdgeContext]].

TruncatableTable

Represents a table which can be atomically truncated.

typed
Deprecated.
As of release 3.0.0, please use the untyped builtin aggregate functions.

typed
Deprecated.
please use untyped builtin aggregate functions.

TypedColumn<T,U>

A Column where an Encoder has been given for the expected input and return type.

UDF0<R>

A Spark SQL UDF that has 0 arguments.

UDF1<T1,R>

A Spark SQL UDF that has 1 arguments.

UDF10<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,R>

A Spark SQL UDF that has 10 arguments.

UDF11<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,R>

A Spark SQL UDF that has 11 arguments.

UDF12<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,R>

A Spark SQL UDF that has 12 arguments.

UDF13<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,R>

A Spark SQL UDF that has 13 arguments.

UDF14<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,R>

A Spark SQL UDF that has 14 arguments.

UDF15<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,R>

A Spark SQL UDF that has 15 arguments.

UDF16<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,R>

A Spark SQL UDF that has 16 arguments.

UDF17<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,R>

A Spark SQL UDF that has 17 arguments.

UDF18<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,R>

A Spark SQL UDF that has 18 arguments.

UDF19<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,R>

A Spark SQL UDF that has 19 arguments.

UDF2<T1,T2,R>

A Spark SQL UDF that has 2 arguments.

UDF20<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,R>

A Spark SQL UDF that has 20 arguments.

UDF21<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,R>

A Spark SQL UDF that has 21 arguments.

UDF22<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,T22,R>

A Spark SQL UDF that has 22 arguments.

UDF3<T1,T2,T3,R>

A Spark SQL UDF that has 3 arguments.

UDF4<T1,T2,T3,T4,R>

A Spark SQL UDF that has 4 arguments.

UDF5<T1,T2,T3,T4,T5,R>

A Spark SQL UDF that has 5 arguments.

UDF6<T1,T2,T3,T4,T5,T6,R>

A Spark SQL UDF that has 6 arguments.

UDF7<T1,T2,T3,T4,T5,T6,T7,R>

A Spark SQL UDF that has 7 arguments.

UDF8<T1,T2,T3,T4,T5,T6,T7,T8,R>

A Spark SQL UDF that has 8 arguments.

UDF9<T1,T2,T3,T4,T5,T6,T7,T8,T9,R>

A Spark SQL UDF that has 9 arguments.

UDFRegistration

Functions for registering user-defined functions.

UDTFRegistration

Functions for registering user-defined table functions.

UDTRegistration

This object keeps the mappings between user classes and their User Defined Types (UDTs).

UIRoot

This trait is shared by the all the root containers for application UI information --
 the HistoryServer and the application UI.

UIRootFromServletContext
 
UIUtils
 
UIUtils
 
UIUtils

Utility functions for generating XML pages with spark content.

UIWorkloadGenerator

Continuously generates jobs that expose various features of the WebUI (internal testing tool).

UnaryTransformer<IN,OUT,T extends UnaryTransformer<IN,OUT,T>>

Abstract class for transformers that take one input column, apply transformation, and output the
 result as a new column.

UnboundFunction

Represents a user-defined function that is not bound to input types.

UniformGenerator

Generates i.i.d. samples from U[0.0, 1.0]

UnionRDD<T>
 
UnivariateFeatureSelector

Feature selector based on univariate statistical tests against labels.

UnivariateFeatureSelectorModel

Model fitted by UnivariateFeatureSelectorModel.

UnivariateFeatureSelectorParams

Params for UnivariateFeatureSelector and UnivariateFeatureSelectorModel.

UnknownPartitioning

Represents a partitioning where rows are split across partitions in an unknown pattern.

UnknownReason

:: DeveloperApi ::
 We don't know why the task ended -- for example, because of a ClassNotFound exception when
 deserializing the task result.

UnrecognizedBlockId
 
UnresolvedAttribute

An unresolved attribute.

UnspecifiedDistribution

A distribution where no promises are made about co-location of data.

UnspecifiedDistributionImpl
 
UpCastRule

Rule that defines which upcasts are allow in Spark.

Updater

Class used to perform steps (weight update) using Gradient Descent methods.

UserDefinedAggregateFunc

The general representation of user defined aggregate function, which implements
 AggregateFunc, contains the upper-cased function name, the canonical function name,
 the `isDistinct` flag and all the inputs.

UserDefinedAggregateFunction
Deprecated.
UserDefinedAggregateFunction is deprecated.

UserDefinedFunction

A user-defined function.

UserDefinedScalarFunc

The general representation of user defined scalar function, which contains the upper-cased
 function name, canonical function name and all the children expressions.

UserDefinedType<UserType>

The data type for User Defined Types (UDTs).

Utils
 
Utils
 
Utils

Various utility methods used by Spark.

V1Scan

A trait that should be implemented by V1 DataSources that would like to leverage the DataSource
 V2 read code paths.

V1Write

A logical write that should be executed using V1 InsertableRelation interface.

V2ExpressionSQLBuilder

The builder to generate SQL from V2 expressions.

V2TableWithV1Fallback

A V2 table with V1 fallback support.

ValidatorParams

Common params for TrainValidationSplitParams and CrossValidatorParams.

ValuesHolder<T>
 
VarcharType
 
Variance

Class for calculating variance during regression

VarianceThresholdSelector

Feature selector that removes all low-variance features.

VarianceThresholdSelectorModel

Model fitted by VarianceThresholdSelector.

VarianceThresholdSelectorParams

Params for VarianceThresholdSelector and VarianceThresholdSelectorModel.

Vector

Represents a numeric vector, whose index type is Int and value type is Double.

Vector

Represents a numeric vector, whose index type is Int and value type is Double.

VectorAssembler

A feature transformer that merges multiple columns into a vector column.

VectorAttributeRewriter

Utility transformer that rewrites Vector attribute names via prefix replacement.

VectorImplicits

Implicit methods available in Scala for converting Vector to
 Vector and vice versa.

VectorIndexer

Class for indexing categorical feature columns in a dataset of Vector.

VectorIndexerModel

Model fitted by VectorIndexer.

VectorIndexerParams

Private trait for params for VectorIndexer and VectorIndexerModel

Vectors

Factory methods for Vector.

Vectors

Factory methods for Vector.

VectorSizeHint

A feature transformer that adds size information to the metadata of a vector column.

VectorSlicer

This class takes a feature vector and outputs a new feature vector with a subarray of the
 original features.

VectorTransformer

Trait for transformation of a vector

VectorUDT

:: AlphaComponent ::

VersionInfo
 
VersionUtils

Utilities for working with Spark version strings

VertexPartitionBaseOpsConstructor<T extends org.apache.spark.graphx.impl.VertexPartitionBase<Object>>

A typeclass for subclasses of VertexPartitionBase representing the ability to wrap them in a
 VertexPartitionBaseOps.

VertexRDD<VD>

Extends RDD[(VertexId, VD)] by ensuring that there is only one entry for each vertex and by
 pre-indexing the entries for fast, efficient joins.

VertexRDDImpl<VD>
 
View

An interface representing a persisted view.

ViewCatalog

Catalog methods for working with views.

ViewChange

ViewChange subclasses represent requested changes to a view.

ViewChange.RemoveProperty
 
ViewChange.SetProperty
 
VocabWord

Entry in vocabulary

VoidFunction<T>

A function with no return value.

VoidFunction2<T1,T2>

A two-argument function that takes arguments of type T1 and T2 with no return value.

WeibullGenerator

Generates i.i.d. samples from the Weibull distribution with the
 given shape and scale parameter.

WelchTTest

Performs Welch's 2-sample t-test.

Window

Utility functions for defining window in DataFrames.

WindowSpec

A window specification that defines the partitioning, ordering, and frame boundaries.

Word2Vec

Word2Vec trains a model of Map(String, Vector), i.e. transforms a word into a code for further
 natural language processing or machine learning process.

Word2Vec

Word2Vec creates vector representation of words in a text corpus.

Word2VecBase

Params for Word2Vec and Word2VecModel.

Word2VecModel

Model fitted by Word2Vec.

Word2VecModel

Word2Vec model
 param:  wordIndex maps each word to an index, which can retrieve the corresponding
                  vector from wordVectors
 param:  wordVectors array of length numWords * vectorSize, vector corresponding
                    to the word mapped with index i can be retrieved by the slice
                    (i * vectorSize, i * vectorSize + vectorSize)

Word2VecModel.Data$
 
Word2VecModel.Word2VecModelWriter$
 
WritableByteChannelWrapper

:: Private ::
 A thin wrapper around a WritableByteChannel.

Write

A logical representation of a data source write.

WriteAheadLog

:: DeveloperApi ::

 This abstract class represents a write ahead log (aka journal) that is used by Spark Streaming
 to save the received data (by receivers) and associated metadata to a reliable storage, so that
 they can be recovered after driver failures.

WriteAheadLogRecordHandle

:: DeveloperApi ::

 This abstract class represents a handle that refers to a record written in a
 WriteAheadLog.

WriteAheadLogUtils

A helper class with utility functions related to the WriteAheadLog interface

WriteBuilder

An interface for building the Write.

WriteConfigMethods<R>

Configuration methods common to create/replace operations and insert/overwrite operations.

WriterCommitMessage

A commit message returned by DataWriter.commit() and will be sent back to the driver side
 as the input parameter of BatchWrite.commit(WriterCommitMessage[]) or
 StreamingWrite.commit(long, WriterCommitMessage[]).

XssSafeRequest
 
YearMonthIntervalType

The type represents year-month intervals of the SQL standard.

ZStdCompressionCodec

:: DeveloperApi ::
 ZStandard implementation of CompressionCodec.