Changelog
The three packages version and ship together. Per-package changelogs live alongside the source:
GitHub release · Latest
- Catalog parity with PySpark: the full
spark.catalogsurface (currentCatalog/setCurrentCatalog,listCatalogs/listDatabases/listTables/listColumns/listFunctions,databaseExists/tableExists/functionExists,getDatabase/getTable/getFunction,dropTempView/dropGlobalTempView,cacheTable/uncacheTable/clearCache/isCached,refreshTable/refreshByPath,recoverPartitions,createTable/createExternalTable) spark.udf.registerJavaFunction(name, className, returnType?)andspark.udf.registerJavaUDAF(name, className)for binding Java UDFs and UDAFs already on the server’s classpath to a SQL function nameSparkSession.version()returns the server’s Spark versionSparkSession.builder().sessionId(uuid)to reuse a server-side session by IDRuntimeConfigonspark.confwithget,set,unset,getAll,isModifiable- Session tags and interrupts:
addTag,removeTag,getTags,clearTags,interruptAll,interruptTag,interruptOperation Transportinterface gains optionalconfigandinterruptmethods;ExecuteOptionsplumbs per-call tagsSparkConnectErrorexposeserrorClass,sqlState,messageParameters,errorTypeHierarchy, andserverStackTrace- Fix
count("*")to sendcount(1)on the wire instead ofcount(<unresolved-*>), matching PySpark and Scala behavior
- Full
sc://connection-string grammar parsed: TLS viause_ssl=true, bearertoken,user_id,user_agent,session_id(UUID),grpc_max_message_size, plus arbitrarykey=valuepairs that pass through as gRPC metadata on every RPC - Bearer token attached as
authorization: Bearer <token>viacombineChannelCredentials(createSsl(), createFromMetadataGenerator(...)) - Canonical
user_agentsuffix:<your prefix> spark-connect-js/<ver> (node <ver>; <platform>). - Per-request operation IDs (UUIDv4) on every
ExecutePlanrequest ReattachExecuteiterator resumes server-streaming responses after transient gRPC drops (UNAVAILABLE,INTERNALwithINVALID_CURSOR.DISCONNECTED) without re-executing the plan- Configurable retry policy via
GrpcTransportOptions.retryPolicy; default mirrors PySpark (maxRetries=15,initialBackoffMs=50,maxBackoffMs=60_000,backoffMultiplier=4,jitterMs=500) - Error trailers: decode
grpc-status-details-bin(google.rpc.Status+ErrorInfo) to populateerrorClass,sqlState,messageParametersonSparkConnectError, with fallback to aFetchErrorDetailsRPC forerrorTypeHierarchyandserverStackTracewhen the inline trailer is incomplete client_observed_server_side_session_idcaptured from every response and echoed back on subsequent RPCs for stale-session detection; cleared onReleaseSessionConfigandInterruptRPCs wired (consumed byspark.confandinterrupt*on core)
- Vendored
google.rpc.Statusandgoogle.rpc.ErrorInfoproto definitions, plus regenerated bindings forFetchErrorDetailsRequest/Response, consumed by@spark-connect-js/nodefor error-trailer decoding
- DataFrameReader shortcuts:
csv(),json(),parquet(),orc(),text(),schema() - DataFrameWriter shortcuts:
csv(),json(),parquet(),orc(),text(),bucketBy(),insertInto() - DataFrameWriterV2 with full
writeTo()API:create,replace,createOrReplace,append,overwrite,overwritePartitions - Typed client error hierarchy:
SparkClientError,InvalidConfigError,InvalidInputError,UnsupportedOperationError isDistinctpropagation on aggregate functions- Cross join validation rejects join conditions
- Proto serialization for
WriteOperationV2command SparkProcessManagerthrow sites reclassified toSparkClientError- Re-exported typed client errors from core
- Re-exported proto schemas:
WriteOperationV2Schema,WriteOperationV2_ModeSchema
DataFrame.cube(),.rollup()for multi-dimensional aggregationDataFrame.unpivot()/.melt()for wide-to-long reshapingDataFrame.summary()for descriptive statisticsDataFrame.replace()for value substitution viaNAReplaceDataFrame.randomSplit()for splitting into multiple DataFramesDataFrame.createTempView(),.createGlobalTempView(),.createOrReplaceGlobalTempView()DataFrame.sameSemantics()and.semanticHash()for plan comparisonDataFrameStatclass (.stataccessor) withcorr(),cov(),crosstab(),freqItems(),approxQuantile()GroupedData.pivot()support with cube/rollup/pivot group types
- Proto serialization for
StatSummary,NAReplace,Unpivot,StatCorr,StatCov,StatCrosstab,StatFreqItems,StatApproxQuantile, andAggregate_Pivot - Added analyze-plan request/response handling for
sameSemanticsandsemanticHash - Re-exported
DataFrameStatfrom package index
- Re-exported proto schemas:
StatSummarySchema,NAReplaceSchema,NAReplace_ReplacementSchema,StatCorrSchema,StatCovSchema,StatCrosstabSchema,StatFreqItemsSchema,StatApproxQuantileSchema,UnpivotSchema,Unpivot_ValuesSchema,Aggregate_PivotSchema - Re-exported analyze-plan schemas for
SameSemanticsandSemanticHash
- SparkSession: connect via
sc://URL, execute SQL, read tables, create DataFrames from local data - DataFrame: 30+ transformations (select, filter, join, groupBy, sort, union, intersect, sample, fillna, dropna, and more), actions (collect, show, count, head, tail, toLocalIterator), properties (schema, columns, dtypes, isEmpty, printSchema, explain)
- Column: comparisons, arithmetic, logical ops, cast, alias, null checks, pattern matching, bitwise ops, window support
- GroupedData: agg, count, sum, avg, mean, min, max
- Window: partitionBy, orderBy, rowsBetween, rangeBetween
- DataFrameReader: format, option, options, load, table
- DataFrameWriter: format, mode, option, options, partitionBy, sortBy, save, saveAsTable
- Catalog: currentDatabase, setCurrentDatabase, listDatabases, listTables, listColumns, databaseExists, tableExists
- 248 built-in functions across 12 categories: aggregate, math, string, date/timestamp, window, collection, conditional, hash, JSON, CSV, bitwise, sort
- PlanBuilder: constructs Spark Connect logical plan protobuf messages from the DataFrame API
- Zero runtime dependencies
- GrpcTransport: connects to Spark Connect over gRPC, streams ExecutePlan responses, handles metadata and session management
- ArrowDecoder: deserializes Arrow IPC batches into JavaScript row objects
- SparkProcessManager: launches and manages local
spark-connectserver processes for development - buildRelation / buildExpression: serializes logical plan nodes and expressions to protobuf wire format
- Re-exports the entire
@spark-connect-js/corepublic API (SparkSession, DataFrame, Column, functions, etc.) for single-package convenience
- Protobuf types: Plan, Relation, Expression, DataType, and all nested message types
- Service stubs: ExecutePlanRequest/Response, AnalyzePlanRequest/Response, ConfigRequest/Response, AddArtifactsRequest/Response, ArtifactStatusesRequest/Response
- Schema objects: StructType, StructField, MapType, ArrayType, and all Spark data type descriptors
- Single runtime dependency:
@bufbuild/protobuf