Changelog

The three packages version and ship together. Per-package changelogs live alongside the source:

0.5.1

4 July 2026 · GitHub release · Latest

References: #101 · 0a8e2de

@spark-connect-js/core

Drop the hardcoded version from the README development-status note. 0.5.0 published with the note still reading v0.4.0, since npm snapshots the README at publish time, and a version-free note cannot go stale.

@spark-connect-js/node

Drop the hardcoded version from the README development-status note. 0.5.0 published with the note still reading v0.4.0, since npm snapshots the README at publish time, and a version-free note cannot go stale.

@spark-connect-js/connect

Drop the hardcoded version from the README development-status note. 0.5.0 published with the note still reading v0.4.0, since npm snapshots the README at publish time, and a version-free note cannot go stale.

0.5.0

4 July 2026 · GitHub release

References: #98 · a6237a1

@spark-connect-js/core

Structured Streaming: spark.readStream (DataStreamReader) and df.writeStream (DataStreamWriter) with Trigger factories (processingTime, availableNow, once, continuous); start() returns a StreamingQuery (id, runId, name, isActive, stop, awaitTermination, status, lastProgress, recentProgress, processAllAvailable, exception, explain)
spark.streams (StreamingQueryManager): active, get, awaitAnyTermination, resetTerminated, addListener/removeListener with StreamingQueryListener callbacks (onQueryStarted, onQueryProgress, onQueryIdle, onQueryTerminated) and typed StreamingQueryProgress
Event-time aggregation: DataFrame.withWatermark(eventTimeColumn, delayThreshold), window(timeColumn, windowDuration, slideDuration?, startTime?), session_window(timeColumn, gapDuration)
createDataFrame(rows) accepts plain row objects, encoded via the new arrowEncoder builder hook; Uint8Array input is validated as Arrow IPC stream format (file-format and empty input throw InvalidInputError)
spark.table(name) reads a catalog table or temp view, shorthand for spark.read.table(name)
Typed row access: df.as<Schema>() narrows collected rows at compile time; the row accessor namespace (getInt, getLong, getDouble, getString, getBoolean, getBinary, getDate) validates at runtime
df.agg(...exprs) aggregates without grouping; df.col(name) binds a column reference to its DataFrame for self-joins
Comparison, arithmetic, and bitwise Column methods accept raw primitives and wrap them as literals
filter and where accept SQL string predicates
count() returns bigint, matching the LongType result; wrap in Number(...) when the count is known to fit a JS safe integer
show() renders dates, maps, structs, arrays, and binary in Spark’s display style
lit(null) emits a typed NULL literal and lit(undefined) throws InvalidInputError; pivot(col, values) accepts null values
Optional Transport.executeCommandStream method for custom transports that stream command result frames
pow accepts a Column | number exponent; regexp_replace accepts Column | string pattern and replacement; element_at accepts a numeric index
isSessionInvalidated(err) matches INVALID_HANDLE.* errors so callers can rebuild the session after a server restart

@spark-connect-js/node

Type-driven Arrow decode keyed on the column’s Arrow type: DECIMAL(p, s) as a fixed-point string honoring scale, DATE/TIMESTAMP as Date, MAP<K, V> as Map<K, V> with typed keys, LONG always as bigint (wrap in Number(...) for values known to fit a JS safe integer), applied recursively through structs and arrays
ArrowEncoder backs createDataFrame(rows) with type inference over string, number, boolean, bigint, Date, and nulls; strings encode as materialized Utf8 (never dictionary-encoded, which Spark LocalRelation misreads)
GrpcTransportOptions.handshakeTimeoutMs (default 10_000, 0 disables): the channel handshake fails with errorClass: "CONNECTION_TIMEOUT" instead of hanging on an unreachable or misconfigured endpoint
RetryPolicy.maxConsecutiveNoProgressReattaches (default 3, 0 disables): a stream that keeps reattaching without delivering data throws errorClass: "REATTACH_NO_PROGRESS" instead of retrying forever
GrpcTransport implements the streaming command RPCs (WriteStreamOperationStart, StreamingQueryCommand, StreamingQueryManagerCommand) and the listener event stream
parseConnectionString rejects non-sc:// schemes and userinfo in the host with messages naming the offending segment

@spark-connect-js/connect

Re-exported proto schemas: WriteStreamOperationStart/WriteStreamOperationStartResult, StreamingQueryCommand/StreamingQueryCommandResult, StreamingQueryManagerCommand/StreamingQueryManagerCommandResult, StreamingQueryListenerBusCommand, StreamingQueryListenerEvent/StreamingQueryListenerEventsResult, StreamingQueryEventType, StreamingQueryInstanceId, WithWatermark, DataType_NULL, and RelationCommon, with their result and sub-command messages, consumed by @spark-connect-js/node for streaming commands, watermarks, and typed NULL literals

0.4.0

14 May 2026 · GitHub release

References: #61 · aa22cc6

@spark-connect-js/core

Catalog parity with PySpark: the full spark.catalog surface (currentCatalog/setCurrentCatalog, listCatalogs/listDatabases/listTables/listColumns/listFunctions, databaseExists/tableExists/functionExists, getDatabase/getTable/getFunction, dropTempView/dropGlobalTempView, cacheTable/uncacheTable/clearCache/isCached, refreshTable/refreshByPath, recoverPartitions, createTable/createExternalTable)
spark.udf.registerJavaFunction(name, className, returnType?) and spark.udf.registerJavaUDAF(name, className) for binding Java UDFs and UDAFs already on the server’s classpath to a SQL function name
SparkSession.version() returns the server’s Spark version
SparkSession.builder().sessionId(uuid) to reuse a server-side session by ID
RuntimeConfig on spark.conf with get, set, unset, getAll, isModifiable
Session tags and interrupts: addTag, removeTag, getTags, clearTags, interruptAll, interruptTag, interruptOperation
Transport interface gains optional config and interrupt methods; ExecuteOptions plumbs per-call tags
SparkConnectError exposes errorClass, sqlState, messageParameters, errorTypeHierarchy, and serverStackTrace
Fix count("*") to send count(1) on the wire instead of count(<unresolved-*>), matching PySpark and Scala behavior

@spark-connect-js/node

Full sc:// connection-string grammar parsed: TLS via use_ssl=true, bearer token, user_id, user_agent, session_id (UUID), grpc_max_message_size, plus arbitrary key=value pairs that pass through as gRPC metadata on every RPC
Bearer token attached as authorization: Bearer <token> via combineChannelCredentials(createSsl(), createFromMetadataGenerator(...))
Canonical user_agent suffix: <your prefix> spark-connect-js/<ver> (node <ver>; <platform>).
Per-request operation IDs (UUIDv4) on every ExecutePlan request
ReattachExecute iterator resumes server-streaming responses after transient gRPC drops (UNAVAILABLE, INTERNAL with INVALID_CURSOR.DISCONNECTED) without re-executing the plan
Configurable retry policy via GrpcTransportOptions.retryPolicy; default mirrors PySpark (maxRetries=15, initialBackoffMs=50, maxBackoffMs=60_000, backoffMultiplier=4, jitterMs=500)
Error trailers: decode grpc-status-details-bin (google.rpc.Status + ErrorInfo) to populate errorClass, sqlState, messageParameters on SparkConnectError, with fallback to a FetchErrorDetails RPC for errorTypeHierarchy and serverStackTrace when the inline trailer is incomplete
client_observed_server_side_session_id captured from every response and echoed back on subsequent RPCs for stale-session detection; cleared on ReleaseSession
Config and Interrupt RPCs wired (consumed by spark.conf and interrupt* on core)

@spark-connect-js/connect

Vendored google.rpc.Status and google.rpc.ErrorInfo proto definitions, plus regenerated bindings for FetchErrorDetailsRequest/Response, consumed by @spark-connect-js/node for error-trailer decoding

0.3.0

29 March 2026 · GitHub release

References: #40 · d468479

@spark-connect-js/core

DataFrameReader shortcuts: csv(), json(), parquet(), orc(), text(), schema()
DataFrameWriter shortcuts: csv(), json(), parquet(), orc(), text(), bucketBy(), insertInto()
DataFrameWriterV2 with full writeTo() API: create, replace, createOrReplace, append, overwrite, overwritePartitions
Typed client error hierarchy: SparkClientError, InvalidConfigError, InvalidInputError, UnsupportedOperationError
isDistinct propagation on aggregate functions
Cross join validation rejects join conditions

@spark-connect-js/node

Proto serialization for WriteOperationV2 command
SparkProcessManager throw sites reclassified to SparkClientError
Re-exported typed client errors from core

@spark-connect-js/connect

Re-exported proto schemas: WriteOperationV2Schema, WriteOperationV2_ModeSchema

0.2.0

15 March 2026 · GitHub release

References: #18 · 924ea50

@spark-connect-js/core

DataFrame.cube(), .rollup() for multi-dimensional aggregation
DataFrame.unpivot() / .melt() for wide-to-long reshaping
DataFrame.summary() for descriptive statistics
DataFrame.replace() for value substitution via NAReplace
DataFrame.randomSplit() for splitting into multiple DataFrames
DataFrame.createTempView(), .createGlobalTempView(), .createOrReplaceGlobalTempView()
DataFrame.sameSemantics() and .semanticHash() for plan comparison
DataFrameStat class (.stat accessor) with corr(), cov(), crosstab(), freqItems(), approxQuantile()
GroupedData.pivot() support with cube/rollup/pivot group types

@spark-connect-js/node

Proto serialization for StatSummary, NAReplace, Unpivot, StatCorr, StatCov, StatCrosstab, StatFreqItems, StatApproxQuantile, and Aggregate_Pivot
Added analyze-plan request/response handling for sameSemantics and semanticHash
Re-exported DataFrameStat from package index

@spark-connect-js/connect

Re-exported proto schemas: StatSummarySchema, NAReplaceSchema, NAReplace_ReplacementSchema, StatCorrSchema, StatCovSchema, StatCrosstabSchema, StatFreqItemsSchema, StatApproxQuantileSchema, UnpivotSchema, Unpivot_ValuesSchema, Aggregate_PivotSchema
Re-exported analyze-plan schemas for SameSemantics and SemanticHash

0.1.0

9 March 2026 · GitHub release

References: #10 · 895f389

@spark-connect-js/core

Initial release. Platform-agnostic DataFrame API and logical plan builder with zero runtime dependencies.
SparkSession: connect via sc:// URL, execute SQL, read tables, create DataFrames from local data
DataFrame: 30+ transformations (select, filter, join, groupBy, sort, union, intersect, sample, fillna, dropna, and more), actions (collect, show, count, head, tail, toLocalIterator), properties (schema, columns, dtypes, isEmpty, printSchema, explain)
Column: comparisons, arithmetic, logical ops, cast, alias, null checks, pattern matching, bitwise ops, window support
GroupedData: agg, count, sum, avg, mean, min, max
Window: partitionBy, orderBy, rowsBetween, rangeBetween
DataFrameReader: format, option, options, load, table
DataFrameWriter: format, mode, option, options, partitionBy, sortBy, save, saveAsTable
Catalog: currentDatabase, setCurrentDatabase, listDatabases, listTables, listColumns, databaseExists, tableExists
248 built-in functions across 12 categories: aggregate, math, string, date/timestamp, window, collection, conditional, hash, JSON, CSV, bitwise, sort
PlanBuilder: constructs Spark Connect logical plan protobuf messages from the DataFrame API
Zero runtime dependencies

@spark-connect-js/node

Initial release. Node.js runtime adapter for Spark Connect with gRPC transport, Arrow decoding, and convenience re-exports of the full core API.
GrpcTransport: connects to Spark Connect over gRPC, streams ExecutePlan responses, handles metadata and session management
ArrowDecoder: deserializes Arrow IPC batches into JavaScript row objects
SparkProcessManager: launches and manages local spark-connect server processes for development
buildRelation / buildExpression: serializes logical plan nodes and expressions to protobuf wire format
Re-exports the entire @spark-connect-js/core public API (SparkSession, DataFrame, Column, functions, etc.) for single-package convenience

@spark-connect-js/connect

Initial release. Generated TypeScript types and service stubs from the Spark Connect protobuf definitions.
Protobuf types: Plan, Relation, Expression, DataType, and all nested message types
Service stubs: ExecutePlanRequest/Response, AnalyzePlanRequest/Response, ConfigRequest/Response, AddArtifactsRequest/Response, ArtifactStatusesRequest/Response
Schema objects: StructType, StructField, MapType, ArrayType, and all Spark data type descriptors
Single runtime dependency: @bufbuild/protobuf