Skip to content

Changelog

The three packages version and ship together. Per-package changelogs live alongside the source:

GitHub release · Latest

References: #61 · aa22cc6

@spark-connect-js/core

npm 0.4.0 github 0.4.0

  • Catalog parity with PySpark: the full spark.catalog surface (currentCatalog/setCurrentCatalog, listCatalogs/listDatabases/listTables/listColumns/listFunctions, databaseExists/tableExists/functionExists, getDatabase/getTable/getFunction, dropTempView/dropGlobalTempView, cacheTable/uncacheTable/clearCache/isCached, refreshTable/refreshByPath, recoverPartitions, createTable/createExternalTable)
  • spark.udf.registerJavaFunction(name, className, returnType?) and spark.udf.registerJavaUDAF(name, className) for binding Java UDFs and UDAFs already on the server’s classpath to a SQL function name
  • SparkSession.version() returns the server’s Spark version
  • SparkSession.builder().sessionId(uuid) to reuse a server-side session by ID
  • RuntimeConfig on spark.conf with get, set, unset, getAll, isModifiable
  • Session tags and interrupts: addTag, removeTag, getTags, clearTags, interruptAll, interruptTag, interruptOperation
  • Transport interface gains optional config and interrupt methods; ExecuteOptions plumbs per-call tags
  • SparkConnectError exposes errorClass, sqlState, messageParameters, errorTypeHierarchy, and serverStackTrace
  • Fix count("*") to send count(1) on the wire instead of count(<unresolved-*>), matching PySpark and Scala behavior

@spark-connect-js/node

npm 0.4.0 github 0.4.0

  • Full sc:// connection-string grammar parsed: TLS via use_ssl=true, bearer token, user_id, user_agent, session_id (UUID), grpc_max_message_size, plus arbitrary key=value pairs that pass through as gRPC metadata on every RPC
  • Bearer token attached as authorization: Bearer <token> via combineChannelCredentials(createSsl(), createFromMetadataGenerator(...))
  • Canonical user_agent suffix: <your prefix> spark-connect-js/<ver> (node <ver>; <platform>).
  • Per-request operation IDs (UUIDv4) on every ExecutePlan request
  • ReattachExecute iterator resumes server-streaming responses after transient gRPC drops (UNAVAILABLE, INTERNAL with INVALID_CURSOR.DISCONNECTED) without re-executing the plan
  • Configurable retry policy via GrpcTransportOptions.retryPolicy; default mirrors PySpark (maxRetries=15, initialBackoffMs=50, maxBackoffMs=60_000, backoffMultiplier=4, jitterMs=500)
  • Error trailers: decode grpc-status-details-bin (google.rpc.Status + ErrorInfo) to populate errorClass, sqlState, messageParameters on SparkConnectError, with fallback to a FetchErrorDetails RPC for errorTypeHierarchy and serverStackTrace when the inline trailer is incomplete
  • client_observed_server_side_session_id captured from every response and echoed back on subsequent RPCs for stale-session detection; cleared on ReleaseSession
  • Config and Interrupt RPCs wired (consumed by spark.conf and interrupt* on core)

@spark-connect-js/connect

npm 0.4.0 github 0.4.0

  • Vendored google.rpc.Status and google.rpc.ErrorInfo proto definitions, plus regenerated bindings for FetchErrorDetailsRequest/Response, consumed by @spark-connect-js/node for error-trailer decoding

GitHub release

References: #40 · d468479

@spark-connect-js/core

npm 0.3.0 github 0.3.0

  • DataFrameReader shortcuts: csv(), json(), parquet(), orc(), text(), schema()
  • DataFrameWriter shortcuts: csv(), json(), parquet(), orc(), text(), bucketBy(), insertInto()
  • DataFrameWriterV2 with full writeTo() API: create, replace, createOrReplace, append, overwrite, overwritePartitions
  • Typed client error hierarchy: SparkClientError, InvalidConfigError, InvalidInputError, UnsupportedOperationError
  • isDistinct propagation on aggregate functions
  • Cross join validation rejects join conditions

@spark-connect-js/node

npm 0.3.0 github 0.3.0

  • Proto serialization for WriteOperationV2 command
  • SparkProcessManager throw sites reclassified to SparkClientError
  • Re-exported typed client errors from core

@spark-connect-js/connect

npm 0.3.0 github 0.3.0

  • Re-exported proto schemas: WriteOperationV2Schema, WriteOperationV2_ModeSchema

GitHub release

References: #18 · 924ea50

@spark-connect-js/core

npm 0.2.0 github 0.2.0

  • DataFrame.cube(), .rollup() for multi-dimensional aggregation
  • DataFrame.unpivot() / .melt() for wide-to-long reshaping
  • DataFrame.summary() for descriptive statistics
  • DataFrame.replace() for value substitution via NAReplace
  • DataFrame.randomSplit() for splitting into multiple DataFrames
  • DataFrame.createTempView(), .createGlobalTempView(), .createOrReplaceGlobalTempView()
  • DataFrame.sameSemantics() and .semanticHash() for plan comparison
  • DataFrameStat class (.stat accessor) with corr(), cov(), crosstab(), freqItems(), approxQuantile()
  • GroupedData.pivot() support with cube/rollup/pivot group types

@spark-connect-js/node

npm 0.2.0 github 0.2.0

  • Proto serialization for StatSummary, NAReplace, Unpivot, StatCorr, StatCov, StatCrosstab, StatFreqItems, StatApproxQuantile, and Aggregate_Pivot
  • Added analyze-plan request/response handling for sameSemantics and semanticHash
  • Re-exported DataFrameStat from package index

@spark-connect-js/connect

npm 0.2.0 github 0.2.0

  • Re-exported proto schemas: StatSummarySchema, NAReplaceSchema, NAReplace_ReplacementSchema, StatCorrSchema, StatCovSchema, StatCrosstabSchema, StatFreqItemsSchema, StatApproxQuantileSchema, UnpivotSchema, Unpivot_ValuesSchema, Aggregate_PivotSchema
  • Re-exported analyze-plan schemas for SameSemantics and SemanticHash

GitHub release

References: #10 · 895f389

@spark-connect-js/core

npm 0.1.0 github 0.1.0

  • SparkSession: connect via sc:// URL, execute SQL, read tables, create DataFrames from local data
  • DataFrame: 30+ transformations (select, filter, join, groupBy, sort, union, intersect, sample, fillna, dropna, and more), actions (collect, show, count, head, tail, toLocalIterator), properties (schema, columns, dtypes, isEmpty, printSchema, explain)
  • Column: comparisons, arithmetic, logical ops, cast, alias, null checks, pattern matching, bitwise ops, window support
  • GroupedData: agg, count, sum, avg, mean, min, max
  • Window: partitionBy, orderBy, rowsBetween, rangeBetween
  • DataFrameReader: format, option, options, load, table
  • DataFrameWriter: format, mode, option, options, partitionBy, sortBy, save, saveAsTable
  • Catalog: currentDatabase, setCurrentDatabase, listDatabases, listTables, listColumns, databaseExists, tableExists
  • 248 built-in functions across 12 categories: aggregate, math, string, date/timestamp, window, collection, conditional, hash, JSON, CSV, bitwise, sort
  • PlanBuilder: constructs Spark Connect logical plan protobuf messages from the DataFrame API
  • Zero runtime dependencies

@spark-connect-js/node

npm 0.1.0 github 0.1.0

  • GrpcTransport: connects to Spark Connect over gRPC, streams ExecutePlan responses, handles metadata and session management
  • ArrowDecoder: deserializes Arrow IPC batches into JavaScript row objects
  • SparkProcessManager: launches and manages local spark-connect server processes for development
  • buildRelation / buildExpression: serializes logical plan nodes and expressions to protobuf wire format
  • Re-exports the entire @spark-connect-js/core public API (SparkSession, DataFrame, Column, functions, etc.) for single-package convenience

@spark-connect-js/connect

npm 0.1.0 github 0.1.0

  • Protobuf types: Plan, Relation, Expression, DataType, and all nested message types
  • Service stubs: ExecutePlanRequest/Response, AnalyzePlanRequest/Response, ConfigRequest/Response, AddArtifactsRequest/Response, ArtifactStatusesRequest/Response
  • Schema objects: StructType, StructField, MapType, ArrayType, and all Spark data type descriptors
  • Single runtime dependency: @bufbuild/protobuf