Overview

spark-connect-js lets Node.js applications drive Apache Spark over Spark Connect, without a JVM in the same process.

Packages

@spark-connect-js/core: the platform-agnostic DataFrame API and plan builder. Zero runtime dependencies. Install this directly only when writing your own runtime adapter.
@spark-connect-js/node: the Node.js runtime. Bundles core with a gRPC transport (@grpc/grpc-js) and an Arrow IPC decoder (apache-arrow). The only package most applications need.
@spark-connect-js/connect: generated protobuf types. Pulled in transitively as an implementation detail.

Other runtimes are covered on the Integrations page.

Install

npm install @spark-connect-js/node

pnpm add @spark-connect-js/node

yarn add @spark-connect-js/node

bun add @spark-connect-js/node

You need Node.js 22 or later on the client and a reachable Spark 3.4 or newer server. The full matrix is on the compatibility page.

A first query

import { connect } from "@spark-connect-js/node";

const spark = connect("sc://localhost:15002");

const rows = await spark.sql("SELECT 1 AS n").collect();
console.log(rows); // [ { n: 1 } ]

await spark.stop();

connect(url) returns a SparkSession. DataFrame methods are lazy; the plan leaves the client only on an action such as collect, count, show, or a DataFrameWriter save method.

import { col, sum } from "@spark-connect-js/node";

const df = spark.read
  .parquet("s3a://events/2026/")
  .filter(col("status").eq("active"))
  .groupBy("region")
  .agg(sum("revenue").alias("revenue"));

await df.show();

Batch and streaming share the same DataFrame surface. spark.readStream and df.writeStream run the plan as a continuous query instead. See Structured Streaming.

The Quickstart covers starting a local Connect server and a longer example. Runnable scripts live in examples/.

Next steps

QuickstartStart a local Spark Connect server and run your first query.

SQL and DataFrame guideTransformations, actions, and lazy evaluation.

Structured StreamingContinuous queries, watermarks, and event-time windows.

API referenceGenerated from source.