Overview
spark-connect-js lets Node.js applications drive Apache Spark over Spark Connect, without a JVM in the same process.
Packages
Section titled “Packages”@spark-connect-js/core: the platform-agnostic DataFrame API and plan builder. Zero runtime dependencies. Install this directly only when writing your own runtime adapter.@spark-connect-js/node: the Node.js runtime. Bundlescorewith a gRPC transport (@grpc/grpc-js) and an Arrow IPC decoder (apache-arrow). The only package most applications need.@spark-connect-js/connect: generated protobuf types. Pulled in transitively as an implementation detail.
Other runtimes are covered on the Integrations page.
Install
Section titled “Install”npm install @spark-connect-js/nodepnpm add @spark-connect-js/nodeyarn add @spark-connect-js/nodebun add @spark-connect-js/nodeYou need Node.js 22 or later on the client and a reachable Spark 3.4 or newer server. The full matrix is on the compatibility page.
A first query
Section titled “A first query”import { connect } from "@spark-connect-js/node";
const spark = connect("sc://localhost:15002");
const rows = await spark.sql("SELECT 1 AS n").collect();console.log(rows); // [ { n: 1 } ]
await spark.stop();connect(url) returns a SparkSession. DataFrame methods are lazy; the plan leaves the client only on an action such as collect, count, show, or a DataFrameWriter save method.
import { col, lit, sum } from "@spark-connect-js/node";
const df = spark.read .parquet("s3a://events/2026/") .filter(col("status").eq(lit("active"))) .groupBy("region") .agg(sum("revenue").alias("revenue"));
await df.show();The Quickstart covers starting a local Connect server and a longer example. Runnable scripts live in examples/.
Next steps
Section titled “Next steps” Quickstart Start a local Spark Connect server and run your first query.
SQL and DataFrame guide Transformations, actions, and lazy evaluation.
API reference Generated from source.
Architecture What happens on the wire.