Skip to content

Overview

spark-connect-js lets Node.js applications drive Apache Spark over Spark Connect, without a JVM in the same process.

Other runtimes are covered on the Integrations page.

Terminal window
npm install @spark-connect-js/node

You need Node.js 22 or later on the client and a reachable Spark 3.4 or newer server. The full matrix is on the compatibility page.

import { connect } from "@spark-connect-js/node";
const spark = connect("sc://localhost:15002");
const rows = await spark.sql("SELECT 1 AS n").collect();
console.log(rows); // [ { n: 1 } ]
await spark.stop();

connect(url) returns a SparkSession. DataFrame methods are lazy; the plan leaves the client only on an action such as collect, count, show, or a DataFrameWriter save method.

import { col, lit, sum } from "@spark-connect-js/node";
const df = spark.read
.parquet("s3a://events/2026/")
.filter(col("status").eq(lit("active")))
.groupBy("region")
.agg(sum("revenue").alias("revenue"));
await df.show();

The Quickstart covers starting a local Connect server and a longer example. Runnable scripts live in examples/.