Examples

The repository’s examples/ directory contains six self-contained applications. Each has its own package.json, a docker-compose.yml for a local Spark Connect server, and a single-command run script. Clone the repo, cd into one, and follow its README.

`node-quickstart`

Connect, run a SQL query, print the rows, disconnect. The minimum viable script and copy-paste starting point for a new project.

Covers: connect(), spark.sql(...), collect(), stop().

`node-catalog`

Walks through the session.catalog surface: inspect catalogs and databases, register a temp view, list columns, cache and uncache a table. Mirrors the structure of the Catalog guide.

Covers: spark.catalog.*, temp views, cacheTable, isCached, dropTempView.

`node-read-write`

End-to-end round trip through the DataFrame reader and writer: build a DataFrame from SQL, write it as CSV / JSON / Parquet, read it back with an explicit schema, verify equality. Mirrors the I/O guide.

Covers: spark.read, spark.write, save modes, schema-on-read, partitionBy.

`node-cache-pivot-stats`

Broader workload: aggregation, pivoting, caching, windowed stats. Closer to a real analytics pipeline, and useful for benchmarking small-to-medium queries.

Covers: groupBy, pivot, rollup, Window, df.cache(), unpersist, statistical functions.

`node-streaming`

Runs a rate-to-memory streaming query with a lifecycle listener attached: start, watch progress events, inspect the manager, stop. Mirrors the Structured Streaming guide.

Covers: readStream, writeStream, Trigger, StreamingQuery, spark.streams, addListener.

`node-tls-behind-proxy`

Connects to a Spark Connect server over TLS by terminating TLS at a Caddy reverse proxy in front of apache/spark:4.0.0. Covers the production deployment topology, since Spark Connect 4.0 has no native server-side TLS.

Covers: use_ssl=true, self-signed certs via NODE_EXTRA_CA_CERTS, Caddy reverse_proxy h2c://.

Running an example

Each example has a docker-compose.yml that brings up a Spark 4.0 Connect server on localhost:15002. From inside an example directory:

docker compose up -d          # start the Spark Connect server
pnpm install
pnpm start                    # or: node src/index.js
docker compose down           # tear down when done

The compose files don’t share state across examples; each starts a fresh server. Stop one before starting another if you want to reuse port 15002.