Examples
The repository’s examples/ directory contains four self-contained applications. Each has its own package.json, a docker-compose.yml for a local Spark Connect server, and a single-command run script. Clone the repo, cd into one, and follow its README.
Connect, run a SQL query, print the rows, disconnect. The minimum viable script and copy-paste starting point for a new project.
Covers: connect(), spark.sql(...), collect(), stop().
Walks through the session.catalog surface: inspect catalogs and databases, register a temp view, list columns, cache and uncache a table. Mirrors the structure of the Catalog guide.
Covers: spark.catalog.*, temp views, cacheTable, isCached, dropTempView.
End-to-end round trip through the DataFrame reader and writer: build a DataFrame from SQL, write it as CSV / JSON / Parquet, read it back with an explicit schema, verify equality. Mirrors the I/O guide.
Covers: spark.read, spark.write, save modes, schema-on-read, partitionBy.
Broader workload: aggregation, pivoting, caching, windowed stats. Closer to a real analytics pipeline, and useful for benchmarking small-to-medium queries.
Covers: groupBy, pivot, rollup, Window, df.cache(), unpersist, statistical functions.
Connects to a Spark Connect server over TLS by terminating TLS at a Caddy reverse proxy in front of apache/spark:4.0.0. Covers the production deployment topology, since Spark Connect 4.0 has no native server-side TLS.
Covers: use_ssl=true, self-signed certs via NODE_EXTRA_CA_CERTS, Caddy reverse_proxy h2c://.
Running an example
Section titled “Running an example”Each example has a docker-compose.yml that brings up a Spark 3.5 Connect server on localhost:15002. From inside an example directory:
docker compose up -d # start the Spark Connect serverpnpm installpnpm start # or: node src/index.jsdocker compose down # tear down when doneThe compose files don’t share state across examples; each starts a fresh server. Stop one before starting another if you want to reuse port 15002.