Skip to content

Examples

The repository’s examples/ directory contains four self-contained applications. Each has its own package.json, a docker-compose.yml for a local Spark Connect server, and a single-command run script. Clone the repo, cd into one, and follow its README.

Connect, run a SQL query, print the rows, disconnect. The minimum viable script and copy-paste starting point for a new project.

Covers: connect(), spark.sql(...), collect(), stop().

Walks through the session.catalog surface: inspect catalogs and databases, register a temp view, list columns, cache and uncache a table. Mirrors the structure of the Catalog guide.

Covers: spark.catalog.*, temp views, cacheTable, isCached, dropTempView.

End-to-end round trip through the DataFrame reader and writer: build a DataFrame from SQL, write it as CSV / JSON / Parquet, read it back with an explicit schema, verify equality. Mirrors the I/O guide.

Covers: spark.read, spark.write, save modes, schema-on-read, partitionBy.

Broader workload: aggregation, pivoting, caching, windowed stats. Closer to a real analytics pipeline, and useful for benchmarking small-to-medium queries.

Covers: groupBy, pivot, rollup, Window, df.cache(), unpersist, statistical functions.

Connects to a Spark Connect server over TLS by terminating TLS at a Caddy reverse proxy in front of apache/spark:4.0.0. Covers the production deployment topology, since Spark Connect 4.0 has no native server-side TLS.

Covers: use_ssl=true, self-signed certs via NODE_EXTRA_CA_CERTS, Caddy reverse_proxy h2c://.

Each example has a docker-compose.yml that brings up a Spark 3.5 Connect server on localhost:15002. From inside an example directory:

Terminal window
docker compose up -d # start the Spark Connect server
pnpm install
pnpm start # or: node src/index.js
docker compose down # tear down when done

The compose files don’t share state across examples; each starts a fresh server. Stop one before starting another if you want to reuse port 15002.