Contributing
For policy and process, see CONTRIBUTING.md.
Repository layout
Section titled “Repository layout”A pnpm + Turborepo monorepo:
packages/ spark-core/ zero-dependency API surface: DataFrame, Column, plan builder spark-node/ gRPC transport, Arrow decoder, Node entry point spark-connect/ generated protobuf types (do not edit by hand)apps/ docs/ this site (Astro + Starlight + TypeDoc)examples/ node-*/ runnable example apps, one per workload styleScripts are defined in package.json at the repo root and in each workspace. Turbo orchestrates them across the monorepo; pnpm --filter <package> <script> invokes a script in a specific workspace.
Requires Node 22 or later and pnpm 10 or later.
git clone https://github.com/prustic/spark-connect-js.gitcd spark-connect-jspnpm installTesting and linting
Section titled “Testing and linting”Unit tests live alongside source as *.test.ts files and run against an in-memory fake transport, so they’re fast and need no external services.
Integration tests live in tests/integration/ (@spark-connect-js/integration-tests). The test:integration script brings up a Spark Connect server in Docker, runs against it, and tears it down:
pnpm --filter @spark-connect-js/integration-tests test:integrationThe default endpoint is sc://localhost:15002; override with the SPARK_REMOTE env var if you’re pointing at something else.
For interactive smoke testing, the apps in examples/ each ship a docker-compose.yml for the same setup. Run the example, hit the live server, tear it down.
Linting and formatting use ESLint and Prettier with shared config in tooling/eslint/ and tooling/prettier/. Both run in CI on every PR alongside the build and test steps.
Adding a built-in function
Section titled “Adding a built-in function”Built-in functions wrap callFunction, which packages the arguments into an UnresolvedFunction expression and lets Catalyst handle the rest.
export function coalesce(...cols: (Column | string)[]): Column { return callFunction("coalesce", cols);}Add a test that checks the generated plan structure, and if the function has non-trivial semantics, add an integration test that round-trips through a real server.
The functions/index.ts file groups functions by category (aggregate, string, date, math, conditional, collection). Add new entries in the matching section to keep the generated API reference readable.
Adding a DataFrame method
Section titled “Adding a DataFrame method”- Add the logical plan case in
plan/logical-plan.ts. - Add the builder method on
DataFramethat appends the new node tothis.plan. - Add the proto encoding in
plan/plan-builder.ts. - Add tests: a unit test for the builder, an encoding test for the proto round-trip, and an integration test against a real server.
The proto schema for Spark Connect lives in the Apache Spark source tree. When adding support for a new message type, align the TypeScript representation with the proto names (not the Scala DataFrame names) so encoding stays obvious.
Filing a bug
Section titled “Filing a bug”Ideally, include:
@spark-connect-js/nodeversion frompackage.json.- Node version (
node --version). - Spark server version (
spark.sql("SELECT version()").show()or your vendor’s equivalent). - Minimal reproduction, a single SQL string or a short script.
- Full error:
errorClass,code,sqlState,message. - Output of
df.explain("extended")if the error came from a plan.
The Troubleshooting page also offers some help in case of common issues.
Reporting security issues
Section titled “Reporting security issues”Do not open a public issue for a security vulnerability. See SECURITY.md in the repo for the disclosure policy.
License
Section titled “License”By contributing, you agree that your contributions are licensed under Apache-2.0.