Skip to content

Configuration

The client accepts the full sc:// connection-string grammar:

sc://host:port[/[;key=value][;key=value]...]

Reserved keys with first-class handling:

KeyMeaning
use_ssltrue swaps the channel to grpc.credentials.createSsl(). The presence of token implies use_ssl=true.
tokenBearer token. Attached as authorization: Bearer <token> on every RPC. Token-over-insecure (use_ssl=false with token) is rejected.
user_idSent as userContext.userId on every RPC.
user_agentPrefix to the canonical user-agent string <your prefix> spark-connect-js/<ver> (node <ver>; <platform>).
session_idReuse an existing server-side session by UUID. Validated at parse time.
grpc_max_message_sizeOverride the gRPC max-message-size in bytes for both inbound and outbound.

Any unreserved ;key=value pair is attached as gRPC metadata on every outgoing RPC.

connect("sc://localhost:15002");
connect("sc://spark.internal:443/;use_ssl=true");
connect("sc://spark.internal:443/;token=abc;use_ssl=true");
connect("sc://spark.internal:443/;use_ssl=true;tenant=acme;x-deploy=blue");
connect("sc://localhost:15002/;session_id=550e8400-e29b-41d4-a716-446655440000");

To skip the URL string, use the builder directly:

import { SparkSession } from "@spark-connect-js/node";
const spark = SparkSession.builder()
.remote("sc://spark.internal:15002")
.getOrCreate();
SparkSession.builder()
.remote("sc://localhost:15002") // required, connection URL
.sessionId("550e8400-e29b-41d4-a716-446655440000") // optional, reuse a server-side session
.transport(customTransport) // optional, plug in a custom transport
.arrowDecoder(customDecoder) // optional, plug in a custom Arrow decoder
.getOrCreate(); // return the session

getOrCreate() constructs the session without opening the channel. The channel opens lazily on the first RPC.

For low-level transport tuning (custom ChannelCredentials, retry policy override, max-message-size override) construct a GrpcTransport directly and inject it via .transport(...). The GrpcTransportOptions shape is documented in the API reference.

Reader and writer options pass through to Spark unchanged. The full list is in the Spark SQL data sources documentation.

OptionDefaultDescription
headerfalseTreat the first line as a header.
sep,Field separator.
quote"Quote character.
escape\\Escape character inside quotes.
inferSchemafalseInfer types; requires a second pass.
nullValue""String that represents null.
dateFormatDate parse pattern.
timestampFormatTimestamp parse pattern.
modePERMISSIVEPERMISSIVE, DROPMALFORMED, FAILFAST.
multiLinefalseAllow newlines inside quoted fields.
OptionDefaultDescription
multiLinefalseOne JSON object may span multiple lines per file.
allowCommentsfalsePermit // comments.
allowSingleQuotestruePermit '...' strings.
modePERMISSIVEPERMISSIVE, DROPMALFORMED, FAILFAST.
primitivesAsStringfalseKeep every primitive value as a string.
OptionDefaultDescription
compressionsnappynone, snappy, gzip, lz4, zstd.
mergeSchemafalseMerge schemas across files. Expensive for wide tables.
OptionDescription
pathGlobFilterRestrict file discovery with a glob pattern.
recursiveFileLookupInclude files in nested directories.
modifiedBefore / modifiedAfterFilter by file modification time.
ModeDescription
appendAdd to existing data.
overwriteReplace existing data.
ignoreNo-op if the target exists.
error / errorifexistsFail if the target exists (the default).

session.conf mirrors PySpark’s spark.conf, routed through Spark Connect’s Config RPC. Every method is async because each call is a server roundtrip.

await spark.conf.set("spark.sql.shuffle.partitions", "50");
const v = await spark.conf.get("spark.sql.shuffle.partitions"); // "50"
const all = await spark.conf.getAll(); // Record<string, string>
const allShuffle = await spark.conf.getAll("spark.sql.shuffle.");
const ok = await spark.conf.isModifiable("spark.sql.shuffle.partitions"); // true
await spark.conf.unset("spark.sql.shuffle.partitions");

get returns string | undefined; the server returns undefined for unset keys.

Server-side defaults still live in spark-defaults.conf or --conf flags on start-connect-server.sh. session.conf is for per-session overrides made from client code.