Skip to content

DataFrameReader

Defined in: spark-session.ts:468

Fluent reader for loading data into a DataFrame. Returned by spark.read; configure the format, options, and schema, then terminate with a format shortcut (csv, json, parquet, orc, text) or .load().

spark.read.parquet("s3://bucket/events/");
spark.read
.schema("id INT, name STRING")
.option("header", "true")
.csv("/data/people.csv");

Spark source: DataFrameReader.scala

new DataFrameReader(session): DataFrameReader;

Defined in: spark-session.ts:474

ParameterType
sessionSparkSession

DataFrameReader

csv(path): DataFrame;

Defined in: spark-session.ts:555

Shortcut for .format(“csv”).load(path).

ParameterType
pathstring

DataFrame


format(fmt): this;

Defined in: spark-session.ts:508

ParameterType
fmtstring

this


json(path): DataFrame;

Defined in: spark-session.ts:550

Shortcut for .format(“json”).load(path).

ParameterType
pathstring

DataFrame


load(path): DataFrame;

Defined in: spark-session.ts:530

Trigger a Read plan node. The resulting DataFrame is lazy; no data is fetched until .collect() or an action is called.

This maps to Spark Connect’s Relation.Read with ReadType.DataSource: { format: “parquet”, paths: […], options: {…} }

ParameterType
pathstring

DataFrame


option(key, value): this;

Defined in: spark-session.ts:513

ParameterType
keystring
valuestring

this


options(opts): this;

Defined in: spark-session.ts:518

ParameterType
optsRecord<string, string>

this


orc(path): DataFrame;

Defined in: spark-session.ts:565

Shortcut for .format(“orc”).load(path).

ParameterType
pathstring

DataFrame


parquet(path): DataFrame;

Defined in: spark-session.ts:560

Shortcut for .format(“parquet”).load(path).

ParameterType
pathstring

DataFrame


schema(schema): this;

Defined in: spark-session.ts:483

Set the schema for the data source. Accepts a DDL-formatted string (e.g. “name STRING, age INT”) or a StructType with a toDDL() method.

ParameterType
schema| string | { toDDL: string; }

this


table(tableName): DataFrame;

Defined in: spark-session.ts:541

Read a named table (catalog table or temp view).

ParameterType
tableNamestring

DataFrame


text(path): DataFrame;

Defined in: spark-session.ts:570

Shortcut for .format(“text”).load(path).

ParameterType
pathstring

DataFrame