DataFrame
Defined in: data-frame.ts:51
A distributed collection of rows with a named schema, obtained from a
SparkSession (for example via spark.read.parquet(path) or
spark.sql(...)).
DataFrame is lazy. Transformation methods (select, filter, join,
withColumn, etc.) return a new DataFrame that wraps an extended logical
plan; no work is performed on the server until an action (collect,
count, show, write.save, etc.) is called.
Example
Section titled “Example”const df = await spark.read.parquet("s3://bucket/events");
const recent = df .filter(col("ts").gte(lit("2026-01-01"))) .groupBy("country") .count();
const rows = await recent.collect();Accessors
Section titled “Accessors”Get Signature
Section titled “Get Signature”get stat(): DataFrameStat;Defined in: data-frame.ts:631
Access statistical functions (corr, cov, crosstab, etc.).
Returns
Section titled “Returns”Get Signature
Section titled “Get Signature”get write(): DataFrameWriter;Defined in: data-frame.ts:638
Returns a DataFrameWriter for persisting the contents of this DataFrame.
Returns
Section titled “Returns”Methods
Section titled “Methods”alias()
Section titled “alias()”alias(name): DataFrame;Defined in: data-frame.ts:387
Assign an alias to this DataFrame, useful for self-joins.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
name | string |
Returns
Section titled “Returns”DataFrame
cache()
Section titled “cache()”cache(): Promise<DataFrame>;Defined in: data-frame.ts:656
Persist this DataFrame with the default storage level (MEMORY_AND_DISK). Returns this DataFrame for method chaining.
Returns
Section titled “Returns”Promise<DataFrame>
coalesce()
Section titled “coalesce()”coalesce(numPartitions): DataFrame;Defined in: data-frame.ts:509
Return a new DataFrame that is reduced to the given number of partitions. Unlike repartition(), coalesce avoids a full shuffle and tries to combine existing partitions.
Parameters
Section titled “Parameters”| Parameter | Type | Description |
|---|---|---|
numPartitions | number | Target number of partitions |
Returns
Section titled “Returns”DataFrame
collect()
Section titled “collect()”collect(): Promise<Row[]>;Defined in: data-frame.ts:776
Execute the plan and collect all result rows into a JS array.
For large datasets, prefer toLocalIterator() or forEach() to avoid loading everything into memory.
Returns
Section titled “Returns”Promise<Row[]>
columns()
Section titled “columns()”columns(): Promise<string[]>;Defined in: data-frame.ts:904
Return the column names as a string array. Uses the AnalyzePlan.Schema RPC to resolve the schema without executing.
Returns
Section titled “Returns”Promise<string[]>
count()
Section titled “count()”count(): Promise<number>;Defined in: data-frame.ts:791
Return the number of rows. Uses an aggregate count plan. The full dataset is not collected.
Returns
Section titled “Returns”Promise<number>
createGlobalTempView()
Section titled “createGlobalTempView()”createGlobalTempView(viewName): Promise<void>;Defined in: data-frame.ts:739
Register as a global temporary view. Throws if the view already exists.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
viewName | string |
Returns
Section titled “Returns”Promise<void>
createOrReplaceGlobalTempView()
Section titled “createOrReplaceGlobalTempView()”createOrReplaceGlobalTempView(viewName): Promise<void>;Defined in: data-frame.ts:728
Register as a global temporary view, replacing if it already exists.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
viewName | string |
Returns
Section titled “Returns”Promise<void>
createOrReplaceTempView()
Section titled “createOrReplaceTempView()”createOrReplaceTempView(viewName): Promise<void>;Defined in: data-frame.ts:706
Register this DataFrame as a temporary view with the given name. The view is session-scoped and will be dropped when the session ends.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
viewName | string |
Returns
Section titled “Returns”Promise<void>
createTempView()
Section titled “createTempView()”createTempView(viewName): Promise<void>;Defined in: data-frame.ts:717
Register as a temporary view. Throws if the view already exists.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
viewName | string |
Returns
Section titled “Returns”Promise<void>
crossJoin()
Section titled “crossJoin()”crossJoin(other): DataFrame;Defined in: data-frame.ts:188
Alias for join with joinType=“cross”.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
other | DataFrame |
Returns
Section titled “Returns”DataFrame
cube()
Section titled “cube()”cube(...columns): GroupedData;Defined in: data-frame.ts:100
Multi-dimensional cube aggregation (all grouping-column combinations).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columns | (string | Column)[] |
Returns
Section titled “Returns”describe()
Section titled “describe()”describe(...cols): DataFrame;Defined in: data-frame.ts:548
Compute summary statistics (count, mean, stddev, min, max) for columns.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…cols | string[] |
Returns
Section titled “Returns”DataFrame
distinct()
Section titled “distinct()”distinct(): DataFrame;Defined in: data-frame.ts:266
Alias for dropDuplicates() with no arguments.
Returns
Section titled “Returns”DataFrame
drop()
Section titled “drop()”drop(...columnNames): DataFrame;Defined in: data-frame.ts:193
Drop one or more columns by name.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columnNames | string[] |
Returns
Section titled “Returns”DataFrame
dropDuplicates()
Section titled “dropDuplicates()”dropDuplicates(...columnNames): DataFrame;Defined in: data-frame.ts:256
Remove duplicate rows, optionally considering only a subset of columns.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columnNames | string[] |
Returns
Section titled “Returns”DataFrame
dropna()
Section titled “dropna()”dropna(how?, cols?): DataFrame;Defined in: data-frame.ts:362
Drop rows with null values.
Parameters
Section titled “Parameters”| Parameter | Type | Default value |
|---|---|---|
how | "all" | "any" | "any" |
cols | string[] | [] |
Returns
Section titled “Returns”DataFrame
dtypes()
Section titled “dtypes()”dtypes(): Promise<[string, string][]>;Defined in: data-frame.ts:914
Return column names and their data types as [name, type] pairs. Uses the AnalyzePlan.Schema RPC.
Returns
Section titled “Returns”Promise<[string, string][]>
except()
Section titled “except()”except(other): DataFrame;Defined in: data-frame.ts:307
Return rows in this but not in other (distinct).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
other | DataFrame |
Returns
Section titled “Returns”DataFrame
exceptAll()
Section titled “exceptAll()”exceptAll(other): DataFrame;Defined in: data-frame.ts:312
Return rows in this but not in other (duplicates kept).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
other | DataFrame |
Returns
Section titled “Returns”DataFrame
explain()
Section titled “explain()”explain(mode?): Promise<string>;Defined in: data-frame.ts:947
Return the query execution plan as a string.
Parameters
Section titled “Parameters”| Parameter | Type | Default value | Description |
|---|---|---|---|
mode | "simple" | "extended" | "codegen" | "cost" | "formatted" | "simple" | Explain mode: “simple”, “extended”, “codegen”, “cost”, “formatted” |
Returns
Section titled “Returns”Promise<string>
fillna()
Section titled “fillna()”fillna(value, cols?): DataFrame;Defined in: data-frame.ts:352
Replace null values. If cols is empty, applies to all columns.
Parameters
Section titled “Parameters”| Parameter | Type | Default value |
|---|---|---|
value | string | number | boolean | undefined |
cols | string[] | [] |
Returns
Section titled “Returns”DataFrame
filter()
Section titled “filter()”filter(condition): DataFrame;Defined in: data-frame.ts:70
Filter rows by a boolean Column expression.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
condition | Column |
Returns
Section titled “Returns”DataFrame
first()
Section titled “first()”first(): Promise<Row | null>;Defined in: data-frame.ts:866
Return the first row as a Row object, or null if the DataFrame is empty.
Returns
Section titled “Returns”Promise<Row | null>
forEach()
Section titled “forEach()”forEach(fn): Promise<void>;Defined in: data-frame.ts:844
Process each row with a callback as it streams from the server.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
fn | (row) => void |
Returns
Section titled “Returns”Promise<void>
Example
Section titled “Example”await df.forEach((row) => console.log(row.name, row.salary));getStorageLevel()
Section titled “getStorageLevel()”getStorageLevel(): Promise<StorageLevel>;Defined in: data-frame.ts:693
Get the storage level used for caching this DataFrame. Returns the StorageLevel if cached, or NONE if not cached.
Returns
Section titled “Returns”Promise<StorageLevel>
groupBy()
Section titled “groupBy()”groupBy(...columns): GroupedData;Defined in: data-frame.ts:94
Group by one or more columns, returning a GroupedData handle for aggregation.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columns | (string | Column)[] |
Returns
Section titled “Returns”head()
Section titled “head()”head(n?): Promise<Row[]>;Defined in: data-frame.ts:874
Return the first n rows as an array (alias for limit + collect).
Parameters
Section titled “Parameters”| Parameter | Type | Default value |
|---|---|---|
n | number | 1 |
Returns
Section titled “Returns”Promise<Row[]>
hint()
Section titled “hint()”hint(name, ...parameters): DataFrame;Defined in: data-frame.ts:403
Attach an optimizer hint to this DataFrame.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
name | string |
…parameters | (string | number | boolean)[] |
Returns
Section titled “Returns”DataFrame
Examples
Section titled “Examples”df.hint("broadcast")df.join(right.hint("broadcast"), ...)intersect()
Section titled “intersect()”intersect(other): DataFrame;Defined in: data-frame.ts:297
Return rows present in both DataFrames (distinct).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
other | DataFrame |
Returns
Section titled “Returns”DataFrame
intersectAll()
Section titled “intersectAll()”intersectAll(other): DataFrame;Defined in: data-frame.ts:302
Return rows present in both DataFrames (duplicates kept).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
other | DataFrame |
Returns
Section titled “Returns”DataFrame
isEmpty()
Section titled “isEmpty()”isEmpty(): Promise<boolean>;Defined in: data-frame.ts:924
Returns true if the DataFrame has no rows. Uses head(1) to check and stops after the first row.
Returns
Section titled “Returns”Promise<boolean>
join()
Section titled “join()”join( other, condition?, joinType?): DataFrame;Defined in: data-frame.ts:160
Join with another DataFrame.
Parameters
Section titled “Parameters”| Parameter | Type | Default value | Description |
|---|---|---|---|
other | DataFrame | undefined | The right side DataFrame |
condition? | Column | undefined | Join condition (a boolean Column expression) |
joinType? | | "inner" | "full_outer" | "left_outer" | "right_outer" | "left_semi" | "left_anti" | "cross" | "inner" | Type of join (default: “inner”) |
Returns
Section titled “Returns”DataFrame
limit()
Section titled “limit()”limit(n): DataFrame;Defined in: data-frame.ts:112
Limit the number of rows.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
n | number |
Returns
Section titled “Returns”DataFrame
melt()
Section titled “melt()”melt( ids, values, variableColumnName, valueColumnName): DataFrame;Defined in: data-frame.ts:621
Alias for unpivot().
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
ids | (string | Column)[] |
values | | (string | Column)[] | undefined |
variableColumnName | string |
valueColumnName | string |
Returns
Section titled “Returns”DataFrame
offset()
Section titled “offset()”offset(n): DataFrame;Defined in: data-frame.ts:271
Skip the first N rows.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
n | number |
Returns
Section titled “Returns”DataFrame
orderBy()
Section titled “orderBy()”orderBy(...columns): DataFrame;Defined in: data-frame.ts:149
Alias for sort().
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columns | (string | Column)[] |
Returns
Section titled “Returns”DataFrame
persist()
Section titled “persist()”persist(storageLevel?): Promise<DataFrame>;Defined in: data-frame.ts:666
Persist this DataFrame with the given storage level. Returns this DataFrame for method chaining.
Parameters
Section titled “Parameters”| Parameter | Type | Default value | Description |
|---|---|---|---|
storageLevel | StorageLevel | MEMORY_AND_DISK | How to store the cached data |
Returns
Section titled “Returns”Promise<DataFrame>
printSchema()
Section titled “printSchema()”printSchema(): Promise<void>;Defined in: data-frame.ts:962
Print the schema to the console in a tree format. Convenience method that calls schema() and formats the output.
Returns
Section titled “Returns”Promise<void>
randomSplit()
Section titled “randomSplit()”randomSplit(weights, seed?): DataFrame[];Defined in: data-frame.ts:580
Randomly split this DataFrame into multiple DataFrames by weight.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
weights | number[] |
seed? | number |
Returns
Section titled “Returns”DataFrame[]
repartition()
Section titled “repartition()”repartition(numPartitions, ...columns): DataFrame;Defined in: data-frame.ts:484
Return a new DataFrame partitioned by the given number of partitions. This results in a full shuffle of the data.
Parameters
Section titled “Parameters”| Parameter | Type | Description |
|---|---|---|
numPartitions | number | Target number of partitions |
…columns | (string | Column)[] | Optional partitioning columns |
Returns
Section titled “Returns”DataFrame
repartitionByRange()
Section titled “repartitionByRange()”repartitionByRange(numPartitions, ...columns): DataFrame;Defined in: data-frame.ts:524
Return a new DataFrame partitioned by the given columns using range partitioning.
Parameters
Section titled “Parameters”| Parameter | Type | Description |
|---|---|---|
numPartitions | number | Target number of partitions |
…columns | (string | Column)[] | Partitioning columns |
Returns
Section titled “Returns”DataFrame
replace()
Section titled “replace()”replace(to, subset?): DataFrame;Defined in: data-frame.ts:566
Replace values matching old with new, optionally restricted to a column subset.
Parameters
Section titled “Parameters”| Parameter | Type | Default value |
|---|---|---|
to | Record<string, string | number | boolean | null> | undefined |
subset | string[] | [] |
Returns
Section titled “Returns”DataFrame
rollup()
Section titled “rollup()”rollup(...columns): GroupedData;Defined in: data-frame.ts:106
Multi-dimensional rollup aggregation (hierarchical subtotals).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columns | (string | Column)[] |
Returns
Section titled “Returns”sameSemantics()
Section titled “sameSemantics()”sameSemantics(other): Promise<boolean>;Defined in: data-frame.ts:750
Returns true if both DataFrames have the same logical plan.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
other | DataFrame |
Returns
Section titled “Returns”Promise<boolean>
sample()
Section titled “sample()”sample( fraction, withReplacement?, seed?): DataFrame;Defined in: data-frame.ts:338
Return a random sample of rows.
Parameters
Section titled “Parameters”| Parameter | Type | Default value |
|---|---|---|
fraction | number | undefined |
withReplacement | boolean | false |
seed? | number | undefined |
Returns
Section titled “Returns”DataFrame
schema()
Section titled “schema()”schema(): Promise<Record<string, unknown>>;Defined in: data-frame.ts:934
Return the schema of the DataFrame as a plain object. Uses the AnalyzePlan.Schema RPC to resolve column names and types without executing the query.
Returns
Section titled “Returns”Promise<Record<string, unknown>>
select()
Section titled “select()”select(...columns): DataFrame;Defined in: data-frame.ts:84
Project (select) a subset of columns.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columns | (string | Column)[] |
Returns
Section titled “Returns”DataFrame
selectExpr()
Section titled “selectExpr()”selectExpr(...exprs): DataFrame;Defined in: data-frame.ts:421
Select columns using SQL expression strings. Each string is parsed by the server as an expression.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…exprs | string[] |
Returns
Section titled “Returns”DataFrame
Example
Section titled “Example”df.selectExpr("age * 2 as doubled_age", "name")semanticHash()
Section titled “semanticHash()”semanticHash(): Promise<number>;Defined in: data-frame.ts:760
Returns a hash code of the logical plan.
Returns
Section titled “Returns”Promise<number>
show()
Section titled “show()”show(numRows?, truncate?): Promise<void>;Defined in: data-frame.ts:974
Pretty-print the first numRows rows to the console as an ASCII table.
Mirrors PySpark’s df.show() behaviour. If truncate is true,
strings longer than 20 characters are truncated with ....
Parameters
Section titled “Parameters”| Parameter | Type | Default value |
|---|---|---|
numRows | number | 20 |
truncate | boolean | true |
Returns
Section titled “Returns”Promise<void>
sort()
Section titled “sort()”sort(...columns): DataFrame;Defined in: data-frame.ts:124
Sort by one or more columns (ascending by default). Use col(“x”).desc() for descending order.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columns | (string | Column)[] |
Returns
Section titled “Returns”DataFrame
sortWithinPartitions()
Section titled “sortWithinPartitions()”sortWithinPartitions(...columns): DataFrame;Defined in: data-frame.ts:451
Sort within each partition (non-global sort).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columns | (string | Column)[] |
Returns
Section titled “Returns”DataFrame
summary()
Section titled “summary()”summary(...statistics): DataFrame;Defined in: data-frame.ts:557
Compute specified statistics for numeric and string columns.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…statistics | string[] |
Returns
Section titled “Returns”DataFrame
tail()
Section titled “tail()”tail(n): Promise<Row[]>;Defined in: data-frame.ts:891
Return the last n rows as an array.
Maps to Spark Connect’s Relation.Tail.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
n | number |
Returns
Section titled “Returns”Promise<Row[]>
take()
Section titled “take()”take(n): Promise<Row[]>;Defined in: data-frame.ts:882
Return the first n rows as an array.
Alias for head(). Matches PySpark’s take() semantics.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
n | number |
Returns
Section titled “Returns”Promise<Row[]>
toDF()
Section titled “toDF()”toDF(...columnNames): DataFrame;Defined in: data-frame.ts:374
Return a new DataFrame with renamed columns (positional).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
…columnNames | string[] |
Returns
Section titled “Returns”DataFrame
toLocalIterator()
Section titled “toLocalIterator()”toLocalIterator(): AsyncIterableIterator<Row>;Defined in: data-frame.ts:827
Async iterator that yields rows one at a time. Only one batch is in memory at a time.
Returns
Section titled “Returns”AsyncIterableIterator<Row>
Example
Section titled “Example”for await (const row of df.toLocalIterator()) { console.log(row); }transform()
Section titled “transform()”transform<T>(fn): T;Defined in: data-frame.ts:444
Apply a user-defined function to this DataFrame and return the result.
This is purely client-side; it just calls fn(this).
Enables fluent pipeline composition:
Type Parameters
Section titled “Type Parameters”| Type Parameter |
|---|
T extends DataFrame |
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
fn | (df) => T |
Returns
Section titled “Returns”T
Example
Section titled “Example”df.transform(withDoubledAge).transform(withSalaryBand)union()
Section titled “union()”union(other): DataFrame;Defined in: data-frame.ts:282
Return a new DataFrame with rows from both this and other (duplicates kept).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
other | DataFrame |
Returns
Section titled “Returns”DataFrame
unionAll()
Section titled “unionAll()”unionAll(other): DataFrame;Defined in: data-frame.ts:287
Alias for union().
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
other | DataFrame |
Returns
Section titled “Returns”DataFrame
unionByName()
Section titled “unionByName()”unionByName(other, allowMissingColumns?): DataFrame;Defined in: data-frame.ts:292
Union by column name (rather than position), keeping duplicates.
Parameters
Section titled “Parameters”| Parameter | Type | Default value |
|---|---|---|
other | DataFrame | undefined |
allowMissingColumns | boolean | false |
Returns
Section titled “Returns”DataFrame
unpersist()
Section titled “unpersist()”unpersist(blocking?): Promise<DataFrame>;Defined in: data-frame.ts:680
Remove this DataFrame from the cache.
Parameters
Section titled “Parameters”| Parameter | Type | Default value | Description |
|---|---|---|---|
blocking | boolean | false | Whether to block until the operation completes |
Returns
Section titled “Returns”Promise<DataFrame>
unpivot()
Section titled “unpivot()”unpivot( ids, values, variableColumnName, valueColumnName): DataFrame;Defined in: data-frame.ts:602
Unpivot from wide format to long format.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
ids | (string | Column)[] |
values | | (string | Column)[] | undefined |
variableColumnName | string |
valueColumnName | string |
Returns
Section titled “Returns”DataFrame
where()
Section titled “where()”where(condition): DataFrame;Defined in: data-frame.ts:79
Alias for filter().
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
condition | Column |
Returns
Section titled “Returns”DataFrame
withColumn()
Section titled “withColumn()”withColumn(name, expression): DataFrame;Defined in: data-frame.ts:206
Add or replace a column.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
name | string |
expression | Column |
Returns
Section titled “Returns”DataFrame
Example
Section titled “Example”df.withColumn("doubled", col("value").multiply(lit(2)))withColumnRenamed()
Section titled “withColumnRenamed()”withColumnRenamed(existing, newName): DataFrame;Defined in: data-frame.ts:230
Rename a single column.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
existing | string |
newName | string |
Returns
Section titled “Returns”DataFrame
withColumns()
Section titled “withColumns()”withColumns(colMap): DataFrame;Defined in: data-frame.ts:217
Add or replace multiple columns at once.
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
colMap | Record<string, Column> |
Returns
Section titled “Returns”DataFrame
withColumnsRenamed()
Section titled “withColumnsRenamed()”withColumnsRenamed(colsMap): DataFrame;Defined in: data-frame.ts:243
Rename multiple columns at once.
Parameters
Section titled “Parameters”| Parameter | Type | Description |
|---|---|---|
colsMap | Record<string, string> | mapping of { existingName: newName } |
Returns
Section titled “Returns”DataFrame
writeTo()
Section titled “writeTo()”writeTo(tableName): DataFrameWriterV2;Defined in: data-frame.ts:646
Returns a DataFrameWriterV2 for writing to the given table using the DataSource V2 API (catalog-aware, supports create/replace/append/overwrite).
Parameters
Section titled “Parameters”| Parameter | Type |
|---|---|
tableName | string |