Skip to content

[Incompatibility] Document array_repeat negative count handling #3176

@andygrove

Description

@andygrove

Summary

array_repeat is marked as Incompatible in Comet, but the specific incompatibility is not documented. This issue tracks documenting and potentially fixing the behavior difference.

Spark Specification

According to Spark's array_repeat behavior:

  • Returns an array with the element repeated count times
  • Negative counts are treated as 0, returning an empty array
  • Returns null if count is null

Examples:

SELECT array_repeat('hello', 3);
-- Spark returns: ["hello", "hello", "hello"]

SELECT array_repeat('test', 0);
-- Spark returns: []

SELECT array_repeat('item', -1);
-- Spark returns: [] (negative count treated as 0)

SELECT array_repeat('test', null);
-- Spark returns: null

Current Comet Behavior

Comet uses DataFusion's array_repeat function. The specific behavior for negative counts may differ:

  • DataFusion may throw an error for negative counts
  • Or DataFusion may have different behavior

Tests

The test suite includes:

checkSparkAnswerAndOperator(sql("SELECT array_repeat(_4, 0) from t1"))

But negative count tests are not visible in the current test file.

Possible Solutions

  1. Verify actual behavior - test array_repeat(x, -1) in both Spark and Comet
  2. Pre-processing - wrap the count with GREATEST(count, 0) to treat negative as 0
  3. Custom Rust implementation that handles negative counts like Spark

Note: This issue was generated with AI assistance.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions