Summary
array_repeat is marked as Incompatible in Comet, but the specific incompatibility is not documented. This issue tracks documenting and potentially fixing the behavior difference.
Spark Specification
According to Spark's array_repeat behavior:
- Returns an array with the element repeated
count times
- Negative counts are treated as 0, returning an empty array
- Returns null if count is null
Examples:
SELECT array_repeat('hello', 3);
-- Spark returns: ["hello", "hello", "hello"]
SELECT array_repeat('test', 0);
-- Spark returns: []
SELECT array_repeat('item', -1);
-- Spark returns: [] (negative count treated as 0)
SELECT array_repeat('test', null);
-- Spark returns: null
Current Comet Behavior
Comet uses DataFusion's array_repeat function. The specific behavior for negative counts may differ:
- DataFusion may throw an error for negative counts
- Or DataFusion may have different behavior
Tests
The test suite includes:
checkSparkAnswerAndOperator(sql("SELECT array_repeat(_4, 0) from t1"))
But negative count tests are not visible in the current test file.
Possible Solutions
- Verify actual behavior - test
array_repeat(x, -1) in both Spark and Comet
- Pre-processing - wrap the count with
GREATEST(count, 0) to treat negative as 0
- Custom Rust implementation that handles negative counts like Spark
Note: This issue was generated with AI assistance.
Summary
array_repeatis marked asIncompatiblein Comet, but the specific incompatibility is not documented. This issue tracks documenting and potentially fixing the behavior difference.Spark Specification
According to Spark's
array_repeatbehavior:counttimesExamples:
Current Comet Behavior
Comet uses DataFusion's
array_repeatfunction. The specific behavior for negative counts may differ:Tests
The test suite includes:
checkSparkAnswerAndOperator(sql("SELECT array_repeat(_4, 0) from t1"))But negative count tests are not visible in the current test file.
Possible Solutions
array_repeat(x, -1)in both Spark and CometGREATEST(count, 0)to treat negative as 0