Allow passing empty string as match argument to String.replace/4 by dkarter · Pull Request #6559 · elixir-lang/elixir

dkarter · 2017-09-10T22:52:18Z

Summary

This patch allows calling String.replace with an empty string as the match argument.

Reason for patch

When I found out that passing an empty string as a match argument to replace throws an ArgumentError I was surprised by this unexpected behavior.

Many languages support this feature which allows supplying match arguments coming from dynamic sources.

For example assume we are processing file names and trying to remove the
extension (assuming Path.basename/2 function did not exist):

Before patch:

inputs = ["README.md", "package.json", "LICENSE"]

inputs
|> Enum.map(&(String.replace(&1, Path.extname(&1), "")))
# ** (ArgumentError) argument error
#    (stdlib) binary.erl:275: :binary.replace/4
#    (elixir) lib/enum.ex:1255: Enum."-map/2-lists^map/1-0-"/2
#    (elixir) lib/enum.ex:1255: Enum."-map/2-lists^map/1-0-"/2

To fix it we would have to transform the code to something like this:

inputs = ["README.md", "package.json", "LICENSE"]
inputs
|> Enum.map(fn (file) ->
  ext = Path.extname(file)
  if ext == "" do
    file
  else
    String.replace(file, ext, ""))
  end
end)

# => ["README", "package", "LICENSE"]

After patch:

inputs = ["README.md", "package.json", "LICENSE"]

inputs
|> Enum.map(&(String.replace(&1, Path.extname(&1), "")))

# => ["README", "package", "LICENSE"]

Examples from other languages

Ruby

'LICENSE'.gsub('','')
# => 'LICENSE'

JavaScript

'LICENSE'.replace('','')
// => 'LICENSE'

Go

fmt.Println(strings.Replace("LICENSE", "", "", -1))
// => "LICENSE"

Rust

fn main() {
    let result = str::replace("LICENSE", "", "");
    println!("{}", result); // => "LICENSE"
}

Java

System.out.println("LICENSE".replace("",""));
// => "LICENSE"

Lua

str = "LICENSE"
res = str:gsub("", "")

io.write(res, "\n")
-- "LICENSE"

SQL

select replace('LICENSE', '', '');
-- LICENSE

Swift

"LICENSE".replacingOccurrences(of: "", with: "") // "LICENSE"

Objective C

NSString *str = [@"LICENSE" stringByReplacingOccurrencesOfString:@"" withString:@""];
NSLog (str);

Using empty string match for interspersing

This patch also adds support for interspersing every character using the empty string as match - as in many other languages (Ruby, Rust, Java, Python, Lua...):

Ruby

'LICENSE'.gsub('','x')
# => 'xLxIxCxExNxSxEx'

After this patch:

String.replace("LICENSE","","x") 
# => "xLxIxCxExNxSxEx"

Thank you for your consideration and time.

Edit: added more languages as examples

josevalim · 2017-09-10T23:22:56Z

Thank you @dkarter! Does any of the other languages do the interspersing behaviour? The reason I am asking you is because those results are seemingly inconsistent.

For example:

String.replace("", "", "x") #=> "x"
String.replace_leading("", "", "x") #=> ""

String.replace("a", "a", "x") #=> "x"
String.replace_leading("a", "a", "x") #=> "x"

Or if I were to define a property, it would be written like this:

check all string <- string(:ascii) do
  assert String.replace(string, string, "x") == "x"
  assert String.replace_leading(string, string, "x") == "x"
  assert String.replace_trailing(string, string, "x") == "x"
end

And that seems to be true on all cases, unless the string is empty.

If we think in properties, than we would conclude that if an empty string is given to replace_leading and replace_trailing, then it should always prepend or append the replacement. This is what replace_suffix and replace_prefix does and it would also reflect this property:

check all left <- string(:ascii),
          right <- string(:ascii),
          left != right do
  assert String.replace_leading(left <> right, left, "x") == "x" <> right
  assert String.replace_trailing(left <> right, right, "x") == left <> "x"
end

And again, this is always true except for the empty string in the chosen implementation.

whatyouhide · 2017-09-10T23:25:01Z

I am inclined to say this is not a good idea. If we look at this from a properties point of view, one property of String.replace/3 could be that

check all str1 <- string(),
          str2 <- string(),
          substr1 <- substring(str1) do
  assert String.contains?(String.replace(str1, substr1, str2), str2)
end

This property should hold, but it doesn't if substr1 == "" because we don't replace it with str2. After all, since String.contains?(any_string, "") == true, the "correct" behaviour for String.replace(any_string, "", replacement) would mean getting "infinite" replacements inside any_string. Does this make sense?

This is only one way of looking at the problem so I might be looking at it from the wrong angle here.

dkarter · 2017-09-11T00:37:53Z

@josevalim Thank you for your input. Very valid points.

To answer your first question: yes - In my tests I have found Ruby, Rust, Python and Java to support the interspersing functionality, SQL, Swift and PHP support replacement with empty string but do not intresperse.

I've adjusted the code for replace_leading and replace_trailing to always prepend/append the replacement to the subject. I believe this should now pass the property tests you described.

Although you mentioned property based testing becoming part of Elixir 1.6 in ElixirConf I am not sure how to run those on master. I am assuming this is not ready yet, but if it is, are there any examples in this repo I can lean on to test my code?

@whatyouhide Thank you for reviewing the code. I am not entirely sure what you mean here:

This property should hold, but it doesn't if substr1 == "" because we don't replace it with str2. After all, since String.contains?(any_string, "") == true, the "correct" behaviour for String.replace(any_string, "", replacement) would mean getting "infinite" replacements inside any_string. Does this make sense?

Particularly I do not understand the "infinite" replacement statement.

The code as it is in this patch will return a string containing str2 if substr1 == "":

iex(8)> String.replace("", "", "foo")
"foo"
iex(9)> String.replace(".", "", "foo")
"foo.foo"
iex(10)> String.contains?(String.replace("", "", "foo"), "foo")
true
iex(11)> String.contains?(String.replace(".", "", "foo"), "foo")
true

Can you please provide an example of an input and what you expect the output would be vs what it is using this patch? That would be very helpful for me.

lexmag · 2017-09-11T08:03:58Z

@dkarter the PR that introduced raising behaviour has a discussion on what infinite empty string is: #5022.

josevalim · 2017-09-11T12:11:56Z

I believe there are two discussions here. One about replace and another about replace_leading/replace_trailing. Shall we make two separate issues and start again?

Currently we have an inconsistency when comparing replace with the suffix and prefix ones. In particular, should this property hold?

check all string <- string(:ascii) do
  assert String.replace(string, string, "x") == "x"
  assert String.replace_prefix(string, string, "x") == "x"
  assert String.replace_suffix(string, string, "x") == "x"
end

In my opinion we should be consistent, so we need to either stop supporting supporting empty strings on prefix and suffix, or support them altogether.

The replace_suffix and replace_prefix is a bit more complicated because of the reasons stated.

michalmuskala · 2017-09-11T12:24:58Z

There's also another property to consider: if I replace some value in the string, it shouldn't be present in it anymore. Or in code:

check all string <- string(:ascii), replace <- string(:ascii) do
  # use ą as the replacement since it can't be generated by the ascii generators
  replaced = String.replace(string, replace, "ą")
  refute String.contains?(replaced, replace)
end

This does not hold with empty "replace" argument, since String.contains?/2 considers an empty string to be always contained in other strings.

josevalim · 2017-09-11T12:41:25Z

@michalmuskala I would say that property only holds if the replace string is not in the replacement and "" is always in the replacement.

This is a follow up to elixir-lang#6559. Add support for using empty string as a match to `String.replace_leading` and `String.replace_trailing` while matching behavior to `String.replace_prefix` and `String.replace_suffix`.

dkarter · 2017-09-11T17:18:31Z

@josevalim As per your suggestion, I have split the String.replace_trailing and String.replace_leading out into it's own PR (#6563), and this PR is now solely focused on String.replace. I hope this makes it easier.

The main point I hope to get across is that using an exception for String.replace leads to more verbose code and potentially unexpected behavior during runtime if a dynamic value is used for match. That argument is strengthened by the fact that there are plenty of precedents in most popular programming languages to support empty string as a match argument to replace.

josevalim · 2017-09-11T20:23:23Z

I would rather implement this ourselves instead of dispatching to regex. Something like:

defp intersperse(subject, replacement) do case next_grapheme(subject) do {current, rest} -> [current, replacement | intersperse(rest, replacement)] nil -> [] end end

And then you call IO.iodata_to_binary [replacement | intersperse(subject, replacement)] on this clause.

PS: I haven't tested this.

I like your solution, next_grapheme is cool.

The tests did not pass with your solution when the global: false option is provided, so I had to add another pattern match:

assert String.replace("LICENSE","","x") == "xLxIxCxExNxSxEx" assert String.replace("LICENSE","","x", global: true) == "xLxIxCxExNxSxEx" assert String.replace("LICENSE","","x", global: false) == "xLICENSE" # ^^^ did not pass

This is what I ended up with if we are not going to call out to Regex.replace via my original solution:

def replace(subject, pattern, replacement, options \\ []) def replace(subject, "", "", _), do: subject def replace(subject, "", replacement, global: false), do: replacement <> subject def replace(subject, "", replacement, _options) do IO.iodata_to_binary [replacement | intersperse(subject, replacement)] end def replace(subject, pattern, replacement, options) when is_binary(replacement) do if Regex.regex?(pattern) do Regex.replace(pattern, subject, replacement, global: options[:global]) else opts = translate_replace_options(options) :binary.replace(subject, pattern, replacement, opts) end end defp intersperse(subject, replacement) do case next_grapheme(subject) do {current, rest} -> [current, replacement | intersperse(rest, replacement)] nil -> [] end end

This passes all the tests, and although it is longer it is definitely more explicit. If that looks good to you I'll revise my commit. @josevalim

Beautiful. We should not pattern match on keyword lists though, so it is probably better for you to do:

def replace(subject, "", replacement, options) do if Keyword.get(option, :global, true) do IO.iodata_to_binary [replacement | intersperse(subject, replacement)] else replacement <> subject end end

or something like that.

@josevalim I've committed the changes you requested and added some documentation. Please let me know if you have any other concerns or anything else you would like me to change. Thanks!

This patch allows calling `String.replace` with an empty string as the match argument. This is particularly useful when the match expression is being passed down as variable value that could be an empty string. For example assume we are processing file names and trying to remove the extension (get the basename - assuming Path.basename/2 function did not exist): Before patch: ```elixir inputs = ["README.md", "package.json", "LICENSE"] inputs |> Enum.map(fn (file) -> ext = Path.extname(file) if ext == "" do file else String.replace(file, ext, "")) end end) ``` After patch: ```elixir inputs = ["README.md", "package.json", "LICENSE"] inputs |> Enum.map(&(String.replace(&1, Path.extname(&1), ""))) ```

When empty string is provided for `match` and non-empty string is provided for `replacement`.

lexmag · 2017-09-12T10:48:53Z

+  interspersed. If an empty string is provided as `replacement` the `subject`
+  will be returned:
+
+      iex> String.replace("ELIXIR", "", ".")


I'm wondering if it is not consistent with how split/2 sees empty strings inside:

String.split("ELIXIR", "") #=> ["E", "L", "I", "X", "I", "R", ""]

Should not String.replace("ELIXIR", "", ".") return "E.L.I.X.I.R." then? 🤔

@lexmag that however does not mirror how replace_suffix and replace_prefix works today. In Rust we would get an empty string before and after, so I would rather say it is a bug on how split works.

@josevalim great.

We can't change how split works because it is deeply grained into how split with regex works.

The other point to consider is that changing split may not be the way to go depending on how we want to write the properties. For example, this holds today:

check all string <- string(:unicode) do assert length(String.split(string, "")) == String.length + 1 end

But if we rewrite split to have an empty string before and after, then it no longer holds. The main point here still holds though, replace_suffix, replace_prefix and replace are inconsistent.

But if we rewrite split to have an empty string before and after, then it no longer holds.

If todays property holds, it does not mean that it is correct though; with fixed splice we would have + 2 instead of + 1, thus we still have a property.
Maybe we should change Regex.split/3 as well, considering that Regex.replace(~r//, "qwe", ".") sees an empty string before and returns ".q.w.e."?

@lexmag We can generalize the property above:

check all string <- string(:unicode), count <- integer() do duplicate = String.duplicate(string, count) assert String.split(duplicate, string)) == count + 1 end

The only way the property above holds is if the previous property also holds. We can generally say that the number of results in split is the number of occurrences of the pattern + 1. Adding an empty string before and after makes no sense, I am afraid.

Btw, great catch regarding Regex.replace. That puts me clearly in the camp of merging this PR. Split and replace will have the inconsistency we have been debating but this inconsistency exists today and I would prefer to make all replacements consistent between them and all splits consistent between them.

whatyouhide · 2017-09-12T11:55:54Z

It would really help me review this if we could decide on a set of properties that replace should hold, write them down, and run them quickly to see if it holds them before or after this PR. What do you folks think?

josevalim · 2017-09-12T12:19:29Z

@whatyouhide I would like to see this being consistent:

check all string <- string(:ascii) do
  assert String.replace(string, string, "x") == "x"
  assert String.replace_suffix(string, string, "x") == "x"
  assert String.replace_prefix(string, string, "x") == "x"
end

whatyouhide · 2017-09-12T12:20:58Z

@josevalim can we generalize it to say

check all string <- string(:ascii),
          replacement <- string(:ascii) do
  assert String.replace(string, string, replacement) == replacement
  # same for _prefix and _suffix
end

?

josevalim · 2017-09-12T12:21:47Z

@whatyouhide it doesn't matter in my opinion... so whatever you prefer.

whatyouhide · 2017-09-12T12:58:36Z

      "a[,,]b[,,]c"

+  When an empty string is provided as a `pattern`, the function will treat it as
+  an implicit empty string between each character and the string will be


Should we say codepoint instead of character here?

IMO "character" is alright, but if we want to be more specific it should be "grapheme", not "codepoint".

As I see we use "character" in other places.

I guess we need to go over String docs since our use of "character" is inconsistent: usually it stands for grapheme, sometimes—codepoint. 🤔

We should use grapheme.

josevalim · 2017-09-12T13:21:02Z

❤️ 💚 💙 💛 💜

dkarter force-pushed the allow-calling-replace-with-empty-string branch from b9105d9 to 96d3964 Compare September 11, 2017 00:15

dkarter mentioned this pull request Sep 11, 2017

String.replace_trailing/leading match empty string #6563

Closed

dkarter force-pushed the allow-calling-replace-with-empty-string branch from 96d3964 to 14b786a Compare September 11, 2017 16:58

dkarter changed the title ~~Allow passing empty string to replace functions~~ Allow passing empty string as match argument String.replace Sep 11, 2017

josevalim reviewed Sep 11, 2017

View reviewed changes

dkarter force-pushed the allow-calling-replace-with-empty-string branch from fb25625 to 4ad148c Compare September 11, 2017 22:14

Dorian Karter added 3 commits September 11, 2017 17:17

Intersperse explicitly on String.replace

d5f51dd

When empty string is provided for `match` and non-empty string is provided for `replacement`.

Document interspersing using String.replace

d8683d6

dkarter force-pushed the allow-calling-replace-with-empty-string branch from 4ad148c to d8683d6 Compare September 11, 2017 22:18

Update CHANGELOG.md

9224cc0

josevalim approved these changes Sep 12, 2017

View reviewed changes

lexmag reviewed Sep 12, 2017

View reviewed changes

lexmag changed the title ~~Allow passing empty string as match argument String.replace~~ Allow passing empty string as match argument String.replace/4 Sep 12, 2017

josevalim mentioned this pull request Sep 12, 2017

Make string split with an empty string consistent with replace #6566

Closed

whatyouhide changed the title ~~Allow passing empty string as match argument String.replace/4~~ Allow passing empty string as match argument to String.replace/4 Sep 12, 2017

whatyouhide reviewed Sep 12, 2017

View reviewed changes

lexmag approved these changes Sep 12, 2017

View reviewed changes

Update string.ex

8911886

josevalim merged commit 63f1d91 into elixir-lang:master Sep 12, 2017

Conversation

dkarter commented Sep 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reason for patch

Examples from other languages

Ruby

JavaScript

Go

Rust

Java

Lua

SQL

Swift

Objective C

Using empty string match for interspersing

Ruby

Uh oh!

josevalim commented Sep 10, 2017

Uh oh!

whatyouhide commented Sep 10, 2017

Uh oh!

dkarter commented Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lexmag commented Sep 11, 2017

Uh oh!

josevalim commented Sep 11, 2017

Uh oh!

michalmuskala commented Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Sep 11, 2017

Uh oh!

dkarter commented Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

whatyouhide commented Sep 12, 2017

Uh oh!

josevalim commented Sep 12, 2017

Uh oh!

whatyouhide commented Sep 12, 2017

Uh oh!

josevalim commented Sep 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lexmag Sep 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lexmag Sep 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lexmag Sep 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

dkarter commented Sep 10, 2017 •

edited

Loading

dkarter commented Sep 11, 2017 •

edited

Loading

michalmuskala commented Sep 11, 2017 •

edited

Loading

dkarter commented Sep 11, 2017 •

edited

Loading

lexmag Sep 12, 2017 •

edited

Loading

lexmag Sep 12, 2017 •

edited

Loading

lexmag Sep 12, 2017 •

edited

Loading