Allow passing empty string as match argument to String.replace/4#6559
Conversation
|
Thank you @dkarter! Does any of the other languages do the interspersing behaviour? The reason I am asking you is because those results are seemingly inconsistent. For example: Or if I were to define a property, it would be written like this: check all string <- string(:ascii) do
assert String.replace(string, string, "x") == "x"
assert String.replace_leading(string, string, "x") == "x"
assert String.replace_trailing(string, string, "x") == "x"
endAnd that seems to be true on all cases, unless the string is empty. If we think in properties, than we would conclude that if an empty string is given to replace_leading and replace_trailing, then it should always prepend or append the replacement. This is what replace_suffix and replace_prefix does and it would also reflect this property: check all left <- string(:ascii),
right <- string(:ascii),
left != right do
assert String.replace_leading(left <> right, left, "x") == "x" <> right
assert String.replace_trailing(left <> right, right, "x") == left <> "x"
endAnd again, this is always true except for the empty string in the chosen implementation. |
|
I am inclined to say this is not a good idea. If we look at this from a properties point of view, one property of check all str1 <- string(),
str2 <- string(),
substr1 <- substring(str1) do
assert String.contains?(String.replace(str1, substr1, str2), str2)
endThis property should hold, but it doesn't if This is only one way of looking at the problem so I might be looking at it from the wrong angle here. |
b9105d9 to
96d3964
Compare
|
@josevalim Thank you for your input. Very valid points. To answer your first question: yes - In my tests I have found Ruby, Rust, Python and Java to support the interspersing functionality, SQL, Swift and PHP support replacement with empty string but do not intresperse. I've adjusted the code for Although you mentioned property based testing becoming part of Elixir 1.6 in ElixirConf I am not sure how to run those on @whatyouhide Thank you for reviewing the code. I am not entirely sure what you mean here:
Particularly I do not understand the "infinite" replacement statement. The code as it is in this patch will return a string containing Can you please provide an example of an input and what you expect the output would be vs what it is using this patch? That would be very helpful for me. |
|
I believe there are two discussions here. One about replace and another about replace_leading/replace_trailing. Shall we make two separate issues and start again? Currently we have an inconsistency when comparing replace with the suffix and prefix ones. In particular, should this property hold? check all string <- string(:ascii) do
assert String.replace(string, string, "x") == "x"
assert String.replace_prefix(string, string, "x") == "x"
assert String.replace_suffix(string, string, "x") == "x"
endIn my opinion we should be consistent, so we need to either stop supporting supporting empty strings on prefix and suffix, or support them altogether. The replace_suffix and replace_prefix is a bit more complicated because of the reasons stated. |
|
There's also another property to consider: if I replace some value in the string, it shouldn't be present in it anymore. Or in code: check all string <- string(:ascii), replace <- string(:ascii) do
# use ą as the replacement since it can't be generated by the ascii generators
replaced = String.replace(string, replace, "ą")
refute String.contains?(replaced, replace)
endThis does not hold with empty "replace" argument, since |
|
@michalmuskala I would say that property only holds if the |
This is a follow up to elixir-lang#6559. Add support for using empty string as a match to `String.replace_leading` and `String.replace_trailing` while matching behavior to `String.replace_prefix` and `String.replace_suffix`.
96d3964 to
14b786a
Compare
String.replace
|
@josevalim As per your suggestion, I have split the The main point I hope to get across is that using an exception for |
There was a problem hiding this comment.
I would rather implement this ourselves instead of dispatching to regex. Something like:
defp intersperse(subject, replacement) do
case next_grapheme(subject) do
{current, rest} -> [current, replacement | intersperse(rest, replacement)]
nil -> []
end
endAnd then you call IO.iodata_to_binary [replacement | intersperse(subject, replacement)] on this clause.
PS: I haven't tested this.
There was a problem hiding this comment.
I like your solution, next_grapheme is cool.
The tests did not pass with your solution when the global: false option is provided, so I had to add another pattern match:
assert String.replace("LICENSE","","x") == "xLxIxCxExNxSxEx"
assert String.replace("LICENSE","","x", global: true) == "xLxIxCxExNxSxEx"
assert String.replace("LICENSE","","x", global: false) == "xLICENSE"
# ^^^ did not passThis is what I ended up with if we are not going to call out to Regex.replace via my original solution:
def replace(subject, pattern, replacement, options \\ [])
def replace(subject, "", "", _), do: subject
def replace(subject, "", replacement, global: false), do: replacement <> subject
def replace(subject, "", replacement, _options) do
IO.iodata_to_binary [replacement | intersperse(subject, replacement)]
end
def replace(subject, pattern, replacement, options) when is_binary(replacement) do
if Regex.regex?(pattern) do
Regex.replace(pattern, subject, replacement, global: options[:global])
else
opts = translate_replace_options(options)
:binary.replace(subject, pattern, replacement, opts)
end
end
defp intersperse(subject, replacement) do
case next_grapheme(subject) do
{current, rest} -> [current, replacement | intersperse(rest, replacement)]
nil -> []
end
endThis passes all the tests, and although it is longer it is definitely more explicit. If that looks good to you I'll revise my commit. @josevalim
There was a problem hiding this comment.
Beautiful. We should not pattern match on keyword lists though, so it is probably better for you to do:
def replace(subject, "", replacement, options) do
if Keyword.get(option, :global, true) do
IO.iodata_to_binary [replacement | intersperse(subject, replacement)]
else
replacement <> subject
end
endor something like that.
There was a problem hiding this comment.
@josevalim I've committed the changes you requested and added some documentation. Please let me know if you have any other concerns or anything else you would like me to change. Thanks!
fb25625 to
4ad148c
Compare
This patch allows calling `String.replace` with an empty string as the
match argument.
This is particularly useful when the match expression is being passed
down as variable value that could be an empty string.
For example assume we are processing file names and trying to remove the
extension (get the basename - assuming Path.basename/2 function did not
exist):
Before patch:
```elixir
inputs = ["README.md", "package.json", "LICENSE"]
inputs
|> Enum.map(fn (file) ->
ext = Path.extname(file)
if ext == "" do
file
else
String.replace(file, ext, ""))
end
end)
```
After patch:
```elixir
inputs = ["README.md", "package.json", "LICENSE"]
inputs
|> Enum.map(&(String.replace(&1, Path.extname(&1), "")))
```
When empty string is provided for `match` and non-empty string is provided for `replacement`.
4ad148c to
d8683d6
Compare
| interspersed. If an empty string is provided as `replacement` the `subject` | ||
| will be returned: | ||
|
|
||
| iex> String.replace("ELIXIR", "", ".") |
There was a problem hiding this comment.
I'm wondering if it is not consistent with how split/2 sees empty strings inside:
String.split("ELIXIR", "")
#=> ["E", "L", "I", "X", "I", "R", ""]Should not String.replace("ELIXIR", "", ".") return "E.L.I.X.I.R." then? 🤔
There was a problem hiding this comment.
@lexmag that however does not mirror how replace_suffix and replace_prefix works today. In Rust we would get an empty string before and after, so I would rather say it is a bug on how split works.
There was a problem hiding this comment.
We can't change how split works because it is deeply grained into how split with regex works.
The other point to consider is that changing split may not be the way to go depending on how we want to write the properties. For example, this holds today:
check all string <- string(:unicode) do
assert length(String.split(string, "")) == String.length + 1
endBut if we rewrite split to have an empty string before and after, then it no longer holds. The main point here still holds though, replace_suffix, replace_prefix and replace are inconsistent.
There was a problem hiding this comment.
But if we rewrite split to have an empty string before and after, then it no longer holds.
If todays property holds, it does not mean that it is correct though; with fixed splice we would have + 2 instead of + 1, thus we still have a property.
Maybe we should change Regex.split/3 as well, considering that Regex.replace(~r//, "qwe", ".") sees an empty string before and returns ".q.w.e."?
There was a problem hiding this comment.
@lexmag We can generalize the property above:
check all string <- string(:unicode), count <- integer() do
duplicate = String.duplicate(string, count)
assert String.split(duplicate, string)) == count + 1
endThe only way the property above holds is if the previous property also holds. We can generally say that the number of results in split is the number of occurrences of the pattern + 1. Adding an empty string before and after makes no sense, I am afraid.
There was a problem hiding this comment.
Btw, great catch regarding Regex.replace. That puts me clearly in the camp of merging this PR. Split and replace will have the inconsistency we have been debating but this inconsistency exists today and I would prefer to make all replacements consistent between them and all splits consistent between them.
String.replace|
It would really help me review this if we could decide on a set of properties that |
|
@whatyouhide I would like to see this being consistent: check all string <- string(:ascii) do
assert String.replace(string, string, "x") == "x"
assert String.replace_suffix(string, string, "x") == "x"
assert String.replace_prefix(string, string, "x") == "x"
end |
|
@josevalim can we generalize it to say check all string <- string(:ascii),
replacement <- string(:ascii) do
assert String.replace(string, string, replacement) == replacement
# same for _prefix and _suffix
end? |
|
@whatyouhide it doesn't matter in my opinion... so whatever you prefer. |
| "a[,,]b[,,]c" | ||
|
|
||
| When an empty string is provided as a `pattern`, the function will treat it as | ||
| an implicit empty string between each character and the string will be |
There was a problem hiding this comment.
Should we say codepoint instead of character here?
There was a problem hiding this comment.
IMO "character" is alright, but if we want to be more specific it should be "grapheme", not "codepoint".
There was a problem hiding this comment.
As I see we use "character" in other places.
There was a problem hiding this comment.
I guess we need to go over String docs since our use of "character" is inconsistent: usually it stands for grapheme, sometimes—codepoint. 🤔
|
❤️ 💚 💙 💛 💜 |
Summary
This patch allows calling
String.replacewith an empty string as the match argument.Reason for patch
When I found out that passing an empty string as a match argument to
replacethrows anArgumentErrorI was surprised by this unexpected behavior.Many languages support this feature which allows supplying match arguments coming from dynamic sources.
For example assume we are processing file names and trying to remove the
extension (assuming
Path.basename/2function did not exist):Before patch:
To fix it we would have to transform the code to something like this:
After patch:
Examples from other languages
Ruby
JavaScript
Go
Rust
Java
Lua
SQL
Swift
Objective C
Using empty string match for interspersing
This patch also adds support for interspersing every character using the empty string as match - as in many other languages (Ruby, Rust, Java, Python, Lua...):
Ruby
After this patch:
Thank you for your consideration and time.
Edit: added more languages as examples