-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filter_map
, dedup
and some variants to the stdlib
#2120
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -917,6 +917,181 @@ | |
xs | ||
|> std.array.length | ||
|> std.array.generate (fun i => f i (std.array.at i xs)), | ||
|
||
filter_map | ||
: forall a b. (a -> [| 'Some b, 'None |]) -> Array a -> Array b | ||
| doc m%" | ||
Applies a function to every element in the given array, filtering out | ||
`'None` results. `filter_map` combines `std.array.map` and | ||
`std.array.filter` in a single pass. | ||
|
||
# Examples | ||
|
||
```nickel | ||
["1", "hello", "2", "world"] | ||
|> std.array.filter_map (fun x => | ||
if std.string.is_match "^[+-]?\\d+$" x then | ||
'Some (std.string.to_number x) | ||
else | ||
'None | ||
) | ||
# => [ 1, 2 ] | ||
``` | ||
"% | ||
= fun f array => | ||
fold_left | ||
(fun acc x => | ||
f x | ||
|> match { | ||
'Some y => acc @ [y], | ||
'None => acc, | ||
} | ||
) | ||
[] | ||
array, | ||
|
||
dedup | ||
: Array Dyn -> Array Dyn | ||
| doc m%" | ||
Removes duplicates from an array. | ||
|
||
# Performance | ||
|
||
This function relies on equality and has a quadratic complexity in the | ||
length of the array. It might thus be slow for large arrays or if | ||
called repeatedly. Prefer `std.array.sort_dedup` or | ||
`std.array.hash_dedup` if you have efficiency concerns. | ||
|
||
# Examples | ||
|
||
```nickel multiline | ||
std.array.dedup [ 4, 2, 1, 3, 5, 2, 1, 4 ] | ||
# => [ 4, 2, 1, 3, 5 ] | ||
|
||
std.array.dedup [ "hello", "world", "hello" ] | ||
# => [ "hello", "world" ] | ||
``` | ||
"% | ||
= fun array => | ||
let length = %array/length% array in | ||
|
||
fold_left | ||
(fun acc x => | ||
if elem x acc then | ||
acc | ||
else | ||
acc @ [x] | ||
) | ||
[] | ||
array, | ||
|
||
sort_dedup | ||
: forall a. (a -> a -> [| 'Lesser, 'Equal, 'Greater |]) -> Array a -> Array a | ||
| doc m%" | ||
Sorts an array based on the provided comparison operator and removes | ||
duplicates. | ||
|
||
# Performance | ||
|
||
As opposed to `std.array.dedup`, this function has a better time | ||
complexity (`O(n*log(n))` where `n` is the size of the array), which | ||
should improve performance especially on large arrays. If you need to | ||
preserve the original order of the array, see `std.array.hash_dedup`. | ||
|
||
# Examples | ||
|
||
```nickel multiline | ||
std.array.sort_dedup std.number.compare [ 4, 2, 1, 3, 5, 2, 1, 4 ] | ||
# => [ 1, 2, 3, 4, 5 ] | ||
|
||
std.array.sort_dedup std.string.compare [ "world", "hello", "world" ] | ||
# => [ "hello", "world" ] | ||
``` | ||
"% | ||
= fun cmp array => | ||
let sorted = sort cmp array in | ||
let length = %array/length% sorted in | ||
|
||
let rec go = fun acc n => | ||
if n == length then | ||
acc | ||
else | ||
let x = %array/at% sorted n in | ||
|
||
# we would prefer to use the primitive equality here instead of | ||
# relying on `cmp` returning `'Equal`. The latter 1. requires that | ||
# the user-provided function is correct and 2. is more costly. | ||
# | ||
# Unfortunately, that would break parametricity and require to | ||
# substitute `a` for `Dyn` in the type of `sort_dedup`, which makes | ||
# it more annoying to use with otherwise statically typed comparison | ||
# function. So for now, we rely on the comparison function. | ||
let acc = | ||
if n == 0 | ||
|| cmp (%array/at% sorted (n - 1)) x != 'Equal then | ||
acc @ [x] | ||
else | ||
acc | ||
in | ||
|
||
go acc (n + 1) | ||
in | ||
|
||
go [] 0, | ||
|
||
hash_dedup | ||
: forall a. (a -> String) -> Array a -> Array a | ||
| doc m%" | ||
Removes duplicates from an array efficiently. | ||
|
||
# Hash function | ||
|
||
`hash_dedup` uses a "hash" function mapping array elements to strings. | ||
This function is used to efficiently check if an element has already | ||
been seen before (linear time amortized in the size of array versus | ||
quadratic for `std.array.dedup`). | ||
|
||
The hash function must separate distinct elements with very high | ||
probability. If two elements have the same hash, they will be considered | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find the mention of probability misleading, especially when talking about a "hash" function: in a hash table, having a low collision probability only matters for performance, but here you absolutely need two distinct elements to have different "hash"es or else you'll get the wrong answer. I wonder if it's worth baking in a notion of hashes and sets directly into nickel? It seems like almost as "core" a notion as equality, and we've had a feature request for sets for a while now... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right, it's a logic error to not separate unequal keys. In some sense the hash function is the notion of equality at hand. Maybe I shouldn't call this a hash function at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Though I don't have a better name in mind that would be discoverable as a function name (mathematically what we want is an injection For the motivation, this function is taken from a real use-case: this is how we check for duplicates |
||
equal and thus deduplicated. | ||
|
||
Although we call it a hash function, it doesn't have to produce good | ||
hash values: the output of this function is used internally to generate | ||
record field names, which are themselves properly hashed again by the | ||
Nickel runtime. | ||
|
||
# Examples | ||
|
||
```nickel multiline | ||
std.array.hash_dedup std.to_string [ 4, 2, 1, 3, 5, 2, 1, 4 ] | ||
# => [ 4, 2, 1, 3, 5 ] | ||
|
||
std.array.hash_dedup std.function.id [ "world", "hello", "world" ] | ||
# => [ "world", "hello" ] | ||
``` | ||
"% | ||
= fun hash array => | ||
let length = %array/length% array in | ||
|
||
let rec go = fun acc n => | ||
if n == length then | ||
acc | ||
else | ||
let x = %array/at% array n in | ||
let hashed = hash x in | ||
|
||
if %record/has_field% hashed acc.seen then | ||
go acc (n + 1) | ||
else | ||
let acc = { | ||
seen = %record/insert% hashed acc.seen null, | ||
result = acc.result @ [x], | ||
} | ||
in | ||
go acc (n + 1) | ||
in | ||
|
||
(go { seen = {}, result = [] } 0).result, | ||
}, | ||
|
||
contract = { | ||
|
@@ -4348,7 +4523,7 @@ | |
%contract/custom% (fun _label _value => 'Error { message = msg }), | ||
|
||
fail_with | ||
| String -> Dyn | ||
| String -> (forall a. a) | ||
| doc m%" | ||
Abort the evaluation with the given message. The error will be reported as | ||
a contract violation, as `fail_with` uses `std.FailWith` under the hood. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just brainstorming here, but what if we had
dedup_sorted : forall a. -> Array a -> Array a
instead (which would do what the innergo
function does, basically)? Then you'd use it likexs |> sort cmp |> dedup_sorted
, but it feels a bit more composable thansort_dedup
because if you know that your array is already sorted then you don't need to sort it again.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can factor out
dedup_sorted
. I would still keepsort_dedup
as a handy alias, though.