-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add documentation for the String.iterator
API
#3300
chore: add documentation for the String.iterator
API
#3300
Conversation
String.iterator
API
src/Init/Data/String/Basic.lean
Outdated
/-- The *full* string the iterator is for. -/ | ||
def toString : Iterator → String | ||
| ⟨s, _⟩ => s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm puzzled by how we have both this and Iterator.s
; that is, I think this is the same as
abbrev toString := Iterator.s
Should the docstring explain which one is preferred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for forward
and nextn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more places where there are duplicate definitions for inherit_doc
Mathlib CI status (docs):
|
src/Init/Data/String/Basic.lean
Outdated
This position is not necessarily legal for the string, for instance if one keeps calling | ||
`Iterator.next` when `Iterator.atEnd` is true. If the position is not a legal one, then the | ||
current character is `(default : Char)`, similar to `String.get` on an illegal position. -/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This position is not necessarily legal for the string, for instance if one keeps calling | |
`Iterator.next` when `Iterator.atEnd` is true. If the position is not a legal one, then the | |
current character is `(default : Char)`, similar to `String.get` on an illegal position. -/ | |
This position is not necessarily legal for the string, for instance if one keeps calling | |
`Iterator.next` when `Iterator.atEnd` is true. If the position is not valid, then the | |
current character is `(default : Char)`, similar to `String.get` on an invalid position. -/ |
The other docs for strings refer to positions as "valid" or "invalid" rather than "legal" or "illegal" - I think it's good to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, it's still using the "legal" terminology...
src/Init/Data/String/Basic.lean
Outdated
This position is not necessarily legal for the string, for instance if one keeps calling | ||
`Iterator.next` when `Iterator.atEnd` is true. If the position is not a legal one, then the | ||
current character is `(default : Char)`, similar to `String.get` on an illegal position. -/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording here should discourage invalid positions a bit more strongly. There is no proof witness in the type asserting that the position is valid, but it should be unequivocally a logic bug to have such a misaligned iterator and all string functions should be allowed to have arbitrary behavior in such cases.
This position is not necessarily legal for the string, for instance if one keeps calling | |
`Iterator.next` when `Iterator.atEnd` is true. If the position is not a legal one, then the | |
current character is `(default : Char)`, similar to `String.get` on an illegal position. -/ | |
An iterator is *valid* if the position `i` is valid for the string `s`. Most operations | |
on iterators return garbage if the iterator is not valid. If you use the functions in the API here, | |
you should not have to deal with invalid iterators, with the notable exception of `Iterator.next`, | |
which produces an invalid iterator if the iterator is already at the end of the string. -/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you're saying but I find this a bit tricky. I think one expects "position i
is valid for string s
" to mean that i
is in-bounds, but then "".iter
is invalid, and so is "1".iter.next
although "1".iter.atEnd
is false.
So you mean that i
is valid for s
if i
is in-bounds or i
is the successor of the index of the last character in s
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A p : String.Pos
is valid for s : String
when 0 <= p <= s.endPos
(aka p
is in-bounds) and p
lies on a UTF8 byte boundary. This is defined in std. (In fact all of the validity claims in this file are proved in std, so it might be a good idea to double check there to ensure you make the right claim.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are two options of validity here: valid for character access, and valid for iterator comparison (which also allows atEnd
).
Maybe the C++ documentation has some terminology for this distinction that we can copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C++ a pointer is considered valid (not immediate UB) if it is within the allocation or one-past-the-end. I'd rather not borrow the one-past-the-end terminology though, it's very awkward and reinforces the idea that a pointer points to an element rather than a separation point "between characters". Std currently does not have any terminology for "valid and not at the end"; if necessary it would just say that as two things. String.get_of_valid
is not stated in a way that directly uses String.Pos.Valid
because it needs to know more specifically what is the list of characters on the right of the partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of string position validity should also be moved or copied to the String.Pos
docs. Also both should forward reference the String.Pos.Valid
/ String.Iterator.Valid
definitions in std.
Your last point is missing an is invalid
:
-- `Iterator.forward iter n`/`Iterator.nextn iter n` if `n` is strictly greater than the number of
+- `Iterator.forward iter n`/`Iterator.nextn iter n` is invalid if `n` is strictly greater than the number of
remaining characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Forgot to document the exceptions...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's no need to emphasize *at the end of the string*
here - this is an ordinary and expected state for an iterator to have.
How about this minor edit:
An iterator is *valid* if its current position `i` is valid for the string `s`, meaning `0 ≤ i ≤ s.endPos`
and `i` lies on a UTF8 byte boundary. If `i = s.endPos`, the iterator is at the end of the string.
Most operations on iterators return arbitrary values if the iterator is not valid. The functions in
the `String.Iterator` API should rule out the creation of invalid iterators, with two
exceptions:
- `Iterator.next iter` is invalid if `iter` is already at the end of the string (`iter.atEnd` is
true), and
- `Iterator.forward iter n`/`Iterator.nextn iter n` if `n` is strictly greater than the number of
remaining characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, with the link to std4 removed
src/Init/Data/String/Basic.lean
Outdated
- `Iterator.forward iter n`/`Iterator.nextn iter n` is invalid if `n` is strictly greater than the | ||
number of remaining characters. | ||
|
||
[valid]: https://leanprover-community.github.io/mathlib4_docs/Std/Data/String/Lemmas.html#String.Pos.Valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't link to external generated doc HTML from docstrings - this will be too fragile. We need to have notation for stable doc links, but we don't have it yet, so for now we should just have the names here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[valid]: https://leanprover-community.github.io/mathlib4_docs/Std/Data/String/Lemmas.html#String.Pos.Valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we use the same URL as the docs#String.Pos.Valid
linkifier, that is https://leanprover-community.github.io/mathlib4_docs/find/?pattern=String.Pos.Valid#doc , that is as good as we are going to get for the near term (it is designed to be a permalink), and removing the links does users a disservice. Moreover, if and when we get a stable doc link syntax, it won't work here anyway because this is a forward reference to a definition in another repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to have links to external domains and downstream projects in Lean - it makes everything too fragile. I totally get the concern here! But we should provide enough information for people to understand the validity of the iterator in these docstrings, not put it elsewhere, and I think we do that now. And this is largely your doing - thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, all doc-gen generated lean documentation is on the leanprover-community web site, for better or worse. If and when we start generating doc-gen for leanprover projects somewhere on the FRO website we can retarget these URLs if desired. (It will be much harder to re-add these links after removing them, compared to retargeting existing URLs.) This is a link from core to std, hardly "external". In fact, ideally users would not even need to know that half of the String API is in Init
(in core) and the other half is in Std
. Validity is an essential characterizing property of these types, and I think it is important to point to the definition itself by name. (If I could, I would put validity directly into the types themselves and save users the trouble of having to ever deal with invalid iterators. But this has its own issues.)
src/Init/Prelude.lean
Outdated
lies on a UTF8 byte boundary. This notion is properly introduced in [std's | ||
`String.Pos.Valid`][valid]. | ||
|
||
[valid]: https://leanprover-community.github.io/mathlib4_docs/Std/Data/String/Lemmas.html#String.Pos.Valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lies on a UTF8 byte boundary. This notion is properly introduced in [std's | |
`String.Pos.Valid`][valid]. | |
[valid]: https://leanprover-community.github.io/mathlib4_docs/Std/Data/String/Lemmas.html#String.Pos.Valid | |
lies on a UTF8 byte boundary. |
We shouldn't be linking to external URLs like this before we get a way to keep them working over time, even if it would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem ditto. We've been using docs permalinks on zulip for years.
I went ahead and resolved the two outstanding issues. Thanks again! |
Adds documentation to the
String.Iterator
API, mentored by @eric-wieser and @david-christiansen