Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add documentation for the String.iterator API #3300

Merged

Conversation

AdrienChampion
Copy link
Contributor

Adds documentation to the String.Iterator API, mentored by @eric-wieser and @david-christiansen

@AdrienChampion AdrienChampion changed the title String iterator doc chore: add documentation for the String.iterator API Feb 10, 2024
Comment on lines 322 to 324
/-- The *full* string the iterator is for. -/
def toString : Iterator → String
| ⟨s, _⟩ => s
Copy link
Contributor

@eric-wieser eric-wieser Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm puzzled by how we have both this and Iterator.s; that is, I think this is the same as

abbrev toString := Iterator.s

Should the docstring explain which one is preferred?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for forward and nextn

Copy link
Contributor

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more places where there are duplicate definitions for inherit_doc

src/Init/Data/String/Basic.lean Outdated Show resolved Hide resolved
src/Init/Data/String/Basic.lean Outdated Show resolved Hide resolved
@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Feb 10, 2024
@leanprover-community-mathlib4-bot
Copy link
Collaborator

Mathlib CI status (docs):

  • ❗ Std CI can not be attempted yet, as the nightly-testing-2024-02-10 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-mathlib, Std CI should run now. (2024-02-10 16:24:53)

Comment on lines 301 to 303
This position is not necessarily legal for the string, for instance if one keeps calling
`Iterator.next` when `Iterator.atEnd` is true. If the position is not a legal one, then the
current character is `(default : Char)`, similar to `String.get` on an illegal position. -/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This position is not necessarily legal for the string, for instance if one keeps calling
`Iterator.next` when `Iterator.atEnd` is true. If the position is not a legal one, then the
current character is `(default : Char)`, similar to `String.get` on an illegal position. -/
This position is not necessarily legal for the string, for instance if one keeps calling
`Iterator.next` when `Iterator.atEnd` is true. If the position is not valid, then the
current character is `(default : Char)`, similar to `String.get` on an invalid position. -/

The other docs for strings refer to positions as "valid" or "invalid" rather than "legal" or "illegal" - I think it's good to be consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, it's still using the "legal" terminology...

Comment on lines 301 to 303
This position is not necessarily legal for the string, for instance if one keeps calling
`Iterator.next` when `Iterator.atEnd` is true. If the position is not a legal one, then the
current character is `(default : Char)`, similar to `String.get` on an illegal position. -/
Copy link
Collaborator

@digama0 digama0 Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wording here should discourage invalid positions a bit more strongly. There is no proof witness in the type asserting that the position is valid, but it should be unequivocally a logic bug to have such a misaligned iterator and all string functions should be allowed to have arbitrary behavior in such cases.

Suggested change
This position is not necessarily legal for the string, for instance if one keeps calling
`Iterator.next` when `Iterator.atEnd` is true. If the position is not a legal one, then the
current character is `(default : Char)`, similar to `String.get` on an illegal position. -/
An iterator is *valid* if the position `i` is valid for the string `s`. Most operations
on iterators return garbage if the iterator is not valid. If you use the functions in the API here,
you should not have to deal with invalid iterators, with the notable exception of `Iterator.next`,
which produces an invalid iterator if the iterator is already at the end of the string. -/

Copy link
Contributor Author

@AdrienChampion AdrienChampion Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you're saying but I find this a bit tricky. I think one expects "position i is valid for string s" to mean that i is in-bounds, but then "".iter is invalid, and so is "1".iter.next although "1".iter.atEnd is false.

So you mean that i is valid for s if i is in-bounds or i is the successor of the index of the last character in s, right?

Copy link
Collaborator

@digama0 digama0 Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A p : String.Pos is valid for s : String when 0 <= p <= s.endPos (aka p is in-bounds) and p lies on a UTF8 byte boundary. This is defined in std. (In fact all of the validity claims in this file are proved in std, so it might be a good idea to double check there to ensure you make the right claim.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are two options of validity here: valid for character access, and valid for iterator comparison (which also allows atEnd).

Maybe the C++ documentation has some terminology for this distinction that we can copy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++ a pointer is considered valid (not immediate UB) if it is within the allocation or one-past-the-end. I'd rather not borrow the one-past-the-end terminology though, it's very awkward and reinforces the idea that a pointer points to an element rather than a separation point "between characters". Std currently does not have any terminology for "valid and not at the end"; if necessary it would just say that as two things. String.get_of_valid is not stated in a way that directly uses String.Pos.Valid because it needs to know more specifically what is the list of characters on the right of the partition.

Copy link
Collaborator

@digama0 digama0 Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of string position validity should also be moved or copied to the String.Pos docs. Also both should forward reference the String.Pos.Valid / String.Iterator.Valid definitions in std.

Your last point is missing an is invalid:

-- `Iterator.forward iter n`/`Iterator.nextn iter n` if `n` is strictly greater than the number of
+- `Iterator.forward iter n`/`Iterator.nextn iter n` is invalid if `n` is strictly greater than the number of
   remaining characters.

Copy link
Contributor Author

@AdrienChampion AdrienChampion Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Forgot to document the exceptions...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's no need to emphasize *at the end of the string* here - this is an ordinary and expected state for an iterator to have.

How about this minor edit:

An iterator is *valid* if its current position `i` is valid for the string `s`, meaning `0 ≤ i ≤ s.endPos`
and `i` lies on a UTF8 byte boundary. If `i = s.endPos`, the iterator is at the end of the string.

Most operations on iterators return arbitrary values if the iterator is not valid. The functions in
the `String.Iterator` API should rule out the creation of invalid iterators, with two
exceptions:

- `Iterator.next iter` is invalid if `iter` is already at the end of the string (`iter.atEnd` is
  true), and
- `Iterator.forward iter n`/`Iterator.nextn iter n` if `n` is strictly greater than the number of
  remaining characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, with the link to std4 removed

- `Iterator.forward iter n`/`Iterator.nextn iter n` is invalid if `n` is strictly greater than the
number of remaining characters.

[valid]: https://leanprover-community.github.io/mathlib4_docs/Std/Data/String/Lemmas.html#String.Pos.Valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't link to external generated doc HTML from docstrings - this will be too fragile. We need to have notation for stable doc links, but we don't have it yet, so for now we should just have the names here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[valid]: https://leanprover-community.github.io/mathlib4_docs/Std/Data/String/Lemmas.html#String.Pos.Valid

Copy link
Collaborator

@digama0 digama0 Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we use the same URL as the docs#String.Pos.Valid linkifier, that is https://leanprover-community.github.io/mathlib4_docs/find/?pattern=String.Pos.Valid#doc , that is as good as we are going to get for the near term (it is designed to be a permalink), and removing the links does users a disservice. Moreover, if and when we get a stable doc link syntax, it won't work here anyway because this is a forward reference to a definition in another repository.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to have links to external domains and downstream projects in Lean - it makes everything too fragile. I totally get the concern here! But we should provide enough information for people to understand the validity of the iterator in these docstrings, not put it elsewhere, and I think we do that now. And this is largely your doing - thanks!

Copy link
Collaborator

@digama0 digama0 Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, all doc-gen generated lean documentation is on the leanprover-community web site, for better or worse. If and when we start generating doc-gen for leanprover projects somewhere on the FRO website we can retarget these URLs if desired. (It will be much harder to re-add these links after removing them, compared to retargeting existing URLs.) This is a link from core to std, hardly "external". In fact, ideally users would not even need to know that half of the String API is in Init (in core) and the other half is in Std. Validity is an essential characterizing property of these types, and I think it is important to point to the definition itself by name. (If I could, I would put validity directly into the types themselves and save users the trouble of having to ever deal with invalid iterators. But this has its own issues.)

Comment on lines 2378 to 2381
lies on a UTF8 byte boundary. This notion is properly introduced in [std's
`String.Pos.Valid`][valid].

[valid]: https://leanprover-community.github.io/mathlib4_docs/Std/Data/String/Lemmas.html#String.Pos.Valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lies on a UTF8 byte boundary. This notion is properly introduced in [std's
`String.Pos.Valid`][valid].
[valid]: https://leanprover-community.github.io/mathlib4_docs/Std/Data/String/Lemmas.html#String.Pos.Valid
lies on a UTF8 byte boundary.

We shouldn't be linking to external URLs like this before we get a way to keep them working over time, even if it would be nice.

Copy link
Collaborator

@digama0 digama0 Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem ditto. We've been using docs permalinks on zulip for years.

@david-christiansen david-christiansen added this pull request to the merge queue Feb 20, 2024
Merged via the queue into leanprover:master with commit a898aa1 Feb 20, 2024
9 checks passed
@david-christiansen
Copy link
Contributor

I went ahead and resolved the two outstanding issues. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants