Skip to content

Commit

Permalink
Improve the API script to handle | characters (#718)
Browse files Browse the repository at this point in the history
### Summary

This PR changes how we handle pipes inside math expressions when we
convert sphinx HTML into markdown. Some math expressions use pipes to
define quantum states using the Dirac notation, and we need to escape
those characters to avoid breaking the page when the pipe characters are
used inside a markdown table.

### Details

One solution could only handle the `|` characters used inside a table,
but given that the math expressions could be used in nested tags (e.g
`<td> <p> <span class="math"> SOME_EXPRESSION </span></p></td>`), and
would need to make the script more complex without ensuring we fix all
the cases where we could make the page to fail to render, I decided to
handle that character differently in all math expressions.

The PR replaces the `|` character with `\vert ` which will represent the
same character. `\vert` needs extra space at the end to handle cases
where the pipe was next to a non-numerical character. In those cases, we
should avoid converting `|x` to `\vertx` given that the latter is not a
valid command (`\vert x` is the correct conversion).

We also need to take into account that, in some cases, we still need to
use the `|` characters because when escaped (`\|`), it represents a
double pipe (`||`), which could be used in different mathematical
expressions like the length of a vector.

This is the regex used, which only matches pipe characters not preceded
by a backslash:
```ts
/(?<!\\)\|/gm
```

A new test was added to verify different cases where we can find a pipe
character. The tests checks `|` characters outside math expressions, in
math expressions inside a table, and in math expressions outside the
table. We can also find a case where we intentionally want to use `\|`
to create a double pipe. In the following screenshot, we can see the
rendered result of the test.


![test-example](https://github.com/Qiskit/documentation/assets/47946624/9603819c-4fb6-4d6a-997d-dc4fdcf1c3cd)

Closes #488
  • Loading branch information
arnaucasau authored Jan 30, 2024
1 parent 6d3a0ab commit f6d67a1
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 2 deletions.
45 changes: 43 additions & 2 deletions scripts/lib/api/htmlToMd.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1042,7 +1042,7 @@ bits.</p>
<div role='main'>
<div class="math notranslate nohighlight">
\\[\\begin{split}CCX q_0, q_1, q_2 =
I \\otimes I \\otimes |0 \\rangle \\langle 0| + CX \\otimes |1 \\rangle \\langle 1| =
I \\otimes I \\otimes \vert 0 \\rangle \\langle 0\vert + CX \\otimes \vert 1 \\rangle \\langle 1\vert =
\\begin{pmatrix}
1 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; 0\\\\
0 &amp; 1 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; 0\\\\
Expand All @@ -1061,7 +1061,7 @@ bits.</p>
).toMatchInlineSnapshot(`
"$$
\\begin{split}CCX q_0, q_1, q_2 =
I \\otimes I \\otimes |0 \\rangle \\langle 0| + CX \\otimes |1 \\rangle \\langle 1| =
I \\otimes I \\otimes \vert 0 \\rangle \\langle 0\vert + CX \\otimes \vert 1 \\rangle \\langle 1\vert =
\\begin{pmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\\\
Expand Down Expand Up @@ -1427,6 +1427,47 @@ test("test dt tag without id", async () => {
`);
});

test("test replacement of the pipe character for `\vert` on math expressions", async () => {
expect(
await toMd(`
<div role="main">
<p class="rubric">Methods</p>
<table class="longtable docutils align-default">
<colgroup>
<col style="width: 10%" />
<col style="width: 90%" />
</colgroup>
<tbody>
<tr class="row-odd"><td><a class="reference internal" href="#text-with-pipe" title="text with pipe"></td>
<td><p>This is an example of using the | character outside of a math expression</p></td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="#span-tag-math-expressions-with-pipe" title="(span tag) math expressions with pipe"></a></td>
<td><p>Example single pipe: <span class="math">\\(\\mathcal{Q}^k \\mathcal{A} |0\\rangle\\)</span>.</p></td>
<tr class="row-odd"><td><a class="reference internal" href="#span-tag-math-expressions-with-double-pipe" title="(span tag) math expressions with double pipe"></a></td>
<td><p>Example double pipe: The length of the vector x is <span class="math">\\(\\|x\\|_2\\)</span>.</p></td>
</tbody>
</table>
<p>This is a math expression outside the table: <div class="math">\\[\\mathcal{Q}^k \\mathcal{A} |0\\rangle\\]</div></p>
</div>
`),
).toMatchInlineSnapshot(`
"## Methods
| | |
| ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| [](#text-with-pipe "text with pipe") | This is an example of using the \\| character outside of a math expression |
| [](#span-tag-math-expressions-with-pipe "(span tag) math expressions with pipe") | Example single pipe: $\\mathcal{Q}^k \\mathcal{A} \\vert 0\\rangle$. |
| [](#span-tag-math-expressions-with-double-pipe "(span tag) math expressions with double pipe") | Example double pipe: The length of the vector x is $\\|x\\|_2$. |
This is a math expression outside the table:
$$
\\mathcal{Q}^k \\mathcal{A} \\vert 0\\rangle
$$
"
`);
});

async function toMd(html: string) {
return (
await sphinxHtmlToMarkdown({
Expand Down
6 changes: 6 additions & 0 deletions scripts/lib/api/htmlToMd.ts
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ async function generateMarkdownFile(
const sufix = "\\)";
if (value.startsWith(prefix) && value.endsWith(sufix)) {
value = value.substring(prefix.length, value.length - sufix.length);
// We need to replace the single `|` characters for `\vert ` to avoid page crashes when
// they are used inside a table. For more information: https://github.com/Qiskit/documentation/issues/488
value = value.replace(/(?<!\\)\|/gm, "\\vert ");
}
return { type: "inlineMath", value };
}
Expand All @@ -92,6 +95,9 @@ async function generateMarkdownFile(
const sufix = "\\]";
if (value.startsWith(prefix) && value.endsWith(sufix)) {
value = value.substring(prefix.length, value.length - sufix.length);
// We need to replace the single `|` characters for `\vert ` to avoid page crashes when
// they are used inside a table. For more information: https://github.com/Qiskit/documentation/issues/488
value = value.replace(/(?<!\\)\|/gm, "\\vert ");
}
return { type: "math", value };
}
Expand Down

0 comments on commit f6d67a1

Please sign in to comment.