Skip to content

Commit

Permalink
Merge pull request #457 from CloudCannon/fix/non-ascii-sub-results
Browse files Browse the repository at this point in the history
Fix sub results for headings containing non-ascii text
  • Loading branch information
bglw authored Sep 27, 2023
2 parents e7cd3f6 + 70a12c0 commit ca805fa
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 7 deletions.
61 changes: 55 additions & 6 deletions pagefind/features/edge_cases.feature
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,12 @@ Feature: Graceful Pagefind Errors
"""
Given I have a "public/ja/index.html" file with the content:
"""
<!DOCTYPE html>
<html lang="ja">
<body>
<p>Hello&nbsp;👋</p>
</body>
</html>
<!DOCTYPE html>
<html lang="ja">
<body>
<p>Hello&nbsp;👋</p>
</body>
</html>
"""
When I run my program
Then I should see "Running Pagefind" in stdout
Expand All @@ -113,3 +113,52 @@ Feature: Graceful Pagefind Errors
"""
Then There should be no logs
Then The selector "[data-url]" should contain "/ja/"

# Previously, headings that didn't match \w would be filtered out
Scenario: Pagefind multilingual sub-results
Given I have a "public/index.html" file with the content:
"""
<!DOCTYPE html>
<html lang="fa-IR" dir="rtl">
<body>
<p data-url>Nothing</p>
</body>
</html>
"""
Given I have a "public/test/index.html" file with the content:
"""
<!DOCTYPE html>
<html lang="fa-IR" dir="rtl">
<body>
<h1 id="_top">چامه - آصف آشنا</h1>
<p>هزار سال پس از ماجرای گمشدنت</p>
<h2 id="از">RTL ID</h2>
<p>از پیاله‌ای چای سیاه پررنگ</p>
<h2 id="rtl-content">بیرون نه می‌روی از من</h2>
<p>بیرون نه می‌روی از من</p>
</body>
</html>
"""
When I run my program
Then I should see "Running Pagefind" in stdout
Then I should see the file "public/pagefind/pagefind.js"
When I serve the "public" directory
When I load "/"
When I evaluate:
"""
async function() {
let pagefind = await import("/pagefind/pagefind.js");
let search = await pagefind.search("از");
let results = await Promise.all(search.results.map(r => r.data()));
let result = results[0];
let subs = result.sub_results.map(s => s.url).sort().join(', ');
document.querySelector('[data-url]').innerText = subs;
}
"""
Then There should be no logs
Then The selector "[data-url]" should contain "/test/#%D8%A7%D8%B2, /test/#_top, /test/#rtl-content"
2 changes: 1 addition & 1 deletion pagefind_web_js/lib/sub_results.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ export const calculate_sub_results = (
): PagefindSubResult[] => {
const anchors = fragment.anchors
.filter(
(a) => /h\d/i.test(a.element) && a.text?.length && /\w/.test(a.text)
(a) => /h\d/i.test(a.element) && a.text?.length && /\S/.test(a.text)
)
.sort((a, b) => a.location - b.location);
const results: PagefindSubResult[] = [];
Expand Down

0 comments on commit ca805fa

Please sign in to comment.