Fix wiki-links with non-ascii chars being broken sometimes

Normalize unicode filenames to NFC Ref: - #611 - #419 Resolves #611
srid · May 6, 2021 · 39597ae · 39597ae
1 parent 4a327f5
commit 39597ae
Show file tree

Hide file tree

Showing 4 changed files with 15 additions and 4 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -38,6 +38,7 @@
   - Fix a bug where folgezettel relationship is not established if a note also has non-folgezettel links to the same target
 - Clean HTML output when zettels are deleted (#141)
 - Added '§' character in whitelist (#595)
+- Normalize unicode filenames to NFC, fixing broken wiki links.
 
 ## Unreleased (v1 + v2)
 

diff --git a/doc/Guide/Zettel ID.md b/doc/Guide/Zettel ID.md
@@ -2,7 +2,9 @@
 slug: id
 ---
 
-A Zettel ID is a [[Zettel Markdown]] file's filename without the extension. Zettel IDs must be unique across the Zettelkasten.
+A Zettel ID is a [[Zettel Markdown]] file's filename[^unicode] without the extension. Zettel IDs must be unique across the Zettelkasten.
+
+[^unicode]: Neuron will [NFC normalize](https://www.unicode.org/faq/normalization.html) the Zettel ID derived from filename or link so that they work reliably when using non-ascii characters in filename or links (see [[Linking]]).
 
 By default, `neuron new`[^new] will use random alphanumeric IDs of length 8, called a "random ID". But you may use arbitrary text as ID as well, called a "title ID".
 

diff --git a/neuron.cabal b/neuron.cabal
@@ -1,6 +1,6 @@
 cabal-version: 2.4
 name: neuron
-version: 1.9.27.3
+version: 1.9.28.0
 license: AGPL-3.0-only
 copyright: 2020 Sridhar Ratnakumar
 maintainer: [email protected]
@@ -84,6 +84,7 @@ common library-common
     text,
     time,
     timeit,
+    unicode-transforms,
     unix,
     uri-encode,
     uuid,

diff --git a/src/Neuron/Zettelkasten/ID.hs b/src/Neuron/Zettelkasten/ID.hs
@@ -5,6 +5,7 @@
 {-# LANGUAGE OverloadedStrings #-}
 {-# LANGUAGE ScopedTypeVariables #-}
 {-# LANGUAGE TypeApplications #-}
+{-# LANGUAGE ViewPatterns #-}
 {-# LANGUAGE NoImplicitPrelude #-}
 
 module Neuron.Zettelkasten.ID
@@ -29,6 +30,7 @@ import Data.Aeson
     ToJSONKey (toJSONKey),
   )
 import Data.Aeson.Types (toJSONKeyText)
+import qualified Data.Text.Normalize as UT
 import Relude hiding (traceShowId)
 import System.FilePath (splitExtension, takeFileName)
 import qualified Text.Megaparsec as M
@@ -110,6 +112,11 @@ idParser' cs = do
 -- | Parse the ZettelID if the given filepath is a Markdown zettel.
 getZettelID :: FilePath -> Maybe ZettelID
 getZettelID fp = do
-  let (fileName, ext) = splitExtension $ takeFileName fp
+  let ( -- Apply unicode normalization per https://github.com/srid/neuron/issues/611
+        UT.normalize UT.NFC . toText ->
+          fileName,
+        ext
+        ) =
+          splitExtension $ takeFileName fp
   guard $ ".md" == toText ext
-  rightToMaybe $ parseZettelID (toText fileName)
+  rightToMaybe $ parseZettelID fileName