You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using the LMS and its predecessors for more than 15 years, currently version 9.0.1 under Windows 11 with the local music library on an NTFS hard drive.
I've missed some tracks or even entire albums already in the past, but only recently realized, that over 1,600 audio files could not be found in the LMS, more than 1% of the entire music library.
“Lost” audio files differ from others in that their filename contains Unicode characters consisting of 2 or 3 bytes. The LMS scanner “ignores” such files without logging an error or at least a warning.
A Java program determined the paths of all audio files and directories whose names contain characters with a (decimal) code greater than 256 and wrote them into a list, along with the codes of the Unicode characters.
The program found more than 200 different “bad” Unicode characters, for example FULLWIDTH COLON (U+FF1A, 65306), HYPHEN (U+2010, 8208), Fullwidth Solidus (U+FF0F, 65295), Combining Circumflex Accent (U+0302, 770), all Cyrillic characters starting with Cyrillic Small Letter A (U+1072, 1072).
Audio files (also .cue) with such characters in the name can be played without any problems in player apps such as Foobar.
How did files and directories with “bad” Unicode characters get into my music library? Most likely with the metadata retrieved from the CDDB when ripping CDs in the Exact Audio Copy tool.
The issue with unicode characters has been known for a long time, as pointed out by Bug Fix “#2475 - Problems with filenames containing non-current-codepage (foreign language, double byte) characters” in Changelog6 of version 6.5.4 – 2007-08-15, but has never been fixed, as my tests with various old versions back to 6.5 showed.
This suggests that the cause of the error lies in Perl, in the way Perl reads directory trees.
The opendir / readdir functions in PERL show exactly this incorrect behavior, see “Scanning directories with Perl” https://www.ralph-schuster.eu/2007/11/27/scanning-directories-with-perl/.
But perhaps only installations under Windows with an NTFS file system are affected?
In https://www.perlmonks.org/?node_id=11149351 a function WinReadDir() is presented as a mixture of Perl and JavaScript, which reads files with Unicode file names without errors. Maybie a code like this could be integrated?
What happens in the LMS when scanning a file that has unusual Unicode characters in the name, e.g. “08. Die Toten Hosen – Industrie-Girls.flac”, where there is no hyphen between industry and girls, but rather a hyphen?
[25-01-22 16:01:28.1403] main::main (213) Starting Lyrion Music Server scanner (v9.0.1, 1736238071, Thu Jan 9 17:14:13 CUT 2025) perl 5.032001
...
[25-01-22 16:01:28.4504] Slim::Music::Import::runImporter (581) Starting Slim::Media::MediaFolderScan scan
[25-01-22 16:01:28.4507] Slim::Media::MediaFolderScan::startScan (62) Starting audio-only scan in: ["g:\test_unicode"]
[25-01-22 16:01:28.4509] Slim::Utils::Scanner::Local::rescan (156) Rescanning g:\test_unicode
[25-01-22 16:01:28.4510] Slim::Utils::Scanner::Local::rescan (180) Discovering audio files in g:\test_unicode
...
[25-01-22 16:01:30.0406] Slim::Utils::Scanner::Local::Async::ANON (149) Found G:\test_unicode\Toten Hosen, Die(2012) Ballast der Republik\CD2\08. Die Toten Hosen – Industrie?Mädchen.flac
...
At this point already the HYPHEN (%E2%80%90) has mutated into a question mark (%3F), and accordingly a record with the URL
file:///G:/test_unicode/Toten%20Hosen,%20Die/(2012)%20Ballast%20der%20Republik/CD2/08.%20Die%20Toten%20Hosen%20-%20Industrie%3FM%E4dchen.flac
is written to the scanned_files table of library.db.
Of course, a check whether a file with this URL exists fails. Consequently, no entry is written to the tracks table and the audio file is lost for LMS.
By the way, I did not observe an exception as described in Issue #1256 “Hyphen in folder name makes scan crash”.
Proposal:
If the problem affects only a minority of users, or cannot be solved with reasonable effort, the scanner should at least log every audio file that is not taken into account (error or warning).
The text was updated successfully, but these errors were encountered:
I have been using the LMS and its predecessors for more than 15 years, currently version 9.0.1 under Windows 11 with the local music library on an NTFS hard drive.
I've missed some tracks or even entire albums already in the past, but only recently realized, that over 1,600 audio files could not be found in the LMS, more than 1% of the entire music library.
“Lost” audio files differ from others in that their filename contains Unicode characters consisting of 2 or 3 bytes. The LMS scanner “ignores” such files without logging an error or at least a warning.
A Java program determined the paths of all audio files and directories whose names contain characters with a (decimal) code greater than 256 and wrote them into a list, along with the codes of the Unicode characters.
The program found more than 200 different “bad” Unicode characters, for example FULLWIDTH COLON (U+FF1A, 65306), HYPHEN (U+2010, 8208), Fullwidth Solidus (U+FF0F, 65295), Combining Circumflex Accent (U+0302, 770), all Cyrillic characters starting with Cyrillic Small Letter A (U+1072, 1072).
Audio files (also .cue) with such characters in the name can be played without any problems in player apps such as Foobar.
How did files and directories with “bad” Unicode characters get into my music library? Most likely with the metadata retrieved from the CDDB when ripping CDs in the Exact Audio Copy tool.
The issue with unicode characters has been known for a long time, as pointed out by Bug Fix “#2475 - Problems with filenames containing non-current-codepage (foreign language, double byte) characters” in Changelog6 of version 6.5.4 – 2007-08-15, but has never been fixed, as my tests with various old versions back to 6.5 showed.
This suggests that the cause of the error lies in Perl, in the way Perl reads directory trees.
The opendir / readdir functions in PERL show exactly this incorrect behavior, see “Scanning directories with Perl” https://www.ralph-schuster.eu/2007/11/27/scanning-directories-with-perl/.
But perhaps only installations under Windows with an NTFS file system are affected?
In https://www.perlmonks.org/?node_id=11149351 a function WinReadDir() is presented as a mixture of Perl and JavaScript, which reads files with Unicode file names without errors. Maybie a code like this could be integrated?
What happens in the LMS when scanning a file that has unusual Unicode characters in the name, e.g. “08. Die Toten Hosen – Industrie-Girls.flac”, where there is no hyphen between industry and girls, but rather a hyphen?
[25-01-22 16:01:28.1403] main::main (213) Starting Lyrion Music Server scanner (v9.0.1, 1736238071, Thu Jan 9 17:14:13 CUT 2025) perl 5.032001
...
[25-01-22 16:01:28.4504] Slim::Music::Import::runImporter (581) Starting Slim::Media::MediaFolderScan scan
[25-01-22 16:01:28.4507] Slim::Media::MediaFolderScan::startScan (62) Starting audio-only scan in: ["g:\test_unicode"]
[25-01-22 16:01:28.4509] Slim::Utils::Scanner::Local::rescan (156) Rescanning g:\test_unicode
[25-01-22 16:01:28.4510] Slim::Utils::Scanner::Local::rescan (180) Discovering audio files in g:\test_unicode
...
[25-01-22 16:01:30.0406] Slim::Utils::Scanner::Local::Async::ANON (149) Found G:\test_unicode\Toten Hosen, Die(2012) Ballast der Republik\CD2\08. Die Toten Hosen – Industrie?Mädchen.flac
...
At this point already the HYPHEN (%E2%80%90) has mutated into a question mark (%3F), and accordingly a record with the URL
file:///G:/test_unicode/Toten%20Hosen,%20Die/(2012)%20Ballast%20der%20Republik/CD2/08.%20Die%20Toten%20Hosen%20-%20Industrie%3FM%E4dchen.flac
is written to the scanned_files table of library.db.
Of course, a check whether a file with this URL exists fails. Consequently, no entry is written to the tracks table and the audio file is lost for LMS.
By the way, I did not observe an exception as described in Issue #1256 “Hyphen in folder name makes scan crash”.
Proposal:
If the problem affects only a minority of users, or cannot be solved with reasonable effort, the scanner should at least log every audio file that is not taken into account (error or warning).
The text was updated successfully, but these errors were encountered: