Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
I am curious what edge cases we know about with regard to file and folder naming on various file systems in various operating systems, such as prohibited characters, leading and trailing characters, and other abortions.
This can also extend to arbitrary limitations within operating system software itself, such as Windows Explorer's arbitrary prohibition against renaming files with leading spaces or leading periods (perfectly legal by other means). Also maybe worth noting any common peculiarities exhibited by other software such as of handling filenames with multiple sequential spaces, or containing DOS device names (AUX CON NUL PRN CLOCK$ KEYBD$).
I encounter these periodically, and I'm kind of hoping to build a reference of all known quirks, no matter how quarky.
-----
[windows] The characters \ / : * ? " < > | and control characters 0-31 are forbidden in object names.
[windows] Objects cannot end with a space or a dot or sequences of dots and spaces. This is a hold over of 8.3 space padding and dot extensions.
[windows] Objects can begin with a space or a dot, but most applications disallow it.
[windows] Objects can contain multiple sequential spaces, but some applications may not be able to access them.
[ntfs] Only the NUL character is technically prohibited in object names. Windows and Linux both prohibit forward-slash (/).
[fat] The ^ character cannot be used in object names. (have not confirmed this. doesn't seem to apply to modern systems.)
[fat] The 0x05 character is/was used to denote a deleted object by replacing the first character of the object. (have not confirmed this)
[codepage] The Yen symbol (¥) translates to backslash (\) on Japanese language systems, and may introduce problems in WideCharToMultiByte() and other conversion functions.[ref]
[ntfs/windows] NTFS is POSIX.1 compliant with support for case sensitivity in Unix-based applications. The Win32 subsystem behaves strangely when it encounters multiple filenames of mixed case. This can lead to access issues and hijinks. (needs more testing)
[general] Filenames may contain isolated UCS surrogate code points U+D800..U+DFFF that do not produce a valid Unicode character. This could cause access issues in some software. [void][ref][ref]
This can also extend to arbitrary limitations within operating system software itself, such as Windows Explorer's arbitrary prohibition against renaming files with leading spaces or leading periods (perfectly legal by other means). Also maybe worth noting any common peculiarities exhibited by other software such as of handling filenames with multiple sequential spaces, or containing DOS device names (AUX CON NUL PRN CLOCK$ KEYBD$).
I encounter these periodically, and I'm kind of hoping to build a reference of all known quirks, no matter how quarky.
-----
[windows] The characters \ / : * ? " < > | and control characters 0-31 are forbidden in object names.
[windows] Objects cannot end with a space or a dot or sequences of dots and spaces. This is a hold over of 8.3 space padding and dot extensions.
[windows] Objects can begin with a space or a dot, but most applications disallow it.
[windows] Objects can contain multiple sequential spaces, but some applications may not be able to access them.
[ntfs] Only the NUL character is technically prohibited in object names. Windows and Linux both prohibit forward-slash (/).
[fat] The ^ character cannot be used in object names. (have not confirmed this. doesn't seem to apply to modern systems.)
[fat] The 0x05 character is/was used to denote a deleted object by replacing the first character of the object. (have not confirmed this)
[codepage] The Yen symbol (¥) translates to backslash (\) on Japanese language systems, and may introduce problems in WideCharToMultiByte() and other conversion functions.[ref]
[ntfs/windows] NTFS is POSIX.1 compliant with support for case sensitivity in Unix-based applications. The Win32 subsystem behaves strangely when it encounters multiple filenames of mixed case. This can lead to access issues and hijinks. (needs more testing)
[general] Filenames may contain isolated UCS surrogate code points U+D800..U+DFFF that do not produce a valid Unicode character. This could cause access issues in some software. [void][ref][ref]
Last edited by raccoon on Thu Feb 24, 2022 7:50 am, edited 3 times in total.
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
What characters are forbidden in Windows and Linux directory names?
Anything dealing with LFN, & even programs that do deal with LFN .
FastCopy has an edge case where filename+(1) > 255 (or whatever the number is) & instead of... doing something, the file gets duplicated in the program's ("installation") directory. (I think [it was that] the GUI will correctly return an error message, but the command-line version does do that.)
Windows file naming rules (& lengths) & what is allowed to be deleted to $Recycle.Bin, differ.
ADS (Alternate Data Streams) will always be an albatross.
(Edge case early on where Everything... what was it, created an ADS for a file rather then the operation it was supposed to do.)
Google, "Google Drive", plays by its own rules (regarding duplicates & more).
Differences between differing OS & what is allowed or not, & what particular programs may or may not do in that respect.
7-zip is good at averting or alleviating some of those issues, but some simply exist with no "easy or sure fire" method to work-around.
Anything dealing with LFN, & even programs that do deal with LFN .
FastCopy has an edge case where filename+(1) > 255 (or whatever the number is) & instead of... doing something, the file gets duplicated in the program's ("installation") directory. (I think [it was that] the GUI will correctly return an error message, but the command-line version does do that.)
Windows file naming rules (& lengths) & what is allowed to be deleted to $Recycle.Bin, differ.
ADS (Alternate Data Streams) will always be an albatross.
(Edge case early on where Everything... what was it, created an ADS for a file rather then the operation it was supposed to do.)
Google, "Google Drive", plays by its own rules (regarding duplicates & more).
Differences between differing OS & what is allowed or not, & what particular programs may or may not do in that respect.
7-zip is good at averting or alleviating some of those issues, but some simply exist with no "easy or sure fire" method to work-around.
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
Indeed. It's a big muddy mess. Which is why I intend to curate a list specifying the exact limitations of each behavior scenario.
I'm glad you've discovered FastCopy and have been enjoying your bug reports over there.
I'm glad you've discovered FastCopy and have been enjoying your bug reports over there.
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
, COM1 -COM9, LPT1 - LPT9.
And I vaguely remember CONOUT or something similar, but can't reproduce.
BTW: I am not familiar with KEYBD$
On NTFS filesystems, the following are blocked by the ntfs.sys filesystem driver (found by experimentation; there might be more):
$Extend, $AttDef, $BadClus, $Bitmap, $Boot, $Logfile, $MFT, $MFTMirr, $Secure, $UpCase, $Volume
The DOS device names also don't allow extensions (PRN.txt is not OK), whereas the 'NTFS ones' do allow the protected names with extensions ($MFT.txt is OK)
Fun fact: There used to be a bug in Windows, where you could execute something like dir C:\$MFT\abc in CMD and that would lock $MFT (yes, the Master File Table; the 'address book' of all files) for other access and that would hang your Windows.
Somewhere around Vista / Win7 period I think, but my memory is vague here.
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
Everything doesn't allow the following filenames:
con
prn
aux
nul
com0 .. com9
lpt0 .. lpt9
con.*
prn.*
aux.*
nul.*
com0.* .. com9.*
lpt0.* .. lpt9.*
.
..
Naming Files, Paths, and Namespaces
con
prn
aux
nul
com0 .. com9
lpt0 .. lpt9
con.*
prn.*
aux.*
nul.*
com0.* .. com9.*
lpt0.* .. lpt9.*
.
..
Naming Files, Paths, and Namespaces
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
NTFS, and Microsoft Windows 32bit and 64bit (not 16bit and not DOS) support case sensitive file naming with duplicate filenames of mixed case. Under Windows 32/64 this is supposed to be supported via CreateFile and FILE_FLAG_POSIX_SEMANTICS=0x01000000
Does anyone know of a test program that can create conflicting filenames of varying CaSe? I'd like to see how they're handled by Explorer and Everything.
ref: Naming Files, Paths, and Namespaces https://docs.microsoft.com/en-us/window ... dfrom=MSDN
https://docs.microsoft.com/en-us/previo ... dfrom=MSDN
http://web.archive.org/web/200306090124 ... 04F26.html
Between Windows XP, Vista and beyond (NTFS 5.1 vs NTFS 6.0 in particular), additional characters are mapped to case-insensitive pairs and create new collision issues.
Does anyone know of a test program that can create conflicting filenames of varying CaSe? I'd like to see how they're handled by Explorer and Everything.
ref: Naming Files, Paths, and Namespaces https://docs.microsoft.com/en-us/window ... dfrom=MSDN
ref: CreateFileA function (fileapi.h) https://docs.microsoft.com/en-us/window ... reatefileaDo not assume case sensitivity. For example, consider the names OSCAR, Oscar, and oscar to be the same, even though some file systems (such as a POSIX-compliant file system) may consider them as different. Note that NTFS supports POSIX semantics for case sensitivity but this is not the default behavior. For more information, see CreateFile.
https://docs.microsoft.com/en-us/previo ... dfrom=MSDNAccess will occur according to POSIX rules. This includes allowing multiple files with names, differing only in case, for file systems that support that naming. Use care when using this option, because files created with this flag may not be accessible by applications that are written for MS-DOS or 16-bit Windows.
https://docs.microsoft.com/en-us/previo ... dfrom=MSDN
http://web.archive.org/web/200306090124 ... 04F26.html
Between Windows XP, Vista and beyond (NTFS 5.1 vs NTFS 6.0 in particular), additional characters are mapped to case-insensitive pairs and create new collision issues.
Last edited by raccoon on Thu Feb 24, 2022 8:09 am, edited 2 times in total.
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
From memory: you can use fsutil to create a case sensitive folder "t:\worst case":
Anyway, the commands used:
fsutil.exe file setCaseSensitiveInfo "t:\worst case" enable
To check:
fsutil.exe file queryCaseSensitiveInfo "t:\worst case"
Just tested: doesn't work.
IIRC I had to install the Windows Subsytem for Linux feature first (run appwiz.cpl,2 to install if needed )
But this was with WSL1; WSL2 works fundamentally different. You just have to try (I don't have the time atm.)
If enabled, you can simply do echo. > CaSe.txt. File Explore works (or worked) fine too.
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
weird, I can't find these anywhere in fsutil (on Windows 7), and Microsoft's own fsutil documentation makes no mention of it. Though, it is mentioned in another unrelated document for Windows Subsystem for Linux.
not mentioned: https://docs.microsoft.com/en-us/window ... sutil-file
mentioned here: https://docs.microsoft.com/en-us/window ... ensitivity
Does anyone know if this was added to fsutil in Win 8/10/11?
not mentioned: https://docs.microsoft.com/en-us/window ... sutil-file
mentioned here: https://docs.microsoft.com/en-us/window ... ensitivity
Does anyone know if this was added to fsutil in Win 8/10/11?
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
Windows 10 and later.
Everything has the 'Case Sensitive Dir' property.
Everything has the 'Case Sensitive Dir' property.
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
To find files in Everything with the same path and filename and where there is only a difference in case, search for:
dupe:name;path !case:dupe:name;path
filelists are not case sensitive if you would like to test this:
case.efu:
dupe:name;path !case:dupe:name;path
filelists are not case sensitive if you would like to test this:
case.efu:
Code: Select all
Filename
"C:\foo\bar.txt"
"C:\FOO\bar.txt"
"C:\foo\BAR.txt"
"C:\FOO\BAR.txt"
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
I need to find some software that will create/flag folders as POSIX in Windows 7 so I can dink around on a real filesystem and try to break things. As it is, I'll have to try creating some folders on Win 10 and transport them to Win 7.
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
Filenames on Windows can use characters in the range 0xd800 - 0xdfff which are invalid Unicode characters.
0xd800.txt
https://en.wikipedia.org/wiki/Unicode_block
0xd800.txt
https://en.wikipedia.org/wiki/Unicode_block
Re: Discussion: Edge cases of file naming in NTFS/FAT/etc, Windows/Linux.
WinRAR 6.11 release notes:
3. Reserved device names followed by file extension, such as aux.txt,
are extracted as is in Windows 11 even without "Allow potentially
incompatible names" option or -oni command line switch.
Unlike previous Windows versions, Windows 11 treats such names
as usual files.
Device names without extension, such as aux, still require these
options to be unpacked as is regardless of Windows version.