FAT Filenames
Format
This document is for those who are already familiar with DOS files. It
is intended to remind you of DOS conventions.
DOS uses the following pattern for forming file paths:
[disk]:\[directory]\[subdirectories]\[filename].[extension]
[disk] is one letter of the Latin alphabet. Valid letters are
A through Z. How disk letters are assigned to physical
devices was discussed earlier.
[directory] and [subdirectories] are strings. They
specify the location of the file in the directory tree. None, either
or both are omitted depending on file location.
[filename] is also a string that identifies name of the file.
[extension] usually contains some information regarding type of
the file. It is a string. Note that directories may also have extensions.
Because file and directory names are of the same format, everything below
that applies to filenames also applies to directory names.
Short and Long Names
There are two types of filenames: long and short, or aliases.
Short filenames are subject to the infamous 8.3 limitation. Thus, [filename]
is one to eight characters long, and [extension] is zero to three
characters long. This limitation also applies to directory names. Many
sources say that paths are limited to 80 or 128 characters, but these limitations
are due to DOS peculiarities, but not FAT format. With FAT file system,
one can have infinitely long paths. The recommended (by me) maximum length
of a path is 256 characters, including terminating null character. Short
filenames use ASCII character set. Thus, each character takes up exactly
one byte. According to DOS manuals, short filenames are case insensitive,
and the following characters can be used:
-
Letters A through Z.
-
Digits 0 through 9.
-
Characters with ASCII codes greater than 127. Note that these characters
depend on current code table. Also, their handling is case sensitive.
-
Space. Note that many applications do not recognize this character and
it is not used in creating aliases for long filenames.
-
$ % - _ @ ~ ` ! ( ) { } ^ # &
Case insensitivity is achieved by converting the name to uppercase when
the file is accessed or created. The following characters have special
meaning:
-
. (dot) is the delimiter between [name] and [extension].
There should be only one delimiter for each file or directory name.
-
\ (back slash) is the delimiter between directory and file names.
-
: (colon) is the delimiter between disk letter and the rest of
the path.
The following characters are called wild cards. They are used in search
operations:
-
? (question mark) denotes any character.
-
* (asterisk) denotes blank or any number of any characters.
You are best advised not to allow any other characters in filenames, and
to use special characters according to their meaning. If the existing filename
has any illegal characters in it, it should either be ignored or the invalid
characters should be replaced by the valid characters.
Long names are up to 256 characters long, including extension and the
terminating null character. This limitation is artificial, and the long
names on the disk can actually be longer than 256 characters. Again, some
limitations were created by the software that serves FAT. The maximum length
of a directory path, including drive letter, column, and leading slash,
but excluding trailing slash, null terminator, filename and extension,
is 246 bytes. The maximum length of a full path is 260 characters, including
null terminator. This is four characters more than I recommend.
Long filenames are stored in unicode.
Each character is two bytes long. There are two important things to remember
about unicode:
-
For only the characters in the low half of the ASCII table, the high byte
of unicode character is zero, and the low byte is the same as in ASCII.
-
Unicode system supports case insensitive operations for all characters,
not just Latin.
All characters that are valid for short filenames are also valid for long
filenames. In addition, the following characters can be used:
-
+ , ; = [ ]
-
. (dot) can occur more than once in a filename. Extension is the
substring after the last dot.
Long filenames are what I call half case sensitive. The case of the characters
is preserved when creating the file, but other from that long filenames
are case insensitive. For example, "File" and "file" are treated as the
same string by file system. They cannot co-exist in the same folder, and
any of these names can be used to reference the file.
Aliasing
Whenever a file with a long filename is created, its alias with short filename
is also created. The converse is usually not the case. If the long filename
fits in the standard 8.3 scheme and contains only valid for short filenames
characters, the following rules are applied:
-
If one or more of the characters are not upper case, the alias created
is the same as the long filename, only converted to upper case.
-
If all characters are uppercase, then only the alias is created, but the
directory entry for the long filename itself is not created.
If the long name is either too long to fit in 8.3 or contains illegal for
short filenames characters, the following rules are applied:
-
First, illegal for short filenames characters are deleted from long filename.
All dots except the last one are also deleted.
-
Then, first six (or less, if the name is shorter) characters are used as
base, and ~ (tilde) and the number 1 are used as tail.
Base and tail are concatenated to form the alias' filename. Alias' extension
is formed by the first three (or less) letters from the long filename's
extension.
-
If the alias with such a name already exists, the number in the tail is
increased. This process is repeated until we arrive at a unique alias.
Note that when the number reaches the power of ten, its decimal representation
takes up one more byte. In this case one last character is stripped off
the base. The situation when this loop overflows (base is empty, but number
needs to be increased) is very unlikely. After all, where did you see more
than 9,999,999 files in one directory?
Needless to say, there is no way to tell the alias by just looking at the
long name, and there is no way to retrieve the long name by looking at
the alias. Special directory structure insures that they are associated
with each other. That is why when the file is accessed via its long filename
and is edited (usually, deleted and re-created), the alias may change.
It may especially change if the file is copied to a different folder.
One can access the file using its alias.
-
When the file is created using its alias, the corresponding long name is
not created, with the exception of one case below.
-
When the file is deleted, either via its alias or via its long name, all
directory entries for the long name and the alias are marked as deleted.
-
Some extra steps are taken to preserve the long filename. The long filename
is obviously lost when the file is copied using its alias. But there are
less obvious things. For example, an application may delete and re-create
the file using its alias. Not to loose the long filename, Windows 95 keeps
the information about the deleted via aliases files for fifteen seconds,
and if an an attempt was made to re-create the file during this time using
the same alias, the old long filename is associated with this alias.
Finally, only VFAT filesystems support long filenames and aliasing.
Author: Alex Verstak
3/10/1998