XP file sorting rant
Ever since file systems on computers have existed, file names have
been text strings, composed of some variable number of characters,
with various limits on lengths, valid characters, and sometimes
subdivided into parts. Early file systems had no provision
for sorting the names... they appeared in the order they were
allocated in the file system's directory, which was sometimes the
order in which the files were created... but if files were deleted
and others created, sometimes gaps were left, later filled, and the
order appeared to be quite random (but was generally quite
deterministic, if only one knew the algorithm, and the history of
file creation and deletion operations).
It was a great step forward when directory name display commands
(often called "dir", sometimes called "ls") provided options to
sort the output in various ways... options to sort by name, by
date, by size, and other attributes were created.
Sorts by name were always by character code of each character,
beginning at the first and proceeding to the last, either for the
filename as a whole, or for each syntactical unit of the
It has been quite a few years now, since long filenames have been
available in most desktop operating systems, and most software
handles them comfortably at this point. The incentive to
create encoding schemes to squeeze unique, meaningful data into 8
or 14 character filenames has been greatly reduced with the
availability of long filenames... however people still do embed a
variety of numeric information into filenames. And there are
still embedded systems running old operating systems with limited
One problem with text string sorting of filenames containing
sequence numbers is if the sequence numbers are composed of a
variable number of digits (think more than 9, or more than 99,
etc.). If one does not plan ahead, the order of the
text-sorted string filenames might include things like...
Now as a sequence of files, numbered say, from 1 to 500, this
ordering is not intuitive. One would want the files to be
ordered by the value of the digit-substring taken as a whole, and
that can easily be achieved by using leading zero digits to make
the numeric portion of the name have a fixed number of digits,
A Microsoft style improvement
In XP, Microsoft decided to solve the above problem in a different
way. Instead of treating the filename as a plain text string,
and sorting it left to right, and letting users supply simple
solutions like the above to simple problems they might encounter,
XP now treats filenames as multi-field substrings, and applies text
sorting rules to the text substrings, and numeric sorting rules to
the numeric substrings. The text and numeric fields are
variable length, determined by whether or not the next character is
an ASCII digit from 0 through 9, or anything else. This
"solves" the above problem, resulting in a sort order for the above
Clearly the enhancement to XP is an improvement, right?
Let us consider other uses of digit strings in file names besides
simple ordered sequences. Another common type of number to
embed in filenames is the version of a distributed product.
Version numbers are often variable in length, and often
contain punctuation. Here are some typical version
numbers... v1 v1.5 v2 v2.1.7
v2.1.8 v2.2 Often, the version is encoded in a filename
without the punctuation characters... resulting, on most operating
This works great, and even sorts right... except now, in XP, it
This is not nearly as useful as it was.
In some applications, fixed-number-of-digits hexadecimal numbers
are placed into filenames. Because hexadecimal numbers may
contain the letters A, B, C, D, E, and F in any digit position, a
sequence of hexadecimal numbers may be treated as multiple
fields. What would appear in numerical order on most
appears on XP as
What's more, if you add 7000 (hex) to these numbers, the order
changes again, to
Knowing how to count in hexadecimal isn't going to help anyone find
a file in the "ordered" list XP's Windows Explorer presents!
And Windows XP supports Unicode in filenames... but the numeric
digits for Thai, for Chinese, and other numeration systems, all
available in Unicode, are not used to delineate numeric subfields
for filenames. So this solution only works (to the extent
that you consider the above working), for people using the ASCII
digits for their numeric subfields.
And Windows XP still has a command prompt, and the command prompt
still supports the "dir" command.... which still sorts the way
every other operating system sorts filenames... leading to further
confusion and inconsistencies.
And why, oh why, didn't Windows XP figure out that I want the
following list of files sorted in the order given below?
seven hundred sixty eight
The final word?
It is not clear that the final word on this topic has been
written. But in SP1, Microsoft quietly added a new
policy. Adding the following key or keys to the registry will
undo the filename sorting feature introduced in XP:
Use the first one if you wish to undo it for all users of the
machine, or the second one if you wish to undo it just for the
currently logged in user.
It appears that some customers must have spoken, and that they
didn't like the feature, and that they were important customers
(since Microsoft listened).