RP9 and TOSEC File Names
This document describes the default format used for RP9 file names, and the mapping to and from TOSEC file names.
RP9 was designed to simplify the distribution, use and organization of retrogames and other classic content, while recognizing the individual strengths of different media image formats and naming conventions.
Because they are used in players that also have access to a local or online database, RP9 files may be freely renamed without compromising functionality, in a way more similar to MP3 files than to TOSEC files. Even RP9 Toolbox, a component of RetroPlatform Player, includes a customizable feature to name and rename files according to user preferences. At the same time however, a default format is used for consistency. In its simplest form, it looks like this:
- Asteroid Invader II (Acme Games, 1986, Amiga).rp9
This format respects the fact that the application title (rather than, for example, the publisher entity) has emerged as the preferred initial part of commonly used naming methodologies, and adds a minimum of information to visually identify or search for items based on the elements in the file name. The following fields are always indicated:
- Item title
- Entity name (commonly used name, not corporate registration)
- Release year
- System family
Two additional fields may be added:
- Extended data (e.g. all additional TOSEC attributes), enclosed in one "master" set of [square brackets]
- Final enumeration suffix (number between round brackets), in case of multiple items with the same name in the same directory, e.g. (2), (3), etc.
The extended format is the default, but it may be not visible in well-cataloged games and demos, because in an ideal situation there is only one optimal entry for each title, and no need for extra information. This is what is most desirable from a usability point of view. Nevertheless, the extended format aims to preserve the full set of additional TOSEC fields, with no loss of information, while improving parsing and reducing undesirable naming differences.
The main differences between RP9 and TOSEC names:
- RP9 does not rearrange the "The" prefix to the end of a title or subtitle (if a title begins with "The" or any other article in any language, the file name also begins in the same way)
- Like TOSEC, RP9 also maps ":" (illegal in most file systems) to "-", but it enforces typesetting accuracy by adding proper spacing if necessary (the TOSEC 1.0 specification indicates the opposite, but in practice most TOSEC files that were based on that spec had the space added, resulting in two possible versions of the same file)
- The "vx.xx" version attribute is isolated from the title and placed in its own pair of round parentheses (the "v" is preserved)
- The original Roman and Arabic numerals are preserved if available, rather than being normalized (this type of normalization can always be done internally in the search layer, if desired)
This results in RP9 file names like the following:
- Asteroid Invader II (Acme Games, 1986, Amiga)[(v2.1(demo)(US)[a]].rp9
- Asteroid Invader II (Acme Games, 1986, Amiga)[(v2.1)(demo)(US)[a]](2).rp9
- Asteroid Invader II (Acme Games, 1986, Amiga)[(v2.1)(demo)(US)[a]](3).rp9
In general, RP9 naming follows a goal of rigorous simplicity and elegance, also taking into account undesirable variations observed in the reality of tens of thousands of TOSEC 1.0 files. This is why there are some small differences between TOSEC and RP9, whereby for RP9 the aim was to reduce the presence of inconsistent exceptions and to make parsing easier.
In particular, the choice to not rearrange titles beginning with an article (as in "Das Boot" changing to "Boot, Das") was based on the following considerations:
- This transformation originates from traditional library cataloging rules, but is less useful in a context where automated search (usually "live" as-you-type search) is pervasive
- Any rearrangement is a modification of the original title, introducing more work to humans, an inevitable duality (at the beginning there was one title, then there are two) and bringing with it the possibility of further unintended consequences (errors, borderline cases, difficulty in reconstructing the original, etc.)
- The presence of such a rule opens the doors to a "because you can" approach to editing, often with inconsistent results (which articles of which languages should be rearranged? should these be rearranged even within the context of another language? if the title has a subtitle separated by a dash, does the rule apply to both parts, or not?)
- Examples of "difficult" cases: "The Halley Mission - A Shuttle Simulation" vs. "Halley Mission, The - A Shuttle Simulation" vs. "Halley Mission, The - Shuttle Simulation, A" (four possible combinations to search for); "Live, Die - The German Rocket" vs. "Die Live - The German Rocket, The" (not only four possible combinations to choose from, but also not clear whether "Die" is article or not); "The Elphs, the Devils and the Blue Angel" vs. "Elphs, the Devils and the Blue Angel, The" vs. "the The Elphs Devils and the Blue Angel" (incorrect automatic reconstruction based on comma followed by article).
The system family property as used in the RP9 file name aims to indicate the widest set of compatible systems, not just one sample configuration. For example, for Amiga systems the supported names are "Amiga", "CDTV" and "CD32", because these reflect three important device branches both from a technical and a recognition perspective. Additional configuration details (e.g. preference for A-500 vs. A-1200) are embedded in the RP9 manifest. For CBM (8-bit) systems the platform are model-specific (C64, VIC 20, etc.), because the differences (and software incompatibilities) between the various models were more distinct.
- By default, space characters are used (not underscore characters)
- Illegal characters (such as "?") are converted to underscore characters (except ":" which becomes "-", with an initial space added if necessary)
- Multiple space characters are condensed into one
- Leading and trailing space characters are stripped
- The extended information field is always included in a pair of square brackets, which may include any other combination and nesting of paired round and/or square brackets
In the player implementation, if there is no database match for an item the file name information is used as follows:
- Underscore characters are converted to spaces
- If the extended information includes a version field, that is extracted as such
- If the extended information includes a demo status field, that is extracted as such
- If the extended information includes additional fields, these are extracted for further processing (without the "master" square brackets, and without the already-extracted version field)
- If the player has dedicated columns or fields for the version or demo status, this information is displayed there. Otherwise, the information is, by default, displayed together with the title.
- Any additional extended information, if present, is either shown in any dedicated columns or fields the player may have, or, only if necessary for disambiguation purposes (e.g. if there are two otherwise identical entries), is displayed after the title information.
||normalization, mapping, transformations
|Your feedback is
always appreciated. It is safe to link to