The filesysobjects API

The filesysobjects API covers a variety of interfaces for the processing of resource path addresses, and the search of resources. The initial set of interfaces forcusses on filesystem resources in a basic distributed environment. This covers in particular a basic set of call parameters, which are common for a subset of the call interfaces.

API

Interfaces

.

  • process paths - normalize, split, and join resource paths
  • manage search paths - create, add, remove, and clear search paths
  • search for files, directories, and branches - find resources by search on media
  • match files, directories, and branches into path strings - search resources on strings only
  • configuration - abstraction of specific configuration directories for various platforms
  • os - abstraction of specific OS directories for various platforms
  • user - abstraction of specific user directories for various platforms

Major Options by Interfaces

The cross-platform related options are

  • apppre - application prefix - scheme
  • pathsep - special path separator
  • spf - source platform
  • tpf - target platform

The following table displays the support of interfaces with some additional path separator processing options. For the full set of options refer to the interfaces.

interface spath apppre spf tpf pathsep appsplit keepsep strip
addpath_to_searchpath x   x x x     x
clearpath   x x x     x
delpath_from_searchpath x     x x     x
escapepathx x   x x       x
expandpath x   x x        
findpattern x              
findrelpath_in_searchpath_iter x              
findrelpath_in_searchpath x              
findrelpath_in_uppertree_iter x              
findrelpath_in_uppertree x              
get_subpath_product                
gettop_from_pathstring x   x x x   x  
join_apppathx_entry   x x        
normapppathx x x x x x x   x
normpathx x   x x x     x
set_uppertree_searchpath     x x x     x
splitpathx x   x x       x
splitapppathx x x x x x x x x
splitapppathx_getlocalpath     x x       x
unescapepathx x   x x       x

tpf and spf

The filesysobjects provides crossplatform conversion of resource addresses for a variety of platforms, including ordinary local filesystems, network filesystes, UNC, URI, URL and others. Therefore the source and target platfors could be adapted by the parameters tpf for the target and spf for the source.

The source and target platform options define the separators ‘sep’ and ‘pathsep’, and some structural context such as drive letters on the path layer, and additional application schemes as prefixes and additional structural information such as fragments and queries for URLs. While the target platform parameter supports one value only, the source platform parameter enables the combination of multiple platforms. The latter enables in particular the cross-conversion of mixed terms by accurate handling of the structural components of all platforms.

type constant tpf spf scheme host share path option
URL/URI RTE_URI, RTE_HTTP, RTE_HTTPS http, https   http/https hostname   path query fragment
file-URI RTE_FILEURI posix, win posix, win file hostname drive, share path
ldsys RTE_WIN32 win win drive path
lfsys RTE_POSIX, RTE_WIN32 posix, win posix, win path
RAW   all all        
share   win win //, \\ hostname share path
SMB/CIFS RTE_SMB win, posix   smb hostname share path
UNC RTE_UNC win, posix   file, //, \\ hostname share path

Options

The call interface provides for groups of functions and classes with a set of common parameters and additional context specific modifications.

The provided function sets comprise the categories:

  • Filesystem Positions and Navigation
  • Canonical Node Address

Various common options are supported, which may not be available for each interface.

spath

The processed path, either to be added to the search list, or to be find in a earch list.

plist

A path list for search operations, default is ‘sys.path’.

kargs

apppre

Application prefix, application scheme.

appsplit := (
     True     # add application scheme
   | False    # remove application scheme
)

default := False

appsplit

Split into scheme and components of an application path.

delnulpsep

Constraints the strip option. Keeps the separators after reduction.

default := True

keepsep

Modifies the behavior of the ‘strip’ parameter. If ‘False’, the trailing separator is dropped.

normapppathx('/a/b', keepsep=False)   => '/a/b'
normapppathx('/a/b/', keepsep=False)  => '/a/b'

for ‘True’ trailing separators are kept as directory marker:

normapppathx('/a/b', keepsep=True)    => '/a/b'
normapppathx('/a/b/', keepsep=True)   => '/a/b/'

default := True # for URIs except: file://, smb:// default := False # for file path names

pathsep

Forces the use of the provided ‘pathsep‘ for the analysis of the input. WHile the application related interfaces in general provide multiple paths separated bu ‘pathsep‘, the path related interfaces provide this optionally. Thus the parameter for path interfaces comes into effect when provided.

This could comprise ‘:;‘, which are applied literally as generic tokens, while the options ‘posix‘ and/or ‘win‘ recognize the structural element context too.

pathsep := (
    True            # uses default from *spf*
  | False           # keeps unchanged
  | [:;]*           # replaces ':' and/or ';'
  | 'win'           # replaces ';'
  | 'posix'         # replaces ':'
)

default:=False

The value True just activates the application of the pathsep as defined by the spf, which defaults to os.pathsep.

raw

Suppress normalization by call of ‘os.path.normpath’.

strip

Removes resulting null-entries. This relies on the choosen target platform, because some chaarcters are used for multiple syntax elements, thus without context information lead to ambiguity. An example is here again the DOS-drive. The interpretation of the separator characters by the undelying OS is the base for the processing of the strip operation.

1
2
3
spf='posix':  d:/  => [ 'd',  '/']  # two directories
spf='win':    d:/  => [ 'd:', '/']  # drive 'd:'. and the root directory on 'd:'
                                    # as node-name

Thus the option strip leads to the following results:

1
2
spf='posix':  d:/  => [ 'd']        # one directory
spf='win':    d:/  => [ 'd:', '/']  # drive 'd:'. and the root directory on 'd:'

The supported target platforms encounter the following effect when strip is set to True.

  • URL / URI

    • multiple occurances of sep and pathsep
  • ldsys

    • multiple occurances of sep and pathsep
    • terminating sep and pathsep
  • ‘lfsys’

    • multiple occurances of sep and pathsep
    • terminating sep and pathsep
  • RAW

    n.a.

  • share

    • multiple occurances of sep and pathsep
    • terminating sep and pathsep
  • SMB

    • multiple occurances of sep and pathsep
    • terminating sep and pathsep
  • UNC

    • multiple occurances of sep and pathsep
    • terminating sep and pathsep

default := True

spf - Source Platform
spf string constant compatible to normpath cross platform behaviour
local posix, win RTE_POSIX, RTE_WIN32 yes no calls local os.path.normpath()
posix posix RTE_POSIX no (*pchar) yes, Portable for IEEE1003.1, 2013/3.276 Portable Character Set transforms all separators to ‘/’ or ‘:’
uri http, https, smb, file RTE_HTTP, RTE_HTTPS, RTE_SMB, RTE_FILEURI no yes transforms all separators to ‘/’
win win RTE_WIN32 no no transforms all separators to ‘\\’ or ‘;’
default   RTE no no adapts ‘win’(on win) or ‘posix’(on posix) to local os

Defines the characters to be used as path separator, and the additional specific semantics. Default is the current platform, with a single character for the ‘os.pathsep’ and the support for the specific semantics like DOS drives for the platform ‘win’.

Application and URL/URI file prefixes and tags like ‘smb://‘, ‘file://‘, ‘file:////‘, and ‘file://///‘ are detected in any case. Thus these are treated as reserved words.

The syntax is:

spf := (<macros>|<char-string>)

macros := ('posix'|'win')[, macros]
char-string := (':'|';'|'')*

default := ('posix'|'win')
  • posix:

    String for POSIX based platforms. Is aware of the POSIX syntax and semantics, single character ‘/’ as separator. Ignores pattern for potential DOS drives.

  • win:

    String for the Windows platform. Is aware of the Windows syntax and semantics, single character ‘\’ as separator. Recognizes string pattern of DOS drives. Undestands in addition the separator ‘/’.

  • URI char-string:

    Applies the selected URI.

For mixed input for example:

spf := ':;'
tpf - Target Platform

Due to some deviations from the expected behavior in case of cross platform development the following options are defined:

tpf constant compatible to normpath cross platform behaviour
cnp   yes(posix) no calls posixpath.normpath()
cnw   yes(win) no calls ntpath.normpath()
local   yes no calls local os.path.normpath()
posix RTE_POSIX no (*pchar) yes* Portable for IEEE1003.1, 2013/3.276 Portable Character Set transforms all separators to ‘/’ or ‘:’
uri   no yes transforms all separators to ‘/’
win RTE_WIN32 no no transforms all separators to ‘\\’ or ‘;’
default RTE no no adapts ‘win’(on win) or ‘posix’(on posix) to local os

See also tpf and spf, for a detailed comparison refer to ‘filesysobjects.normpathx’ vs. ‘os.path.normpath’.

  • cnw

    Calls transparently

    posixpath.normpath()
    
  • cnp

    Calls transparently

    ntpath.normpath()
    
  • default

    Adapt ‘os.path.sep‘ and ‘os.pathsep‘ to local native os, else wise the same behavior as the modes ‘posix‘ or ‘win‘.

    This mode is in term of the structure including drives of Windows based file systems compatible across all supported platforms. But the os.path.sep, and the os.pathsep are adapted to the local platform.

    On POSIX platforms(Linux, MacOS, ...), e.g.:

    d:/  => d:
    d:\\  => d:\\
    

    On Windows platforms, e.g.:

    d:/  => d:\\
    d:\\  => d:\\
    

local:

Compatible to local ‘os.path.normpath()‘, this includes the original permitted input characters.

On POSIX platforms(Linux, MacOS, ...), e.g.:

d:/  => d:
d:\\  => d:\\

On Windows platforms, e.g.:

d:/  => d:\\
d:\\  => d:\\

posix:

POSIX based style with “os.path.sep = ‘/’” and “os.pathsep = ‘;’”. The special escape-characters are kept, additionally the following chars by escaping: “[/\\:;]”. The special case of ambiguous non-intentionally escape-character ‘\\\\‘ could be eliminated by an odd number of ‘\\‘.

E.g. Linux, MacOS/OS-X, BSD, Solaris, etc.:

d:/  => d:/
d:\\  => d:/

This mode is literally compatible across all supported platforms.

uri:

win:

MS-Windows style with “os.path.sep= ‘\\’” and “os.pathsep = ‘:’”. The special escape-characters are kept, additionally the following chars by escaping: “[/\\:;]”. The special case of ambiguous non-intentionally escape-character ‘\\\\‘ could be eliminated by an odd number of ‘\\‘.

d:/  => d:\\
d:\\  => d:\\

This mode is literally compatible across all supported platforms.

Literals, RegExpr, and Glob

The common variants of pathnames as parameters are provided by one of the categories:

  • Literal - literal names

    The applicability varies on the scope. Whereas literals could be applied in any scope, these are the least flexible search pattern. These just provide native matches either on single nodes, or single goups in case of directories represented as containers.

  • Regular Expression - specific match pattern, which are implementation dependent

    The regular expressions in general are part of applications, either special autonomous conversion filters, or embedded into a greater application, and/or programming environment.

    These are strongly implementation dependent, even though a broad commen set is generally provided. These in particular lack the native support of filesystems, thus could be only used as input and/or output filters.

  • Glob - platform dependent native filesystem match

    The ‘glob’ expressions are a very basic type of regular expressions, which are platform dependent. These could be applied at the interface of the filesystems, and influence the responce of the filesystem interface directly.

    Globs can span multiple levels of directory paths.

    r"a[0-9]*/[!xy]*/???/*"
    

    Use e.g. the following patterns to restrict on a single node name:

    os.sep+r"a*b"+os.sep
    os.sep+r"a*b"
    
  • Combined - literal names combined with glob, or regexpr

    The pathnames are internally processed depending on the category of the interface. Interfaces operating in memeory on strings only apply regular expressions, The implementation of interfaces accessing fthe filesystem, e.g. for existence tests and name resolution, use literal matches and glob.

    The Semi-Literal type arose from the design, that ‘glob’ and ‘regexpr’ must not be intermixed because of the possible ambiguities. One of the main differences is the scope of match. The ‘glob’ functions are aware of seperators, for regexpr they represent simply a character. The pattern has some differences too, e.g.

    regexpr:  F[0-2]*  := (F, F0, F1, F2, )
    glob:     F[0-2]*  := (F.*, F0.*, F1.*, F2.*, )
    

    So this basicly also prohibits functions like ‘fnmatch.translate()’ on intermixed expressions. Anyhow, the user could prepare a string as regexpr before calling the interface. But be aware, for filesystem evaluation the glob-style is applied only.

    Literals and one only of the pattern types could be intermixed arbitrarily.

    The implementation and the possible intermix are provided due to the implemented algorithm when the following is true:

    • ‘regexpr’ and ‘glob’ could be intermixed, when the ‘glob’ compiled by ‘re’ does not match. Than the algorithm keeps it simply as an unknown node, and continues with ‘glob’ expansion. Thus the following strict pattern of path names is provided, where the order is significant:

      <mixed-regexpr-glob> := <literal-or-regexpr><literal-or-glob>
      

    The rule of thumb is given by the following combinations:

    • literal + glob:

      a literal path part matching the search path, a glob to be applied to the filesystem.

    • regexpr + literal:

      a regexpr to be applied onto the in-memory path, followed by a literal applicable in any case.

    • literal + regexpr + literal + glob + literal:

      this order is provided by the algorithm but the input pattern is not verified to be of a consistent type and though the applicability of the syntax has to be assured by the caller

    • path-pattern := [path-pattern] + (literal|regexpr|glob):

      Arbitrary pathname pattern are by default supported by trials to match the longest valid parts for each type. This is inherently ambiguous when glob and regexpr has to be detected. Thus in cases where arbitrary types are required a grouping type-keyword has to be provided.

      path-pattern := [path-pattern] + (
              literal   | 'literal(' literal ')' | 'l(' literal ')'
              | regexpr | 'regexpr(' regexpr ')' | 'r(' regexpr ')'
              | glob    | 'glob(' glob ')'       | 'g(' glob ')'
              )
      

The overall applicability for specific execution and call contexts is depicted in the following table.

The automatic resolution in absence of type-keyword starts in general with literals and globs from left. Than the regexpr is tried to be resolved.

Application Processing Scope Literals Semi-Literals RegExpr glob
In-Memory Path Strings yes yes yes no (1)
Filesystem yes yes no yes
Filesystem-Extension(3) yes yes yes(2) yes
  1. is treated as an ‘regexpr’, when matches this is resolved, else ends re-processing and is applied as a ‘glob’
  2. is handled by caching the file system component into memory and applying the ‘regexpr’ onto the in-mem-string
  3. in case of ambiguity type-keywords have to be provided on the syntax parts, see ‘path-pattern’

Thus the application of RegExpr is implemented as an optional parameter performed on in.memory strings only. The workflow in case of required searches for unknown filesystem nodes by re-patterns is:

  1. filter and collect filesystem entries by ‘literal’ and ‘glob’ parameters
  2. post-filter the collected set by ‘literal’ and ‘RegExpr’ type parameters.

Common Path-Rules

The filesysobjects supports an abstract API for the transparent application of resource paths. This includes in reality a bunch of redundancies and ambiguities of tokens ans structures. Therefore some minor ‘esoteric’ limits are accepted, while still the escaping and masking for some application URIs is required. The basic concept relies on specific tokens and token characters, which indicate the applications and seperators of the syntax elements. Thus these are required in order to reliably distinguish between the various path syntaxes. When multiple characters - e.g. ‘;’ or ‘:’ as parts of the path constructs of URLs are reuqired, these have to be masked, or replaced by the character codes and processed by the requesting application. An overall generic processing for all applications e.g. by splitapppathx() could not be provided reliable. Even the masking by ‘backslash’ runs finally into ambiguity, e.g. due to supported Windows filesystems. Therefore the package contains at the time of writing already more than 4000 Unit tests.

The resulting advance after these practically minor - if at all - constraints is the seamless access to various resources by mixed resource paths. Where the major part of the management and the assignment of the resource locators is provided by the filesysobjects.

This syntax represents e.g. the following valid filepathanmes with the current position(PWD) as reference point for relative positions. For the conversion API refer to splitapppathx [docs] [source] and splitapppathx_getlocalpath [docs] [source] :

/local/path/access

Where the following equivalent transformations result

/local/path/access/dir/ => /local/path/access/dir/ # forces a directory
/local/path/access/dir  => /local/path/access/dir  # could be a file too
./dir/                  => /local/path/access/dir/ # forces a directory
./dir                   => /local/path/access/dir  # could be a file too
../dir/                 => /local/path/dir/        # forces a directory
../dir                  => /local/path/dir         # could be a file too
dir/                    => /local/path/access/dir/ # forces a directory
dir                     => /local/path/access/dir  # could be a file too

Where the following basic paths are equivalent

/local/path/access/dir/ == /local////////path/access//dir/////
/local/path/access/dir/ == /local//.///./././path/access//dir//././/
/local/path/access/dir/ == /local//../path/////path/access/../../path/access/dir/////
/local/path/access/dir  != /local////////path/access//dir/////
/local/path/access/dir  != //local////////path/access//dir/////
/local/path/access/dir  != //local////////path/access//dir

Where the following basic URI paths are still equivalent

file:///local/path/access/dir/ == /local////////path/access//dir/////
file:///local/path/access/dir/ == /local//.///./././path/access//dir//././/
file:///local/path/access/dir/ == /local//../path/////path/access/../../path/access/dir/////
file:///local/path/access/dir  != file:///local////////path/access//dir/////
file:///local/path/access/dir  != //local////////path/access//dir/////
file:///local/path/access/dir  != //local////////path/access//dir

Same for the first basic URI paths on both sides

file:///local/path/access/dir/ == file:///local////////path/access//dir/////
file:///local/path/access/dir/ == file:///local//.///./././path/access//dir//././/
file:///local/path/access/dir/ == file:///local//../path/////path/access/../../path/access/dir/////
file:///local/path/access/dir  != file:///local////////path/access//dir/////

But for the last two no longer, because ‘2SEP’ has to be at the beginning of the string, which is interpreted here as the beginning of the raw string representation

file:///local/path/access/dir  == file:////local////////path/access//dir/////
file:///local/path/access/dir  == file:////local////////path/access//dir

Resources