‘filesysobjects.paths’ - Module

The filesysobjects.paths module provides advanced operations on paths and search paths.

Implementation Details

Glob Parameters

The ‘glob‘ wildcard definitions are a subset of regular expressions which additionally deviate slightly by their semantics. In the case of a dot for example this could be in addition ambiguous.

1
2
3
4
5
# file path name: /a/b/xname
path0 = '/a/b/x.*'

# regexpr: matches
# glob:    does not match

Therefore the resolution of contained path-elements as ‘glob’ expressions are resolved dynamically by applying the glob module onto the file system nodes.

Regular Expressions

The regular expressions support the full scope of the standard Pyhton ‘re‘ module. The expressions are used as post-filter onto a set of fetched file system node path names.

The regexpr are by default compiled/loaded statically during load time of the module. The regular expressions for the path analysis contain the ‘os.path.sep‘ of the current platform which could be altered as parameter by some interfaces. In this case the function interfaces compile the adapted regexpr into a local stack-variable, where the compiled match object is cached by the module ‘re‘, but potentially will be compiled for each call again. The object interface may nont have to handle with this within the lifetime of the instances.

Common Call Parameters

appre

keepsep

pathsep

strip

Strips null-entries.

default := True

stripquote

Strips special Python style filesysobject triple-quotes.

/my/path/with/"""double-quoted"""/dirs => /my/path/with/double-quoted/dirs
/my/path/with/'''single-quoted'''/dirs => /my/path/with/single-quoted/dirs

Current version does not support nested triple-quotes, neither homogeneous, nor heterogeneous.

spf

Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf. For additional details refer to tpf and spf, filesysobjects.getspf(), filesysobjects.gettpf(), paths.normpathx(), paths.splitpathx(), and apppaths.normapppathx(),

default := RTE

tpf

Target platform. For the syntax refer to API in the manual at tpf. For additional details refer to tpf and spf, filesysobjects.getspf(), filesysobjects.gettpf(), paths.normpathx(), paths.splitpathx(), and apppaths.normapppathx(),

default := spf ( == RTE)

Supported Conversions

The filesysobjects provides functions for the normalization and the split of resource paths. This includes the cross-conversion of the major filesysytem platforms Posix and Windows. In addition various conversions are supported. These comprise the equivalent conversion between semantically compatible representations, and the conversion between syntactical compatible respresentations without assured semantic equivalency. The latter requires for the semantical equivalency additional conventions, e.g. for the conversion of ‘z:mypath’ from windows to Posix.

The split of all supported platforms for custom applications is supported.

normpathx

The normpathx supports the normalization of resource paths for the native platform, and the cross-conversion for a limited set of other platforms when the call parameters spf and tpf are used.

The following basic cross-conversions are supported.

  posix win cygwin file (0) file (2) file (3) file (4) file (5) unc
posix c f c c c c      
win f c c c c c      
cygwin c c c c c c      
file (0) c c c c c c      
file (2)         c        
file (3) c c c c c c      
file (4)             c c c
file (5)             c c c
unc   f f       c c c

c: Could be converted into equivalent representation.

f: Format conversion without assured equivalency. The accuracy depends on additional conventions.

(0): file uri - short form:

'file:/my/path'   == '/my/path'
'file:c:/my/path' == 'c:/my/path' == 'c:\my\path'

(2): file uri - remote file:

'file\://myhost/my/path' == '//myhost/my/path'

(3): file uri - traditional:

'file:///my/path'== '/my/path'

(4): file uri - UNC:

'file:////myhost/myshare/my/path' == '\\myhost\myshare\my\path'

(5): file uri - UNC:

'file://///myhost/myshare/my/path' == '\\myhost\myshare\my\path'

For details of the sub-conversion, e.g. file-URI with 4 or 5 slashes refer to the API.

Module

The ‘filesysobjects.paths’ module provides operations on static file resource paths.

Constants

Note

The displayed numeric values for the enums are for debugging support only and may change apperantly, use the symbolic names only.

Simple Path Scanner - Parser

The scanner and parser for file path name rules in accordance to Posix/IEEE-1003.1 and NTFS/FAT path name specifications. Adds some minor causal constraints on esoteric cases for ambiguity resolution.

The base for the path compilers of filesysobjects.paths.normpathx, filesysobjects.paths.splitpathx, filesysobjects.paths.escapepathx, and the filesysobjects.paths.unescapepathx.

  • PATHSCANNER

    Generic tokenizer for file path names, supports Posix/IEEE-1003.1 and NTFS/FAT path names.

  • Control Constants - Tokens:

    • SC_BSPAIR(1000) - ‘\’ pair
    • SC_CIFS(1010) - cifs:
    • SC_CRMASK(1020) - masked ‘\n’
    • SC_DOIT(1030) - out of range
    • SC_DQUOTED(1040) - “
    • SC_DRIVE(1050) - DOS DRIVE LETTER - OR A DIRECTORY ON POSIX !!!
    • SC_DRIVENPSEP(1060) - dos drive letter following n * posix_sep
    • SC_DRIVENWSEP(1070) - dos drive letter following n * win_sep
    • SC_DUMMY(1080) - for tests
    • SC_EACHOF(1090) - assure for each
    • SC_ESCCHAR(1100) - ‘\[abf...]’
    • SC_FABS(1110) - file:///path - absolute path - rfc8089 rfc1738
    • SC_FILE(1120) - file:
    • SC_FMIN(1130) - file:/path - min rfc8089 - Appendix B
    • SC_FNONLOCAL(1140) - file://host/path non-local - rfc8089 - Appendix B / maps to Posix-App
    • SC_FSHORT(1150) - file:<dos-drive>:path - short-form - rfc8089
    • SC_FUNC(1160) - file:///// | file://// - share/netapp - rfc8089 - Appendix E.3.2
    • SC_HTTP(1170) - http:
    • SC_KEEP(1180) - keep literally
    • SC_MASKALL(1190) - keep literally
    • SC_NULLDIR(1200) - ‘/./’
    • SC_PAPP(1210) - Posix-Net-App * 2 * ‘/’ + posix-rules + “causal constraints”
    • SC_PDOM(1220) - ‘//’ share/posix-app - 2 * ‘/’ + domain-rules
    • SC_PSEPP(1230) - ‘:’
    • SC_PSEPW(1240) - ‘;’
    • SC_REPLACE(1250) - replace an equal set of chars e.g. ‘/’ or ‘\’
    • SC_SEPP(1260) - n * Posix path.sep
    • SC_SEPW(1270) - 1 * win path.sep
    • SC_SLASH(1280) - ‘/’
    • SC_SLASHPREB(1290) - ‘\’ + ‘/’
    • SC_SMB(1300) - smb:
    • SC_SQUOTED(1310) - ‘
    • SC_TOEVEN(1320) - assure count is even
    • SC_U16(1330) - unicode-16
    • SC_U16R(1340) - unicode-16 raw
    • SC_U32(1350) - unicode-32
    • SC_U32R(1360) - unicode-32 raw
    • SC_UNC(1370) - unc:
    • SC_UPDIR(1380) - ‘/../’
    • SC_WDOM(1390) - Win-Domain - 2 * ‘\’ + domain-rules
  • Context Maps:

    • ASCII_SC_CTRL

      Map matche groups to appropriate control tokens.

  • Scanner - Parser Registry

    Set of current registered scanners - parsers. To be used for re functions.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    sub_path_calls = {  #: vector for escaping and normalization
       'b': sub_win,
       'k': sub_keep,
       'keep': sub_keep,
       'posix': sub_posix,
       's': sub_posix,
       'uri': sub_posix,
       'win': sub_win,
       'win32': sub_win,
    }
    

Miscellaneous

  • _CPSPLIT - static compiled: split pathnames with literal + glob + regexpr.
  • _ENV_SPLIT - Split-out environment variables for substitution.
  • _ENV_SPLITg - Entry points into sub strings environment variables and literals.

Functions

escapepathx

filesysobjects.paths.escapepathx(spath, tpf=None, **kargs)[source]

Escape special characters within path names, supports cross-platform processing, knows the special escape characters of Python and re. The characters could be masked by quoting, and/or enclosing in character classes.

input -> esc -> unesc
\abc”\n” \\abc”\n” \abc”\n”
\”abc\n” \\”abc\n” \”abc\n”
\abc\n \\abc\\n \abc\n
\xy” “z \\xy” “z \xy” “z
\”xy z” \\”xy z” \”xy z”
\xy z \\xy\ z \xy z
Args:
spath:

The path to be escaped.

spath := (
     <path-string>
   | <path-array>
)

path-string := (str | unicode)
path-array := (list | tuple)
  • path-string

    The string representation of a complete path, which may contain literal, glob, and re expressions. The supported character representation is str or unicode for Pyton2.7 and Python3.5+.

  • path-array

    The component representation of a path, which consists of it’s items, either as a list or as a tuple. Each item may contain literal, glob, and re expressions.

tpf:
Target path separator, currently not used.

kargs:

charback:

Escapes all backslashes within character classes. Could be combined with force and freeback.

\[\] => \[\\]
force:

Controls the escaped scope. Excludes quoted strings and character classes. Could be combined with charback.

force = (
     True    # escape characters and any free backslash
   | False   # defined escape characters only
)

force == True

   \a\X\n => \\a\\X\\n

force == False

   \a\X\n => \\a\X\\n

default := False

freeback:

Escapes backslashes outside character classes. Could be combined with charback.

\[\] => \b\[\]
Returns:

The escaped path with added ‘\‘ in accordance to the rules and chosen options. The return type of the representation is the same as the input representation.

str      =>  str
unicode  =>  unicode
list     =>  list
tuple    =>  tuple
Raises:

PathError

FileSysObjectsError

TypeError

pass-through

normpathx

filesysobjects.paths.normpathx(spath, **kargs)[source]

Normalize paths, similar to ‘os.path.normpath()’ - with optional extensions paths with basic application schemes and search paths, dos-drives, and the split of paths into directories. The various representations could be converted on-the-fly.

smb, cifs, file, http/https, UNC, POSIX-network apps

For advanced processing of application schemes refer to normapppathx() and ‘splitapppathx()’. The path could include regular expressions re and glob, literals and masked parts.

  • regular expressions

    The supported regular expressions are native Python regular expressions as supported by ‘re’ with support of expressions spanning multiple directories.

  • globs

    Standard module glob.

  • literals:

    Any literal path.

Regular expressions and globs could be masked as quoted strings, which are kept unchanged.

The normpathx provides the features as simple interface for the normalization across multiple platforms. The companion interface provide various features, e.g. the escapepathx and unescapepathx of path names including re and glob.

Args:
spath:
A single path entry - no valid ‘os.pathsep’. In case of required search path including semantic ‘os.pathsep’ use ‘splitapppathx()’.

kargs:

apppre:

Application prefix.

default:=False

keepsep:
Keeps significant seperators, in particular the trailing path separator ‘sep’, and the trailing search path ‘pathseparator’.
strip:

Strips redundancies from path names,

"a/.//./b/c/../" => "a/b"

see related ‘keepsep’

"a/.//./b/c/../" => "a/b/"

default:=True

stripquote:

Removes paired triple-quotes of protected/masked string sections.

"/a/'''head:'''/c" => "/a/head:/c"

default := False

spf:

Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.

For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().

tpf:

Target platform, defines the output syntax domain. For the syntax refer to the API in the manual at tpf.

For additi0onal details refer to tpf and spf, paths.gettpf(), normapppathx(), normpathx().

pathsep:

Changes path separator for the source platform.

pathsep := (
      (: | ;)         # replaces by ':' or ';'
    | <keyword>
    | <#enum>
)
Returns:
Normalized path.
Raises:

PathError

pass-through

splitpathx

filesysobjects.paths.splitpathx(spath, **kw)[source]

Split pathnames into a list/tuple of items for each directory. For example

In [15]: filesysobjects.paths.splitpathx("/a/b/c")
Out[15]: ('', 'a', 'b', 'c')

In [16]: filesysobjects.paths.splitpathx("x:/a/b/c")
Out[16]: ('x:', 'a', 'b', 'c')

In [17]: filesysobjects.paths.splitpathx("x:\a\b\c")
Out[17]: ('x:', 'a', 'b', 'c')

For URI*s and search paths refer to *splitapppathx.

Supports directory name types as ‘literal’, ‘glob’, and ‘re/regexpr’. Supports the same syntax elements as normpathx, while it is prepared to simple application of the built-in join() with os.sep.

Is not aware of application tags except Network-Shares, Posix-Applications, and file-URI.

REMARK:
The intention is to replace the ‘str.split()’ method for the split of the path parts, thus this is different to the method ‘os.path.split()’.

Args:

spath:
Path to split.

kw:

apppre:

Application prefix, when ‘True’ the scheme is included, else dropped.

apppre=(True|False)
keepsep:

Modifies the behavior of ‘strip’ parameter. If ‘False’, the trailing separator is dropped.

splitpathx('/a/b', keepsep=False)   => ('', 'a', 'b')
splitpathx('/a/b/', keepsep=False)  => ('', 'a', 'b')

for ‘True’ trailing separators are kept as directory marker:

splitpathx('/a/b', keepsep=True)    => ('', 'a', 'b')
splitpathx('/a/b/', keepsep=True)   => ('', 'a', 'b', '')
pathsep:

Optional search path separator.

posix: ‘:’

win32: ‘;’

default := os.pathsep

sep:

Optional path separator.

posix: ‘/’

win32: ‘’

default := os.path.sep

strip:

Removes null-entries.

default := False

stripquote:

Removes paired triple-quotes of protected/masked string sections.

"/a/'''head:'''/c" => "/a/head:/c"

default := False

spf:

Source platform, defines the input syntax domain. For the syntax refer to the API in the manual at spf.

For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().

tpf:

Target platform. Even though the splitted form of a resource path is basically canonical, some details of the specifications for slightly variations requires the granular fine-tuning. Thus defines in case of ambiguity the scheme for apppre=True. Accepts the following values only.

tpf := (
     RTE_FILEURI0 | 'fileuri0'  # RFC8089 - minimal
   | RTE_FILEURI4 | 'fileuri4'  # RFC8089 - 4-slash UNC/POSIX-app
   | RTE_FILEURI5 | 'fileuri5'  # RFC8089 - 5-slash UNC/POSIX-app
   | RTE_FILEURI  | 'fileuri'   # RFC8089 - canonical
)
Returns:

A list containing the path split into it’s components. The list is prepared to be concatenated by join().

The interface is aware of the os.path.sep character, but a present regular expression may span multiple path components, which have to be handled dynamically when applying the path pattern e.g. by findpattern.

Raises:
pass-through

splitpathx_posix

filesysobjects.paths.splitpathx_posix(p, **kw)[source]

Split pathnames containing ‘literal’, ‘glob’, and ‘re/regexpr’. Serves the source platform POSIX and alike. For the call interface see splitpathx()

Args:
p:
The path name to split.
kargs:
apppre:

Application prefix.

default := False

keepsep:

Keeps seprator, in particular the trailing.

default := False

strip:

Strip separators, in particular the trailing.

default := False

stripquote:

Strips filesysobjects triple-quotes.

default := False

tpf:

Target platform. Defines some fine-tuning, e.g. for the file-URI, see splitpathx.

default := current OS.

Returns:
The splitted path, else [].
Raises:
pass-through

splitpathx_win

filesysobjects.paths.splitpathx_win(p, **kw)[source]

Split windows pathnames containing ‘literal’, ‘glob’, and ‘re/regexpr’. Serves the source platform windows and alike. For the call interface see splitpathx()

Args:
p:
The path name to split.
kargs:
apppre:

Application prefix.

default := False

keepsep:

Keeps seprator, in particular the trailing.

default := False

strip:

Strip separators, in particular the trailing.

default := False

stripquote:

Strips filesysobjects triple-quotes.

default := False

tpf:

Target platform. Defines some fine-tuning, e.g. for the file-URI, see splitpathx.

default := current OS.

Returns:
The splitted path, else [].
Raises:
pass-through

sub_esc

filesysobjects.paths.sub_esc(it, spf=8256, strip=False, pathsep='', state=[], **kw)[source]

To be used by re.sub() - escapes backslashes and non-printable characters.

Args:
it:
iterator from re.sub.
spf:

Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.

For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().

strip:

pathsep:

state:

kw:
charback:
Escapes all backslashes within character classes. Could be combined with force and freeback.
force:

Escapes all back-slashes, else the special characters only. Unix processing of DOS paths requires all separators to be escaped.

default := False

freeback:
Escapes backslashes outside character classes. Could be combined with charback.

Returns:

Converted format win. E.g.

C:\Windows\system32\cmd.exe;C:\Windows\system32\notepad.exe
Raises:
pass-through

sub_keep

filesysobjects.paths.sub_keep(it, spf=8256, strip=True, pathsep='')[source]

To be used by re.sub() - keeps mixed.

sub_posix

filesysobjects.paths.sub_posix(it, spf=8256, strip=True, pathsep=':', state=None, **kw)[source]

To be used by re.sub() - converts to posix.

Replaces ‘[/]’ with ‘/’, and ‘[;:]’ with ‘:’.

Posix does not have drives, just ignores the drive-property, assumes these are ordinary characters. When drives are required as syntax tokens refer to ‘Cygwin’.

Args:
it:
Iterator from re.sub.
spf:

Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.

For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().

strip:
Strip redundancies.
pathsep:

Input separator ‘pathsep’ to be be replaced.

pathsep := ':' ';' ''

One or more are allowed, is used as a set containment of replacement checks. Empty string disables the replacement.

state:
Compile states.
kw:
apppre:
Application prefix.
keepsep:
Keeps seprator, in particular the trailing.
stripquote:
Strips filesysobjects triple-quotes.
Returns:

Converted format posix. E.g.:

c:/Windows/system32/cmd.exe:c:/Windows/system32/notepad.exe
Raises:
pass-through

sub_unesc

filesysobjects.paths.sub_unesc(it, _t=None, spf=None, state=[], **kw)[source]

To be used by re.sub() - unescapes backslashes and non-printable characters.

Args:
it:
iterator from re.sub.

_t:

spf:

Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.

For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().

state*:

kw:
all or force:

Unescapes all back/slashes, else the special characters only. Unix processing of DOS paths requires all separators to be escaped and therefore eventually to be unescaped too.

default := False

Returns:

Converted format win. E.g.:

C:\Windows\system32\cmd.exe;C:\Windows\system32\notepad.exe

Raises:

pass-through

sub_win

filesysobjects.paths.sub_win(it, spf=8256, strip=True, pathsep=';', state=None, **kw)[source]

To be used by re.sub() - converts to windows.

Replaces ‘[/\]’ with ‘\’, and ‘[;:]’ with ‘;’.

Args:
it:
iterator from re.sub.
spf:

Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.

For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().

strip:
Strip redundancies.
pathsep:

Input separator ‘pathsep’ to be be replaced.

pathsep := ':' ';' ''

One or more are allowed, is used as a set containment of replacement checks. Empty string disables the replacement.

state:
Compile states.
kw:
apppre:
Application prefix.
keepsep:
Keeps seprator, in particular the trailing.
stripquote:
Strips filesysobjects triple-quotes.
Returns:

Converted format win. E.g.:

C:\Windows\system32\cmd.exe;C:\Windows\system32\notepad.exe
Raises:
pass-through

unescapepathx

filesysobjects.paths.unescapepathx(spath, **kargs)[source]

Unescape path - which has been escaped before. The path representation could either be as a string/unicode or split components as a list or tuple.

Warning

Processes strings accurately which were processed by escapepathx() before, else the result could be erroneous. In particular for windows paths due to the ambiguity of the ‘\’!

The same masking rules apply as for the normpathx() and escapepathx() calls. Escape sequences could be protected by quoting, which keeps the content literally. See pathtools.stripquotes.

Args:
spath:

The path to be unescaped.

spath := (
     <path-string>
   | <path-array>
)

path-string := (str | unicode)
path-array := (list | tuple)
  • path-string

    The string representation of a complete path, which may contain literal, glob, and re expressions. The supported character representation is str or unicode for Pyton2.7 and Python3.5+.

  • path-array

    The component representation of a path, which consists of it’s items, either as a list or as a tuple. Each item may contain literal, glob, and re expressions.

kargs:
tpf:
Target platform, currently not used.
netpath:

When True considers double prefix separators as share and/or network application, else assumes these are the result of escaping with force.

default := False

Returns:

The unescaped path with removed ‘’ in accordance to the rules and chosen options. The return type of the representation is the same as the input representation.

str      =>  str
unicode  =>  unicode
list     =>  list
tuple    =>  tuple
Raises:

PathError

FileSysObjectsError

TypeError

pass-through

Exceptions

PathException

PathTargetAccessException