‘filesysobjects.paths’ - Module¶
The filesysobjects.paths module provides advanced operations on paths and search paths.
Implementation Details¶
Glob Parameters¶
The ‘glob‘ wildcard definitions are a subset of regular expressions which additionally deviate slightly by their semantics. In the case of a dot for example this could be in addition ambiguous.
1 2 3 4 5 | # file path name: /a/b/xname
path0 = '/a/b/x.*'
# regexpr: matches
# glob: does not match
|
Therefore the resolution of contained path-elements as ‘glob’ expressions are resolved dynamically by applying the glob module onto the file system nodes.
Regular Expressions¶
The regular expressions support the full scope of the standard Pyhton ‘re‘ module. The expressions are used as post-filter onto a set of fetched file system node path names.
The regexpr are by default compiled/loaded statically during load time of the module. The regular expressions for the path analysis contain the ‘os.path.sep‘ of the current platform which could be altered as parameter by some interfaces. In this case the function interfaces compile the adapted regexpr into a local stack-variable, where the compiled match object is cached by the module ‘re‘, but potentially will be compiled for each call again. The object interface may nont have to handle with this within the lifetime of the instances.
Common Call Parameters¶
appre¶
keepsep¶
pathsep¶
stripquote¶
Strips special Python style filesysobject triple-quotes.
/my/path/with/"""double-quoted"""/dirs => /my/path/with/double-quoted/dirs
/my/path/with/'''single-quoted'''/dirs => /my/path/with/single-quoted/dirs
Current version does not support nested triple-quotes, neither homogeneous, nor heterogeneous.
spf¶
Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf. For additional details refer to tpf and spf, filesysobjects.getspf(), filesysobjects.gettpf(), paths.normpathx(), paths.splitpathx(), and apppaths.normapppathx(),
default := RTE
tpf¶
Target platform. For the syntax refer to API in the manual at tpf. For additional details refer to tpf and spf, filesysobjects.getspf(), filesysobjects.gettpf(), paths.normpathx(), paths.splitpathx(), and apppaths.normapppathx(),
default := spf ( == RTE)
Supported Conversions¶
The filesysobjects provides functions for the normalization and the split of resource paths. This includes the cross-conversion of the major filesysytem platforms Posix and Windows. In addition various conversions are supported. These comprise the equivalent conversion between semantically compatible representations, and the conversion between syntactical compatible respresentations without assured semantic equivalency. The latter requires for the semantical equivalency additional conventions, e.g. for the conversion of ‘z:mypath’ from windows to Posix.
The split of all supported platforms for custom applications is supported.
normpathx¶
The normpathx supports the normalization of resource paths for the native platform, and the cross-conversion for a limited set of other platforms when the call parameters spf and tpf are used.
The following basic cross-conversions are supported.
posix | win | cygwin | file (0) | file (2) | file (3) | file (4) | file (5) | unc | |
---|---|---|---|---|---|---|---|---|---|
posix | c | f | c | c | c | c | |||
win | f | c | c | c | c | c | |||
cygwin | c | c | c | c | c | c | |||
file (0) | c | c | c | c | c | c | |||
file (2) | c | ||||||||
file (3) | c | c | c | c | c | c | |||
file (4) | c | c | c | ||||||
file (5) | c | c | c | ||||||
unc | f | f | c | c | c |
c: Could be converted into equivalent representation.
f: Format conversion without assured equivalency. The accuracy depends on additional conventions.
(0): file uri - short form:
'file:/my/path' == '/my/path'
'file:c:/my/path' == 'c:/my/path' == 'c:\my\path'
(2): file uri - remote file:
'file\://myhost/my/path' == '//myhost/my/path'
(3): file uri - traditional:
'file:///my/path'== '/my/path'
(4): file uri - UNC:
'file:////myhost/myshare/my/path' == '\\myhost\myshare\my\path'
(5): file uri - UNC:
'file://///myhost/myshare/my/path' == '\\myhost\myshare\my\path'
For details of the sub-conversion, e.g. file-URI with 4 or 5 slashes refer to the API.
Module¶
The ‘filesysobjects.paths’ module provides operations on static file resource paths.
Constants¶
Note
The displayed numeric values for the enums are for debugging support only and may change apperantly, use the symbolic names only.
Simple Path Scanner - Parser¶
The scanner and parser for file path name rules in accordance to Posix/IEEE-1003.1 and NTFS/FAT path name specifications. Adds some minor causal constraints on esoteric cases for ambiguity resolution.
The base for the path compilers of filesysobjects.paths.normpathx, filesysobjects.paths.splitpathx, filesysobjects.paths.escapepathx, and the filesysobjects.paths.unescapepathx.
PATHSCANNER
Generic tokenizer for file path names, supports Posix/IEEE-1003.1 and NTFS/FAT path names.
Control Constants - Tokens:
- SC_BSPAIR(1000) - ‘\’ pair
- SC_CIFS(1010) - cifs:
- SC_CRMASK(1020) - masked ‘\n’
- SC_DOIT(1030) - out of range
- SC_DQUOTED(1040) - “
- SC_DRIVE(1050) - DOS DRIVE LETTER - OR A DIRECTORY ON POSIX !!!
- SC_DRIVENPSEP(1060) - dos drive letter following n * posix_sep
- SC_DRIVENWSEP(1070) - dos drive letter following n * win_sep
- SC_DUMMY(1080) - for tests
- SC_EACHOF(1090) - assure for each
- SC_ESCCHAR(1100) - ‘\[abf...]’
- SC_FABS(1110) - file:///path - absolute path - rfc8089 rfc1738
- SC_FILE(1120) - file:
- SC_FMIN(1130) - file:/path - min rfc8089 - Appendix B
- SC_FNONLOCAL(1140) - file://host/path non-local - rfc8089 - Appendix B / maps to Posix-App
- SC_FSHORT(1150) - file:<dos-drive>:path - short-form - rfc8089
- SC_FUNC(1160) - file:///// | file://// - share/netapp - rfc8089 - Appendix E.3.2
- SC_HTTP(1170) - http:
- SC_KEEP(1180) - keep literally
- SC_MASKALL(1190) - keep literally
- SC_NULLDIR(1200) - ‘/./’
- SC_PAPP(1210) - Posix-Net-App * 2 * ‘/’ + posix-rules + “causal constraints”
- SC_PDOM(1220) - ‘//’ share/posix-app - 2 * ‘/’ + domain-rules
- SC_PSEPP(1230) - ‘:’
- SC_PSEPW(1240) - ‘;’
- SC_REPLACE(1250) - replace an equal set of chars e.g. ‘/’ or ‘\’
- SC_SEPP(1260) - n * Posix path.sep
- SC_SEPW(1270) - 1 * win path.sep
- SC_SLASH(1280) - ‘/’
- SC_SLASHPREB(1290) - ‘\’ + ‘/’
- SC_SMB(1300) - smb:
- SC_SQUOTED(1310) - ‘
- SC_TOEVEN(1320) - assure count is even
- SC_U16(1330) - unicode-16
- SC_U16R(1340) - unicode-16 raw
- SC_U32(1350) - unicode-32
- SC_U32R(1360) - unicode-32 raw
- SC_UNC(1370) - unc:
- SC_UPDIR(1380) - ‘/../’
- SC_WDOM(1390) - Win-Domain - 2 * ‘\’ + domain-rules
Context Maps:
ASCII_SC_CTRL
Map matche groups to appropriate control tokens.
Scanner - Parser Registry
Set of current registered scanners - parsers. To be used for re functions.
1 2 3 4 5 6 7 8 9 10
sub_path_calls = { #: vector for escaping and normalization 'b': sub_win, 'k': sub_keep, 'keep': sub_keep, 'posix': sub_posix, 's': sub_posix, 'uri': sub_posix, 'win': sub_win, 'win32': sub_win, }
Miscellaneous¶
- _CPSPLIT - static compiled: split pathnames with literal + glob + regexpr.
- _ENV_SPLIT - Split-out environment variables for substitution.
- _ENV_SPLITg - Entry points into sub strings environment variables and literals.
Functions¶
escapepathx¶
-
filesysobjects.paths.
escapepathx
(spath, tpf=None, **kargs)[source]¶ Escape special characters within path names, supports cross-platform processing, knows the special escape characters of Python and re. The characters could be masked by quoting, and/or enclosing in character classes.
input -> esc -> unesc \abc”\n” \\abc”\n” \abc”\n” \”abc\n” \\”abc\n” \”abc\n” \abc\n \\abc\\n \abc\n \xy” “z \\xy” “z \xy” “z \”xy z” \\”xy z” \”xy z” \xy z \\xy\ z \xy z - Args:
- spath:
The path to be escaped.
spath := ( <path-string> | <path-array> ) path-string := (str | unicode) path-array := (list | tuple)
path-string
The string representation of a complete path, which may contain literal, glob, and re expressions. The supported character representation is str or unicode for Pyton2.7 and Python3.5+.
path-array
The component representation of a path, which consists of it’s items, either as a list or as a tuple. Each item may contain literal, glob, and re expressions.
- tpf:
- Target path separator, currently not used.
kargs:
- charback:
Escapes all backslashes within character classes. Could be combined with force and freeback.
\[\] => \[\\]
- force:
Controls the escaped scope. Excludes quoted strings and character classes. Could be combined with charback.
force = ( True # escape characters and any free backslash | False # defined escape characters only ) force == True \a\X\n => \\a\\X\\n force == False \a\X\n => \\a\X\\n
default := False
- freeback:
Escapes backslashes outside character classes. Could be combined with charback.
\[\] => \b\[\]
- Returns:
The escaped path with added ‘\‘ in accordance to the rules and chosen options. The return type of the representation is the same as the input representation.
str => str unicode => unicode list => list tuple => tuple
- Raises:
PathError
FileSysObjectsError
TypeError
pass-through
normpathx¶
-
filesysobjects.paths.
normpathx
(spath, **kargs)[source]¶ Normalize paths, similar to ‘os.path.normpath()’ - with optional extensions paths with basic application schemes and search paths, dos-drives, and the split of paths into directories. The various representations could be converted on-the-fly.
smb, cifs, file, http/https, UNC, POSIX-network apps
For advanced processing of application schemes refer to normapppathx() and ‘splitapppathx()’. The path could include regular expressions re and glob, literals and masked parts.
regular expressions
The supported regular expressions are native Python regular expressions as supported by ‘re’ with support of expressions spanning multiple directories.
globs
Standard module glob.
literals:
Any literal path.
Regular expressions and globs could be masked as quoted strings, which are kept unchanged.
The normpathx provides the features as simple interface for the normalization across multiple platforms. The companion interface provide various features, e.g. the escapepathx and unescapepathx of path names including re and glob.
- Args:
- spath:
- A single path entry - no valid ‘os.pathsep’. In case of required search path including semantic ‘os.pathsep’ use ‘splitapppathx()’.
kargs:
- apppre:
Application prefix.
default:=False
- keepsep:
- Keeps significant seperators, in particular the trailing path separator ‘sep’, and the trailing search path ‘pathseparator’.
- strip:
Strips redundancies from path names,
"a/.//./b/c/../" => "a/b"
see related ‘keepsep’
"a/.//./b/c/../" => "a/b/"
default:=True
- stripquote:
Removes paired triple-quotes of protected/masked string sections.
"/a/'''head:'''/c" => "/a/head:/c"
default := False
- spf:
Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.
For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().
- tpf:
Target platform, defines the output syntax domain. For the syntax refer to the API in the manual at tpf.
For additi0onal details refer to tpf and spf, paths.gettpf(), normapppathx(), normpathx().
- pathsep:
Changes path separator for the source platform.
pathsep := ( (: | ;) # replaces by ':' or ';' | <keyword> | <#enum> )
- Returns:
- Normalized path.
- Raises:
PathError
pass-through
splitpathx¶
-
filesysobjects.paths.
splitpathx
(spath, **kw)[source]¶ Split pathnames into a list/tuple of items for each directory. For example
In [15]: filesysobjects.paths.splitpathx("/a/b/c") Out[15]: ('', 'a', 'b', 'c') In [16]: filesysobjects.paths.splitpathx("x:/a/b/c") Out[16]: ('x:', 'a', 'b', 'c') In [17]: filesysobjects.paths.splitpathx("x:\a\b\c") Out[17]: ('x:', 'a', 'b', 'c')
For URI*s and search paths refer to *splitapppathx.
Supports directory name types as ‘literal’, ‘glob’, and ‘re/regexpr’. Supports the same syntax elements as normpathx, while it is prepared to simple application of the built-in join() with os.sep.
Is not aware of application tags except Network-Shares, Posix-Applications, and file-URI.
- REMARK:
- The intention is to replace the ‘str.split()’ method for the split of the path parts, thus this is different to the method ‘os.path.split()’.
Args:
- spath:
- Path to split.
kw:
- apppre:
Application prefix, when ‘True’ the scheme is included, else dropped.
apppre=(True|False)
- keepsep:
Modifies the behavior of ‘strip’ parameter. If ‘False’, the trailing separator is dropped.
splitpathx('/a/b', keepsep=False) => ('', 'a', 'b') splitpathx('/a/b/', keepsep=False) => ('', 'a', 'b')
for ‘True’ trailing separators are kept as directory marker:
splitpathx('/a/b', keepsep=True) => ('', 'a', 'b') splitpathx('/a/b/', keepsep=True) => ('', 'a', 'b', '')
- pathsep:
Optional search path separator.
posix: ‘:’
win32: ‘;’
default := os.pathsep
- sep:
Optional path separator.
posix: ‘/’
win32: ‘’
default := os.path.sep
- strip:
Removes null-entries.
default := False
- stripquote:
Removes paired triple-quotes of protected/masked string sections.
"/a/'''head:'''/c" => "/a/head:/c"
default := False
- spf:
Source platform, defines the input syntax domain. For the syntax refer to the API in the manual at spf.
For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().
- tpf:
Target platform. Even though the splitted form of a resource path is basically canonical, some details of the specifications for slightly variations requires the granular fine-tuning. Thus defines in case of ambiguity the scheme for apppre=True. Accepts the following values only.
tpf := ( RTE_FILEURI0 | 'fileuri0' # RFC8089 - minimal | RTE_FILEURI4 | 'fileuri4' # RFC8089 - 4-slash UNC/POSIX-app | RTE_FILEURI5 | 'fileuri5' # RFC8089 - 5-slash UNC/POSIX-app | RTE_FILEURI | 'fileuri' # RFC8089 - canonical )
- Returns:
A list containing the path split into it’s components. The list is prepared to be concatenated by join().
The interface is aware of the os.path.sep character, but a present regular expression may span multiple path components, which have to be handled dynamically when applying the path pattern e.g. by findpattern.
- Raises:
- pass-through
splitpathx_posix¶
-
filesysobjects.paths.
splitpathx_posix
(p, **kw)[source]¶ Split pathnames containing ‘literal’, ‘glob’, and ‘re/regexpr’. Serves the source platform POSIX and alike. For the call interface see splitpathx()
- Args:
- p:
- The path name to split.
- kargs:
- apppre:
Application prefix.
default := False
- keepsep:
Keeps seprator, in particular the trailing.
default := False
- strip:
Strip separators, in particular the trailing.
default := False
- stripquote:
Strips filesysobjects triple-quotes.
default := False
- tpf:
Target platform. Defines some fine-tuning, e.g. for the file-URI, see splitpathx.
default := current OS.
- Returns:
- The splitted path, else [].
- Raises:
- pass-through
splitpathx_win¶
-
filesysobjects.paths.
splitpathx_win
(p, **kw)[source]¶ Split windows pathnames containing ‘literal’, ‘glob’, and ‘re/regexpr’. Serves the source platform windows and alike. For the call interface see splitpathx()
- Args:
- p:
- The path name to split.
- kargs:
- apppre:
Application prefix.
default := False
- keepsep:
Keeps seprator, in particular the trailing.
default := False
- strip:
Strip separators, in particular the trailing.
default := False
- stripquote:
Strips filesysobjects triple-quotes.
default := False
- tpf:
Target platform. Defines some fine-tuning, e.g. for the file-URI, see splitpathx.
default := current OS.
- Returns:
- The splitted path, else [].
- Raises:
- pass-through
sub_esc¶
-
filesysobjects.paths.
sub_esc
(it, spf=8256, strip=False, pathsep='', state=[], **kw)[source]¶ To be used by re.sub() - escapes backslashes and non-printable characters.
- Args:
- it:
- iterator from re.sub.
- spf:
Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.
For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().
strip:
pathsep:
state:
- kw:
- charback:
- Escapes all backslashes within character classes. Could be combined with force and freeback.
- force:
Escapes all back-slashes, else the special characters only. Unix processing of DOS paths requires all separators to be escaped.
default := False
- freeback:
- Escapes backslashes outside character classes. Could be combined with charback.
Returns:
Converted format win. E.g.
C:\Windows\system32\cmd.exe;C:\Windows\system32\notepad.exe
- Raises:
- pass-through
sub_keep¶
sub_posix¶
-
filesysobjects.paths.
sub_posix
(it, spf=8256, strip=True, pathsep=':', state=None, **kw)[source]¶ To be used by re.sub() - converts to posix.
Replaces ‘[/]’ with ‘/’, and ‘[;:]’ with ‘:’.
Posix does not have drives, just ignores the drive-property, assumes these are ordinary characters. When drives are required as syntax tokens refer to ‘Cygwin’.
- Args:
- it:
- Iterator from re.sub.
- spf:
Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.
For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().
- strip:
- Strip redundancies.
- pathsep:
Input separator ‘pathsep’ to be be replaced.
pathsep := ':' ';' ''
One or more are allowed, is used as a set containment of replacement checks. Empty string disables the replacement.
- state:
- Compile states.
- kw:
- apppre:
- Application prefix.
- keepsep:
- Keeps seprator, in particular the trailing.
- stripquote:
- Strips filesysobjects triple-quotes.
- Returns:
Converted format posix. E.g.:
c:/Windows/system32/cmd.exe:c:/Windows/system32/notepad.exe
- Raises:
- pass-through
sub_unesc¶
-
filesysobjects.paths.
sub_unesc
(it, _t=None, spf=None, state=[], **kw)[source]¶ To be used by re.sub() - unescapes backslashes and non-printable characters.
- Args:
- it:
- iterator from re.sub.
_t:
- spf:
Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.
For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().
state*:
- kw:
- all or force:
Unescapes all back/slashes, else the special characters only. Unix processing of DOS paths requires all separators to be escaped and therefore eventually to be unescaped too.
default := False
Returns:
Converted format win. E.g.:
C:\Windows\system32\cmd.exe;C:\Windows\system32\notepad.exe
Raises:
pass-through
sub_win¶
-
filesysobjects.paths.
sub_win
(it, spf=8256, strip=True, pathsep=';', state=None, **kw)[source]¶ To be used by re.sub() - converts to windows.
Replaces ‘[/\]’ with ‘\’, and ‘[;:]’ with ‘;’.
- Args:
- it:
- iterator from re.sub.
- spf:
Source platform, defines the input syntax domain. For the syntax refer to API in the manual at spf.
For additi0onal details refer to tpf and spf, paths.getspf(), normapppathx(), normpathx().
- strip:
- Strip redundancies.
- pathsep:
Input separator ‘pathsep’ to be be replaced.
pathsep := ':' ';' ''
One or more are allowed, is used as a set containment of replacement checks. Empty string disables the replacement.
- state:
- Compile states.
- kw:
- apppre:
- Application prefix.
- keepsep:
- Keeps seprator, in particular the trailing.
- stripquote:
- Strips filesysobjects triple-quotes.
- Returns:
Converted format win. E.g.:
C:\Windows\system32\cmd.exe;C:\Windows\system32\notepad.exe
- Raises:
- pass-through
unescapepathx¶
-
filesysobjects.paths.
unescapepathx
(spath, **kargs)[source]¶ Unescape path - which has been escaped before. The path representation could either be as a string/unicode or split components as a list or tuple.
Warning
Processes strings accurately which were processed by escapepathx() before, else the result could be erroneous. In particular for windows paths due to the ambiguity of the ‘\’!
The same masking rules apply as for the normpathx() and escapepathx() calls. Escape sequences could be protected by quoting, which keeps the content literally. See pathtools.stripquotes.
- Args:
- spath:
The path to be unescaped.
spath := ( <path-string> | <path-array> ) path-string := (str | unicode) path-array := (list | tuple)
path-string
The string representation of a complete path, which may contain literal, glob, and re expressions. The supported character representation is str or unicode for Pyton2.7 and Python3.5+.
path-array
The component representation of a path, which consists of it’s items, either as a list or as a tuple. Each item may contain literal, glob, and re expressions.
- kargs:
- tpf:
- Target platform, currently not used.
- netpath:
When True considers double prefix separators as share and/or network application, else assumes these are the result of escaping with force.
default := False
- Returns:
The unescaped path with removed ‘’ in accordance to the rules and chosen options. The return type of the representation is the same as the input representation.
str => str unicode => unicode list => list tuple => tuple
- Raises:
PathError
FileSysObjectsError
TypeError
pass-through