-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement path functions around ADT #53
Comments
@ndmitchell Any news on this front? Is the intention to have strongly typed paths eventually? |
No progress on this front at all, not looked at it at all. There is no intention to change the API with this work - it's about cleaning up the core and ensuring consistency. |
@ndmitchell I came up with the following. This is for windows only for now, doesn't use a real parser since it's rather small and passes the data Lexeme = NS NameSpace
| Disk Char
| Device String
| Share String
| Separators [Char]
| FileName String
deriving Show
data NameSpace = FileNameSpace
| DeviceNameSpace
| NTNameSpace
deriving Show
-- | Parse a filepath into lexemes.
--
-- > display (parse x) === x
parse :: String -> [Lexeme]
parse fp'
-- '\\?\UNC\share\path'
| Just (lx, rest) <- parseExtendedUNC fp' = lx ++ parsePath rest
-- '\\?\C:\path', '\\.\C:\path', '\??\C:\path'
| Just (lx1, r1) <- parseNameSpace fp'
, Just (lx2, r2) <- parseDisk r1 = lx1 ++ lx2 ++ parsePath r2
-- '\\.\COM1'
| Just (lx1@(NS DeviceNameSpace:_), r1) <- parseNameSpace fp'
, Just (lx2, r2) <- parseDevice r1 = lx1 ++ lx2 ++ parsePath r2
-- '\\?\some\other', '\\.\some\other', '\??\some\other'
-- detects no device/disk
| Just (lx, rest) <- parseNameSpace fp' = lx ++ parsePath rest
-- 'C:\path'
| Just (lx, rest) <- parseDisk fp' = lx ++ parsePath rest
-- '\\share\path'
| Just (lx, rest) <- parseDriveShare fp' = lx ++ parsePath rest
-- 'relative\path' and everything else
| otherwise = parsePath fp'
where
parsePath :: String -> [Lexeme]
parsePath fp =
case parseFileName fp <|> parseSeparators fp of
Nothing -> []
Just (lx, []) -> lx
Just (lx, rest) -> lx ++ parsePath rest
parseFileName :: String -> Maybe ([Lexeme], String)
parseFileName fp =
case break isPathSeparator fp of
([], _) -> Nothing
(a, r) -> Just ([FileName a], r)
parseSeparators :: String -> Maybe ([Lexeme], String)
parseSeparators fp
| null a = Nothing
| otherwise = Just ([Separators a], b)
where (a, b) = span isPathSeparator fp
-- prefix only
parseExtendedUNC :: String -> Maybe ([Lexeme], String)
parseExtendedUNC ('\\':'\\' :'?':'\\':'U':'N':'C':s4:r1) | isPathSeparator s4 = do
(lx, r2) <- parseSeparators (s4:r1)
(share, r3) <- parseDriveShareName r2
pure ([NS FileNameSpace] ++ lx ++ share, r3)
parseExtendedUNC _ = Nothing
-- prefix only
parseNameSpace :: String -> Maybe ([Lexeme], String)
parseNameSpace ('\\':'?' :'?':'\\':rest) = Just ([NS NTNameSpace ], rest)
parseNameSpace ('\\':'\\':'?':'\\':rest) = Just ([NS FileNameSpace ], rest)
parseNameSpace ('\\':'\\':'.':'\\':rest) = Just ([NS DeviceNameSpace], rest)
parseNameSpace _ = Nothing
-- prefix or after any NameSpace
parseDisk :: String -> Maybe ([Lexeme], String)
parseDisk (x:':':rest) | isLetter x = Just ([Disk x], rest)
parseDisk _ = Nothing
parseDevice :: String -> Maybe ([Lexeme], String)
parseDevice fp =
case break isPathSeparator fp of
([], _) -> Nothing
(a, r) -> Just ([Device a], r)
-- \\sharename\
parseDriveShare :: String -> Maybe ([Lexeme], String)
parseDriveShare (s1:s2:xs) | isPathSeparator s1 && isPathSeparator s2 = do
(lx, rest) <- parseDriveShareName xs
pure ([Separators [s1, s2]] ++ lx, rest)
parseDriveShare _ = Nothing
-- assume you have already seen \\
-- share\bob -> "share\", "bob"
parseDriveShareName :: String -> Maybe ([Lexeme], String)
parseDriveShareName name = do
case break isPathSeparator name of
([], _) -> Nothing
(a, b) -> let (lx, r) = optional b parseSeparators
in Just ([Share a] ++ lx, r)
optional :: String -> (String -> Maybe ([Lexeme], String)) -> ([Lexeme], String)
optional input parser = maybe ([], input) id (parser input)
display :: [Lexeme] -> String
display lxs = case lxs of
(ns@(NS FileNameSpace):sep@(Separators _):share@(Share _):rest) ->
d ns ++ "UNC" ++ d sep ++ d share ++ display rest
other -> concatMap d other
where
d :: Lexeme -> String
d (NS FileNameSpace) = "\\\\?\\"
d (NS DeviceNameSpace) = "\\\\.\\"
d (NS NTNameSpace) = "\\??\\"
d (Disk c) = c:":"
d (Device dev) = dev
d (Share share) = share
d (Separators sep) = sep
d (FileName fn) = fn |
@hasufell that looks great and handles all the paths I'd expected. The only question I have is how does it handle Also This next one is arcane so feel free not to support it, but on a non-hierarchical path the Lastly I'll leave it up to you whether to support file streams or not, they a they aren't very common https://docs.microsoft.com/en-us/windows/win32/fileio/file-streams |
That actually makes sense in the light of a security vulnerability, which broke the handling of file extensions. E.g. current filepath behavior is this (which is incorrect):
There'll be some questions on how to preserve the
Right... but the lexer should probably parse it anyway. And then we reject it in e.g. EDIT: I tested it and
Right... but it seems both separators are valid in Other things I'm not sure about/still testing:
|
I have digged deeper into this and it appears that:
As such... I'm not sure there's a reasonable way to define what a Drive is in light of these possible expressions. Also relevant: https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumepathnamew |
Split-drive behavior comparison to python:
|
Probably can't be maintained for non-default streams.
Indeed, I was wrong, forgot that it becomes a relative path. So they're all valid. Just not absolute paths.
Yes with valid I didn't mean you get an error, I meant it doesn't return what a user might expect. It also returns something the user can't remove using any normal OS tools like Explorer. Hmm I don't think the
Yes, eventually they all access the same kernel objects, but the way there differs.
It's only special in that it's to replace the definition of the original
Or
I think a drive should stay what 99% of users expect, namely just the user level object mappings for local paths. e.g. I would expect most users to use the well known PATH syntax and these differences are only important for GHC or specialized program where I assume the programmer knows what they're doing and so wouldn't call
The one makes more sense to me than the Haskell one,
These don't make sense to me, I would have expected
hmm somewhat inconsistent |
What if we do this:
I think that's somewhat consistent. So everything after the first
To clarify: The rest of the intricacies can be left to the user, who chose to go down this path.
Well, yes potentially... because the question is what we consider a Drive. I think there are a couple of choices:
Yes, I think that our current |
That makes some sense, but needs to be documented, it may not be what one intuitively expects.
I find 3. the most logical. Handling the common cases should get the most bang for bucks. I would expect very little, if any programs to use But I personally think that it should treat the first
To me says Store in the With this inconsistency a user would need to know whether the patch is a local or UNC one when they call something that returns the drive, in order to get the
Wouldn't it make more sense to release it as a major version and take the breaking change? or a new name for the new behavior? Could always ask what people think about it. |
Alright... I tried to come up with a lax grammar around windows filepaths, based on the UNC grammar from MS docs: https://gist.github.com/hasufell/ce7c18ef3ad44d1d59b548206cdcb59d |
For info, @hasufell, I never expected this to use a real parser (and for dependency reasons it probably won't) - your notes seem plausible along the lines of what I was thinking, but if the ADT parser works better that seems reasonable too. This was more a direction of travel I thought would be good than precise thoughts. |
I just handrolled a parser that will work for both |
Pulling out from #12 (comment), I think most path/drive based functions (but not extension functions) should first parse their path, then modify, then render it. That gives a much smaller "trusted core" of FilePath.
We would rely on the property
display . parse == id
, so likely UNC would need extending to say which type of separators it was, whether it had?
etc.The text was updated successfully, but these errors were encountered: