Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict left folds of Stream (ByteStream m) m r? #41

Open
vdukhovni opened this issue Oct 19, 2020 · 1 comment
Open

Strict left folds of Stream (ByteStream m) m r? #41

vdukhovni opened this issue Oct 19, 2020 · 1 comment

Comments

@vdukhovni
Copy link
Contributor

vdukhovni commented Oct 19, 2020

While working on the expanded WIP README.md, I ended up thinking about what a general-purpose strict left fold would look like for a sub-divided bytestream (i.e. Stream (ByteStream m) m r, a stream of monadic bytestreams over the same monad).

The streaming library provides an excellent lazy right-fold for this type: streamFold, and also provides multiple strict left-folds, but only for Stream (Of a) m r and the like, because fitting into the composable parallel Fold paradigm from Control.Foldl requires pure stream values (the folds themselves can be monadic via FoldM, impurely, ... but the stream elements need to be pure to match that interface).

So types like Stream (ByteStream m) m r don't appear to be supported for strict left folds, which are often useful.

I came up with the below, which perhaps belongs in Streaming rather than Streaming.ByteString (if it is generally useful to add to the ecosystem). It has the disadvantage that it does match the Fold or FoldM signatures, so there is not yet a way to get parallel fold support, though I suspect that too could be added (by extending Control.Foldl with a new mechanism for stream-of-stream folds, where the pure values are one level down, and each sub-stream returns the next sub-stream head along with the monadic accumulator).

Anyway, this is the proof-of-concept function:

nestedFoldM :: Monad m
            => (forall k. x -> f k -> m (Of x k))
            -> m x
            -> (x -> m b)
            -> Stream f m r
            -> m (Of b r)
nestedFoldM step begin done = (begin >>=) . flip loop
  where
    loop !x = \case
      Return r -> (:> r) <$> done x
      Effect m -> m >>= loop x
      Step f   -> step x f >>= \(x' :> rest) -> loop x' rest

It is able to fold Stream (ByteString m) m r as follows (rewrite of README.md example that counts lines in an input file that start with the letter i as a fold, rather than a series of stream transformations):

{-# LANGUAGE BangPatterns, RankNTypes, LambdaCase #-}
module Main where
import qualified Streaming.ByteString as Q
import qualified Streaming.ByteString.Char8 as Q8
import Control.Monad.Trans.Resource (runResourceT)
import Data.Maybe (fromMaybe, listToMaybe)
import Streaming (Of(..))
import Streaming.Internal (Stream(..))
import Data.Word (Word8)
import System.Environment (getArgs)

countStarts :: Monad m => Word8 -> Int -> Q.ByteStream m r -> m (Of Int r)
countStarts !w !acc = \ !mbs -> Q.nextByte mbs >>= \case
    Right (c, t) | c == w    -> (:>) (acc+1) <$> Q.effects t
                 | otherwise -> (:>) acc <$> Q.effects t
    Left       r             -> return $ acc :> r

-- insert nestedFoldM here --

main :: IO ()
main = do
    fname <- listToMaybe <$> getArgs
    (n :> _) <- runResourceT
        $ nestedFoldM (countStarts 0x69) (return 0) return
        $ Q8.lines
        $ fromMaybe Q.stdin (Q.readFile <$> fname)
    print n

So my questions (issues) are: Is something like nestedFoldM a sensible interface to add to either Streaming or Streaming.ByteString? Can it be improved, or is it about right? And, finally, would it make sense to pursue composable parallel folds for this type of fold?

cc: @chessai , @cartazio, @archaephyrryx, @Bodigrim (feel free to remain silent, or say you don't care if this is of no interest...)

[ EDIT: It occurs to me that another approach might to expose the isomorphism f (Stream f m r) -> Stream (Of a) m r and then use some of the existing left fold machinery on that, with helper functions to combine the results of the inner folds into the outer accumulator, ... This might then make it possible to use the parallel fold machinery without changes to Control.Foldl, I'll explore this a bit further. Suggestions welcome... ]

@vdukhovni
Copy link
Contributor Author

Following up on the initial notes, I think the below comes closer to the mark.

  • Since the outer fold of the overall stream consumes monadic elements, while Control.Foldl does not consume its inputs linearly, we can't expect (and don't really need) any parallel folds of the outer stream
  • But the inner sub-streams are just streams of strict chunks, so they do support parallel folds, and all the machinery for that is already available
  • And since we have a stream of streams, and not just a single undifferentiated stream, it actually makes sense to start the fold of each inner stream with its own fresh accumulator, and then the outer stream just folds the results of the individual folds.
  • Therefore, it looks like the right type for inner fold parameter of the outer fold is just (forall k. f k -> m (Of a k)), which is compatible with the existing left folds available in the ecosystem.
  • The below defines just nestedFold, but it is clear how to define an analogous nestedFoldM...
{-# LANGUAGE BangPatterns, BlockArguments, RankNTypes, LambdaCase #-}
module Main where
import qualified Data.ByteString as B
import qualified Data.ByteString.Unsafe as B
import qualified Streaming.ByteString as Q
import qualified Streaming.ByteString.Char8 as Q8
import Control.Monad.Trans.Resource (runResourceT)
import Data.Maybe (fromMaybe, listToMaybe)
import Streaming (Of(..))
import Streaming.Internal (Stream(..))
import Data.Word (Word8)
import System.Environment (getArgs)

countStarts :: Monad m => Word8 -> Q.ByteStream m r -> m (Of Int r)
countStarts w = Q.chunkFold step Nothing \case { Just 1 -> 1; _ -> 0 }
  where
    step Nothing bs | B.length bs == 0 = Nothing
                    | B.unsafeHead bs == w = Just 1
                    | otherwise = Just 0
    step x _ = x

nestedFold :: Monad m
           => (forall k. f k -> m (Of a k))
           -> (b -> a -> b)
           -> b
           -> Stream f m r
           -> m (Of b r)
nestedFold inner merge = loop
  where
    loop !b = \case
        Return r -> return $ b :> r
        Effect m -> m >>= loop b
        Step f   -> inner f >>= \(a :> rest) -> loop (merge b a) rest

main :: IO ()
main = do
    fname <- listToMaybe <$> getArgs
    (n :> _) <- runResourceT
        $ nestedFold (countStarts 0x69) (+) 0
        $ Q8.lines
        $ fromMaybe Q.stdin (Q.readFile <$> fname)
    print n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant