From 48b3dfc05fe5894dc2423f896586b910f95228c6 Mon Sep 17 00:00:00 2001 From: Mateusz Kowalczyk Date: Thu, 22 Dec 2016 01:07:49 +0000 Subject: [PATCH] Render text less There is a big performance hit w.r.t. rendering text. Here is flamegraph of ghc-events-analyze after this patch which illustrates the problem: http://fuuzetsu.co.uk/images/1482368246.png The reason for this is https://github.com/diagrams/SVGFonts/issues/16 ; For now this commit tries to mitigate the issue: * use a simpler font for vector graphics, `bit`, which loads much, much faster; the downside is it's ugly and has fewer characters; once upstream is fixed, we can revert and we should be fast(-ish) * Use `text` from diagrams for thread names; there is a problem here in that this `text` has no space as far as diagrams is concerned which means we can not tell how much space it occupies and therefore how much to move the boxes. We work around this by finding longest thread name, rendering it, taking width of that and assuming that's the widest one. Of course that's not true in general, there are wide glyphs &c. but worst-case scenario, we just overlap the first few blocks (and even then probably just the first, unused one). * For rendering the timeline numbers, memoize last rendering. The times on timeline are in strictly increasing order which means for `bucketTimes` values, we only render `length (nub bucketTimes)` times and not `length (bucketTimes)`. By the way here is the current heap: http://fuuzetsu.co.uk/images/1482368318.png and prof ``` Thu Dec 22 00:52 2016 Time and Allocation Profiling Report (Final) ghc-events-analyze +RTS -p -L40 -hc -s -RTS message-handling.eventlog total time = 6.70 secs (6697 ticks @ 1000 us, 1 processor) total alloc = 11,355,497,656 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc dot Linear.V2 7.2 14.4 fmap Linear.V2 6.1 5.0 atParam Diagrams.Segment 4.3 7.5 moveOriginTo.\ Diagrams.Core.Envelope 2.9 5.5 floatToDigits.gen Data.Text.Lazy.Builder.RealFloat 2.4 1.5 <> Diagrams.Core.Envelope 2.4 1.2 renderAttr.\ Graphics.Rendering.SVG 2.3 1.6 floatToDigits Data.Text.Lazy.Builder.RealFloat 2.3 2.5 getEnvelope.\ Diagrams.Segment 2.0 4.3 formatRealFloat.doFmt Data.Text.Lazy.Builder.RealFloat 2.0 1.0 transferFunction Data.Colour.SRGB 1.7 1.2 ^ Diagrams.Solve.Polynomial 1.7 1.3 getAttr Diagrams.Core.Style 1.7 0.8 getAttr.ty Diagrams.Core.Style 1.7 0.9 buildAttr Graphics.Svg.Core 1.6 2.7 measure Diagrams.Segment 1.5 1.4 floatToDigits.gen.(...) Data.Text.Lazy.Builder.RealFloat 1.5 2.0 getEnvelope.\.\ Diagrams.Segment 1.4 2.2 formatRealFloat Data.Text.Lazy.Builder.RealFloat 1.3 0.6 quadForm Diagrams.Solve.Polynomial 1.1 1.2 ^+^ Linear.Vector 1.1 0.8 concat.ts' Data.Text 1.0 1.0 getEnvelope.\ Diagrams.Segment 1.0 1.6 formatRealFloat.doFmt.(...) Data.Text.Lazy.Builder.RealFloat 0.7 1.2 writeReport'.writeLine GHC.RTS.Events.Analyze.Reports.Timed 0.7 1.5 fromHtmlEscapedText Blaze.ByteString.Builder.Html.Utf8 0.6 2.1 ``` ``` Generated message-handling.totals.txt using default script Generated message-handling.timed.svg using default script Generated message-handling.timed.txt using default script 18,049,345,672 bytes allocated in the heap 16,302,984,912 bytes copied during GC 204,379,936 bytes maximum residency (114 sample(s)) 1,706,848 bytes maximum slop 455 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 34700 colls, 0 par 0.976s 0.975s 0.0000s 0.0003s Gen 1 114 colls, 0 par 12.782s 12.796s 0.1122s 0.2042s INIT time 0.001s ( 0.001s elapsed) MUT time 9.758s ( 15.550s elapsed) GC time 7.984s ( 7.989s elapsed) RP time 0.000s ( 0.000s elapsed) PROF time 5.774s ( 5.782s elapsed) EXIT time 0.002s ( 0.002s elapsed) Total time 23.522s ( 23.542s elapsed) %GC time 33.9% (33.9% elapsed) Alloc rate 1,849,658,888 bytes per MUT second Productivity 41.5% of total user, 41.5% of total elapsed ``` This was ran on 819KB eventlog. In comparison, here is the situation before this and the previous commit (but mostly this): Heap: http://fuuzetsu.co.uk/images/1482369579.png ; many more times memory usage Flamegraph http://fuuzetsu.co.uk/images/1482369638.png ; looks a bit different because all that complex text that it's generating has to be stuck together but if we zoom in a little it's easy to see where time is going http://fuuzetsu.co.uk/images/1482369694.png Finally profiling and GC stats: ``` Thu Dec 22 01:08 2016 Time and Allocation Profiling Report (Final) ghc-events-analyze +RTS -p -L40 -hc -s -RTS message-handling.eventlog total time = 41.95 secs (41955 ticks @ 1000 us, 1 processor) total alloc = 81,752,383,184 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc atParam Diagrams.Segment 9.3 15.8 fmap Linear.V2 8.3 6.6 dot Linear.V2 6.6 9.9 ^ Diagrams.Solve.Polynomial 5.7 3.6 getEnvelope.\ Diagrams.Segment 4.2 5.0 quadForm Diagrams.Solve.Polynomial 4.0 3.4 floatToDigits.gen Data.Text.Lazy.Builder.RealFloat 3.7 2.4 ^+^ Linear.Vector 3.1 2.1 floatToDigits.gen.(...) Data.Text.Lazy.Builder.RealFloat 2.7 3.0 quadForm.q Diagrams.Solve.Polynomial 2.5 2.3 writeReport'.showValue GHC.RTS.Events.Analyze.Reports.Timed 2.4 3.3 moveOriginTo.\ Diagrams.Core.Envelope 2.1 2.8 >>= Text.Parsec.Prim 1.9 2.4 floatToDigits Data.Text.Lazy.Builder.RealFloat 1.9 2.7 getEnvelope.\.\ Diagrams.Segment 1.9 2.4 formatRealFloat.doFmt Data.Text.Lazy.Builder.RealFloat 1.7 1.0 measure Diagrams.Segment 1.5 1.0 <> Diagrams.Core.Envelope 1.4 0.7 quadForm.d Diagrams.Solve.Polynomial 1.1 1.3 <*> Linear.V2 1.1 0.2 floatToDigits.gen.mUpN' Data.Text.Lazy.Builder.RealFloat 1.1 0.4 satisfy Text.Parsec.Char 1.1 1.5 formatRealFloat Data.Text.Lazy.Builder.RealFloat 1.0 0.6 fromHtmlEscapedText Blaze.ByteString.Builder.Html.Utf8 0.3 1.2 append.x Data.Text 0.2 1.5 ``` ``` Generated message-handling.totals.txt using default script Generated message-handling.timed.svg using default script Generated message-handling.timed.txt using default script 129,320,970,480 bytes allocated in the heap 499,460,983,080 bytes copied during GC 1,516,879,816 bytes maximum residency (587 sample(s)) 8,188,512 bytes maximum slop 2968 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 248630 colls, 0 par 4.016s 4.011s 0.0000s 0.0003s Gen 1 587 colls, 0 par 419.172s 420.109s 0.7157s 1.4724s INIT time 0.002s ( 0.002s elapsed) MUT time 56.979s (262.965s elapsed) GC time 217.769s (218.280s elapsed) RP time 0.000s ( 0.000s elapsed) PROF time 205.420s (205.839s elapsed) EXIT time 0.027s ( 0.027s elapsed) Total time 480.199s (481.274s elapsed) %GC time 45.3% (45.4% elapsed) Alloc rate 2,269,632,625 bytes per MUT second Productivity 11.9% of total user, 11.8% of total elapsed ``` We have around an order of magnitude improvement in time and space We can do more of course but I would consider these short-term fixes. I plan to see what's happening in SVGFonts before pursuing more issues here. I don't want to have to go for a walk when the input size is few tens of MBs. --- .../RTS/Events/Analyze/Reports/Timed/SVG.hs | 99 +++++++++++++------ 1 file changed, 67 insertions(+), 32 deletions(-) diff --git a/src/GHC/RTS/Events/Analyze/Reports/Timed/SVG.hs b/src/GHC/RTS/Events/Analyze/Reports/Timed/SVG.hs index 69756ab..d19ab8b 100644 --- a/src/GHC/RTS/Events/Analyze/Reports/Timed/SVG.hs +++ b/src/GHC/RTS/Events/Analyze/Reports/Timed/SVG.hs @@ -4,7 +4,7 @@ module GHC.RTS.Events.Analyze.Reports.Timed.SVG ( writeReport ) where -import Data.Maybe (catMaybes) +import Data.List (foldl') import Data.Monoid ((<>)) import Diagrams.Backend.SVG (B, renderSVG) import Diagrams.Prelude (QDiagram, Colour, V2, N, Any, (#), (|||)) @@ -15,6 +15,7 @@ import qualified Data.Map as Map import qualified Diagrams.Prelude as D import qualified Graphics.SVGFonts.Fonts as F import qualified Graphics.SVGFonts.Text as F +import qualified Diagrams.TwoD.Text as TT #if !MIN_VERSION_base(4,8,0) import Data.Monoid (mempty, mconcat) @@ -24,7 +25,7 @@ import GHC.RTS.Events.Analyze.Types import GHC.RTS.Events.Analyze.Reports.Timed hiding (writeReport) writeReport :: Options -> Quantized -> Report -> FilePath -> IO () -writeReport options quantized report path = +writeReport options quantized report path = do uncurry (renderSVG path) $ renderReport options quantized report type D = QDiagram B V2 (N B) Any @@ -44,7 +45,6 @@ renderReport options@Options{..} rendered = D.vcat $ map (uncurry renderSVGFragment) $ zip (cycle [D.white, D.ghostwhite]) (SVGTimeline : fragments) - fragments :: [SVGFragment] fragments = map (renderFragment options) $ zip report (cycle allColors) @@ -53,7 +53,7 @@ renderReport options@Options{..} padHeader (2 * optionsBucketHeight) title renderSVGFragment bg (SVGLine header blocks) = -- Add empty block at the start so that the whole thing doesn't shift up - (padHeader optionsBucketHeight header ||| (blocks <> (block options 0 # D.lw D.none))) + (padHeader optionsBucketHeight (renderText header (optionsBucketHeight + 2)) ||| (blocks <> (block options 0 # D.lw D.none))) `D.atop` (D.rect lineWidth optionsBucketHeight # D.alignL # D.fc bg # D.lw D.none) renderSVGFragment _ SVGTimeline = @@ -68,29 +68,37 @@ renderReport options@Options{..} D.translateX (0.5 * optionsBucketWidth) h <> D.rect headerWidth height # D.alignL # D.lw D.none + -- optimisation: find the longest text header, render + -- it then check the rendered size and use that for + -- width; it does not necessarily mean it's the right + -- width to use but it's good enough considering speed + -- trade-off headerWidth :: Double - headerWidth = optionsBucketWidth -- extra padding - + (maximum . catMaybes . map headerWidthOf $ fragments) + headerWidth = optionsBucketWidth + widestHeader -- extra padding - headerWidthOf :: SVGFragment -> Maybe Double - headerWidthOf (SVGLine header _) = Just (D.width header) - headerWidthOf _ = Nothing + widestHeader :: Double + widestHeader = + let headers = [ (header, length header) | SVGLine header _ <- fragments ] + (maxHeader, _) = foldl' (\(s, l) (s', l') -> + if l' > l then (s', l') else (s, l)) + ("", 0) headers + in D.width $! mkSVGText maxHeader (optionsBucketHeight + 2) data SVGFragment = SVGTimeline | SVGSection D - | SVGLine D D + | SVGLine String D renderFragment :: Options -> (ReportFragment, Colour Double) -> SVGFragment renderFragment options@Options{..} = go where go :: (ReportFragment, Colour Double) -> SVGFragment - go (ReportSection title,_) = SVGSection (renderText title (optionsBucketHeight + 2)) + go (ReportSection title,_) = SVGSection (mkSVGText title (optionsBucketHeight + 2)) go (ReportLine line,c) = uncurry SVGLine $ renderLine options c line -renderLine :: Options -> Colour Double -> ReportLine -> (D, D) +renderLine :: Options -> Colour Double -> ReportLine -> (String, D) renderLine options@Options{..} lc line@ReportLineData{..} = - ( renderText lineHeader (optionsBucketHeight + 2) + ( lineHeader -- renderText lineHeader (optionsBucketHeight + 2) , blocks lc <> bgBlocks options lineBackground ) where @@ -119,16 +127,20 @@ bgBlocks options = go | b <- [fr .. to] ] -renderText :: String -> Double -> D -renderText str size = - D.stroke textSVG # D.fc D.black # D.lc D.black # D.alignL # D.lw D.none +-- | Create a text diagram that is sized (as opposed to 'renderText'). +-- The problem with this function is that it's *extremely* slow and +-- memory hungry in comparison to something simple like 'TT.text'. +-- This function should therefore be used as little as possible. +mkSVGText :: String -> Double -> D +mkSVGText str size = + D.stroke textSVG # D.fc D.black # D.lc D.black # D.alignL # D.lw D.none where textSVG = F.textSVG' (textOpts size) str textOpts :: Double -> TextOpts Double textOpts size = TextOpts { - textFont = F.lin + textFont = F.bit , mode = F.INSIDE_H , spacing = F.KERN , underline = False @@ -136,6 +148,13 @@ textOpts size = , textHeight = size } +-- | Render text with diagram's own engine. The issue with this text +-- is that it has no size: we can not tell how wide it is. For a +-- sized-text see 'mkSVGText'. +renderText :: String -> Double -> D +renderText str size = + TT.fontSizeL (size / 2) $ TT.alignedText 0 0.5 str + -- | Translate quantized value to opacity -- -- For every event and every bucket we record the percentage of that bucket @@ -160,27 +179,43 @@ block Options{..} i = timeline :: Options -> Int -> Timestamp -> D timeline Options{..} numBuckets bucketSize = - mconcat [ timelineBlock tb # D.translateX (fromIntegral tb * timelineBlockWidth) - | -- bucket number - b <- [0 .. numBuckets - 1] - -- timeline block number, index within this timeline block @(0 .. optionsTickEvery - 1)@ - , let (tb, tidx) = b `divMod` optionsTickEvery - -- we show the timeline block when the index is 0 - , tidx == 0 - ] + let timeBlocks = [ tb + | b <- [0 .. numBuckets - 1] + -- timeline block number, index within this timeline block @(0 .. optionsTickEvery - 1)@ + , let (tb, tidx) = b `divMod` optionsTickEvery + -- we show the timeline block when the index is 0 + , tidx == 0 ] + + -- memoize the rendering of the last time label: if it's the same + -- for the next 10 displays, why render it 10 times? Text is expensive. + in case foldl' (\acc tb -> timelineBlock acc tb) mempty timeBlocks of + (_, _, fullDiag) -> fullDiag + where timelineBlockWidth :: Double timelineBlockWidth = fromIntegral optionsTickEvery * optionsBucketWidth + moveAlongTimeline :: Int -> D -> D + moveAlongTimeline tb = D.translateX (fromIntegral tb * timelineBlockWidth) + -- Single block on the time-line; every 5 blocks a larger line and a time -- label; for the remainder just a shorter line - timelineBlock :: Int -> D - timelineBlock tb - | tb `rem` 5 == 0 - = D.strokeLine bigLine # D.lw (D.local 0.5) - <> (renderText (bucketTime tb) optionsBucketHeight # D.translateY (optionsBucketHeight - 2)) - | otherwise - = D.strokeLine smallLine # D.lw (D.local 0.5) # D.translateY 1 + timelineBlock :: (String, D, D) -> Int -> (String, D, D) + timelineBlock (lastStr, lastNumD, fullDiag) tb + | tb `rem` 5 == 0 = + let btime = bucketTime tb + myNum = if lastStr == btime + then lastNumD + else mkSVGText btime optionsBucketHeight # D.translateY (optionsBucketHeight - 2) + myDiag = D.strokeLine bigLine # D.lw (D.local 0.5) <> myNum + in (btime, myNum, fullDiag <> myDiag # moveAlongTimeline tb) + | otherwise = + let myDiag :: D + myDiag = D.strokeLine smallLine + # D.lw (D.local 0.5) + # D.translateY 1 + # moveAlongTimeline tb + in (lastStr, lastNumD, fullDiag <> myDiag) bucketTime :: Int -> String bucketTime tb = case optionsGranularity of