Extract contents of a code fence #246
Replies: 2 comments 1 reply
-
Have you been able to solve this ? I'm looking at doing something similar - I need to extract all code blocs (they're all yaml) and return them e.g. as slice of maps (one item per code block) |
Beta Was this translation helpful? Give feedback.
-
Late to this post, but here's something that works for me. This Walk()s the parsed document instead of rendering it, extracting the text of each code block every time one is encountered: package main
import (
"bytes"
"fmt"
"io"
"net/http"
"github.com/yuin/goldmark"
"github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/extension"
"github.com/yuin/goldmark/text"
)
// CodeBlock is a copy of a fenced code block from a Markdown document.
type CodeBlock struct {
Lang string
Code string
}
func main() {
// Get the Goldmark readme file as an example
resp, err := http.Get("https://raw.githubusercontent.com/yuin/goldmark/master/README.md")
if err != nil {
panic(err)
}
defer resp.Body.Close()
source, _ := io.ReadAll(resp.Body)
// Parse the input
md := goldmark.New(
goldmark.WithExtensions(extension.GFM),
)
doc := md.Parser().Parse(text.NewReader(source))
// doc.Dump(source, 0) // Uncomment to see the whole AST!
// Walk the AST and extract the code blocks
blocks := make([]CodeBlock, 0)
ast.Walk(doc, func(node ast.Node, entering bool) (ast.WalkStatus, error) {
if entering && (node.Kind() == ast.KindFencedCodeBlock || node.Kind() == ast.KindCodeBlock) {
block := CodeBlock{}
// Get the language if present. Will be "" otherwise.
if fcb, ok := node.(*ast.FencedCodeBlock); ok {
block.Lang = string(fcb.Language(source))
}
// Copy the code. Goldmark gives us []byte segments for each line.
// The trailing '\n' is included in each returned segment.
code := bytes.Buffer{}
for i := 0; i < node.Lines().Len(); i++ {
line := node.Lines().At(i)
code.Write(line.Value(source))
}
block.Code = code.String()
blocks = append(blocks, block)
}
return ast.WalkContinue, nil
})
// Print the results
for i, block := range blocks {
fmt.Printf("\nCode block #%d, lang='%s':\n", i, block.Lang)
fmt.Println("````````````````````````````````````````")
fmt.Print(block.Code) // ends with newline already!
fmt.Println("````````````````````````````````````````")
}
} |
Beta Was this translation helpful? Give feedback.
-
I have been looking at this for a little bit and don't seem to be able to grok it.
Is it possible to extract the contents of all code fences within a file, and return those as text ignoring all the other contents of the markdown file?
I looked at utilising just the
CodeBlockParser
however, when I set that in the options like below it still returns all the content and renders as HTML:I assume there is no simple way to do this, so would love some advice on what direction I should be looking 😃.
Beta Was this translation helpful? Give feedback.
All reactions