A structured Markdown engine that supports Go and JavaScript
Lute is a structured Markdown engine that fully implements the latest GFM / CommonMark standard, better support for Chinese context.
Welcome to Lute Official Discussion Forum to learn more.
I have been using other Markdown engines before, and they are more or less "defective":
- Inconsistent support for standard specifications
- The processing of "strange" text is very time-consuming and even hangs
- Support for Chinese is not good enough
Lute's goal is to build a structured Markdown engine that implements GFM/CM specifications and provides better support for Chinese. The so-called "structured" refers to the construction of an abstract syntax tree from the input MD text, HTML output, text formatting, etc. through the operation tree. The realization of the specification is to ensure that there is no ambiguity in Markdown rendering, so that the same Markdown text can be processed by the Markdown engine to achieve the same result, which is very important.
There are not many engines that implement specifications. I want to see if I can write one, which is one of Lute's motivations. There are many opinions on the Internet about how to implement a Markdown engine:
- Some people say that Markdown is suitable for regular analysis, because the grammar rules are too simple
- Some people say that Markdown can be handled by the compilation principle, but the rule is too difficult to maintain
I agree with the latter, because regular expressions is indeed too difficult to maintain and has low operating efficiency. The most important reason is that the core parsing algorithm of the Markdown engine that conforms to the GFM/CM specification cannot be written in regular, because the rules defined by the specification are too complicated.
Finally, another important motivation is that the B3log open source community needs its own Markdown engine:
- Solo, Pipe, Sym Markdown rendering with uniform effects is required, and performance is very important
- Vditor needs a structured engine as support to achieve the next generation of Markdown editor
- Implement the latest version of GFM/CM specifications
- Zero regular expressions, very fast
- Built-in code block syntax highlighting
- Better support for Chinese context
- Terminology spelling correction
- Markdown format
- Emoji analysis
- HTML to Markdown
- Custom rendering function
- Support JavaScript
- Enhanced automatic link recognition
- Automatically insert spaces between Chinese and Western languages
- English punctuation is replaced with Chinese punctuation
The formatting function can format "untidy" Markdown text into a unified style. In scenarios that require public editing, a unified typography style makes it easier for everyone to collaborate.
Click here to expand the formatting example.
Markdown:
# ATX 标题也有可能需要格式化的 ##
一个简短的段落。
Setext 说实话我不喜欢 Setext 标题
----
0. 有序列表可以从 0 开始
0. 应该自增序号的
1. 对齐对齐对齐
我们再来看看另一个有序列表。
1. 没空行的情况下序号要从 1 开始才能打断段落开始一个新列表
3. 虽然乱序不影响渲染
2. 但是随意写序号容易引起误解
试下贴段代码:
```go
package main
import "fmt"
func main() {
fmt.Println("Hello, 世界")
}
```
对了,缩进代码块建议换成围栏代码块:
缩进代码块太隐晦了
也没法指定编程语言,容易导致代码高亮失效
所以建议大家用 ``` 围栏代码块
试下围栏代码块匹配场景:
````markdown
围栏代码块只要开头的 ` 和结束的 ` 数量匹配即可,这样可以实现在围栏代码块中显示围栏代码块:
```
这里只有 3 个 `,所以不会匹配markdown代码块结束
```
下面匹配到就真的结束了。
````
以上块级内容都挤在一坨了,插入合理的空行也很有必要。
但是过多的空行分段也不好啊,用来分段的话一个空行就够了。
接下来让我们试试稍微复杂点的场景,比如列表项包含多个段落的情况:
1. 列表项中的第一段
这里是第二个段落,贴段代码:
```markdown
要成为Markdown程序员并不容易,同理PPT架构师也是。
注意代码块中的中西文间并没有插入空格。
```
这里是最后一段了。
1. 整个有序列表是“松散”的:列表项内容要用 `<p>` 标签
最后,我们试下对 GFM 的格式化支持:
|col1|col2 | col3 |
--- |---------------|--
col1 without left pipe | this is col2 | col3 without right pipe
||need align cell|
**以上就是为什么我们需要Markdown Format,而且是带中西文自动空格的格式化。**
Formatted:
# ATX 标题也有可能需要格式化的
一个简短的段落。
## Setext 说实话我不喜欢 Setext 标题
0. 有序列表可以从 0 开始
1. 应该自增序号的
2. 对齐对齐对齐
我们再来看看另一个有序列表。
1. 没空行的情况下序号要从 1 开始才能打断段落开始一个新列表
2. 虽然乱序不影响渲染
3. 但是随意写序号容易引起误解
试下贴段代码:
```go
package main
import "fmt"
func main() {
fmt.Println("Hello, 世界")
}
```
对了,缩进代码块建议换成围栏代码块:
```
缩进代码块太隐晦了
也没法指定编程语言,容易导致代码高亮失效
所以建议大家用 ``` 围栏代码块
```
试下围栏代码块匹配场景:
````markdown
围栏代码块只要开头的 ` 和结束的 ` 数量匹配即可,这样可以实现在围栏代码块中显示围栏代码块:
```
这里只有 3 个 `,所以不会匹配markdown代码块结束
```
下面匹配到就真的结束了。
````
以上块级内容都挤在一坨了,插入合理的空行也很有必要。
但是过多的空行分段也不好啊,用来分段的话一个空行就够了。
接下来让我们试试稍微复杂点的场景,比如列表项包含多个段落的情况:
1. 列表项中的第一段
这里是第二个段落,贴段代码:
```markdown
要成为Markdown程序员并不容易,同理PPT架构师也是。
注意代码块中的中西文间并没有插入空格。
```
这里是最后一段了。
2. 整个有序列表是“松散”的:列表项内容要用 `<p>` 标签
最后,我们试下对 GFM 的格式化支持:
| col1 | col2 | col3 |
| ---------------------- | --------------- | ----------------------- |
| col1 without left pipe | this is col2 | col3 without right pipe |
| | need align cell | |
**以上就是为什么我们需要 Markdown Format,而且是带中西文自动空格的格式化。**
Markdown:
Doing open source projects on github is a very happy thing, please don't spell Github as `github`!
In particular, this should never happen in your resume:
> Proficient in using JAVA, Javascript, GIT, have a certain understanding of android, ios development, proficient in using Mysql, postgresql database.
after fixing:
Doing open source projects on GitHub is a very happy thing, please don't spell Github as `github`!
In particular, this should never happen in your resume:
> Proficient in using Java, JavaScript, Git, have a certain understanding of Android, iOS development, proficient in using MySQL, PostgreSQL database.
Please see Golang markdown engine performance benchmark.
Lute carries all Markdown processing on LianDi, processes millions of parsing and rendering requests every day, and runs stably.
Lute does not implement Disallowed Raw HTML (extension) in GFM, because the extension still has certain vulnerabilities <input>
).
It is recommended to use other libraries (such as bluemonday) for HTML security filtering, so that it can better adapt to the application scenario.
There are three ways to use Lute:
- Backend: Introduce
github.com/88250/lute
package in Go language - Backend: Start Lute as an HTTP service process for other processes to call, please refer to here
- Front end: Introduce lute.min.js in the js directory, support Node.js
Introduce the Lute library:
go get -u github.com/88250/lute
Working example of minimization:
package main
import (
"fmt"
"github.com/88250/lute"
)
func main() {
luteEngine := lute.New() // GFM support and Chinese context optimization have been enabled by default
html := luteEngine.MarkdownStr("demo", "**Lute** - A structured markdown engine.")
fmt.Println(html)
// <p><strong>Lute</strong> - A structured Markdown engine.</p>
}
About code block syntax highlighting:
- The external style sheet is used by default, and the theme is github.css. You can copy the style file from the chroma-styles directory to the project and import it
- You can specify highlight-related parameters such as whether to enable inline styles, line numbers, and themes through
lutenEngine.SetCodeSyntaxHighlightXXX ()
For a simple example, please refer to the demo in the JavaScript directory. For the complete usage of the front-end editor, please refer to Demo in Vditor
Some details:
- lute.js has no built-in syntax highlighting feature
- The size of lute.js after compilation is ~2MB, the size after compression through
brotli -o lute.min.js.br lute.min.js
is ~200KB, the size after regular GZip compression is ~300KB
// Type of JSONRenderer
type JSONRendererType = Array<JSONRendererItemType>
// Flag Node
type FlagType = |
| "Paragraph"
| "Emphasis"
| "Strong"
| "Blockquote"
| "ListItem"
| "Strikethrough"
| "TableHead"
| "Table"
| "TableRow"
| "Mark"
| "Sub"
| "Sup"
| "Tag"
| "BlockRef"
// Non-Flag Node
type NotFlagType = |
| "Heading"
| "ThematicBreak"
| "List"
| "HTMLBlock"
| "InlineHTML"
| "CodeBlock"
| "Text"
| "CodeSpan"
| "HardBreak"
| "SoftBreak"
| "Link"
| "Image"
| "HTMLEntity"
| "TaskListItemMarker"
| "TableCell"
| "EmojiUnicode"
| "EmojiImg"
| "MathBlock"
| "InlineMath"
| "YamlFrontMatter"
| "Backslash"
| "BlockEmbed"
| "BlockQueryEmbed"
interface JSONRendererItemType {
type?: string
value?: string
flag?: string
title?: string
language?: string
mindmap?: string
children?: Array<JSONRendererItemType>
}
// Node has 4 types: Normal Node, Flag Node, Link Node, Codeblock Node
interface NormalNodeType {
type: string
value: string
children?: Array<JSONRendererItemType>
}
interface FlagNodeType {
flag: string
children?: Array<JSONRendererItemType>
}
// Link or Image
interface LinkNodeType {
type: string
value: string
title: string
children?: Array<JSONRendererItemType>
}
interface CodeBlockType {
type: string
value: string
language: string
mindmap?: string // if language is "mindmap"
}
- Interpretation of CommonMark specifications
- Lute Implementation Postscript
- Markdown parsing and Markdown AST
Lute uses the Mulan Permissive Software License,Version 2 open source license.
- commonmark.js: CommonMark parser and renderer in JavaScript
- goldmark:A markdown parser written in Go
- golang-commonmark: A CommonMark-compliant markdown parser and renderer in Go
- Chroma: A general purpose syntax highlighter in pure Go
- 中文文案排版指北: Chinese copywriting guidelines for better written communication
- GopherJS: A compiler from Go to JavaScript for running Go code in a browser