Add support for converting parsed tables into Markdown or other LLM-friendly formats #1417

GISStd · 2025-01-06T08:34:57Z

Is your feature request related to a problem? Please describe.
您的特性请求是否与某个问题相关？请描述。
Currently, MinerU only supports saving parsed tables as images. While this is useful for visual representation, it poses challenges when integrating with large language models (LLMs), as LLMs cannot directly interpret images. Users often need to extract table content into formats like Markdown, CSV, or JSON to enable better understanding and reasoning by LLMs.

Describe the solution you'd like
描述您期望的解决方案
Provide functionality to export parsed tables into text-based formats that are easily interpretable by LLMs, such as:
Markdown Table Format: A simple, human-readable format that can be directly processed by LLMs. Example:
markdown

Column 1	Column 2	Column 3
Value 1	Value 2	Value 3
Plain Text (Tab-Delimited): A minimalistic format for lightweight processing. Example:
mathematica
Column 1 Column 2 Column 3
Value 1 Value 2 Value 3
JSON: A structured format suitable for hierarchical data or API integrations. Example:
json
[

{"Column 1": "Value 1", "Column 2": "Value 2", "Column 3": "Value 3"}

]
CSV: For users who want to integrate with spreadsheet tools.

Describe alternatives you've considered
描述您已考虑的替代方案
Manually extracting text from parsed images, which is tedious and error-prone.
Using external tools to convert table images to text, which disrupts the workflow and reduces productivity.

Additional context
提供更多细节
Add any other context or screenshots about the feature request here.
请附上任何相关截图、链接或文件，以帮助我们更好地理解您的请求。

The text was updated successfully, but these errors were encountered:

myhloli · 2025-01-22T09:39:35Z

Currently, since Markdown tables do not support the representation of cell merging, we use HTML to represent tables. HTML can be easily rendered in various Markdown readers. The demands for cell merging and LLM (Large Language Model) compatibility are difficult to reconcile, so we currently prioritize format fidelity.

GISStd added the enhancement New feature or request label Jan 6, 2025

myhloli closed this as completed Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for converting parsed tables into Markdown or other LLM-friendly formats #1417

Add support for converting parsed tables into Markdown or other LLM-friendly formats #1417

GISStd commented Jan 6, 2025

myhloli commented Jan 22, 2025

Add support for converting parsed tables into Markdown or other LLM-friendly formats #1417

Add support for converting parsed tables into Markdown or other LLM-friendly formats #1417

Comments

GISStd commented Jan 6, 2025

myhloli commented Jan 22, 2025