You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe. 您的特性请求是否与某个问题相关?请描述。
Currently, MinerU only supports saving parsed tables as images. While this is useful for visual representation, it poses challenges when integrating with large language models (LLMs), as LLMs cannot directly interpret images. Users often need to extract table content into formats like Markdown, CSV, or JSON to enable better understanding and reasoning by LLMs.
Describe the solution you'd like 描述您期望的解决方案
Provide functionality to export parsed tables into text-based formats that are easily interpretable by LLMs, such as:
Markdown Table Format: A simple, human-readable format that can be directly processed by LLMs. Example:
markdown
Column 1
Column 2
Column 3
Value 1
Value 2
Value 3
Plain Text (Tab-Delimited): A minimalistic format for lightweight processing. Example:
mathematica
Column 1 Column 2 Column 3
Value 1 Value 2 Value 3
JSON: A structured format suitable for hierarchical data or API integrations. Example:
]
CSV: For users who want to integrate with spreadsheet tools.
Describe alternatives you've considered 描述您已考虑的替代方案
Manually extracting text from parsed images, which is tedious and error-prone.
Using external tools to convert table images to text, which disrupts the workflow and reduces productivity.
Additional context 提供更多细节
Add any other context or screenshots about the feature request here.
请附上任何相关截图、链接或文件,以帮助我们更好地理解您的请求。
The text was updated successfully, but these errors were encountered:
Currently, since Markdown tables do not support the representation of cell merging, we use HTML to represent tables. HTML can be easily rendered in various Markdown readers. The demands for cell merging and LLM (Large Language Model) compatibility are difficult to reconcile, so we currently prioritize format fidelity.
Is your feature request related to a problem? Please describe.
您的特性请求是否与某个问题相关?请描述。
Currently, MinerU only supports saving parsed tables as images. While this is useful for visual representation, it poses challenges when integrating with large language models (LLMs), as LLMs cannot directly interpret images. Users often need to extract table content into formats like Markdown, CSV, or JSON to enable better understanding and reasoning by LLMs.
Describe the solution you'd like
描述您期望的解决方案
Provide functionality to export parsed tables into text-based formats that are easily interpretable by LLMs, such as:
Markdown Table Format: A simple, human-readable format that can be directly processed by LLMs. Example:
markdown
]
CSV: For users who want to integrate with spreadsheet tools.
Describe alternatives you've considered
描述您已考虑的替代方案
Manually extracting text from parsed images, which is tedious and error-prone.
Using external tools to convert table images to text, which disrupts the workflow and reduces productivity.
Additional context
提供更多细节
Add any other context or screenshots about the feature request here.
请附上任何相关截图、链接或文件,以帮助我们更好地理解您的请求。
The text was updated successfully, but these errors were encountered: