Feature Request:Docx native capabilities #34

tomduck · 2016-12-01T03:33:28Z

Internal links now use docx's native capabilities (closing Issue #25, submitted by @krnlyng). Figure and references numbers should do the same. Currently, the numbers are hard-coded.

tomduck · 2016-12-01T06:33:49Z

Below is a copy of the docx document.xml produced by pandoc for a figure with a caption. I have formatted the xml for easier viewing. It looks like the caption is outside of the Figure and in its own paragraph. Is this how Word does it?

<?xml version="1.0" encoding="UTF-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing">
  <w:body>

    <w:p>
      <w:pPr><w:pStyle w:val="FigureWithCaption" /></w:pPr>
      <w:r>
        <w:drawing>
          <wp:inline>
            <wp:extent cx="1473200" cy="1460500" />
            <wp:effectExtent b="0" l="0" r="0" t="0" />
            <wp:docPr descr="Plot." id="1" name="Picture" />
            <a:graphic>
              <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                <pic:pic>
                  <pic:nvPicPr>
                    <pic:cNvPr descr="plot.png" id="0" name="Picture" />
                    <pic:cNvPicPr>
                      <a:picLocks noChangeArrowheads="1" noChangeAspect="1" />
                    </pic:cNvPicPr>
                  </pic:nvPicPr>
                  <pic:blipFill><a:blip r:embed="rId21" /><a:stretch><a:fillRect /></a:stretch></pic:blipFill>
                  <pic:spPr bwMode="auto">
                    <a:xfrm><a:off x="0" y="0" /><a:ext cx="1473200" cy="1460500" /></a:xfrm>
                    <a:prstGeom prst="rect"><a:avLst /></a:prstGeom>
                    <a:noFill />
                    <a:ln w="9525"><a:noFill /><a:headEnd /><a:tailEnd /></a:ln>
                  </pic:spPr>
                </pic:pic>
              </a:graphicData>
            </a:graphic>
          </wp:inline>
        </w:drawing>
      </w:r>
    </w:p>
    
    <w:p>
      <w:pPr><w:pStyle w:val="ImageCaption" /></w:pPr>
      <w:r><w:t xml:space="preserve">Plot.</w:t></w:r>
    </w:p>
    
    <w:sectPr />

  </w:body>
</w:document>

krnlyng · 2016-12-03T00:17:34Z

this seems to be true, figure captions are just paragraphs with a different style afaik

iandol · 2016-12-03T12:54:49Z

There was talk of using a similar mechanism as the MMD ODT writer (where figure and caption are contained in a text box). But I think that never really happened as using keep-with-next works well enough...

tomduck · 2016-12-05T04:28:55Z

Hi @iandol. I'm not really sure what you mean by "keep-with-next". Can you please explain?

My understanding is that docx natively supports figure captions and automatic figure numbering. Pandoc-fignos and friends should be using that. What I need is a model document.xml to emulate. If either you or @krnlyng could supply it, that would be very helpful (I don't have a copy of Word myself). There are some tricks I can use to work around the current behaviour of the docx pandoc writer.

The model docx file should inculde a single-captioned figure with automatic figure numbering turned on (i.e., nothing hard-coded). A short sentence with a reference to that figure would be helpful as well. To obtain the document.xml, simply unzip the docx file and look in the word directory.

iandol · 2016-12-05T05:46:18Z

I'm not using fignos yet, so I can't comment on whether it is outputting correctly formed DOCX files yet. But just in case, here is a document.xml. Generated in Word 2016. I dragged an image into a new blank document, then added a default caption with some caption text "This is a simple test". Then added a paragraph in front of the picture "This is a simple test to see if [Figure 1] is hyperlinked?" — The bracketed text is a cross reference I inserted. Line 101 of the XML is where the caption paragraph starts:

https://gist.github.com/iandol/a3d7a456776002719e2ea139e681790e

What I meant above is that MultiMarkDown wraps the figure and caption in a text box, but Pandoc doesn't. Pandoc uses a paragraph style mechanism to keep the caption always underneath the figure. In Word if you select "keep-with-next" it makes two adjacent styles "stick together", so it behaves as if they are grouped in a text box. MMD uses a text frame/box in DOCX and ODT, I got confused as I mentioned in my comment above Pandoc ODT writer, but I meant MMD (MMD also uses auto-numbering for figures).

Here is some tangential discussion about using frames for captioned figures in Pandoc (for ODT output but some discussion on DOCX):

jgm/pandoc#2401

And possibly: jgm/pandoc#3177 may have an influence on fignos as it is resolved...

tomduck · 2016-12-05T12:00:23Z

Thank you, @iandol. This is enormously helpful. I will look at the docx and see what pandoc-fignos can do to better support docx. We can have a look into odt after that.

Thanks for the heads up on jgm/pandoc#3177. I have subscribed to it and will have pandoc-fignos and friends adjust to the new behaviours as they emerge.

iandol · 2016-12-06T00:49:52Z

I can also add the docx/odt output from MMD which does generate text frame and auto-numbered figure legend if it helps. As pointed out on jgm/pandoc#2401 one issue is hard coding the English "Figure" text in the legend, and I'm not sure if the easiest solution isn't allowing a YAML variable for the user to change.

tomduck · 2016-12-10T18:16:13Z

A possible solution is to replace entirely pandoc's Image element with custom ooxml (including the caption), and then insert custom ooxml for the figure reference. For this I need a model document.xml file.

One difficulty with what you posted, @iandol (although much appreciated), is that it does not appear that the native cross-referencing mechanism for docx was used. The figure number appears to be hard-coded. I could be mistaken. OOXml is pretty hard for a human to read.

I tried to generate it anew using LibreOffice. Unfortunately, when I imported the resulting docx file back into LibreOffice, the cross-references were broken. It turns out that this is a long-standing issue (since 2011!) with no end in sight: see here.

So, I am still needing the following to make progress: A docx with a captioned figure (using Word's native caption feature), automatic numbering of that figure, and a native reference in some text to the figure (i.e., not hard-coded).

Cheers,
Tom

tstenner · 2017-02-16T11:16:25Z

I've generated a sample document, hope this helps.

torstenberg · 2017-05-05T09:27:01Z

I just saw this pandoc filter, which would be a huge improvement to my workflow if it used the right way of producing the references in docx. However, there are many different ways to represent a figure caption number, both in the caption itself and in the text referring to it.

The document supplied by @tstenner is a good one, but I typically do not include the whole caption text in the reference but just the number. Here’s @tstenner’s example the way I typically use figures and references: Figure_example2.docx

The style for the figure caption is called „Beschriftung“ in document.xml (a German term for caption).

I guess the formatting of the figure number is encoded in Word’s format for this caption type and can be changed in Word itself. Thus, this is nothing pandoc-fignos must do if the user wants a different numbering scheme. Right? Pandoc-fignos could provide these different schemes but it would be OK to just use one default setting.

In the text reference to the figure, however, it the user chooses to include the caption text, this text is part of the content of the <w:fldChar …> tag. This is something I can’t figure out to change directly in Word, so perhaps this is something to be decided in the pandoc filtering? You will see the difference when you compare the two Word files.

I hope, this gives you sufficient material to actually implement the feature. That would be great!

Torsten

ghost · 2017-12-21T20:38:33Z

Hello. I am trying to export from Org to DOCX, but I don't get a numeric reference. I would like to know if someone can help me. I create the LaTeX file with Org, and then do pandoc --filter pandoc-fignos -f latex+smart fignos.tex -t docx+smart -o fignos.docx. Thanks.

org-latex-pandoc+fignos-docx.tar.gz

pandoc 2.0.5 Compiled with pandoc-types 1.17.3, texmath 0.10, skylighting 0.5
Org mode version 9.1.4 (9.1.4-elpa 20171205/)
pandoc-fignos (1.0.0rc3)
pandoc-xnos (0.13)
pandocfilters (1.4.2)

tomduck · 2018-01-14T16:39:23Z

Hi, @broncodev. Pandoc-fignos only works for markdown input files.

tomduck · 2018-01-14T18:36:16Z

I have created a Wiki page to help formulate docx support in pandoc-fignos:

https://github.com/tomduck/pandoc-fignos/wiki/Development#docx-output

This page describes the progress and what needs to be done next to solve the docx problem.

shixuguo · 2018-01-15T02:36:46Z

in https://github.com/tomduck/pandoc-eqnos/wiki/Development @tomduck mentioned the issue below:
Is the bookmarking done correctly? (Note to self: Why aren't the bookmark start and end encapsulated in a single paragraph? What I have coded in pandoc_eqnos.py seems to come out differently.)

yes it is correctly.

there is document about this issue
https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.bookmarkstart.aspx

shixuguo · 2018-01-15T02:47:15Z

in https://github.com/tomduck/pandoc-eqnos/wiki/Development @tomduck mentioned the issue below:

Is the equation ooxml in paragraph 2 correct? Should the equation be encapsulated in another element? What is best? I will likely need to override pandoc's implementation in any case.

I have created a blank docx file and write a OMML eqaution manually.
I write a y=f(x)
and the xml docment is like below:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document 
    xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" 
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
    xmlns:o="urn:schemas-microsoft-com:office:office" 
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" 
    xmlns:v="urn:schemas-microsoft-com:vml" 
    xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" 
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" 
    xmlns:w10="urn:schemas-microsoft-com:office:word" 
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" 
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" 
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" 
    xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" 
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" 
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14">
    <w:body>
        <w:p w:rsidR="00AA3459" w:rsidRDefault="00F438B5">
            <w:pPr>
                <w:rPr>
                    <w:rFonts w:hint="eastAsia"/>
                </w:rPr>
            </w:pPr>
            <m:oMathPara>
                <m:oMath>
                    <m:r>
                        <w:rPr>
                            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                        </w:rPr>
                        <m:t>y=f</m:t>
                    </m:r>
                    <m:r>
                        <w:rPr>
                            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                        </w:rPr>
                        <m:t>(x)</m:t>
                    </m:r>
                </m:oMath>
            </m:oMathPara>
            <w:bookmarkStart w:id="0" w:name="_GoBack"/>
            <w:bookmarkEnd w:id="0"/>
        </w:p>
        <w:sectPr w:rsidR="00AA3459">
            <w:pgSz w:w="11906" w:h="16838"/>
            <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="851" w:footer="992" w:gutter="0"/>
            <w:cols w:space="425"/>
            <w:docGrid w:type="lines" w:linePitch="312"/>
        </w:sectPr>
    </w:body>
</w:document>

What I suggestted is do not overwrite pandoc's implementation about equation, I think there is much more work than you have considered.

shixuguo · 2018-01-15T03:16:01Z

in https://github.com/tomduck/pandoc-eqnos/wiki/Development @tomduck mentioned the issue below:

How should the equation number be attached to the equation? At present the number is hard-coded. Docx's native capabilities should be used instead (i.e., something equivalent to LaTeX's \label macro).

How should the reference number be written into paragraph 4? At present the number is hard-coded. Docx's native capabilities should be used instead (i.e., something equivalent to LaTeX's \ref macro).

I have tested the original equation number in docx and I have to say, it is very stupid design.
I can create an equation.
I can insert a caption for this equation.
But when I cross-reference this caption, it show not only the equation number but also the equation itself.

If I want to implementation the correct form, I have to do a lot of dirty work, which is below:

create a three-column table.
insert equaiton in the second column of the table.
insert caption for the equation. after this step .the equation number shows above the table.
move the eqaution number to the third column of the table.
other style details.

then I can cite the equation correctly.

after doing this, the xml document is like below:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document 
    xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" 
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
    xmlns:o="urn:schemas-microsoft-com:office:office" 
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" 
    xmlns:v="urn:schemas-microsoft-com:vml" 
    xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" 
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" 
    xmlns:w10="urn:schemas-microsoft-com:office:word" 
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" 
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" 
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" 
    xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" 
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" 
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14">
    <w:body>
        <w:bookmarkStart w:id="0" w:name="_Ref503754848"/>
        <w:p w:rsidR="00E836C0" w:rsidRPr="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
            <w:pPr>
                <w:pStyle w:val="Caption"/>
            </w:pPr>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> MACROBUTTON MTEditEquationSection2 </w:instrText>
            </w:r>
            <w:r w:rsidRPr="00E836C0">
                <w:rPr>
                    <w:rStyle w:val="MTEquationSection"/>
                </w:rPr>
                <w:instrText>Equation Chapter 1 Section 1</w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> SEQ MTEqn \r \h \* MERGEFORMAT </w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> SEQ MTSec \r 1 \h \* MERGEFORMAT </w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> SEQ MTChap \r 1 \h \* MERGEFORMAT </w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
            <w:bookmarkEnd w:id="0"/>
        </w:p>
        <w:p w:rsidR="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
            <w:pPr>
                <w:pStyle w:val="Caption"/>
                <w:keepNext/>
            </w:pPr>
        </w:p>
        <w:tbl>
            <w:tblPr>
                <w:tblStyle w:val="TableGrid"/>
                <w:tblW w:w="0" w:type="auto"/>
                <w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/>
            </w:tblPr>
            <w:tblGrid>
                <w:gridCol w:w="2765"/>
                <w:gridCol w:w="2765"/>
                <w:gridCol w:w="2766"/>
            </w:tblGrid>
            <w:tr w:rsidR="00E836C0" w:rsidTr="00E836C0">
                <w:tc>
                    <w:tcPr>
                        <w:tcW w:w="2765" w:type="dxa"/>
                    </w:tcPr>
                    <w:p w:rsidR="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
                        <w:pPr>
                            <w:pStyle w:val="Caption"/>
                            <w:rPr>
                                <w:rFonts w:hint="eastAsia"/>
                            </w:rPr>
                        </w:pPr>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr>
                        <w:tcW w:w="2765" w:type="dxa"/>
                    </w:tcPr>
                    <w:p w:rsidR="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
                        <w:pPr>
                            <w:pStyle w:val="Caption"/>
                            <w:rPr>
                                <w:rFonts w:hint="eastAsia"/>
                            </w:rPr>
                        </w:pPr>
                        <m:oMathPara>
                            <m:oMath>
                                <m:r>
                                    <w:rPr>
                                        <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                                    </w:rPr>
                                    <m:t>y=f(x)</m:t>
                                </m:r>
                            </m:oMath>
                        </m:oMathPara>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr>
                        <w:tcW w:w="2766" w:type="dxa"/>
                    </w:tcPr>
                    <w:p w:rsidR="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
                        <w:pPr>
                            <w:pStyle w:val="Caption"/>
                            <w:rPr>
                                <w:rFonts w:hint="eastAsia"/>
                            </w:rPr>
                        </w:pPr>
                        <w:r>
                            <w:fldChar w:fldCharType="begin"/>
                        </w:r>
                        <w:r>
                            <w:instrText xml:space="preserve"> SEQ </w:instrText>
                        </w:r>
                        <w:r>
                            <w:instrText>公弝</w:instrText>
                        </w:r>
                        <w:r>
                            <w:instrText xml:space="preserve"> \* ARABIC </w:instrText>
                        </w:r>
                        <w:r>
                            <w:fldChar w:fldCharType="separate"/>
                        </w:r>
                        <w:bookmarkStart w:id="1" w:name="_Ref503754939"/>
                        <w:r>
                            <w:rPr>
                                <w:noProof/>
                            </w:rPr>
                            <w:t>1</w:t>
                        </w:r>
                        <w:bookmarkEnd w:id="1"/>
                        <w:r>
                            <w:fldChar w:fldCharType="end"/>
                        </w:r>
                    </w:p>
                </w:tc>
            </w:tr>
        </w:tbl>
        <w:p w:rsidR="00E836C0" w:rsidRPr="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
            <w:pPr>
                <w:pStyle w:val="Caption"/>
                <w:rPr>
                    <w:rFonts w:hint="eastAsia"/>
                </w:rPr>
            </w:pPr>
            <w:r>
                <w:t xml:space="preserve">Ref to </w:t>
            </w:r>
            <w:bookmarkStart w:id="2" w:name="_GoBack"/>
            <w:bookmarkEnd w:id="2"/>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"></w:instrText>
            </w:r>
            <w:r>
                <w:rPr>
                    <w:rFonts w:hint="eastAsia"/>
                </w:rPr>
                <w:instrText>REF _Ref503754939</w:instrText>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"></w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="separate"/>
            </w:r>
            <w:r>
                <w:rPr>
                    <w:noProof/>
                </w:rPr>
                <w:t>1</w:t>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
        </w:p>
        <w:sectPr w:rsidR="00E836C0" w:rsidRPr="00E836C0">
            <w:pgSz w:w="11906" w:h="16838"/>
            <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="851" w:footer="992" w:gutter="0"/>
            <w:cols w:space="425"/>
            <w:docGrid w:type="lines" w:linePitch="312"/>
        </w:sectPr>
    </w:body>
</w:document>

for most docx user, no one use this stupid method, because there is a third party software as MathType.
Obviously , pandoc should not to convert equation to Mathtype format.

As my opinion, pandoc-xnos should use hard-coded number still but use the table style to layout the equation.

ghost · 2018-01-18T05:17:19Z

@tomduck Thanks. I will try by first exporting latex to markdown then (or something similar).

notZaki · 2018-05-26T11:26:49Z

@shixuguo
Instead of a three-column table, you could try adding a '#' before the equation number to quickly format it.
I've included a gif below to show the process.

@tomduck
Regarding the xml in the Eqnos wiki: In case it helps, below is a cleaned up xml (for the math part) that produces the end product of the above gif:

<m:oMathPara>
  <m:oMathParaPr><m:jc m:val="center" /></m:oMathParaPr>
  <m:oMath>
    <m:eqArr>
      <m:eqArrPr>
          <m:maxDist m:val="1"/>
      </m:eqArrPr>
      <m:e>
        <m:r><m:t>y</m:t></m:r>
        <m:r><m:t>=</m:t></m:r>
        <m:r><m:t>f</m:t></m:r>
        <m:r><m:t>(</m:t></m:r>
        <m:r><m:t>x</m:t></m:r>
        <m:r><m:t>)</m:t></m:r>
        <m:r><m:t>#</m:t></m:r>
        <m:r><m:t>(</m:t></m:r>
        <m:r><m:t>1</m:t></m:r>
        <m:r><m:t>)</m:t></m:r>
      </m:e>
    </m:eqArr>
  </m:oMath>
</m:oMathPara>

So the changes are:

Include an <m:eqArr> environment that cover the entire equation
Add the <m:eqArrPr> ... </m:eqArrPr> part
Instead of a space before the equation label, place a #

The equation number (between the brackets) remains hard-coded, but I don't have a problem with that.

correaa · 2019-02-25T04:33:17Z

"Pandoc-fignos only works for markdown input files." . It would be great if you clarify this at the beginning of the home page. That will save a lot of frustration. Also, would it work for native (native-pandoc) format? (I couldn't make it work).

ociule · 2019-06-14T09:23:21Z

I am very interested in what pandoc-fignos does, but work with RST. It would be great if fignos handled rst input as well.

ociule · 2019-07-01T13:25:33Z

@tomduck FWIW, I tried to solve the To Do task on the Docx output wiki page.

I've created a word document that comes close to what desired docx output should be. It uses native docx capabilities for numbering the figures and for referencing. As asked on the wiki page, the doc has one captioned figure and one reference to it. I replaced the hardcoded 1 in "Figure 1: Caption" with a numbered caption (In Word: Reference > Insert Caption). I created the reference using Reference > Insert reference.

Here's the xml from word/document.xml:

<?xml version="1.0" encoding="UTF-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se wp14">
    <w:body>
        <w:p w:rsidR="00BD015B" w:rsidRDefault="00EF2D29">
            <w:pPr>
                <w:pStyle w:val="CaptionedFigure" />
            </w:pPr>
            <w:bookmarkStart w:id="0" w:name="fig:1" />
            <w:r>
                <w:rPr>
                    <w:noProof />
                    <w:lang w:val="fr-FR" w:eastAsia="fr-FR" />
                </w:rPr>
                <w:drawing>
                    <wp:inline distT="0" distB="0" distL="0" distR="0">
                        <wp:extent cx="914400" cy="457200" />
                        <wp:effectExtent l="0" t="0" r="0" b="0" />
                        <wp:docPr id="1" name="Picture" descr="Figure 1: Caption" />
                        <wp:cNvGraphicFramePr />
                        <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
                            <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                                <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
                                    <pic:nvPicPr>
                                        <pic:cNvPr id="0" name="Picture" descr="img1.jpg" />
                                        <pic:cNvPicPr>
                                            <a:picLocks noChangeAspect="1" noChangeArrowheads="1" />
                                        </pic:cNvPicPr>
                                    </pic:nvPicPr>
                                    <pic:blipFill>
                                        <a:blip r:embed="rId8" />
                                        <a:stretch>
                                            <a:fillRect />
                                        </a:stretch>
                                    </pic:blipFill>
                                    <pic:spPr bwMode="auto">
                                        <a:xfrm>
                                            <a:off x="0" y="0" />
                                            <a:ext cx="914400" cy="457200" />
                                        </a:xfrm>
                                        <a:prstGeom prst="rect">
                                            <a:avLst />
                                        </a:prstGeom>
                                        <a:noFill />
                                        <a:ln w="9525">
                                            <a:noFill />
                                            <a:headEnd />
                                            <a:tailEnd />
                                        </a:ln>
                                    </pic:spPr>
                                </pic:pic>
                            </a:graphicData>
                        </a:graphic>
                    </wp:inline>
                </w:drawing>
            </w:r>
        </w:p>
        <w:p w:rsidR="00BD015B" w:rsidRDefault="00EF2D29" w:rsidP="00B227A1">
            <w:pPr>
                <w:pStyle w:val="Image Caption" />
            </w:pPr>
            <w:bookmarkStart w:id="1" w:name="_Ref12885797" />
            <w:r>
                <w:t xml:space="preserve">Figure </w:t>
            </w:r>
            <w:fldSimple w:instr=" SEQ Figure \* ARABIC ">
                <w:r w:rsidR="00B227A1">
                    <w:rPr>
                        <w:noProof />
                    </w:rPr>
                    <w:t>1</w:t>
                </w:r>
            </w:fldSimple>
            <w:bookmarkEnd w:id="1" />
            <w:r>
                <w:t>: Cap</w:t>
            </w:r>
            <w:bookmarkStart w:id="2" w:name="_GoBack" />
            <w:bookmarkEnd w:id="2" />
            <w:r>
                <w:t>tion</w:t>
            </w:r>
        </w:p>
        <w:bookmarkEnd w:id="0" />
        <w:p w:rsidR="00BD015B" w:rsidRDefault="00B227A1">
            <w:pPr>
                <w:pStyle w:val="Corpsdetexte" />
            </w:pPr>
            <w:r>
                <w:t xml:space="preserve">Reference to </w:t>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="begin" />
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> REF _Ref12885797 \h </w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="separate" />
            </w:r>
            <w:r>
                <w:t xml:space="preserve">Figure </w:t>
            </w:r>
            <w:r>
                <w:rPr>
                    <w:noProof />
                </w:rPr>
                <w:t>1</w:t>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end" />
            </w:r>
            <w:r w:rsidR="00EF2D29">
                <w:t>.</w:t>
            </w:r>
        </w:p>
        <w:sectPr w:rsidR="00BD015B">
            <w:pgSz w:w="12240" w:h="15840" />
            <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0" />
            <w:cols w:space="720" />
        </w:sectPr>
    </w:body>
</w:document>

The numbering and the reference use fields. See the <w:fldSimple w:instr=" SEQ Figure \* ARABIC "> line for an example. In Word, you can view the fields with the Show Fields (Alt-F9) option. Here's the Microsoft doc page about the available fields.

Anwering the questions from your wiki page:

Word seems to bookmark the "Figure { field}:" part of the caption (see the <w:bookmarkStart w:id="1" w:name="_Ref12885797" />)
Can't say, not familiar enough with OOXML
The number definitely should not be hard-coded, it should use the above field caption system, that numbers the figure caption using a sequence SEQ field, creating a "Figure" sequence.
The reference should not be hardcoded either, but it should use the field system again, specifically the REF field. Unsurprisingly, the id of the ref is hard-coded.

Hope this helps. If you want, I can paste the XML into the wiki. Here's the docx file.

fignos_demo_fixed_output.docx

tomduck · 2019-08-26T16:56:11Z

Thanks for your work on this @ociule. I'm planning to release pandoc-fignos 2.0.0 soon. I expect to revisit the issue of docx support this fall, and the leg-work you have done will be helpful.

Cheers,
Tom

ebranlard · 2019-09-01T03:22:22Z

Thank you @tomduck, this is indeed a much needed feature for all of us that don't want to use Word but have to generate Word documents.

@ociule made the point that indeed, standard insertion of tables, figures, and documents are some "special" bookmarks. I feel like the important part is the following:

<w:fldSimple w:instr=" SEQ Figure \* ARABIC ">

I tried a simple docx document with one equation, and this showed up in the xml:

<w:instrText xml:space="preserve"> SEQ Equation \* ARABIC </w:instrText>

I guess this is relevant for pandoc-eqnos. For a table it's SEQ Table \* ARABIC .

free41 · 2022-01-05T14:30:18Z

Is there anything more needed for this feature? Is there anything I could do to help this along?

This is a feature I would use heavily so I would be willing to help out.

Thanks for all the work on this and many other features.

free41 · 2022-01-07T22:01:51Z

So I did some playing around and the label in the Figure caption is currently working correctly. It has the form:

<w:fldSimple w:instr=" SEQ Figure \* ARABIC "><w:r><w:t>1</w:t></w:r></w:fldSimple>

In order to make the reference to the figure work the following xml code is needed:

<w:fldSimple w:instr="REF fig1 \h"><w:r><w:t>Figure 1</w:t></w:r></w:fldSimple>
I was able to make this work by just manually changing the xml.

I think I have what I need to make the changes, but I'm having trouble actually figuring out where the changes need to be made in the code. I don't see any of the existing markup in the code, so I am a bit confused about where it comes from.

If someone can help me out with where to look, I can try to implement the changes.

free41 · 2022-01-08T05:21:52Z

I was able to get the native figure numbering working on my computer. I think I have a good start on this, but I am guessing that these changes would break things for other users. Let me know if you think I should make a pull request to work out some of these issues or if it is easier to do it here (I've never written a pull request before).

Next, you have to use the native_numbering extension which then creates the necessary field in the figure caption that can be "cross referenced" (linked) to. I have been running:

pandoc --filter pandoc-fignos -t docx+native_numbering -o test.docx test.md

This will create the necessary field in the caption:

<w:fldSimple w:instr=" SEQ Figure \* ARABIC "><w:r><w:t>1</w:t></w:r></w:fldSimple>

Note that this looks like the Figure number is being hardcoded in, but when you open the word doc, all you have to do is select the text and hit "update field" and it will update the text inside the field.

Also of note, is that the above field is contained with the following xml block which creates a bookmark:

<w:bookmarkStart w:id="24" w:name="ref_fig:1" />

This contains the key ref_fig:1 that we need to reference in our in-text reference.

Note that all of the above is created by the pandoc writer without pandoc-xnos. That means that we actually don't need pandoc-fignos to do anything at all to our captions (Otherwise you end up with Figure X: Figure X:). On my machine I added the following to _adjust_caption()

if fmt == 'docx':
     pass

This also means that we don't need anything word specific under _add_markup(). So there is a code block that can be removed there as well.

The next step is to adjust the reference which is under the pandoc-xnos package. I added the following to _cite_replacement()at line 968:

if fmt == 'docx':
     ret = RawInline('openxml', r'<w:fldSimple w:instr="REF ref_%s \h"><w:r><w:t>%s</w:t></w:r></w:fldSimple>'%(label, refname + NBSP + text))

This results in the following xml:

<w:fldSimple w:instr="REF ref_fig:1 \h"> <w:r><w:t>Figure 1</w:t></w:r></w:fldSimple>

Which again looks like it is hard coded, but it will be able to update if say new figures are added in word after the document's creation.

Here are the outstanding issues/TODO items:

I have not tested this for all equation or table numbers. Not sure if it is going to work the same way.
This relies on the native_numbering extension. Maybe we should have a way to detect if that extension is being used, and if not keep the old behavior active.
The above only works on some of the more recent versions of pandoc.

tomduck added the enhancement label Dec 1, 2016

tomduck mentioned this issue Dec 1, 2016

docx clickable references #25

Closed

This was referenced Jan 14, 2018

Feature request: native support for docx output tomduck/pandoc-tablenos#5

Open

docx output: extra blank line before table tomduck/pandoc-tablenos#6

Closed

tomduck added the help wanted label Jan 14, 2018

tomduck changed the title ~~Docx native capabilities should be used for figure and reference numbers~~ Feature Request:Docx native capabilities Aug 7, 2018

tomduck removed the enhancement label Aug 7, 2018

tomduck mentioned this issue Jun 25, 2019

Feature Request: RST input #69

Open

iandol mentioned this issue Mar 15, 2021

Support for DOCX native numbering? lierdakil/pandoc-crossref#299

Open

iandol mentioned this issue Jun 10, 2022

Figures and Tables in your Crossref examplae iandol/scrivomatic#48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request:Docx native capabilities #34

Feature Request:Docx native capabilities #34

tomduck commented Dec 1, 2016

tomduck commented Dec 1, 2016

krnlyng commented Dec 3, 2016

iandol commented Dec 3, 2016 •

edited

Loading

tomduck commented Dec 5, 2016

iandol commented Dec 5, 2016 •

edited

Loading

tomduck commented Dec 5, 2016

iandol commented Dec 6, 2016

tomduck commented Dec 10, 2016

tstenner commented Feb 16, 2017

torstenberg commented May 5, 2017

ghost commented Dec 21, 2017

tomduck commented Jan 14, 2018

tomduck commented Jan 14, 2018

shixuguo commented Jan 15, 2018 •

edited

Loading

shixuguo commented Jan 15, 2018

shixuguo commented Jan 15, 2018

ghost commented Jan 18, 2018

notZaki commented May 26, 2018

correaa commented Feb 25, 2019

ociule commented Jun 14, 2019

ociule commented Jul 1, 2019 •

edited

Loading

tomduck commented Aug 26, 2019

ebranlard commented Sep 1, 2019

free41 commented Jan 5, 2022

free41 commented Jan 7, 2022

free41 commented Jan 8, 2022

Feature Request:Docx native capabilities #34

Feature Request:Docx native capabilities #34

Comments

tomduck commented Dec 1, 2016

tomduck commented Dec 1, 2016

krnlyng commented Dec 3, 2016

iandol commented Dec 3, 2016 • edited Loading

tomduck commented Dec 5, 2016

iandol commented Dec 5, 2016 • edited Loading

tomduck commented Dec 5, 2016

iandol commented Dec 6, 2016

tomduck commented Dec 10, 2016

tstenner commented Feb 16, 2017

torstenberg commented May 5, 2017

ghost commented Dec 21, 2017

tomduck commented Jan 14, 2018

tomduck commented Jan 14, 2018

shixuguo commented Jan 15, 2018 • edited Loading

shixuguo commented Jan 15, 2018

shixuguo commented Jan 15, 2018

ghost commented Jan 18, 2018

notZaki commented May 26, 2018

correaa commented Feb 25, 2019

ociule commented Jun 14, 2019

ociule commented Jul 1, 2019 • edited Loading

tomduck commented Aug 26, 2019

ebranlard commented Sep 1, 2019

free41 commented Jan 5, 2022

free41 commented Jan 7, 2022

free41 commented Jan 8, 2022

iandol commented Dec 3, 2016 •

edited

Loading

iandol commented Dec 5, 2016 •

edited

Loading

shixuguo commented Jan 15, 2018 •

edited

Loading

ociule commented Jul 1, 2019 •

edited

Loading