Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request:Docx native capabilities #34

Open
tomduck opened this issue Dec 1, 2016 · 26 comments
Open

Feature Request:Docx native capabilities #34

tomduck opened this issue Dec 1, 2016 · 26 comments

Comments

@tomduck
Copy link
Owner

tomduck commented Dec 1, 2016

Internal links now use docx's native capabilities (closing Issue #25, submitted by @krnlyng). Figure and references numbers should do the same. Currently, the numbers are hard-coded.

@tomduck
Copy link
Owner Author

tomduck commented Dec 1, 2016

Below is a copy of the docx document.xml produced by pandoc for a figure with a caption. I have formatted the xml for easier viewing. It looks like the caption is outside of the Figure and in its own paragraph. Is this how Word does it?

<?xml version="1.0" encoding="UTF-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing">
  <w:body>

    <w:p>
      <w:pPr><w:pStyle w:val="FigureWithCaption" /></w:pPr>
      <w:r>
        <w:drawing>
          <wp:inline>
            <wp:extent cx="1473200" cy="1460500" />
            <wp:effectExtent b="0" l="0" r="0" t="0" />
            <wp:docPr descr="Plot." id="1" name="Picture" />
            <a:graphic>
              <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                <pic:pic>
                  <pic:nvPicPr>
                    <pic:cNvPr descr="plot.png" id="0" name="Picture" />
                    <pic:cNvPicPr>
                      <a:picLocks noChangeArrowheads="1" noChangeAspect="1" />
                    </pic:cNvPicPr>
                  </pic:nvPicPr>
                  <pic:blipFill><a:blip r:embed="rId21" /><a:stretch><a:fillRect /></a:stretch></pic:blipFill>
                  <pic:spPr bwMode="auto">
                    <a:xfrm><a:off x="0" y="0" /><a:ext cx="1473200" cy="1460500" /></a:xfrm>
                    <a:prstGeom prst="rect"><a:avLst /></a:prstGeom>
                    <a:noFill />
                    <a:ln w="9525"><a:noFill /><a:headEnd /><a:tailEnd /></a:ln>
                  </pic:spPr>
                </pic:pic>
              </a:graphicData>
            </a:graphic>
          </wp:inline>
        </w:drawing>
      </w:r>
    </w:p>
    
    <w:p>
      <w:pPr><w:pStyle w:val="ImageCaption" /></w:pPr>
      <w:r><w:t xml:space="preserve">Plot.</w:t></w:r>
    </w:p>
    
    <w:sectPr />

  </w:body>
</w:document>

@krnlyng
Copy link

krnlyng commented Dec 3, 2016

this seems to be true, figure captions are just paragraphs with a different style afaik

@iandol
Copy link

iandol commented Dec 3, 2016

There was talk of using a similar mechanism as the MMD ODT writer (where figure and caption are contained in a text box). But I think that never really happened as using keep-with-next works well enough...

@tomduck
Copy link
Owner Author

tomduck commented Dec 5, 2016

Hi @iandol. I'm not really sure what you mean by "keep-with-next". Can you please explain?

My understanding is that docx natively supports figure captions and automatic figure numbering. Pandoc-fignos and friends should be using that. What I need is a model document.xml to emulate. If either you or @krnlyng could supply it, that would be very helpful (I don't have a copy of Word myself). There are some tricks I can use to work around the current behaviour of the docx pandoc writer.

The model docx file should inculde a single-captioned figure with automatic figure numbering turned on (i.e., nothing hard-coded). A short sentence with a reference to that figure would be helpful as well. To obtain the document.xml, simply unzip the docx file and look in the word directory.

@iandol
Copy link

iandol commented Dec 5, 2016

I'm not using fignos yet, so I can't comment on whether it is outputting correctly formed DOCX files yet. But just in case, here is a document.xml. Generated in Word 2016. I dragged an image into a new blank document, then added a default caption with some caption text "This is a simple test". Then added a paragraph in front of the picture "This is a simple test to see if [Figure 1] is hyperlinked?" — The bracketed text is a cross reference I inserted. Line 101 of the XML is where the caption paragraph starts:

https://gist.github.com/iandol/a3d7a456776002719e2ea139e681790e

What I meant above is that MultiMarkDown wraps the figure and caption in a text box, but Pandoc doesn't. Pandoc uses a paragraph style mechanism to keep the caption always underneath the figure. In Word if you select "keep-with-next" it makes two adjacent styles "stick together", so it behaves as if they are grouped in a text box. MMD uses a text frame/box in DOCX and ODT, I got confused as I mentioned in my comment above Pandoc ODT writer, but I meant MMD (MMD also uses auto-numbering for figures).

Here is some tangential discussion about using frames for captioned figures in Pandoc (for ODT output but some discussion on DOCX):

jgm/pandoc#2401

And possibly: jgm/pandoc#3177 may have an influence on fignos as it is resolved...

@tomduck
Copy link
Owner Author

tomduck commented Dec 5, 2016

Thank you, @iandol. This is enormously helpful. I will look at the docx and see what pandoc-fignos can do to better support docx. We can have a look into odt after that.

Thanks for the heads up on jgm/pandoc#3177. I have subscribed to it and will have pandoc-fignos and friends adjust to the new behaviours as they emerge.

@iandol
Copy link

iandol commented Dec 6, 2016

I can also add the docx/odt output from MMD which does generate text frame and auto-numbered figure legend if it helps. As pointed out on jgm/pandoc#2401 one issue is hard coding the English "Figure" text in the legend, and I'm not sure if the easiest solution isn't allowing a YAML variable for the user to change.

@tomduck
Copy link
Owner Author

tomduck commented Dec 10, 2016

A possible solution is to replace entirely pandoc's Image element with custom ooxml (including the caption), and then insert custom ooxml for the figure reference. For this I need a model document.xml file.

One difficulty with what you posted, @iandol (although much appreciated), is that it does not appear that the native cross-referencing mechanism for docx was used. The figure number appears to be hard-coded. I could be mistaken. OOXml is pretty hard for a human to read.

I tried to generate it anew using LibreOffice. Unfortunately, when I imported the resulting docx file back into LibreOffice, the cross-references were broken. It turns out that this is a long-standing issue (since 2011!) with no end in sight: see here.

So, I am still needing the following to make progress: A docx with a captioned figure (using Word's native caption feature), automatic numbering of that figure, and a native reference in some text to the figure (i.e., not hard-coded).

Cheers,
Tom

@tstenner
Copy link

I've generated a sample document, hope this helps.

@torstenberg
Copy link

I just saw this pandoc filter, which would be a huge improvement to my workflow if it used the right way of producing the references in docx. However, there are many different ways to represent a figure caption number, both in the caption itself and in the text referring to it.

The document supplied by @tstenner is a good one, but I typically do not include the whole caption text in the reference but just the number. Here’s @tstenner’s example the way I typically use figures and references: Figure_example2.docx

The style for the figure caption is called „Beschriftung“ in document.xml (a German term for caption).

I guess the formatting of the figure number is encoded in Word’s format for this caption type and can be changed in Word itself. Thus, this is nothing pandoc-fignos must do if the user wants a different numbering scheme. Right? Pandoc-fignos could provide these different schemes but it would be OK to just use one default setting.

In the text reference to the figure, however, it the user chooses to include the caption text, this text is part of the content of the <w:fldChar …> tag. This is something I can’t figure out to change directly in Word, so perhaps this is something to be decided in the pandoc filtering? You will see the difference when you compare the two Word files.

I hope, this gives you sufficient material to actually implement the feature. That would be great!

Torsten

@ghost
Copy link

ghost commented Dec 21, 2017

Hello. I am trying to export from Org to DOCX, but I don't get a numeric reference. I would like to know if someone can help me. I create the LaTeX file with Org, and then do pandoc --filter pandoc-fignos -f latex+smart fignos.tex -t docx+smart -o fignos.docx. Thanks.

org-latex-pandoc+fignos-docx.tar.gz

pandoc 2.0.5 Compiled with pandoc-types 1.17.3, texmath 0.10, skylighting 0.5
Org mode version 9.1.4 (9.1.4-elpa 20171205/)
pandoc-fignos (1.0.0rc3)
pandoc-xnos (0.13)
pandocfilters (1.4.2)

@tomduck
Copy link
Owner Author

tomduck commented Jan 14, 2018

Hi, @broncodev. Pandoc-fignos only works for markdown input files.

@tomduck
Copy link
Owner Author

tomduck commented Jan 14, 2018

I have created a Wiki page to help formulate docx support in pandoc-fignos:

https://github.com/tomduck/pandoc-fignos/wiki/Development#docx-output

This page describes the progress and what needs to be done next to solve the docx problem.

@shixuguo
Copy link

shixuguo commented Jan 15, 2018

in https://github.com/tomduck/pandoc-eqnos/wiki/Development @tomduck mentioned the issue below:
Is the bookmarking done correctly? (Note to self: Why aren't the bookmark start and end encapsulated in a single paragraph? What I have coded in pandoc_eqnos.py seems to come out differently.)

yes it is correctly.

there is document about this issue
https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.bookmarkstart.aspx

@shixuguo
Copy link

in https://github.com/tomduck/pandoc-eqnos/wiki/Development @tomduck mentioned the issue below:

Is the equation ooxml in paragraph 2 correct? Should the equation be encapsulated in another element? What is best? I will likely need to override pandoc's implementation in any case.

I have created a blank docx file and write a OMML eqaution manually.
I write a y=f(x)
and the xml docment is like below:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document 
    xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" 
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
    xmlns:o="urn:schemas-microsoft-com:office:office" 
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" 
    xmlns:v="urn:schemas-microsoft-com:vml" 
    xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" 
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" 
    xmlns:w10="urn:schemas-microsoft-com:office:word" 
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" 
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" 
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" 
    xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" 
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" 
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14">
    <w:body>
        <w:p w:rsidR="00AA3459" w:rsidRDefault="00F438B5">
            <w:pPr>
                <w:rPr>
                    <w:rFonts w:hint="eastAsia"/>
                </w:rPr>
            </w:pPr>
            <m:oMathPara>
                <m:oMath>
                    <m:r>
                        <w:rPr>
                            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                        </w:rPr>
                        <m:t>y=f</m:t>
                    </m:r>
                    <m:r>
                        <w:rPr>
                            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                        </w:rPr>
                        <m:t>(x)</m:t>
                    </m:r>
                </m:oMath>
            </m:oMathPara>
            <w:bookmarkStart w:id="0" w:name="_GoBack"/>
            <w:bookmarkEnd w:id="0"/>
        </w:p>
        <w:sectPr w:rsidR="00AA3459">
            <w:pgSz w:w="11906" w:h="16838"/>
            <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="851" w:footer="992" w:gutter="0"/>
            <w:cols w:space="425"/>
            <w:docGrid w:type="lines" w:linePitch="312"/>
        </w:sectPr>
    </w:body>
</w:document>

What I suggestted is do not overwrite pandoc's implementation about equation, I think there is much more work than you have considered.

@shixuguo
Copy link

in https://github.com/tomduck/pandoc-eqnos/wiki/Development @tomduck mentioned the issue below:

How should the equation number be attached to the equation? At present the number is hard-coded. Docx's native capabilities should be used instead (i.e., something equivalent to LaTeX's \label macro).

How should the reference number be written into paragraph 4? At present the number is hard-coded. Docx's native capabilities should be used instead (i.e., something equivalent to LaTeX's \ref macro).

I have tested the original equation number in docx and I have to say, it is very stupid design.
I can create an equation.
I can insert a caption for this equation.
But when I cross-reference this caption, it show not only the equation number but also the equation itself.

If I want to implementation the correct form, I have to do a lot of dirty work, which is below:

  1. create a three-column table.
  2. insert equaiton in the second column of the table.
  3. insert caption for the equation. after this step .the equation number shows above the table.
  4. move the eqaution number to the third column of the table.
  5. other style details.

then I can cite the equation correctly.

after doing this, the xml document is like below:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document 
    xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" 
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
    xmlns:o="urn:schemas-microsoft-com:office:office" 
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" 
    xmlns:v="urn:schemas-microsoft-com:vml" 
    xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" 
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" 
    xmlns:w10="urn:schemas-microsoft-com:office:word" 
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" 
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" 
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" 
    xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" 
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" 
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14">
    <w:body>
        <w:bookmarkStart w:id="0" w:name="_Ref503754848"/>
        <w:p w:rsidR="00E836C0" w:rsidRPr="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
            <w:pPr>
                <w:pStyle w:val="Caption"/>
            </w:pPr>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> MACROBUTTON MTEditEquationSection2 </w:instrText>
            </w:r>
            <w:r w:rsidRPr="00E836C0">
                <w:rPr>
                    <w:rStyle w:val="MTEquationSection"/>
                </w:rPr>
                <w:instrText>Equation Chapter 1 Section 1</w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> SEQ MTEqn \r \h \* MERGEFORMAT </w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> SEQ MTSec \r 1 \h \* MERGEFORMAT </w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> SEQ MTChap \r 1 \h \* MERGEFORMAT </w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
            <w:bookmarkEnd w:id="0"/>
        </w:p>
        <w:p w:rsidR="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
            <w:pPr>
                <w:pStyle w:val="Caption"/>
                <w:keepNext/>
            </w:pPr>
        </w:p>
        <w:tbl>
            <w:tblPr>
                <w:tblStyle w:val="TableGrid"/>
                <w:tblW w:w="0" w:type="auto"/>
                <w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/>
            </w:tblPr>
            <w:tblGrid>
                <w:gridCol w:w="2765"/>
                <w:gridCol w:w="2765"/>
                <w:gridCol w:w="2766"/>
            </w:tblGrid>
            <w:tr w:rsidR="00E836C0" w:rsidTr="00E836C0">
                <w:tc>
                    <w:tcPr>
                        <w:tcW w:w="2765" w:type="dxa"/>
                    </w:tcPr>
                    <w:p w:rsidR="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
                        <w:pPr>
                            <w:pStyle w:val="Caption"/>
                            <w:rPr>
                                <w:rFonts w:hint="eastAsia"/>
                            </w:rPr>
                        </w:pPr>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr>
                        <w:tcW w:w="2765" w:type="dxa"/>
                    </w:tcPr>
                    <w:p w:rsidR="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
                        <w:pPr>
                            <w:pStyle w:val="Caption"/>
                            <w:rPr>
                                <w:rFonts w:hint="eastAsia"/>
                            </w:rPr>
                        </w:pPr>
                        <m:oMathPara>
                            <m:oMath>
                                <m:r>
                                    <w:rPr>
                                        <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                                    </w:rPr>
                                    <m:t>y=f(x)</m:t>
                                </m:r>
                            </m:oMath>
                        </m:oMathPara>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr>
                        <w:tcW w:w="2766" w:type="dxa"/>
                    </w:tcPr>
                    <w:p w:rsidR="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
                        <w:pPr>
                            <w:pStyle w:val="Caption"/>
                            <w:rPr>
                                <w:rFonts w:hint="eastAsia"/>
                            </w:rPr>
                        </w:pPr>
                        <w:r>
                            <w:fldChar w:fldCharType="begin"/>
                        </w:r>
                        <w:r>
                            <w:instrText xml:space="preserve"> SEQ </w:instrText>
                        </w:r>
                        <w:r>
                            <w:instrText>公弝</w:instrText>
                        </w:r>
                        <w:r>
                            <w:instrText xml:space="preserve"> \* ARABIC </w:instrText>
                        </w:r>
                        <w:r>
                            <w:fldChar w:fldCharType="separate"/>
                        </w:r>
                        <w:bookmarkStart w:id="1" w:name="_Ref503754939"/>
                        <w:r>
                            <w:rPr>
                                <w:noProof/>
                            </w:rPr>
                            <w:t>1</w:t>
                        </w:r>
                        <w:bookmarkEnd w:id="1"/>
                        <w:r>
                            <w:fldChar w:fldCharType="end"/>
                        </w:r>
                    </w:p>
                </w:tc>
            </w:tr>
        </w:tbl>
        <w:p w:rsidR="00E836C0" w:rsidRPr="00E836C0" w:rsidRDefault="00E836C0" w:rsidP="00E836C0">
            <w:pPr>
                <w:pStyle w:val="Caption"/>
                <w:rPr>
                    <w:rFonts w:hint="eastAsia"/>
                </w:rPr>
            </w:pPr>
            <w:r>
                <w:t xml:space="preserve">Ref to </w:t>
            </w:r>
            <w:bookmarkStart w:id="2" w:name="_GoBack"/>
            <w:bookmarkEnd w:id="2"/>
            <w:r>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"></w:instrText>
            </w:r>
            <w:r>
                <w:rPr>
                    <w:rFonts w:hint="eastAsia"/>
                </w:rPr>
                <w:instrText>REF _Ref503754939</w:instrText>
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"></w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="separate"/>
            </w:r>
            <w:r>
                <w:rPr>
                    <w:noProof/>
                </w:rPr>
                <w:t>1</w:t>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
        </w:p>
        <w:sectPr w:rsidR="00E836C0" w:rsidRPr="00E836C0">
            <w:pgSz w:w="11906" w:h="16838"/>
            <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="851" w:footer="992" w:gutter="0"/>
            <w:cols w:space="425"/>
            <w:docGrid w:type="lines" w:linePitch="312"/>
        </w:sectPr>
    </w:body>
</w:document>

for most docx user, no one use this stupid method, because there is a third party software as MathType.
Obviously , pandoc should not to convert equation to Mathtype format.

As my opinion, pandoc-xnos should use hard-coded number still but use the table style to layout the equation.

@ghost
Copy link

ghost commented Jan 18, 2018

@tomduck Thanks. I will try by first exporting latex to markdown then (or something similar).

@notZaki
Copy link

notZaki commented May 26, 2018

@shixuguo
Instead of a three-column table, you could try adding a '#' before the equation number to quickly format it.
I've included a gif below to show the process.
modifydocxeqn

@tomduck
Regarding the xml in the Eqnos wiki: In case it helps, below is a cleaned up xml (for the math part) that produces the end product of the above gif:

<m:oMathPara>
  <m:oMathParaPr><m:jc m:val="center" /></m:oMathParaPr>
  <m:oMath>
    <m:eqArr>
      <m:eqArrPr>
          <m:maxDist m:val="1"/>
      </m:eqArrPr>
      <m:e>
        <m:r><m:t>y</m:t></m:r>
        <m:r><m:t>=</m:t></m:r>
        <m:r><m:t>f</m:t></m:r>
        <m:r><m:t>(</m:t></m:r>
        <m:r><m:t>x</m:t></m:r>
        <m:r><m:t>)</m:t></m:r>
        <m:r><m:t>#</m:t></m:r>
        <m:r><m:t>(</m:t></m:r>
        <m:r><m:t>1</m:t></m:r>
        <m:r><m:t>)</m:t></m:r>
      </m:e>
    </m:eqArr>
  </m:oMath>
</m:oMathPara>

So the changes are:

  1. Include an <m:eqArr> environment that cover the entire equation
  2. Add the <m:eqArrPr> ... </m:eqArrPr> part
  3. Instead of a space before the equation label, place a #

The equation number (between the brackets) remains hard-coded, but I don't have a problem with that.

@tomduck tomduck changed the title Docx native capabilities should be used for figure and reference numbers Feature Request:Docx native capabilities Aug 7, 2018
@correaa
Copy link

correaa commented Feb 25, 2019

"Pandoc-fignos only works for markdown input files." . It would be great if you clarify this at the beginning of the home page. That will save a lot of frustration. Also, would it work for native (native-pandoc) format? (I couldn't make it work).

@ociule
Copy link

ociule commented Jun 14, 2019

I am very interested in what pandoc-fignos does, but work with RST. It would be great if fignos handled rst input as well.

@ociule
Copy link

ociule commented Jul 1, 2019

@tomduck FWIW, I tried to solve the To Do task on the Docx output wiki page.

I've created a word document that comes close to what desired docx output should be. It uses native docx capabilities for numbering the figures and for referencing. As asked on the wiki page, the doc has one captioned figure and one reference to it. I replaced the hardcoded 1 in "Figure 1: Caption" with a numbered caption (In Word: Reference > Insert Caption). I created the reference using Reference > Insert reference.

Here's the xml from word/document.xml:

<?xml version="1.0" encoding="UTF-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se wp14">
    <w:body>
        <w:p w:rsidR="00BD015B" w:rsidRDefault="00EF2D29">
            <w:pPr>
                <w:pStyle w:val="CaptionedFigure" />
            </w:pPr>
            <w:bookmarkStart w:id="0" w:name="fig:1" />
            <w:r>
                <w:rPr>
                    <w:noProof />
                    <w:lang w:val="fr-FR" w:eastAsia="fr-FR" />
                </w:rPr>
                <w:drawing>
                    <wp:inline distT="0" distB="0" distL="0" distR="0">
                        <wp:extent cx="914400" cy="457200" />
                        <wp:effectExtent l="0" t="0" r="0" b="0" />
                        <wp:docPr id="1" name="Picture" descr="Figure 1: Caption" />
                        <wp:cNvGraphicFramePr />
                        <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
                            <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                                <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
                                    <pic:nvPicPr>
                                        <pic:cNvPr id="0" name="Picture" descr="img1.jpg" />
                                        <pic:cNvPicPr>
                                            <a:picLocks noChangeAspect="1" noChangeArrowheads="1" />
                                        </pic:cNvPicPr>
                                    </pic:nvPicPr>
                                    <pic:blipFill>
                                        <a:blip r:embed="rId8" />
                                        <a:stretch>
                                            <a:fillRect />
                                        </a:stretch>
                                    </pic:blipFill>
                                    <pic:spPr bwMode="auto">
                                        <a:xfrm>
                                            <a:off x="0" y="0" />
                                            <a:ext cx="914400" cy="457200" />
                                        </a:xfrm>
                                        <a:prstGeom prst="rect">
                                            <a:avLst />
                                        </a:prstGeom>
                                        <a:noFill />
                                        <a:ln w="9525">
                                            <a:noFill />
                                            <a:headEnd />
                                            <a:tailEnd />
                                        </a:ln>
                                    </pic:spPr>
                                </pic:pic>
                            </a:graphicData>
                        </a:graphic>
                    </wp:inline>
                </w:drawing>
            </w:r>
        </w:p>
        <w:p w:rsidR="00BD015B" w:rsidRDefault="00EF2D29" w:rsidP="00B227A1">
            <w:pPr>
                <w:pStyle w:val="Image Caption" />
            </w:pPr>
            <w:bookmarkStart w:id="1" w:name="_Ref12885797" />
            <w:r>
                <w:t xml:space="preserve">Figure </w:t>
            </w:r>
            <w:fldSimple w:instr=" SEQ Figure \* ARABIC ">
                <w:r w:rsidR="00B227A1">
                    <w:rPr>
                        <w:noProof />
                    </w:rPr>
                    <w:t>1</w:t>
                </w:r>
            </w:fldSimple>
            <w:bookmarkEnd w:id="1" />
            <w:r>
                <w:t>: Cap</w:t>
            </w:r>
            <w:bookmarkStart w:id="2" w:name="_GoBack" />
            <w:bookmarkEnd w:id="2" />
            <w:r>
                <w:t>tion</w:t>
            </w:r>
        </w:p>
        <w:bookmarkEnd w:id="0" />
        <w:p w:rsidR="00BD015B" w:rsidRDefault="00B227A1">
            <w:pPr>
                <w:pStyle w:val="Corpsdetexte" />
            </w:pPr>
            <w:r>
                <w:t xml:space="preserve">Reference to </w:t>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="begin" />
            </w:r>
            <w:r>
                <w:instrText xml:space="preserve"> REF _Ref12885797 \h </w:instrText>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="separate" />
            </w:r>
            <w:r>
                <w:t xml:space="preserve">Figure </w:t>
            </w:r>
            <w:r>
                <w:rPr>
                    <w:noProof />
                </w:rPr>
                <w:t>1</w:t>
            </w:r>
            <w:r>
                <w:fldChar w:fldCharType="end" />
            </w:r>
            <w:r w:rsidR="00EF2D29">
                <w:t>.</w:t>
            </w:r>
        </w:p>
        <w:sectPr w:rsidR="00BD015B">
            <w:pgSz w:w="12240" w:h="15840" />
            <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0" />
            <w:cols w:space="720" />
        </w:sectPr>
    </w:body>
</w:document>

The numbering and the reference use fields. See the <w:fldSimple w:instr=" SEQ Figure \* ARABIC "> line for an example. In Word, you can view the fields with the Show Fields (Alt-F9) option. Here's the Microsoft doc page about the available fields.

Anwering the questions from your wiki page:

  1. Word seems to bookmark the "Figure { field}:" part of the caption (see the <w:bookmarkStart w:id="1" w:name="_Ref12885797" />)

  2. Can't say, not familiar enough with OOXML

  3. The number definitely should not be hard-coded, it should use the above field caption system, that numbers the figure caption using a sequence SEQ field, creating a "Figure" sequence.

  4. The reference should not be hardcoded either, but it should use the field system again, specifically the REF field. Unsurprisingly, the id of the ref is hard-coded.

Hope this helps. If you want, I can paste the XML into the wiki. Here's the docx file.

fignos_demo_fixed_output.docx

@tomduck
Copy link
Owner Author

tomduck commented Aug 26, 2019

Thanks for your work on this @ociule. I'm planning to release pandoc-fignos 2.0.0 soon. I expect to revisit the issue of docx support this fall, and the leg-work you have done will be helpful.

Cheers,
Tom

@ebranlard
Copy link

Thank you @tomduck, this is indeed a much needed feature for all of us that don't want to use Word but have to generate Word documents.

@ociule made the point that indeed, standard insertion of tables, figures, and documents are some "special" bookmarks. I feel like the important part is the following:

<w:fldSimple w:instr=" SEQ Figure \* ARABIC ">

I tried a simple docx document with one equation, and this showed up in the xml:

<w:instrText xml:space="preserve"> SEQ Equation \* ARABIC </w:instrText>

I guess this is relevant for pandoc-eqnos. For a table it's SEQ Table \* ARABIC .

@free41
Copy link

free41 commented Jan 5, 2022

Is there anything more needed for this feature? Is there anything I could do to help this along?

This is a feature I would use heavily so I would be willing to help out.

Thanks for all the work on this and many other features.

@free41
Copy link

free41 commented Jan 7, 2022

So I did some playing around and the label in the Figure caption is currently working correctly. It has the form:

<w:fldSimple w:instr=" SEQ Figure \* ARABIC "><w:r><w:t>1</w:t></w:r></w:fldSimple>

In order to make the reference to the figure work the following xml code is needed:

<w:fldSimple w:instr="REF fig1 \h"><w:r><w:t>Figure 1</w:t></w:r></w:fldSimple>
I was able to make this work by just manually changing the xml.

I think I have what I need to make the changes, but I'm having trouble actually figuring out where the changes need to be made in the code. I don't see any of the existing markup in the code, so I am a bit confused about where it comes from.

If someone can help me out with where to look, I can try to implement the changes.

@free41
Copy link

free41 commented Jan 8, 2022

I was able to get the native figure numbering working on my computer. I think I have a good start on this, but I am guessing that these changes would break things for other users. Let me know if you think I should make a pull request to work out some of these issues or if it is easier to do it here (I've never written a pull request before).

Next, you have to use the native_numbering extension which then creates the necessary field in the figure caption that can be "cross referenced" (linked) to. I have been running:

pandoc --filter pandoc-fignos -t docx+native_numbering -o test.docx test.md 

This will create the necessary field in the caption:

<w:fldSimple w:instr=" SEQ Figure \* ARABIC "><w:r><w:t>1</w:t></w:r></w:fldSimple>

Note that this looks like the Figure number is being hardcoded in, but when you open the word doc, all you have to do is select the text and hit "update field" and it will update the text inside the field.

Also of note, is that the above field is contained with the following xml block which creates a bookmark:

<w:bookmarkStart w:id="24" w:name="ref_fig:1" />

This contains the key ref_fig:1 that we need to reference in our in-text reference.

Note that all of the above is created by the pandoc writer without pandoc-xnos. That means that we actually don't need pandoc-fignos to do anything at all to our captions (Otherwise you end up with Figure X: Figure X:). On my machine I added the following to _adjust_caption()

if fmt == 'docx':
     pass

This also means that we don't need anything word specific under _add_markup(). So there is a code block that can be removed there as well.

The next step is to adjust the reference which is under the pandoc-xnos package. I added the following to _cite_replacement()at line 968:

if fmt == 'docx':
     ret = RawInline('openxml', r'<w:fldSimple w:instr="REF ref_%s \h"><w:r><w:t>%s</w:t></w:r></w:fldSimple>'%(label, refname + NBSP + text))

This results in the following xml:

<w:fldSimple w:instr="REF ref_fig:1 \h"> <w:r><w:t>Figure 1</w:t></w:r></w:fldSimple>

Which again looks like it is hard coded, but it will be able to update if say new figures are added in word after the document's creation.

Here are the outstanding issues/TODO items:

  • I have not tested this for all equation or table numbers. Not sure if it is going to work the same way.
  • This relies on the native_numbering extension. Maybe we should have a way to detect if that extension is being used, and if not keep the old behavior active.
  • The above only works on some of the more recent versions of pandoc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests