Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xlsx fails to load due to non-standard stylesheet and workbook locations #79

Open
laurielounge opened this issue Oct 17, 2017 · 5 comments

Comments

@laurielounge
Copy link

laurielounge commented Oct 17, 2017

Hi gang,

test_sheet_new.xlsx

The attached file is a valid spreadsheet (can open in Excel) but stylesheet.xml and worksheet.xml, rather than being in /xl/, are in / and consequently, the sheet cannot be opened.

To test, run parse on the file. Error message indicates stylesheet.xml cannot be found.

@laurielounge
Copy link
Author

laurielounge commented Oct 18, 2017

More info - my first thought seems to have been wrong.
Error looks like:

no subfile named /stylesheet.xml at ../perl5/Spreadsheet/ParseXLSX.pm line 967

Version
$Spreadsheet::ParseXLSX::VERSION = '0.27';

ParseExcel
our $VERSION = '0.65';

after unzipping the xlsx file into its own directory
find . -type f looks like:

./workbook.xml
./_rels/.rels
./_rels/workbook.xml.rels
./[Content_Types].xml
./stylesheet.xml
./xl/worksheets/data.xml

cat _rels/.rels

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

  <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="workbook.xml" Id="rw_1"/></Relationships>

cat _rels/workbook.xml.rels

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

  <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="/xl/worksheets/data.xml" Id="r_1"/>
    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>

@laurielounge
Copy link
Author

Just for emphasis:

cat _rels/workbook.xml.rels

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

  <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="/xl/worksheets/data.xml" Id="r_1"/>
    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>

and
cat workbook.xml

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

  <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <sheets>
      <sheet name="Data" sheetId="1" r:id="r_1">
        <bookViews>
          <workbookView/></bookViews></sheet></sheets></workbook>

Looking at _rels/workbook.xml.rels, there is no actual Target="stylesheet.xml", at least, not in that folder.

The only other reference to stylesheet:
grep -r stylesheet *
comes from [Content_type].xml:
<Override PartName="/stylesheet.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>

@laurielounge laurielounge reopened this Oct 18, 2017
@laurielounge
Copy link
Author

laurielounge commented Oct 18, 2017

Further...

If I edit _rels/workbook.xml.rels, and change
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>

to
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="/stylesheet.xml" Id="r_2"/></Relationships>

The file parses.I'm guessing here, but I suspect when Excel opens the file, it has a harder look for stylesheet.xml than ParseXLSX is doing. Another possibilty is that it just throws up it's hand and says open the file anyway, without the stylesheet info. Will experiment. Looking at the info in stylesheet, and the way Excel displays the file, I suspect it's the former (stylesheet seems to have been applied).

@laurielounge
Copy link
Author

Further still...

If we define "valid spreadsheet file" as one successfully read by Excel, then this is a valid spreadsheet file.

Once again
find . -type f reveals

./workbook.xml
./_rels/.rels
./_rels/workbook.xml.rels
./[Content_Types].xml
./stylesheet.xml
./xl/worksheets/data.xml

grep -r Target * reveals

_rels/.rels:    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="workbook.xml" Id="rw_1"/></Relationships>
_rels/workbook.xml.rels:    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="/xl/worksheets/data.xml" Id="r_1"/>
_rels/workbook.xml.rels:    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>

We see that, in _rels/, we have a workbook.xml.rels file. _rels/.rels defines a target of workbook.xml (no ".rels").
_rels/workbook.xml.rels defines a "stylesheet.xml". Not "/stylesheet.xml"

I'm not currently sure which target is looked for first, but one assumes it's 'stlyesheet.xml', as that's where it's erroring. One assumes that fixing the location of stylesheet (which I will do manually do next) will cause the error to become "not found workbook.xml".

@laurielounge
Copy link
Author

Right. I can confirm that if I manually edit _rels/workbook.xml.rels and adjust
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>
to read
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="/stylesheet.xml" Id="r_2"/></Relationships>

ParseXLSX can parse the sheet. I can print the value in $ws->get_cell(0,0) correctly.

Excel can still successfully read the file (it is still a "valid workbook")

I see a possible approaches here.
Pre-check to see if all of the targets exist and are in the appropriate places, adjust if necessary. Is the xml read once and then passed from routine to routine in ParseXLSX? If so, it shouldn't be too hard to see if the stylesheet is where the xml says it is.

I'll continue playing with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant