Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save to excel #262

Merged
merged 17 commits into from
Oct 2, 2023
Merged

Save to excel #262

merged 17 commits into from
Oct 2, 2023

Conversation

aulemahal
Copy link
Collaborator

@aulemahal aulemahal commented Sep 26, 2023

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • (If applicable) Documentation has been added / updated (for bug fixes / features).
  • (If applicable) Tests have been added.
  • This PR does not seem to break the templates.
  • HISTORY.rst has been updated (with summary of main changes).
    • Link to issue (:issue:number) and pull request (:pull:number) has been added.

What kind of change does this PR introduce?

New io.to_table and io.save_to_table for saving datasets to dataframes / csv / excel / etc.

This adds support for multi-column and multi-sheet to ds.to_dataframe().

It also supports adding auxiliary coordinates as columns in the output table, beside the data variables. This is actually the most complex part of the code and it might not cover all cases 🙄...

I also sneaked in a little fix for save_to_netcdf, to allow compute=False. And I took coerce_attrs out of the save function to reduce code duplication.

Does this PR introduce a breaking change?

No.

Other information:

More testing and doc to come.

@aulemahal
Copy link
Collaborator Author

aulemahal commented Sep 28, 2023

The doc issue comes from xclim and has been fixed in Ouranosinc/xclim@c846ca6.

@aulemahal
Copy link
Collaborator Author

Does @sarahclaude or anyone else already have code to add some kind of table of contents to the excel ?

I can write something simple if not.

@sarahclaude
Copy link
Collaborator

I had done this in previous code;

def explication_sheet(path): #creates excel file with path and adds readme sheet
    df1 = pd.DataFrame(data={'column_title': ['experiment_indicator_percentile'],
                             'experiment': ['ssp245 or ssp585'],
                             'percentile': ['10 or 50 or 90'],})

    with pd.ExcelWriter(path, engine='openpyxl') as writer:
        df1.to_excel(writer, sheet_name='readme', index=False)
    return None

df1 could be provided by user in the argument.

And I had this section in the loop that runs through the datasets;

df = pd.DataFrame(data={'indicator_abbreviation': list(ds.data_vars),
                                                'indicator_long_name': [ds[v].attrs['long_name'] for v in
                                                                        ds.data_vars]})
df = df.drop_duplicates(subset=['indicator_long_name'])
df['indicator_abbreviation'] = df.apply(lambda row: '_'.join(row[0].split('_')[1:-1]), axis=1)
with pd.ExcelWriter(path, engine='openpyxl', mode='a', if_sheet_exists='overlay') as writer:
    df.to_excel(writer, sheet_name='readme', index=False, startrow=4)

xscen/io.py Outdated Show resolved Hide resolved
@aulemahal
Copy link
Collaborator Author

aulemahal commented Sep 29, 2023

Last commits added a simple toc. It can be localized (column names and long_names when available) with xclim's metadata_locales option.

If the make_toc function is not enough for the use, one can still pass a DataFrame through the add_toc argument. However, I kept it simple by only allowing a single DataFrame, whereas Sarah's example puts two tables in the first page.
I believe the same idea for a more complex TOC could still be implemented like this (in your own script):

toc_tables = ... #  Complex TOC generation...
# Write toc tables to the Content sheet
with pd.ExcelWriter(path, mode='a', if_sheet_exists='overlay') as writer:
    startrow = 0
    for toc_table in toc_tables:
        toc_table.to_excel(writer, sheet_name='Content', startrow=startrow)
        startrow += toc_table.length + 2

xs.io.save_to_table(ds, path, mode='a')  #  mode 'a' to append to file passed to pandas.

Copy link
Contributor

@juliettelavoie juliettelavoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in the new user request group, so I don't know if this fit for purpose. But, I played around withto_table with a few of my data and it seems to work well.

Copy link
Collaborator

@sarahclaude sarahclaude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flexibility of sheet/row/colums is great for future requests and works well!
Could accept strings in coords since all other similar arguments do

@aulemahal aulemahal merged commit 8299494 into main Oct 2, 2023
6 checks passed
@aulemahal aulemahal deleted the to-excel branch October 2, 2023 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants