Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header is messy #184

Open
kwinkunks opened this issue Jan 6, 2022 · 2 comments
Open

Header is messy #184

kwinkunks opened this issue Jan 6, 2022 · 2 comments
Labels

Comments

@kwinkunks
Copy link
Member

The new Pandas approach just build a dataframe from a LAS file -- unless you build the well yourself, in which case it's a dictionary object. The Header class is still there in header.py but I don't think it's used for anything. So it's polymorphic, and confusing.

Key philosophy: welly is not a representation of LAS files. That's what lasio is for.

Proposal: welly should regain its Header object, even if it simply stores everything in a DataFrame. The header object should have good LAS -> Header and Header -> LAS actions, but also be constructable and serializable by other means.

@kwinkunks kwinkunks added the bug label Jan 6, 2022
@kwinkunks
Copy link
Member Author

I had another go at fixing this yesterday, and wanted to capture some notes. Dealing with the current well header tricky for a few reasons:

  • There's no Header object to take care of the header interface, so you have to know how the header is structured.
  • The header is a DataFrame with no useful index, so you have to make complex queries (eg to find records matching a given section and mnemonic)
  • The header is basically a LAS file header, breaking the model of welly as a format-free representation. (lasio is a LAS file representation, but welly is supposed to be format-independent.)
  • The header code is spread out across well.py, utils.py and (maybe? not sure) header.py. The code in utils.py is accessed by curves and locations (I think), that's why it's in there (I think).

One thing I tried, which I thought would help, was using section and mnemonic as a MultiIndex in the header DataFrame (reasoning: the integer index is totally arbitray and not useful to users), but I couldn't quite get the various bits of machinery working with it. This would at least make the view of the header more useful.

I'd be interested to know what you think about this, @patrick-reinhard. I'm torn on whether the header needs to be useful in v0.5.0, but I think it is something I need to fix soon, because it's going to have consequences for plugging in dlisio.

@patrick-reinhard
Copy link
Collaborator

Hi Matt,

Thanks for putting some thought into this. Sharing my thoughts:

  • There's no Header object to take care of the header interface, so you have to know how the header is structured.

I agree there should be interfacing possible for the header attribute of the well. You can now utils.get_header_item() and well.add_header_item(). These should be put into one place. I would propose placing it in header.py and adding wrapper methods on Well.

  • The header is a DataFrame with no useful index, so you have to make complex queries (eg to find records matching a given section and mnemonic)

Agreed, to me it makes sense to set mnemonic and/or section as the index.

  • The header is basically a LAS file header, breaking the model of welly as a format-free representation. (lasio is a LAS file representation, but welly is supposed to be format-independent.)

Header is currently a tabular frame, as I think it should be represented, because that's how you would find it in well reports or well data. A frame is a generic format but I agree it is indeed somewhat tied to LAS format because we copy 1:1 the names of the lasio output.

['original_mnemonic', 'mnemonic', 'unit', 'value', 'descr', 'section']

Those column names to make make sense, and we could parse any other header data (from dlisio) to that same format.

  • The header code is spread out across well.py, utils.py and (maybe? not sure) header.py. The code in utils.py is accessed by curves and locations (I think), that's why it's in there (I think).
  1. All header related code is indeed in well.py, the utils.get_header_item() is in that place because utils.lasio_get() was there before. It makes sense to move it.
  2. header.py is currently not used, imported or called by any of the code. The class, from my view, can be removed. The header related methods could be placed here.

One thing I tried, which I thought would help, was using section and mnemonic as a MultiIndex in the header DataFrame (reasoning: the integer index is totally arbitray and not useful to users), but I couldn't quite get the various bits of machinery working with it. This would at least make the view of the header more useful.

Agreed. If you want to push your branch I can have a look at that multindexing. I think this could be a change we can put in 0.5.1 if we get it right.

I'd be interested to know what you think about this, @patrick-reinhard. I'm torn on whether the header needs to be useful in v0.5.0, but I think it is something I need to fix soon, because it's going to have consequences for plugging in dlisio.

One last thought what I think is missing from the design of Header, it just being a dictionary, is the import/export reproducibility. The Header constructor discards the unit, description and section upon loading, if you want to import a well, do things to it and export, your header will have lost useful information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants