-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General suggestions for code cleanup/improvements #48
Comments
One practical thought on this. I think the code should have an understanding of what the various analysis methods "mean". Not sure if there's something in place but e,g, all_methods = set(['TI','TI-CUBIC','DEXP','IEXP','GINS','GDEL','BAR','UBAR','RBAR','MBAR']) other methods if applicableselected_method = set(P.methods) # or via argparse if not selected_method & bar_methods: # intersection if u_klt is None: etc. |
Yes, I agree with all of this, and most especially the issue relating to global variables. Probably we should break up this issue into several separate issues which make specific proposals for code changes that can be handled relatively independently, i.e.:
|
Ok, that's certainly a plan but the practical problem is that this affetcs On 18 December 2015 at 19:09, David L. Mobley [email protected]
|
Ok, I have created a branch 'refactor'. Done so far.
|
OK. It may take me a couple of days to get to review this one. What needs Thanks. On Mon, Jan 4, 2016 at 4:00 AM, Hannes Loeffler [email protected]
David Mobley |
The interface i.e. inputs and outputs. At the moment P is passed in which is, frankly, a rather "heavy" data structure. It also seems to be (ab?)used to also contain additional data items beyond what arparse puts in there. The current code heavily relies on this data structure to be present globally so it is in essence another global. I would tend to only pass around data that is really needed. Further cleanup of the code and separation into smaller units will make this obvious. On the output side the code is mostly where it should be. But we will need a mechanism to report back the parsers "capabilities". Maybe a separate data structure that is compared to other data times. e.g. P.methods to determine what analysis can and should be done. Overall, I would think this is not high priority as it will become clearer during refactoring what needs to be done. The new code will need heavy testing before a merge and release. I tend to only test the AMBER side because I am most familiar with it but I will have a look through the example date as well to see if I haven't broken the code in an obvious way. BTW, what is the need to take extras steps for the Desmond parser? Looking into uncorrelate() it seems there is effectively only one line of code different. Can't this be done by the parser itself? |
I think I have an idea how units should be handled. Essentially, the parsers fill up dhdlt and u_klt and report back what units they are in. The parsers should do no further conversion but will need to also report the temperature to compute beta. All conversion are then done in the main code as required and requested by the user |
That sounds reasonable. But I should investigate a bit on HOW the units Otherwise I agree with you though. On Thu, Jan 7, 2016 at 8:01 AM, Hannes Loeffler [email protected]
David Mobley |
Following the duscussion on openmm/openmm#680 On 8 January 2016 at 01:29, David L. Mobley [email protected]
|
I had a quick look into pint and it looks workable. Possible downsides: On 8 January 2016 at 10:15, Hannes Loeffler [email protected]
|
OK, thanks. I did put a follow-up question in that thread. I've used On Fri, Jan 8, 2016 at 2:15 AM, Hannes Loeffler [email protected]
David Mobley |
I have addressed most of these except for th unit problem. The harder part will now be to separate the current functions into more useful chunks (probably by means of analysis method and separte plotting from non-plotting functionality) and a library. I still not convinced that the tool needs a unit managment via a specialised module. Do you expect future functionality that is more than energy conversion at one single point in the code? |
Great. Splitting in the way you describe also makes sense, I think.
I am not totally convinced that it does either, though I think in general I'll probably try to be including these from scratch when I write new code that will be dealing with multiple sets of units - simply because having everything carry units makes it SO MUCH easier to make sure you have appropriate units when/where needed. |
My main worry was that this wouldn't work well with the numpy arrays that I should have taken a closer look though and read On 13 January 2016 at 16:59, David L. Mobley [email protected]
|
Ah, that's a good point. Pint does indeed look promising. On Wed, Jan 13, 2016 at 11:32 AM, Hannes Loeffler [email protected]
David Mobley |
FWIW, seconding all the the points raised in this discussion. I think it's a huge boon if tools written in Python can actually be used directly from a Python session, but as currently written the core component of this module is only really usable as a command-line script. What's more, the code is extremely hard to follow, with e.g. functions that operate on names defined outside of their scope ( Given this, I'm currently ripping out the things I need from it manually to get into a form I can actually use. I'll push the result to GitHub and perhaps that can help to move this issue along. I think this library addresses an important void, but could use some heavy refactoring. |
@dotsdl - I'm very on board with major changes; basically there's a ton that needs to be done along those lines and unfortunately I haven't had anyone in the group who is actively doing organization/improvements/maintenance right now (though @Limn1 will probably be getting involved). @halx has done a lot that we didn't get to review/discuss yet. Perhaps a call would be warranted?
Elaborating - I've had a bit of a look but we haven't really tested much yet, and it raises larger architecture issues/refactoring issues that I haven't been able to think through. Maybe we should do a separate issue to think through the best ways to do this. Or we could schedule a call. Interested in being involved? @halx , are you still thinking about this and interested in being involved? My group doesn't have enough real "code design" experience, I think, yet, so I need to involve outside expertise. The first version of this that we "released" was way too much done just by one student who didn't have the needed expertise. @mrshirts has also mentioned that he's interested in being more involved going forward. |
@dotsdl - do please give us what you end up with regardless, as even if Thanks. On Wed, Jun 15, 2016 at 1:57 PM, Nathan Lim [email protected]
David Mobley |
Hi, all- I'm definitely happy to help review code, especially at the physical correctness and functionality level. I'm not as experienced in larger architecture levels.
I don't think a general unit manager will be needed. There is a very low probability of ever needing anything other than energy units of either kJ/mol, kcal/mol, or kT. Something specialized for this purpose should be fine. |
@davidlmobley I'm happy to be involved in this process. I think converging on a common codebase for doing low-level things like parsing output files from different MD engines and extracting the correct quantities for FEP/TI is important for progress in the larger field, since these should be entirely solvable problems. For my needs at the moment I only need to extract I do have more of a software engineering focus, so I'm happy to help in terms of library architecture. As I said, I'll push the library that results from my work to a repo on GitHub and we'll see if a path forward emerges from it. We'll also then have something concrete to iterate on. That work will definitely require review from @davidlmobley and @mrshirts for physical correctness, since I'm less well-versed in caveats/pitfalls of alchemical free energy calculations. |
Yes, I have started with some light-weight changes in the code and I would think we should use this as a basis for discussion. |
@dotsdl , @halx - I've had this on hold for a while because of a super crazy summer (including a new baby, a visitor on sabbatical, etc.) and current grant proposal deadlines. But I'll be able to deal a bit with it again after the first week in October. Could we talk around that time to develop a better plan for the future, perhaps also with @mrshirts ? |
@davidlmobley absolutely. I'm in the midst of writing my dissertation (defending Nov. 4), so of limited use on development fronts. But been working on a library called tentatively alchemlyb that contains the machinery I used to do data munging in Python from GROMACS output and do MBAR and TI with what appear to be best-practices. I pushed this library to GitHub today, but the API is far from stable, as it's mostly things I built over the course of the past couple of months. Starting November I and two others in our lab will be employed to get this library in a form that is easy for everyone to use, complete with tests, docs, etc., as we have a need for a central library of solid implementations for the different alchemistry projects we are doing ourselves. We will definitely be interested in both you and @mrshirts input to ensure our implementations are scientifically and statistically sound. We hope to keep the library simple and straightforward enough that it can be used both interactively and as part of larger postprocessing pipelines. The goal is simplicity and clarity. Should we set up a time to discuss general needs more directly? |
@dotsdl - this actually sounds really interesting. I'd been thinking to some extent that what this really needs is to be re-framed into a library format rather than a stand-alone tool which attempts to do everything and keeps having to get extended to fit with different codes. There might then still be a front end which would interact with different tools... Maybe move this to e-mail so we can schedule, looping in @mrshirts and @halx ? |
@davidlmobley happy to! You can reach me at [email protected]. Feel free to shoot me a message and we'll get rolling. |
Yes, let's got with email to arrange a discussion in October. |
Looks like I haven't paid much attention to this recently... Where are things standing at the moment? |
@dotsdl - any updates? Should we talk again? I'm still much more enthusiastic about the library approach you were talking about working on, but I haven't seen much action there, and obviously we'll need to do something fairly soon. |
@davidlmobley I've been working on a proof-of-concept set of components and structure for I'll post the issue today and follow it up with the detailed proposal + gist in the coming days. Apologies for the delay. |
Hi, all-
I think my todos are to modularize the calculation in alchemical-analysis
and separate it from reading the information, as we want to try to reuse
things as much as possible (since alchemical-analysis is fairly robust to a
lot of data failure modes, if hard to work with). I will try to get to
that next week -- grades should be submitted today.
…On Wed, Dec 14, 2016 at 10:21 AM, David Dotson ***@***.***> wrote:
@davidlmobley <https://github.com/davidlmobley> I've been working on a
proof-of-concept set of components and structure for alchemlyb
<https://github.com/alchemistry/alchemlyb> this past week, and I'm hoping
to be able to put both the API proposal that it demonstrates as well as a
gist showing how it works in practice during the coming week. Speaking to
@orbeckst <https://github.com/orbeckst>, we also wanted to draft a white
paper that clearly states the problem, aims, and our solution once
consensus on the proposal is reached on the issue tracker.
I'll post the issue today and follow it up with the detailed proposal +
gist in the coming days. Apologies for the delay.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEE31GrJ1t3TVCXCGD4g_fx3u28FNFDRks5rICWVgaJpZM4G0Vg5>
.
|
The parsers have already been separated in the
https://github.com/MobleyLab/alchemical-analysis/tree/refactor branch .
All that needs to be done now is to define the new interface and contract.
On 14 December 2016 at 17:58, Michael Shirts <[email protected]>
wrote:
… Hi, all-
I think my todos are to modularize the calculation in alchemical-analysis
and separate it from reading the information, as we want to try to reuse
things as much as possible (since alchemical-analysis is fairly robust to a
lot of data failure modes, if hard to work with). I will try to get to
that next week -- grades should be submitted today.
On Wed, Dec 14, 2016 at 10:21 AM, David Dotson ***@***.***>
wrote:
> @davidlmobley <https://github.com/davidlmobley> I've been working on a
> proof-of-concept set of components and structure for alchemlyb
> <https://github.com/alchemistry/alchemlyb> this past week, and I'm
hoping
> to be able to put both the API proposal that it demonstrates as well as a
> gist showing how it works in practice during the coming week. Speaking to
> @orbeckst <https://github.com/orbeckst>, we also wanted to draft a white
> paper that clearly states the problem, aims, and our solution once
> consensus on the proposal is reached on the issue tracker.
>
> I'll post the issue today and follow it up with the detailed proposal +
> gist in the coming days. Apologies for the delay.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#48#
issuecomment-267096348>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AEE31GrJ1t3TVCXCGD4g_
fx3u28FNFDRks5rICWVgaJpZM4G0Vg5>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH3aWgl3Y_DsdIsD6zpfrXsenr8PtC7qks5rIC5FgaJpZM4G0Vg5>
.
|
Sounds good, @dotsdl |
There is currently a lot of cruft growing in the code which is basically down to design choices.
Here some suggestions:
The parsers should be designed in a "polymorphic" fashion such that the main code does not need to know anything about the individual parser codes. This implies that a clear interface and a specification must be designed to define what data is passed in and what is passed out and in what form. It is the responsibility of the parser code to signal back what it can and cannot do. E.g., as it stands now, dhdlt should be None if MD code can't do TI or u_klt should be None if it *BAR analysis can't be done. Energies should be converted to a "canonical" form or it should be signalled back to the main code what the units are. This will relieve the main code maintainer to do code-specific clean-up operations.
Optparse should be replaced with argparse.
The globals need to go as this is really bad programming style. The problem with this is that all these variables are accessible throughout all of the module which mean that they can be modified anywhere. This can create hard to track down bugs (all you need is a second person messing around there...).
The text was updated successfully, but these errors were encountered: