Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change nucleicacids results dict #3744

Closed
orbeckst opened this issue Jun 30, 2022 · 1 comment · Fixed by #3958
Closed

change nucleicacids results dict #3744

orbeckst opened this issue Jun 30, 2022 · 1 comment · Fixed by #3958

Comments

@orbeckst
Copy link
Member

Is your feature request related to a problem?

The new experimental nucleicacids analysis module contains the

class NucPairDist(AnalysisBase):
class as its base class. It presents results in a Results dict. The content is different from what other AnalysisBase-based classes use in that

  • it contains "times" which duplicates the .times attribute coming from AnalysisBase
  • has integers as additional keys where "first index is selection, second index is time"

Specifically, for each of the selections, a dict is stored

for i in range(self._n_sel):
self.results[i] = np.array(self._res_dict[i])
and res_dict comes from https://github.com/MDAnalysis/mdanalysis/blob/c01d82bb86b480c8b2ae87f36497e7fc0a726c84/package/MDAnalysis/analysis/nucleicacids.py#L128-L129 where dist is an array of distances.

The "selections" are the individual base pairs so in order to query the distance of base pair j at time frame t I would use results[j][t].

Please correct me, @ALescoulie , if I misrepresented anything here.

Describe the solution you'd like

@IAlibay 's suggestion (from #3735 (review)) :

would it make sense to consider switching this to instead be results.pair_distances (maybe even making it a pairs x times 2D ndarray?

Additionally, I suggest to transpose so that the first axis corresponds to time frame and the second axis to pair index:

results.pair_distances[:, j]    # time series of pair j
results.pair_distances[-1]      # all distances at the last time frame

The "one observation per row" layout is canonical and also used e.g. in the RMSD results.

Describe alternatives you've considered

Don't change anything... but this makes this module behave and feel different from the others and tools such as mdacli might have a hard time working with it — comments by @PicoCentauri and @joaomcteixeira especially welcome on how to best structure results.

Additional context

  • Nuclinfo Major Pair and Minor Pair overhaul #3735 (review)
  • PR Higher Performance nuclinfo #3611 where nucleicacids was implemented
  • I am willing to ignore semver here and change Results and anything else that needs changing in a breaking manner between 2.2 and 2.3, based on the fact that the module is in its infancy and is effectively experimental and likely has not had much uptake yet, so we're not going to hurt many users. It's much more important to define the API well (and not be bogged down with deprecations and crud code).
@orbeckst orbeckst added this to the 2.3.0 milestone Jun 30, 2022
@orbeckst orbeckst changed the title change nuclearacids results dict change nucleicacids results dict Jul 5, 2022
@IAlibay
Copy link
Member

IAlibay commented Aug 27, 2022

I'm moving this to target 2.4.0 instead.

@IAlibay IAlibay modified the milestones: 2.3.0, 2.4.0 Aug 27, 2022
IAlibay added a commit that referenced this issue Dec 16, 2022
…ir_distances [Issue 3744] (#3958)

Fixes #3744
* Deprecated `times` and indices based results
* Remove populating of times separately from the main class
* Add comments to remove deprecations in 2.5.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants