Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove derivation of "circular" variable in tracing #1043

Open
ns-rse opened this issue Dec 10, 2024 · 1 comment
Open

Remove derivation of "circular" variable in tracing #1043

ns-rse opened this issue Dec 10, 2024 · 1 comment
Assignees
Labels

Comments

@ns-rse
Copy link
Collaborator

ns-rse commented Dec 10, 2024

In tracing/ordered_tracing.py we tracing_stats.pop("circular") so that the old column that indicates whether a molecule is linear or circular is not returned (instead the number of grain_ends is reported, if this is 0 then a molecule is circular/closed loop).

Rather than pop() from the data frame we should remove the code that creates it in the first place as there is no point spending time calculating something if its not subsequently used.

The initial derivation of this is done by the tracing.ordered_tracing.linear_or_circular() function and is needed since the value is conditionally used to determine whether tracing.tracingfuncs.reorderTrace.circularTrace() or tracing.tracingfuncs.reorderTrace.linearTrace() is used.

But somewhere this value is added to the dataset that is returned (I think it might be the grainstats dictionaries but haven't narrowed it down yet)

@ns-rse ns-rse added the v2.3.0 label Dec 16, 2024
@ns-rse ns-rse added this to the v2.3.0 milestone Dec 16, 2024
@ns-rse ns-rse added v2.4.0 and removed v2.3.0 labels Dec 17, 2024
@ns-rse ns-rse added v2.3.0 and removed v2.4.0 labels Dec 17, 2024
@ns-rse
Copy link
Collaborator Author

ns-rse commented Dec 17, 2024

I've been looking through this and there are two places where the topostats.tracing.linear_or_circular() function is called...

  1. run_nodestats_tracing()
  2. OrderedTraceTopostats().run_topostats_tracing()

Both values are used to populate the value of a dictionary (self.mol_tracing_stats), which is initialised as "circular": None (see line 67 for the OrderedTraceNodestats class and 730 for OrderedTraceTopostats class).

Its simple enough to remove these from the dictionaries but this has consequences further down the processing pipeline when we come to undertake splining as this requires knowledge of whether the molecule is linear or circular.

My initial though was that we could potentially drop in the grain_endpoints value here because any non-zero value here evaluates to True and 0 evaluates to False...

test = [0, 1, 4]

for x in test:
    if x:
        print(f"{x} evaluates to True")
    else:
        print(f"{x} evaluates to False")

0 evaluates to False
1 evaluates to True
4 evaluates to True

...and so we wouldn't have to change any of the logic of whether linear or circular molecules are being calculated.

However, after checking where grain_endpoints is calculated its not within any of the classes or methods of ordered_tracing. Its actually calculated earlier in during by the disordered_tracing.trace_image_disordered() function from the conv_pruned_skeleton property.

I think we might therefore be able to add

disordered_trace_crop_data[f"grain_{cropped_image_index}"]["grain_endpoints"] = np.int32((conv_pruned_skeleton == 2).sum())

...and use the this value, which is passed into both run_nodestats() and run_ordered_tracing() via topostats_object["disordered_traces"] = disordered_traces_data, and then we can use disordered_tracing_data["grain_endpoints"] and not have to call linear_or_circular() to determine the shape.

We would have to make sure the ordered_tracing dictionary returned by run_ordered_tracing() also contained this as it is passed on to run_splining().

This is I think do-able, but its not going to get done today as its going to take me more than an hour to do it and I have activities planned for this evening so won't be putting in extra hours.

With regards to v2.3.0 release, it doesn't affect the output because we already return grain_endpoints rather than circular in the returned data frames, this would purely remove a bunch of calculations that we don't need so it makes no difference to the end user.

@ns-rse ns-rse added v2.4.0 and removed v2.3.0 labels Dec 17, 2024
@ns-rse ns-rse removed this from the v2.3.0 milestone Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant