Skip to content

ImageObject

ChrisJamesCote edited this page Apr 29, 2020 · 18 revisions

Motivation

This package pertains to the following data analysis problem. The experimenter collects raw data in the form of microscopy images of certain fields of view. Images are collected for multiple channels, for example a nuclear stain, transmission, and several fluorescence stains (in our lab usually single-molecule RNA-FISH). Each field of view contains a few regions of interest (ROIs), such as biological cells, that the scientist wishes to analyze separately.

The image analysis itself may consist of several steps, with the results of later calculations possibly depending on the results of earlier analysis. For example, we might computationally identify RNA molecules of two different genes from spots in two different fluorescent dye channels. Later, we may ask if the RNA molecules of the two different genes are co-locolized in space. If we for some reason go back and adjust how the RNA molecules were identified, we would like the software to be aware that the downstream calculation must be redone and provide a means of performing that recalculation. We would also like to add certain kinds of human aided image processing.

The core of rajlabimagetools is made up of:

  • A data structure, the "Image Object", one for each ROI, to store for each analysis step: what type of analysis it was and what parameters were used, what other data the analysis depends on, and what are the results of the calculation.

  • A set of tools to view and manipulate the above data structures.

The Image Object Data Structure

The most important part of Image Object is a directed acyclic graph. Here is an example graph from a typical use case:

untitled.png

If you followed the worked example, you saw how to produce this kind of graph, but for a simpler example.

The graph represents the following story:

  • We segmented an ROI corresponding to a cell and stored the definition of that ROI in the ImageObject node.

  • We have raw images available in three channels: "cy", "tmr", and "dapi".

  • We ran image processing to estimate the location of the cell nucleus based on the "dapi" nuclear stain image and stored that data in the node "nuclearMask."

  • We ran computer image processing on the "cy" and "tmr" channel images to determine the location and number of single RNA spots and stored that data in nodes "cy:Spots" and "tmr:Spots."

  • We then used computer processing to assess the extent to which the cy-channel RNA spots are spatially close to the tmr-channel RNA spots, stored in the "colocalization" node.

  • Lastly, since automatic RNA detection is occasionally imperfect, we would like to be able to manually inspect the RNA detection and say whether the algorithm did a good job or not. This is a purely manual task, and the results are stored in the "cy:threshQC" and "tmr:threshQC" nodes.

Graph Specification

Every node in the graph contains the following:

  1. A unique label string (ex: "imageObject", "cy:Spots")

  2. connectivity information to other nodes

  3. a piece of data

There are three types of data that can be contained in a node:

  • improc2.dataNodes.ImageObjectBaseData contains information that defines the ROI in the field of view that the data structure pertains to. Contained only in the root node: "imageObject".

  • improc2.dataNodes.ChannelStackContainer contains information necessary to find the raw image associated to a particular channel, as well as fields into which the raw image data itself can be inserted. As a convenience, it also contains a field into which the ROI definition can be inserted. This data is used exclusively in the dedicated nodes representing each imaging channel. In our example they are nodes "cy", "tmr", and "dapi".

  • improc2.interfaces.NodeData: All nodes except the base image object node and the dedicated channel nodes must contain data of this type. Any NodeData must at least define the types of data it depends on, descriptions of the reasons for depending on them, and a boolean flag indicating whether or not this piece of data needs to be updated. In our example, the nodes: "cy:Spots", "tmr:Spots", "nuclearMask", "cy:threshQC", "colocalization", and "tmr:threshQC" all contain NodeData in their data field.

improc2.interfaces.ProcessedData is a subtype of NodeData for data that represents and stores the results of computer image processing. In addition to defining all what a NodeData must define, ProcessedData must also define what it means to "run" that particular image processing step. In our example, the data in the nodes: "cy:Spots", "tmr:Spots", "nuclearMask", and "colocalization" are all ProcessedData, but "cy:threshQC" and "tmr:threshQC" are simply NodeData since they store the results of manual human inspection.

A description of how to define your own data and custom image processing steps is found in a separate page.

In the current implementation of rajlabimagetools, these data structures are saved to disk as Matlab objects and are as such directly accessible to any user. However, we would strongly discourage direct manipulation of the graph. Instead, restrict yourself to using the tools we provide, described below.

Low level tools to manipulate the data structure

Creating the data structure

improc2.buildImageObject creates bare Image Objects we use, with just a root imageObject node and the dedicated channel nodes, given an ROI mask and information necessary to find the corresponding raw images (the directory and the identifier for the field-of-view). More commonly use:

#!matlab

>> improc2.segmentGUI.SegmentGUI

within a directory that contains images to launch a GUI that lets you define ROIs, and creates and saves bare Image Objects.

Image Object Tools

Once you a collection of image objects has been created on disk, there are a set of low level tools that can be used to manipulate this collection. Launch them by navigating to a directory and running:

#!matlab
>> tools = improc2.launchImageObjectTools();
>> tools

tools = 

              navigator: [1x1 improc2.utils.ImageObjectArrayCollectionNavigator]
               iterator: [1x1 improc2.ImageObjectIterator]
           objectHandle: [1x1 improc2.dataNodes.HandleToGraphBasedImageObject]
          dataRegistrar: [1x1 improc2.dataNodes.ProcessorRegistrarForGraphBasedImageObject]
            annotations: [1x1 improc2.utils.UISynchronizedNamedValuesAndChoices]
    annotationItemAdder: [1x1 improc2.utils.TypeCheckedItemCollectionExtender]
                refresh: @(varargin)navigator.discardUnsavedChangesAndReload(varargin{:})

The tools manage the collection of ImageObjects associated to all ROIs in all fields-of-view stored in the current directory. There are two types of tools:

  • Tools that manipulate, extract or display the data in the current ImageObject:
tools.objectHandle
tools.dataRegistrar
tools.annotations
tools.annotationItemAdder
  • navigation tools that change which is the current ImageObject and save changes in them to disk:
tools.navigator
tools.iterator
tools.refresh

Navigating an Image Object collection

Organization of the collection

There is a hierarchy of places where the Image Object live, with the on-disk saved data on one extreme, and the object currently available for manipulation on the other:

  • The full collection: A version of all the image objects from every field of view and every ROI stored on disk. In many of the unit tests included with rajlabimagetools, this on-disk collection is simulated by a full collection stored in memory.

  • The current field of view or "current Array": The navigation tools load all the image objects from a particular field of view into memory. This subset of image objects is referred to as the "currentArray" but may also be referred to as the "current File".

  • The current object: The navigation tools copies one of the image objects in the current Array. It is this copy of the one Image Object (the "current Object") that is manipulated by all the manipulation tools.

The navigator

Like many of the other image tools, simply typing the navigator's full variable name into the command line prints a description of what all it can do.

#!matlab
>> tools.navigator

ans = 

* Class:
    improc2.utils.ImageObjectArrayCollectionNavigator
* Properties:
    currentArrayNum                currentObjNum                  
    numberOfArrays                 numberOfObjectsInCurrentArray  
* Methods:
    addActionAfterMoveAttempt(p, handleToObject, funcToRunOnIt)
    addActionAfterMovingToNewArray(p, handleToObject, funcToRunOnIt)
    addActionBeforeMoveAttempt(p, handleToObject, funcToRunOnIt)
    discardUnsavedChangesAndReload(p)
    disp(p)
    saveIfNeedsSave(p)
    tryToGoToArray(p, requestedArrayNum)
    tryToGoToNextObj(p)
    tryToGoToObj(p, requestedObj)
    tryToGoToPrevObj(p)
(...)

The properties currentArrayNum and currentObjNum are the numeric index of the current array (field of view) and the index of the current object, within that array. For casual browsing the most important methods are:

  • tryToGoToNextObj and tryToGoToPrevObj. These methods switch the current object to being the next or previous object, respectively. If the current object is already at the end of the current Array, attempting to go to the next object will make the navigator try to go to the first object of the next non-empty array. Similarly, it attempts to go to the last object of the previous non-empty array if you request previous object but are at the beginning of an array. If you request the next object but there isn't a next object because you are at the end of the entire collection the current object remains what it currently is without producing any error.

  • tryToGoToObj(requestedObjNumber). Call this method to change the current object to being any other object within the current array.

  • tryToGoToArray(requestedArrayNum). Call this method to switch to any other array within your collection. However, if the requested array is empty, the navigator will first try to go to the next nonempty array, and if that fails, to the previous nonempty array.

The best place to see exactly how the navigator handles edge cases is by reading and executing the navigator's unit test, located at:

#!matlab
>> edit improc2.utils.ImageObjectArrayCollectionNavigator_test

When is data saved?

Recall that there is a three level hierarchy: the full collection, the current array, and the current object.

Changes in the current object are saved to the current array when the user of the tools attempts to change which object is the current object. This happens any time any of the four "tryToGo" methods are called. Even if you call tryToGoToObj to go to the object that is already the current object.

More importantly, changes in the current array are saved to disk right before the navigator attempts to move to a different array. This occurs on any call to tryToGoToArray, even if you request the array that is already the current array. Saving the array to disk also occurs on any call to tryToGoToNextObj or tryToGoToPrevObj when you are the end or beginning of the current array, since this makes the navigator attempt to switch to a different array. Saving the current array to disk happens after saving changes in the current object to the current array.

You can also force the changes in the current object and array to be saved to disk by calling:

#!matlab
>> tools.navigator.saveIfNeedsSave()

which contrary to its name always saves the current array to disk.

Discarding changes.

To discard any changes to the current object or current array and reload the copy that is stored on disk, call:

#!matlab
>> tools.navigator.discardUnsavedChangesAndReload()

Alternatively, you can call tools.refresh() as a shortcut to achieve the same purpose.

Adding events.

When building a larger application such as a GUI using these tools, it is often necessary to perform certain actions when users navigate between objects, such as refreshing windows or setting a flag that the object has been manually reviewed.

#!matlab
tools.navigator.addActionAfterMoveAttempt(...)
tools.navigator.addActionAfterMovingToNewArray(...)
tools.navigator.addActionBeforeMoveAttempt(...)

Each method takes as its first argument a reference to a matlab handle object,and as its second argument a handle to a method or function to run on it. In a typical situation you may have an object windowDisplayer that controls drawing a window through its method windowDisplayer.redraw(). To have this redraw happen automatically whenever the current object is switched, call:

#!matlab
>> tools.navigator.addActionAfterMoveAttempt(windowDisplayer, @redraw)

Iterating through an Image Object Collection

For batch processing it is necessary to repeat some manipulation for every image object in the collection. The iterator makes that simple. Every use of the iterator must follow this template:

#!matlab
tools.iterator.goToFirstObject()

while tools.iterator.continueIteration
    (... manipulate the current object ...)
    tools.iterator.goToNextObject()
end

This will process every Image Object in the entire collection.

Advanced usage notes: Mixing iterator and navigator use. It is rare that an application would need to mix use of the iterator and the navigator. But if you do, it is important to keep in mind that an instance of the "tools" only tracks one current object and one current array. So if you goToFirstObject in the iterator, the navigator is also moved to the first object in the first array of the collection. The rules for when data gets saved are identical to the navigator. Anytime the iterator attempts to change the current Array, the existing array's data is saved to disk. (In fact, the iterator is simply a wrapper that requests the navigator to step through every object from first to last.)

Adding data to the Image Object graph

The Data Registrar is the tool to add image processing data, both automated (ProcessedData) and manual (only NodeData), to the Image Object. Do so by calling the registerNewData method:

#!matlab
>> tools.dataRegistrar.registerNewData(dataToAdd, parentNodeLabels, newNodeLabel)
  • dataToAdd : An object of type improc2.interfaces.NodeData

  • parentNodeLabels : A cell array of labels of existing nodes indicating where to find the dependencies of the data the user wishes to add. If your data does not depend on any other nodes this will be an empty cell array.

  • newNodeLabel : A string that will be the label of the new data node.

There is some flexibility in how the parentNodeLabels can be specified. For details and examples read and execute the unit test:

#!matlab
>> edit improc2.dataNodes.ProcessorRegistrarForGraphBasedImageObject_test

You can ensure that your new node was added either visually by running:

#!matlab
>> tools.objectHandle.view();

or programmatically by examining whether

#!matlab
>> tools.objectHandle.hasData(newNodeLabel)

returns true.

Getting data from the Image Object graph

To get data from a node in the Image Object graph use:

#!matlab
>> data = tools.objectHandle.getData(nodeLabel, ...optionalArgs...)

If called with just the nodeLabel argument, it will fetch the data in the node with exactly that name, as long as that node contains NodeData. However, if you specify as nodeLabel the name of a dedicated channel node or the imageObject root node, the getter will attempt to fetch the data in a child node of that channel that contains ProcessedData. If more than one ProcessedData child is found, the getter returns the data with the least degrees of separation (shallowest) from the starting node. If there is more than one such shallowest node, the getter fails and throws an error.

If called with a nodeLabel first argument, and a string corresponding to a class name as the second argument, the getter will start at the node with label nodeLabel and search it and its descendents for a node with data of the type specified in the class name argument. The getter finds the shallowest match or fails if there is more than one shallowest match.

For details and examples, read and execute the unit test:

#!matlab
>> edit improc2.dataNodes.HandleToGraphBasedImageObject_test

Setting data in an existing node of the Image Object graph

Suppose you've taken data out of a node with the getter and modified it in some way. To put the modified data back in the node use:

#!matlab
>> tools.objectHandle.setData(modifiedData, nodeLabel, ...optionalArgs...)

The setter uses the same logic as the getter to figure out which node to work on using the nodeLabel and the class name optional argument.

The modified data must be of the same class as the data that is already in the node or the setter will fail with an error.

More importantly, any time the data in a node is set, all of its descendent nodes will be flagged for updating. All NodeData has a needsUpdate field. This field will be set to true in all descendents of the modified node.

For details and examples, read and execute the unit test:

#!matlab
>> edit improc2.dataNodes.HandleToGraphBasedImageObject_test

Running an automated image processing step.

If a node stores the results of automated image processing (has data of type ProcessedData), that image processing step can be run as follows:

>> tools.objectHandle.runProcessor(imageProviders, nodeLabel, ...optionalArgs...)

The nodeLabel and optional arguments are used in exactly the same way as in the getter and setter to determine which node the user wants to run.

The imageProviders should be a ChannelArray of image providers, one for each channel, that are used to load raw images from disk into any ChannelStackContainer that may be needed to run the image processing step. They can be instantiated as follows:

#!matlab
>> imageProviders = dentist.utils.makeFilledChannelArray(...
    tools.objectHandle.channelNames, ...
    @(channelName) improc2.ImageObjectCroppedStkProvider(pathToImages));

If running the image processing step does not require raw images, the imageProviders input is ignored (but you must still supply something, imageProviders = [] will suffice).

When you run an image processing step, the tool does the following:

  1. Fetches the node to run.

  2. Finds all immediate parents of the node and fetches their data. If any of these are flagged as themselves needing an update, the tool fails with an error.

  3. If any of the parent data is a ChannelStackContainer, the tool fills it with the cropped image using the user-supplied image Providers, and also fills in the cropped ROI mask field.

  4. It calls the run method of the data to process, passing in all of its parents' data as additional arguments, and obtains a new, now-processed, data object.

  5. It flags the newly processed data as already updated. (It sets needsUpdate = false)

  6. It replaces the newly processed data back into the node it came from.

  7. Flags all descendents of the newly processed data node for updating.

As before, for details and examples, read and execute the unit test:

#!matlab
>> edit improc2.dataNodes.HandleToGraphBasedImageObject_test

Running all processors for data that needs updating

To have the tools walk through the current Image Object graph and run every processor for data that needs updating, run:

#!matlab
>> tools.objectHandle.updateAllProcessedData(imageProviders)

The imageProviders input has the same requirements as when it is used in the runProcessor method described previously.

The updater will walk through every node in the data graph, in the order that they were added to the graph. If the node has ProcessedData and it is flagged for updating, the updater runs the corresponding processor, as long as its parent nodes are updated. This procedure guarantees that every processable node will be updated, except for the descendents of non-processable data (such as human inspection data nodes) that are themselves flagged for update. The user must update such nodes by other means, and then rerun the updateAllProcessedData method.

Other Things you can get out of the Image Object Handle

The objectHandle in the tools offers a few other potentially useful methods and properties. Type its name in the command line to get more info.

#!matlab
>> tools.objectHandle

ans = 

* Class:
    improc2.dataNodes.HandleToGraphBasedImageObject
* Properties:
    channelNames  
* Methods:
    disp(p)
    bbox getBoundingBox(p)
    objMask getCroppedMask(p)
    [pData, foundNodeLabel] getData(p, nodeLabel, dataClassName)
    dirPath getImageDirPath(p)
    fileName getImageFileName(p, channelName)
    imFileMask getMask(p)
    metadata getMetaData(p)
    boolean hasData(p, nodeLabel, dataClassName)
    runProcessor(p, imageProviderChannelArray, nodeLabel, dataClassName)
    setData(p, pData, nodeLabel, dataClassName)
    updateAllProcessedData(p, imageProviderChannelArray)
    h view(p)

Some of the properties and methods not covered above include:

#!matlab
>> channelNames = tools.objectHandle.channelNames;

returns a cell array of the names of the image channels associated with this image object.

#!matlab
>> imFileMask = tools.objectHandle.getMask();
>> objMask = tools.objectHandle.getCroppedMask();
>> bbox = tools.objectHandle.getBoundingBox();

The first method returns the binary mask that defines this ImageObject's ROI within the full field of view of the images it comes from. The second method returns a cropped version of the ROI. The last method returns an array of four numbers defining the cropping rectangle.

#!matlab
>> metaData = tools.objectHandle.getMetaData();

returns the Image Object's "meta data" which for the moment has no specified format or contents.

#!matlab
>> dirPath = tools.objectHandle.getImageDirPath();
>> fileName = tools.objectHandle.getImageFileName(channelName);

Upon creating an ImageObject, it stores the directory path of the raw images that it is derived from, and the filenames of the raw images for each imaging channel. The above two methods allow user access to this path and these file names.