API reference
celltraj.trajectory
Trajectory modeling class for live-cell imaging.
- class celltraj.trajectory.Trajectory(h5filename=None, data_list=None)[source]
Bases:
object
A toolset for single-cell trajectory modeling. See:
References
Copperman, Jeremy, Sean M. Gross, Young Hwan Chang, Laura M. Heiser, and Daniel M. Zuckerman. “Morphodynamical cell state description via live-cell imaging trajectory embedding.” Communications Biology 6, no. 1 (2023): 484.
Copperman, Jeremy, Ian C. Mclean, Sean M. Gross, Young Hwan Chang, Daniel M. Zuckerman, and Laura M. Heiser. “Single-cell morphodynamical trajectories enable prediction of gene expression accompanying cell state change.” bioRxiv (2024): 2024-01.
- __init__(h5filename=None, data_list=None)[source]
Initializes a Trajectory object, optionally loading metadata and additional data from an HDF5 file.
This constructor sets the HDF5 filename and attempts to load metadata associated with the file. If the file is present, it reads the metadata from a predefined group. If data_list is provided, it will also attempt to load additional data specified in the list from the HDF5 file. Errors during metadata or data loading are caught and logged. Future updates should include better commenting and organizational improvements of class attributes.
- Parameters:
h5filename (
str
, optional) – The path to the HDF5 file from which to load the metadata. If not provided, the instance will be initialized without loading metadata.data_list (
list
ofstr
, optional) – A list of data group paths within the HDF5 file to be loaded along with the metadata. Each entry in the list should specify a path to a dataset or group within the HDF5 file that contains data relevant to the trajectory analysis.
Notes
TODO: - Improve documentation of class attributes. - Reorganize attributes into a more meaningful structure.
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') loading path/to/your/hdf5file.h5
If an HDF5 file and data list are provided: >>> data_groups = [‘/group1/data’, ‘/group2/data’] >>> traj = Trajectory(‘path/to/your/hdf5file.h5’, data_list=data_groups) loading path/to/your/hdf5file.h5
- get_Xtraj_celltrajectory(cell_traj, Xtraj=None, traj=None)[source]
Retrieves trajectory segments for a specific cell trajectory from a larger set of trajectory data. This method matches segments of the cell trajectory with those in a pre-computed set of trajectories and extracts the corresponding features or data points.
- Parameters:
cell_traj (
ndarray
) – An array containing indices of a cell’s trajectory over time.Xtraj (
ndarray
, optional) – The trajectory feature matrix from which to extract data. If not provided, the method uses the instance’s attribute Xtraj.traj (
ndarray
, optional) – A matrix of precomputed trajectories used for matching against cell_traj. If not provided, the method uses the instance’s attribute traj.
- Returns:
xt (
ndarray
) – A subset of Xtraj corresponding to the segments of cell_traj that match segments in traj.inds_traj (
ndarray
) – Indices within traj where matches were found, indicating which rows in Xtraj were selected.
- Raises:
ValueError – If the length of cell_traj is less than the length used for trajectories in traj (trajl), making it impossible to match any trajectory segments.
Examples
>>> traj.get_unique_trajectories() >>> cell_trajectory = traj.get_cell_trajectory(10) >>> features, indices = traj.get_Xtraj_celltrajectory(cell_trajectory)
Notes
The method requires trajl, the length of the trajectory segments, to be set either as a class
attribute or passed explicitly. This length determines how the segments are compared for matching. - This function is particularly useful for analyzing time-series data or features extracted from trajectories, allowing for detailed analysis specific to a single cell’s path through time.
- get_alpha(i1, i2)[source]
Calculates the alignment measure, alpha, between two cells identified by their indices. This measure reflects how the movement direction of one cell relates to the direction of the vector connecting the two cells, essentially quantifying the relative motion along the axis of separation.
- Parameters:
i1 (
int
) – Index of the first cell.i2 (
int
) – Index of the second cell.
- Returns:
alpha – The alignment measure between the two cells. This value ranges from -1 to 1, where 1 indicates that the cells are moving directly towards each other, -1 indicates they are moving directly away from each other, and 0 indicates orthogonal movement directions. Returns NaN if the calculation fails (e.g., due to division by zero when normalizing zero-length vectors).
- Return type:
float
- Raises:
Exception – If an error occurs during the trajectory retrieval or normalization process, likely due to missing data or incorrect indices.
Examples
>>> traj = Trajectory('path/to/data.h5') >>> alignment = traj.get_alpha(10, 15) # This computes the alignment measure between cells at index 10 and 15 based on their last movements.
Notes
The function computes the movement vectors of both cells from their previous positions in their
respective trajectories and uses these vectors to determine their alignment relative to the vector connecting the two cells at their current positions.
- get_beta(i1, i2)[source]
Calculates the cosine of the angle (beta) between the movement directions of two cells. This measure quantifies the directional similarity or alignment between two moving cells, with values ranging from -1 to 1.
- Parameters:
i1 (
int
) – Index of the first cell.i2 (
int
) – Index of the second cell.
- Returns:
beta – The cosine of the angle between the movement vectors of the two cells, indicating their directional alignment. A value of 1 means the cells are moving in exactly the same direction, -1 means they are moving in exactly opposite directions, and 0 indicates orthogonal movement directions. Returns NaN if the calculation fails, typically due to a division by zero when attempting to normalize zero-length vectors.
- Return type:
float
- Raises:
Exception – If an error occurs during the trajectory retrieval or normalization process, likely due to missing data or incorrect indices.
Examples
>>> alignment = traj.get_beta(10, 15) # This computes the directional alignment between cells at index 10 and 15 based on their last movements.
Notes
The function calculates movement vectors for both cells from their positions at the last two time points
in their trajectories. It then computes the cosine of the angle between these vectors as the beta value, providing an indication of how parallel their movements are.
- get_border_properties_dict(iS, cell_states=None, secretion_rates=None, surfaces=None, surface_fmask_channels=None, surface_states_baseid=1000, add_label0_surf=True, make_cells_and_surfaces_disjoint=True, border_scale=None, border_resolution=None, vdist_scale=0.3, radius=2.0, order=1, visual=False, z_viz=None)[source]
Calculate border properties for cells and surfaces at a given frame in a trajectory.
This function extracts data required to characterize the borders of cells within a specified frame (iS) in a 3D or 2D cell image sequence. The borders are characterized based on geometric and physical properties, with optional visualization and multipole moment expansion up to the specified order. Supports the inclusion of surface contacts, tracking cell borders over time, and calculating secreted ligand distributions.
- Parameters:
iS (
int
) – Frame index within the trajectory for which to calculate border properties.cell_states (
ndarray
, optional) – Array of states for each cell in the current frame. If not provided, all cells are assigned state 1 by default.secretion_rates (
ndarray
, optional) – Array of secretion rates for each cell, used to calculate secreted ligand distribution along cell borders. If not provided, no secretion is considered.surfaces (
list
ofndarrays
, optional) – List of binary masks for surfaces adjacent to cells (e.g., membranes or other boundaries). Surfaces are optionally assigned states via surface_fmask_channels.surface_fmask_channels (
list
ofint
, optional) – List of channels in the frame data specifying each surface’s mask. Used if surfaces is None. Each channel mask is assigned a state from surface_states_baseid onward.surface_states_baseid (
int
, optional) – Base state ID for assigning states to surfaces. Defaults to 1000.add_label0_surf (
bool
, optional) – If True, includes cells labeled as 0 (background) as a surface layer. Default is True.make_cells_and_surfaces_disjoint (
bool
, optional) – If True, cells and surfaces are treated as mutually exclusive regions. Default is True.border_scale (
float
orlist
offloat
, optional) – Scale for border calculations. If None, it defaults to the border_resolution attribute if available.border_resolution (
float
, optional) – Resolution of the border calculations in units of pixels. If None, the value from the object’s border_resolution attribute is used if present.vdist_scale (
float
, optional) – Scale factor for calculating the secreted ligand distribution. Default is 0.3.radius (
float
, optional) – Radius for identifying contact neighbors when calculating border properties. Default is 2.0.order (
int
, optional) – Order of multipole moments to compute for cell border properties. Default is 1.visual (
bool
, optional) – If True, generates a visualization of the borders, including displacements from the previous and next frames.z_viz (
int
, optional) – Specific z-plane to visualize in 3D data. If None, a central z-plane is chosen.
- Returns:
border_dict – Dictionary containing computed properties for cell borders, including:
pts: Coordinates of each border point.
index: Cell IDs associated with each border point.
nn_states: States of neighboring cells at each border point.
c: Curvature at each border point.
global_index: Global indices for each border point in the dataset.
border_dx_prev, border_dx_next: Displacement vectors for borders to previous and
next frames, if applicable. - contact_properties: Multipole moment properties based on cell-state contacts. - vdist: Secreted ligand density distribution, if secretion_rates is provided.
- Return type:
dict
Notes
Tracking: If previous (cell_labels_prev) or next frame (cell_labels_next) is provided,
the function calculates the optimal transport vector between consecutive frames for each cell. - Surface Assignment: When surface_fmask_channels is provided, each channel is used to define a distinct surface mask, and surface_states are assigned based on surface_states_baseid. - Multipole Moments: Higher-order moments are calculated based on border charges determined from neighboring cell states.
Examples
Compute border properties for frame 10 with secreted ligand distribution and visualize:
>>> border_properties = self.get_border_properties_dict( ... iS=10, cell_states=my_cell_states, secretion_rates=my_secretion_rates, ... surface_fmask_channels=[0, 1], border_resolution=1.0, vdist_scale=0.5, order=2, ... visual=True, z_viz=15 ... )
- get_cell_blocks(label, return_label_ids=False)[source]
Extracts bounding box information for each cell from a labeled mask image. This function returns the minimum and maximum indices for each labeled cell, useful for operations such as cropping around a cell or analyzing specific cell regions. The function supports both 2D and 3D labeled images.
- Parameters:
label (
ndarray
) – A labeled image array where each unique non-zero integer represents a unique cell.- Returns:
cellblocks – An array containing the bounding boxes for each cell. The array has shape (number_of_labels, number_of_dimensions, 2), where each cell’s bounding box is represented by [min_dim1, min_dim2, …, max_dim1, max_dim2, …].
- Return type:
ndarray
Examples
>>> label_image = np.array([[0, 0, 1, 1], [0, 2, 2, 1], [2, 2, 2, 0]]) >>> blocks = traj.get_cell_blocks(label_image) >>> blocks.shape (2, 2, 2) # Example output shape for a 2D label image with two labels.
- get_cell_channel_crosscorr(indcells=None, mskchannel=None, imgchannel1=None, imgchannel2=None, save_h5=False, overwrite=False)[source]
Computes the cross-correlation between two image channels within labeled cells, using masks defined by a specific mask channel. This method is particularly useful for analyzing the relationship between different signal channels at a cellular level.
- Parameters:
indcells (
ndarray
ofint
, optional) – Indices of cells for which to calculate cross-correlations. If None, calculations are performed for all indexed cells.mskchannel (
int
) – Mask channel used to define cellular regions.imgchannel1 (
int
) – First image channel for correlation analysis.imgchannel2 (
int
) – Second image channel for correlation analysis.save_h5 (
bool
, optional) – If True, saves the calculated cross-correlations to an HDF5 file under the specified directory and file names.overwrite (
bool
, optional) – If True and save_h5 is True, existing data in the HDF5 file will be overwritten.
- Returns:
corrc – Array of cross-correlation coefficients for each cell. The length of the array corresponds to the number of cells specified by indcells.
- Return type:
ndarray
- Raises:
ValueError – If required parameters are not set or if no cell index is available, prompting the user to set necessary parameters or perform required prior steps.
Examples
>>> corrc = traj.get_cell_channel_crosscorr(indcells=[1,2,3], mskchannel=0, imgchannel1=1, imgchannel2=2)
- get_cell_children(icell)[source]
Get the child cells for a given cell in the next frame.
This function identifies the child cells of a given parent cell icell by tracking the lineage data across consecutive frames. The lineage is determined from the parent cell’s index and the lineage set for the next frame.
- Parameters:
icell (
int
) – The index of the cell for which to find the children.- Returns:
ind_children – An array of indices representing the child cells of the given cell icell in the next frame.
- Return type:
ndarray
Notes
The function looks at the current frame of icell and identifies its child cells in the subsequent frame
using the lineage tracking set (linSet). - This method assumes that the lineage set (linSet) and cell indexing (cells_indimgSet, cells_indSet) are properly initialized and populated.
Examples
>>> cell_index = 10 >>> children = model.get_cell_children(cell_index) >>> print(f'Children of cell {cell_index}: {children}')
- get_cell_compartment_ratio(indcells=None, imgchannel=None, mskchannel1=None, mskchannel2=None, fmask_channel=None, make_disjoint=True, remove_background_perframe=False, fmask_channel_background=0, background_percentile=1, erosion_footprint1=None, erosion_footprint2=None, combined_and_disjoint=False, intensity_sum=False, intensity_ztransform=False, noratio=False, inverse_ratio=False, save_h5=False, overwrite=False)[source]
Calculates the ratio of features between two cellular compartments, optionally adjusted for image intensity and morphology transformations. This method allows for complex comparisons between different mask channels or modified versions of these channels to derive cellular compartment ratios.
- Parameters:
indcells (
ndarray
ofint
, optional) – Indices of cells for which to calculate the feature ratios.imgchannel (
int
, optional) – Index of the image channel used for intensity measurements.mskchannel1 (
int
, optional) – Mask channel index for the numerator in the ratio calculation.mskchannel2 (
int
, optional) – Mask channel index for the denominator in the ratio calculation. Overlaps with mskchannel1 are removed.fmask_channel (
int
, optional) – Mask channel index used to adjust mskchannel2 if no separate mskchannel2 is provided.make_disjoint (
bool
, optional) – If True, ensures that masks from mskchannel1 and mskchannel2 do not overlap by adjusting mskchannel1.erosion_footprint1 (
ndarray
, optional) – Erosion footprint for the first mask, modifies the mask by eroding it before calculations.erosion_footprint2 (
ndarray
, optional) – Erosion footprint for the second mask, modifies the mask by eroding it before calculations.combined_and_disjoint (
bool
, optional) – If True, combines and then separates the masks to only include disjoint areas in calculations.intensity_sum (
bool
, optional) – If True, sums the intensity over the area rather than averaging it, before ratio calculation.intensity_ztransform (
bool
, optional) – If True, applies a z-transformation based on standard deviations and means stored in the object.noratio (
bool
, optional) – If True, returns only the numerator intensity mean without forming a ratio.inverse_ratio (
bool
, optional) – If True, calculates the inverse of the normal ratio.save_h5 (
bool
, optional) – If True, saves the calculated ratios to an HDF5 file.overwrite (
bool
, optional) – If True and save_h5 is also True, overwrites existing data in the HDF5 file.
- Returns:
cratio (
ndarray
) – Array of calculated compartment ratios for each specified cell.feature_list (
ndarray
ofstr
, optional) – Descriptions of the cell features, returned if return_feature_list is set to True.
Examples
>>> cratio = traj.get_cell_compartment_ratio(indcells=[1,2,3], imgchannel=0, mskchannel1=1, mskchannel2=2)
- get_cell_data(ic, frametype='boundingbox', boundary_expansion=None, return_masks=True, relabel_masks=True, relabel_mskchannels=None, delete_background=False)[source]
Retrieves image and mask data for specific cells based on various configuration options. This method can extract data for single cells, their neighborhoods, or connected cell groups, and offers options to expand the extraction region, relabel masks, and more.
- Parameters:
ic (
int
orlist
ofint
) – Cell ID(s) for which to retrieve data. Can specify a single cell or a list of cells.frametype (
str
, optional) – Type of frame data to retrieve; options include ‘boundingbox’, ‘neighborhood’, or ‘connected’. Default is ‘boundingbox’.boundary_expansion (
ndarray
orint
, optional) – Array specifying how much to expand the bounding box around the cell in each dimension.return_masks (
bool
, optional) – Whether to return the mask data along with the image data. Default is True.relabel_masks (
bool
, optional) – Whether to relabel mask data with movie cell indices. Default is True.relabel_mskchannels (
array
orlist
, optional) – Specifies the mask channels to relabel. If not set, uses the default mask channel.delete_background (
bool
, optional) – If set to True, sets label and image pixels outside the specified cell set to zero. Default is False.
- Returns:
imgc (
ndarray
) – The image data for the specified cell(s) as a NumPy array.mskc (
ndarray
, optional) – The mask data for the specified cell(s) as a NumPy array, returned if return_masks is True.
- Raises:
ValueError – If cells from multiple frames are requested or necessary attributes are not set.
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> img_data, mask_data = traj.get_cell_data(5, frametype='boundingbox', boundary_expansion=5, return_masks=True)
- get_cell_features(function_tuple, indcells=None, imgchannel=0, mskchannel=0, use_fmask_for_intensity_image=False, fmskchannel=None, use_mask_for_intensity_image=False, bordersize=10, apply_contact_transform=False, preprocess_functions=None, return_feature_list=False, save_h5=False, overwrite=False, concatenate_features=False)[source]
Extracts complex cell features based on the specified custom functions and imaging data. This method allows customization of the feature extraction process using region properties from segmented cell data, with optional transformations like border erosion or contact transformations. Features can be based on intensity images, mask data, or specific transformations of these data.
- Parameters:
function_tuple (
callable
ortuple
ofcallables
) – Function(s) that take a mask and an intensity image as input and return a scalar or array of features. These functions must be compatible with skimage’s regionprops.indcells (
ndarray
ofint
, optional) – Array of cell indices for which to calculate features. If None, calculates for all cells.imgchannel (
int
, optional) – Index of the image channel used for intensity image feature calculation.mskchannel (
int
, optional) – Index of the mask channel used for single-cell label feature calculation.use_fmask_for_intensity_image (
bool
, optional) – If True, uses foreground mask data for intensity images.fmskchannel (
int
, optional) – Channel index for foreground mask data if used for intensity image.use_mask_for_intensity_image (
bool
, optional) – If True, uses mask data instead of image data for intensity measurements.bordersize (
int
, optional) – Pixel size for erosion of the foreground mask or growth radius for contact boundaries.apply_contact_transform (
bool
, optional) – If True, applies a contact transform to generate segmentation contacts from the mask data.return_feature_list (
bool
, optional) – If True, returns a list of strings describing the calculated features.save_h5 (
bool
, optional) – If True, saves the features and their descriptions to an HDF5 file.overwrite (
bool
, optional) – If True and save_h5 is also True, overwrites existing data in the HDF5 file.concatenate_features (
bool
, optional) – If True, adds the newly calculated features to existing features in the dataset.
- Returns:
Xf (
ndarray
) – Array of features indexed by cells. The shape is (number of cells, number of features).feature_list (
ndarray
ofstr
, optional) – Array of strings describing each feature, returned if return_feature_list is True.
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> features, feature_descriptions = traj.get_cell_features(my_feature_funcs, return_feature_list=True)
- get_cell_index(verbose=False, save_h5=False, overwrite=False)[source]
Computes indices and corresponding frame information for each cell in an image stack, capturing this data in several attributes. This method gathers extensive cell data across all frames, including frame indices, image file indices, individual image indices, and bounding boxes for each cell. This information is stored in corresponding attributes, facilitating further analysis or reference.
- Parameters:
verbose (
bool
, optional) – If True, prints detailed logging of the processing for each frame. Default is False.save_h5 (
bool
, optional) – If True, saves the computed data to an HDF5 file using the specified mskchannel. Default is False.overwrite (
bool
, optional) – If True and save_h5 is True, existing data in the HDF5 file will be overwritten. Default is False.
- Returns:
True if the computation and any specified data saving are successful, False if there is an error due to missing prerequisites or during saving.
- Return type:
bool
- cells_frameSet
Array storing the frame index for each cell, shape (ncells_total,).
- Type:
ndarray
- cells_imgfileSet
Array storing the image file index for each cell, shape (ncells_total,).
- Type:
ndarray
- cells_indSet
Array storing a unique index for each cell in the trajectory, shape (ncells_total,).
- Type:
ndarray
- cells_indimgSet
Array storing the image-specific index for each cell, shape (ncells_total,).
- Type:
ndarray
- cellblocks
Array of bounding boxes for each cell, shape (ncells_total, ndim, 2).
- Type:
ndarray
- Raises:
AttributeError – If necessary attributes (like nmaskchannels, ncells_total, mskchannel, maxFrame) are not set prior to invoking this method.
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> success = traj.get_cell_index(verbose=True, save_h5=True, overwrite=True) True
- get_cell_parents(icell)[source]
Get the child cells for a given cell in the next frame.
This function identifies the parent cells of a given child cell icell by tracking the lineage data across consecutive frames. The lineage is determined from the parent cell’s index and the lineage set for the previous frame.
- Parameters:
icell (
int
) – The index of the cell for which to find the children.- Returns:
ind_children – An array of indices representing the child cells of the given cell icell in the next frame.
- Return type:
ndarray
Notes
The function looks at the current frame of icell and identifies its child cells in the subsequent frame
using the lineage tracking set (linSet). - This method assumes that the lineage set (linSet) and cell indexing (cells_indimgSet, cells_indSet) are properly initialized and populated.
Examples
>>> cell_index = 10 >>> children = model.get_cell_children(cell_index) >>> print(f'Children of cell {cell_index}: {children}')
- get_cell_positions(mskchannel=0, save_h5=False, overwrite=False)[source]
Calculate the center of mass for cells in each frame of the mask channel and optionally save these positions to an HDF5 file. This method processes mask data to find cell positions across frames and can store these positions back into the HDF5 file associated with the Trajectory instance.
- Parameters:
mskchannel (
int
, optional) – The index of the mask channel from which to calculate cell positions. Default is 0.save_h5 (
bool
, optional) – If True, the calculated cell positions will be saved to the HDF5 file specified by h5filename. Default is False.overwrite (
bool
, optional) – If True and save_h5 is also True, existing data in the HDF5 file will be overwritten. Default is False.
- Returns:
An array of cell positions calculated from the mask channel. The shape of the array is (number of cells, number of dimensions).
- Return type:
ndarray
- Raises:
RuntimeError – If the stack has not been transformed, indicated by tf_matrix_set not being set.
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> traj.get_stack_trans() # Ensure transformation matrix is set >>> positions = traj.get_cell_positions(mskchannel=1, save_h5=True, overwrite=True) getting positions from mask channel 1, default mskchannel is 0 loading cells from frame 0 loading cells from frame 1 ...
- get_cell_sandwich(ic, msk_channel=0, boundary_expansion=None, trajl_past=1, trajl_future=1)[source]
Extracts a sequence of image and mask “sandwiches” for a given cell, including past and future frames.
This function creates a set of 2D or 3D image and mask stacks for a specified cell, tracking the cell across multiple frames into the past and future. It includes boundary expansion around the cell if specified and gathers the images and masks for the cell trajectory. The function is useful for analyzing the temporal behavior of a cell within its local neighborhood.
- Parameters:
ic (
int
) – Index of the target cell.msk_channel (
int
, optional) – Channel of the mask image where the cell is identified (default is 0).boundary_expansion (
int
orNone
, optional) – Number of pixels to expand the boundary around the cell block (default is None, no expansion).trajl_past (
int
, optional) – Number of past frames to include in the sandwich (default is 1).trajl_future (
int
, optional) – Number of future frames to include in the sandwich (default is 1).
- Returns:
imgs (
list
ofndarray
) – A list of image stacks (2D or 3D) for each frame in the trajectory sandwich.msks (
list
ofndarray
) – A list of binary masks corresponding to the same frames in imgs, where the cell and its descendants are highlighted.
Notes
The function retrieves the cell trajectory from the current frame, including both past and future cells,
as well as their children in future frames. - The images and masks are collected and returned in two separate lists, with the past, present, and future frames in sequential order. - The function supports both 2D and 3D image data based on the dimensions of the input data.
Examples
>>> imgs, msks = model.get_cell_sandwich(ic=42, boundary_expansion=10, trajl_past=2, trajl_future=2) >>> print(f'Retrieved {len(imgs)} images and masks for cell 42 across past and future frames.')
- get_cell_trajectory(cell_ind, n_hist=-1)[source]
Retrieves the trajectory of a specified cell across previous frames, tracing back from the current frame to the point of its first appearance or until a specified number of history steps.
- Parameters:
cell_ind (
int
) – The index of the cell for which to retrieve the trajectory.n_hist (
int
, optional) – The number of historical steps to trace back. If set to -1 (default), the function traces back to the earliest frame in which the cell appears.
- Returns:
cell_traj – An array of cell indices representing the trajectory of the specified cell across the tracked frames. The array is ordered from the earliest appearance to the current frame.
- Return type:
ndarray
- Raises:
IndexError – If the cell index provided is out of the bounds of the available data.
ValueError – If the provided cell index does not correspond to any tracked cell, possibly due to errors in lineage tracking.
Examples
>>> cell_trajectory = traj.get_cell_trajectory(10) >>> print(cell_trajectory) [23, 45, 67, 89] # Example output, actual values depend on cell tracking results.
Notes
The trajectory is computed by accessing the lineage data (linSet), which must be computed beforehand via methods such as get_lineage_btrack. Each index in the resulting trajectory corresponds to a position in previous frames where the cell was identified, stepping backwards until the cell’s first detection or the limit of specified history steps.
- get_cellboundary_library(frames=None, indcells=None, secretion_rates=None, cell_states=None, surface_fmask_channels=[0], surface_states_baseid=None, border_resolution=1.0, vdist_scale=0.3, visual=False, save_h5=False, overwrite=False, **border_property_args)[source]
Collect and create a library of boundary properties for cells across specified frames in a trajectory.
This function gathers cell and surface boundary properties across frames or specific cell indices. It collects properties such as position, displacement, neighboring point states, curvatures, and secreted ligand distributions, if applicable. This data is returned in a library format and optionally saved to an HDF5 file for later use.
- Parameters:
frames (
list
ofint
, optional) – Frame indices to include in the boundary library. If None, includes all frames in cells_frameSet.indcells (
ndarray
, optional) – Array of specific cell indices to include in the library. If None, all cells are included.secretion_rates (
ndarray
, optional) – Array of secretion rates for each cell, used to calculate ligand distribution along borders.cell_states (
ndarray
, optional) – Array of cell states for each cell. If not provided, cells are assigned state 1.surface_fmask_channels (
list
ofint
, default[0]
) – List of channels indicating each surface’s mask in the frame data.surface_states_baseid (
int
, optional) – Starting ID for surface states. Defaults to one more than the maximum cell_states.border_resolution (
float
, optional) – Resolution for border properties in units of pixels. Default is 1.0.vdist_scale (
float
, optional) – Scale for secreted ligand distribution. Default is 0.3.visual (
bool
, optional) – If True, displays a visualization of the boundaries. Default is False.save_h5 (
bool
, optional) – If True, saves the resulting boundary library to an HDF5 file. Default is False.overwrite (
bool
, optional) – If True, overwrites the HDF5 data file if it exists. Default is False.**border_property_args – Additional arguments passed to the get_border_properties_dict function.
- Returns:
boundary_library – Dictionary containing collected boundary properties for the specified cells and frames:
global_index: Global cell indices for each boundary point.
states: Cell or surface state at each boundary point.
pts: Coordinates of each boundary point.
dx_prevs, dx_nexts: Displacement vectors for boundaries to previous and next frames.
surface_normals: Surface normal vectors at each boundary point.
curvatures: Curvature at each boundary point.
nn_pts: Coordinates of neighboring points.
nn_states: States of neighboring points.
nn_pts_states: States of neighboring points with finer resolution.
surf_vars: Variance of surface displacement in the previous frame.
vdists: Secreted ligand density distribution, if secretion_rates is provided.
- Return type:
dict
Notes
Frames and Cell Selection: If both frames and indcells are provided, only frames derived from
indcells are used. - Surfaces and States: surface_fmask_channels and surface_states_baseid define and initialize surfaces in each frame, using additional states based on surface_states_baseid. - Ligand Distribution: Secreted ligand density distribution (vdists) is included if secretion_rates is specified.
Examples
Generate a cell boundary library for frames 10, 15, and 20, with visualization enabled:
>>> boundary_library = self.get_cellboundary_library( ... frames=[10, 15, 20], secretion_rates=my_secretion_rates, cell_states=my_cell_states, ... surface_fmask_channels=[0, 1], visual=True, border_resolution=2.0 ... )
Save the boundary library to HDF5 with overwrite enabled:
>>> boundary_library = self.get_cellboundary_library( ... indcells=my_indcells, save_h5=True, overwrite=True ... )
- get_cells_nchildren()[source]
Compute the number of children for each cell across all frames.
This function calculates the number of child cells each parent cell has across consecutive time frames. The result is an array where each element corresponds to the number of child cells for a given parent cell in the next frame. Cells with no children will have a count of 0.
- Returns:
cells_nchildren – An array where each element represents the number of child cells for each cell in the dataset. The indices correspond to the cell indices in self.cells_indSet.
- Return type:
ndarray
Notes
The function iterates through all time frames (nt) to determine the lineage of each cell
using the linSet attribute, which holds the lineage information between frames. - Cells that are not tracked between frames (i.e., not assigned a child in the next frame) will have a count of 0 children. - The method assumes that linSet, cells_indSet, and cells_indimgSet are properly initialized and populated.
Examples
>>> cell_children_counts = model.get_cells_nchildren() >>> print(f'Number of children for each cell: {cell_children_counts}')
- get_dx(i1)[source]
Calculates the displacement vector of a cell between its current position and its previous position in the trajectory. This vector represents the movement of the cell between two consecutive time points.
- Parameters:
i1 (
int
) – Index of the cell for which to calculate the displacement.- Returns:
dx1 – A vector representing the displacement of the cell. The vector is given in the coordinate space of the cell positions. If the calculation fails (e.g., due to missing data), returns a vector of NaNs.
- Return type:
ndarray
- Raises:
Exception – If an error occurs during the trajectory retrieval or calculation, typically due to missing data or incorrect indices.
Examples
>>> displacement = traj.get_dx(10) # This calculates the displacement vector for the cell at index 10 between its current and previous positions.
Notes
The method attempts to retrieve the last position from the cell’s trajectory using get_cell_trajectory.
If the cell’s trajectory does not have a previous position or the data is missing, the displacement vector will contain NaN values to indicate the failure of the calculation.
- get_fmask_data(n_frame, channel=None)[source]
Retrieve the foreground mask data for a specific frame using different methods depending on the set attributes. This method determines the foreground mask either by selecting a specific mask channel (fmskchannel), by applying a threshold to an image channel (fmsk_threshold and fmsk_imgchannel), or directly from specified channels in the HDF5 file (fmask_channels).
- Parameters:
n_frame (
int
) – The frame number from which to retrieve the foreground mask data.channel (
int
, optional) – The specific channel to use when fmask_channels attribute is set. If not provided, the default ‘foreground’ channel is used if available.
- Returns:
fmsk (
ndarray
,bool
) – The foreground (cells) / background mask array, indicating cell locations as True and background as False.Methods for Determining `fmsk`
------------------------------
1. If `fmskchannel
is set`- The method uses the specified channel from the mask data (not fmask data).
2. If `fmsk_threshold`
and fmsk_imgchannel are set- The method thresholds the image data at the specified channel using the given threshold.
3. If `fmask_channels
is set` andthe channel parameter is provided
ora default is available
- The method retrieves the mask from the specified
or defaultchannel in the `fmsk
dataset.`
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> fmask_data = traj.get_fmask_data(5) >>> fmask_data.shape (1024, 1024) # Example shape, actual may vary.
- get_frames()[source]
Scans an HDF5 file for image and mask datasets to determine the total number of frames, images per frame, and the total number of cells across all frames. This method updates the instance with attributes for the number of images, the maximum frame index, and the total cell count. It handles multiple mask channels by requiring the mskchannel attribute to be set if more than one mask channel exists.
- Returns:
Returns True if the frames were successfully scanned and the relevant attributes set. Returns False if the HDF5 file is not set, no image data is found, or if there are multiple mask channels but mskchannel is not specified.
- Return type:
bool
- Raises:
AttributeError – If mskchannel needs to be specified but is not set when nmaskchannel is greater than zero.
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> success = traj.get_frames() True
- get_image_data(n_frame)[source]
Retrieve the image data for a specified frame from the HDF5 file associated with this instance. This method accesses the HDF5 file, navigates to the specific dataset for the given frame, and extracts the image data.
- Parameters:
n_frame (
int
) – The frame number from which to retrieve image data.- Returns:
img – The image data as a NumPy array. The shape and type of the array depend on the structure of the image data in the HDF5 file (e.g., may include dimensions for channels or z-stacks).
- Return type:
ndarray
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> image_data = traj.get_image_data(5) >>> image_data.shape (1024, 1024, 3) # Example shape, actual may vary.
- get_image_shape(n_frame=0)[source]
Determine the dimensions of the image and mask data from the HDF5 file at a specified frame index and store these as attributes. This method retrieves the dimensions of both the image and mask datasets, discerning whether the data includes channels or z-stacks, and updates the object’s attributes accordingly. Attributes updated include the number of dimensions (ndim), image axes layout (axes), image dimensions (nx, ny, [nz]), number of channels in the image and mask (nchannels, nmaskchannels), and the full image shape (image_shape).
- Parameters:
n_frame (
int
, optional) – The frame index from which to retrieve the image and mask dimensions. Default is 0.- Returns:
Returns True if the dimensions were successfully retrieved and stored as attributes, False otherwise, such as when the file is not found or an error occurs in reading data.
- Return type:
bool
- axes
The layout of axes in the image data, e.g., ‘xy’, ‘xyc’, ‘zxy’, ‘zxyc’.
- Type:
str
- nx
Width of the image in pixels.
- Type:
int
- ny
Height of the image in pixels.
- Type:
int
- nz
Number of z-stacks in the image, if applicable.
- Type:
int
, optional
- image_shape
Array representing the dimensions of the image.
- Type:
ndarray
- nchannels
Number of channels in the image data.
- Type:
int
- nmaskchannels
Number of channels in the mask data.
- Type:
int
- ndim
Number of spatial dimensions in the image data.
- Type:
int
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> success = traj.get_image_shape(1) True
- get_lineage_btrack(mskchannel=0, distcut=5.0, framewindow=6, visual_1cell=False, visual=False, max_search_radius=100, save_h5=False, overwrite=False)[source]
Tracks cell lineages over an image stack using Bayesian tracking with visual confirmation options. This method registers transformed masks and applies Bayesian tracking to link cell identities across frames, storing the lineage information. Use of btrack software requires a cell_config.json file stored in the directory, see btrack documentation.
- Parameters:
mskchannel (
int
) – Mask channel used to identify cell labels from which cell centers are extracted.distcut (
float
) – Maximum distance between cell centers in consecutive frames for cells to be considered the same.framewindow (
int
) – Number of frames over which to look for cell correspondences.visual_1cell (
bool
) – If True, displays visual tracking information for single cell matches during processing.visual (
bool
) – If True, displays visual tracking information for all cells during processing.max_search_radius (
int
) – The maximum search radius in pixels for linking objects between frames.save_h5 (
bool
) – If True, saves the lineage data (linSet) to an HDF5 file.overwrite (
bool
) – If True and save_h5 is True, overwrites existing data in the HDF5 file.
- Returns:
linSet – A list of arrays where each array corresponds to a frame and contains indices that map each cell to its predecessor in the previous frame. Cells with no predecessor are marked with -1. The data saved in linSet thus represents the lineage of each cell over the stack.
- Return type:
list
ofndarray
- Raises:
AttributeError – If tf_matrix_set is not set, indicating that stack transformation matrices are required for tracking but have not been calculated.
Examples
>>> lineage_data = traj.get_lineage_btrack(mskchannel=1, visual=True)
- get_lineage_min_otcost(distcut=5.0, ot_cost_cut=inf, border_scale=None, border_resolution=None, return_cost=False, visual=False, save_h5=False, overwrite=False)[source]
Tracks cell lineages over multiple time points using optimal transport cost minimization.
This method uses centroid distances and optimal transport costs to identify the best matches for cell trajectories between consecutive time points, ensuring accurate tracking even in dense or complex environments.
- Parameters:
distcut (
float
, optional) – Maximum distance between cell centroids to consider a match (default is 5.0).ot_cost_cut (
float
, optional) – Maximum optimal transport cost allowed for a match (default is np.inf).border_scale (
list
offloat
, optional) – Scaling factors for the cell border in the [z, y, x] dimensions. If not provided, the scaling is determined from self.micron_per_pixel and border_resolution.border_resolution (
float
, optional) – Resolution for the cell border, used to determine border_scale if it is not provided. If not set, uses self.border_resolution.visual (
bool
, optional) – If True, plots the cells and their matches at each time point for visualization (default is False).save_h5 (
bool
, optional) – If True, saves the lineage data to the HDF5 file (default is False).overwrite (
bool
, optional) – If True, overwrites existing data in the HDF5 file when saving (default is False).
- Returns:
The function updates the instance’s linSet attribute, which is a list of arrays containing lineage information for each time point. If save_h5 is True, the lineage data is saved to the HDF5 file.
- Return type:
None
Notes
This function assumes that cell positions have already been extracted using the get_cell_positions method.
The function uses the spatial.get_border_dict method to compute cell borders and spatial.get_ot_dx
to compute optimal transport distances. - Visualization is available for 2D and 3D data, with different handling for each case.
Examples
>>> traj.get_lineage_min_otcost(distcut=10.0, ot_cost_cut=50.0, visual=True) Frame 1 tracked 20 of 25 cells Frame 2 tracked 22 of 30 cells ...
- get_lineage_mindist(distcut=5.0, visual=False, save_h5=False, overwrite=False)[source]
Tracks cell lineage based on the minimum distance between cells across consecutive frames. This method assesses cell positions to establish lineage by identifying the nearest cell in the subsequent frame within a specified distance threshold.
- Parameters:
distcut (
float
, optional) – The maximum distance a cell can move between frames to be considered the same cell. Cells moving a distance greater than this threshold will not be tracked from one frame to the next.visual (
bool
, optional) – If True, displays a visual representation of the tracking process for each frame, showing the cells and their movements between frames.save_h5 (
bool
, optional) – If True, saves the lineage data (linSet) to an HDF5 file.overwrite (
bool
, optional) – If True and save_h5 is True, overwrites existing data in the HDF5 file.
- Returns:
linSet – A list where each entry corresponds to a frame and contains cell indices that map each cell to its predecessor in the previous frame. Cells with no identifiable predecessor are marked with -1. This list provides a complete lineage map of cells across all analyzed frames.
- Return type:
list
ofndarray
- Raises:
AttributeError – If the cell positions (x) are not calculated prior to running this method, indicating that get_cell_positions needs to be executed first.
Examples
>>> traj = Trajectory('path/to/your/data.h5') >>> lineage_data = traj.get_lineage_mindist(distcut=10, visual=True)
- get_mask_data(n_frame)[source]
Retrieve the mask data for a specified frame from the HDF5 file associated with this instance. This method accesses the HDF5 file, navigates to the specific dataset for the given frame, and extracts the mask data.
- Parameters:
n_frame (
int
) – The frame number from which to retrieve mask data.- Returns:
msk – The mask data as a NumPy array. The structure of the array will depend on the mask setup in the HDF5 file, such as whether it includes dimensions for multiple channels or z-stacks.
- Return type:
ndarray
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> mask_data = traj.get_mask_data(5) >>> mask_data.shape (1024, 1024, 2) # Example shape, actual may vary.
- get_motility_features(indcells=None, mskchannel=None, radius=None, save_h5=False, overwrite=False)[source]
Extracts motility features for individual cells and their neighbors. This method calculates both single-cell and neighbor-averaged motility characteristics, such as displacement and interaction with neighboring cells, based on tracking data and cell label information.
- Parameters:
indcells (
ndarray
ofint
, optional) – Indices of cells for which to calculate motility features. If None, features are calculated for all cells in the dataset.mskchannel (
int
) – Mask channel used to define cell labels.radius (
int
, optional) – Size of morphological expansion in pixels to find neighboring cells.save_h5 (
bool
, optional) – If True, saves the calculated motility features to an HDF5 file specified in the trajectory object.overwrite (
bool
, optional) – If True and save_h5 is True, overwrites existing data in the HDF5 file.
- Returns:
Xf_com (
ndarray
) – An array of computed motility features for each specified cell. The array dimensions are (number of cells, number of features).feature_list (
ndarray
ofstr
, optional) – Descriptions of each motility feature computed. This is returned if return_feature_list is set to True in the method call.
- Raises:
ValueError – If required data such as mask channels or cell indices are not set, or if cell tracking has not been performed prior to feature extraction.
Examples
>>> motility_features = traj.get_motility_features(indcells=[1, 2, 3], mskchannel=0)
- get_pair_rdf(cell_indsA=None, cell_indsB=None, rbins=None, nr=50, rmax=500)[source]
Calculates the radial distribution function (RDF) between two sets of cells, identifying the frequency of cell-cell distances within specified radial bins. This method is commonly used in statistical physics and materials science to study the spatial distribution of particles.
- Parameters:
cell_indsA (
array
ofint
, optional) – Indices of the first set of cells. If None, considers all cells.cell_indsB (
array
ofint
, optional) – Indices of the second set of cells. If None, uses the same indices as cell_indsA.rbins (
ndarray
, optional) – Array of radial bins for calculating RDF. If None, bins are generated linearly from nearly 0 to rmax.nr (
int
, optional) – Number of radial bins if rbins is not specified. Default is 50.rmax (
float
, optional) – Maximum radius for the radial bins if rbins is not specified. Default is 500 units.
- Returns:
rbins (
ndarray
) – The radial bins used for the RDF calculation, adjusted to remove the zero point and ensure proper binning.paircorrx (
ndarray
) – RDF values corresponding to each radial bin, normalized to the total number of pairs and the bin volumes.
Examples
>>> traj = Trajectory('path/to/data.h5') >>> rbins, rdf = traj.get_pair_rdf(cell_indsA=[1, 2, 3], cell_indsB=[4, 5, 6], nr=100, rmax=200) # This will calculate the RDF between two specified sets of cells with 100 radial bins up to a maximum radius of 200.
Notes
The RDF gives a normalized measure of how often pairs of points (cells) appear at certain distances from each other,
compared to what would be expected for a completely random distribution at the same density. - This function is useful for examining the spatial organization and clustering behavior of cells in tissues or cultures.
- get_secreted_ligand_density(frame, mskchannel=0, scale=2.0, npad=None, indz_bm=0, secretion_rate=1.0, D=None, flipz=False, visual=False)[source]
Simulates the diffusion of secreted ligands from cells, providing a spatial distribution of ligand density across a specified frame. The simulation considers specified boundary conditions and secretion rates to model the ligand concentration in the vicinity of cells.
- Parameters:
frame (
int
) – The frame index from which image and mask data are extracted.mskchannel (
int
, optional) – The channel of the mask that identifies the cells.scale (
float
, optional) – The scaling factor for the resolution of the simulation. Default is 2.0.npad (
array-like
, optional) – Padding to add around the simulation area to avoid edge effects. Defaults to [0, 0, 0] if None.indz_bm (
int
, optional) – The index of the bottom-most slice to consider in the z-dimension.secretion_rate (
float
orarray-like
, optional) – The rate at which ligands are secreted by the cells. Can be a single value or an array specifying different rates for different cells.D (
float
, optional) – Diffusion coefficient. If not specified, it is calculated based on the pixel size and z-scaling.flipz (
bool
, optional) – If True, flips the z-dimension of the image and mask data, useful for certain imaging orientations.visual (
bool
, optional) – If True, displays visualizations of the simulation process and results.
- Returns:
vdist – A 3D array representing the volumetric distribution of the ligand density around cells in the specified frame.
- Return type:
ndarray
Examples
>>> traj = Trajectory('path/to/data.h5') >>> ligand_density = traj.get_secreted_ligand_density(frame=10, mskchannel=1, scale=1.5, secretion_rate=0.5, D=15) # This will simulate and return the ligand density around cells in frame 10 with specified parameters.
- Raises:
ValueError – If any of the provided indices or parameters are out of the expected range or if there is a mismatch in array dimensions.
Notes
The method performs a complex series of image processing steps including scaling, padding, flipping, and 3D mesh generation.
It uses finite element methods to solve diffusion equations over the generated mesh, constrained by the cellular boundaries and secretion rates.
- get_signal_contributions(S, time_lag=0, x_pos=None, rmax=5.0, R=None, zscale=None, rescale_z=False)[source]
Computes the spatial contributions of signaling between cells over a specified time lag. This method averages signals from nearby cells, weighted inversely by their distances, to assess local signaling interactions.
- Parameters:
S (
ndarray
) – A binary array indicating the signaling status of cells (1 for active, 0 for inactive).time_lag (
int
, optional) – The time lag over which to assess signal contributions, defaulting to 0 for immediate interactions.x_pos (
ndarray
, optional) – Positions of cells. If None, the positions are taken from the instance’s x attribute.rmax (
float
, optional) – The maximum radius within which to consider signal contributions from neighboring cells, default is 5.R (
float
, optional) – Normalization radius, typically set to the average cell diameter; defaults to the instance’s cellpose_diam.zscale (
float
, optional) – The scaling factor for the z-dimension, used if rescale_z is True.rescale_z (
bool
, optional) – If True, scales the z-coordinates of positions by zscale.
- Returns:
S_r – An array where each element is the averaged spatial signal contribution received by each cell, normalized by distance and weighted by the signaling status of neighboring cells.
- Return type:
ndarray
Examples
>>> traj = Trajectory('path/to/data.h5') >>> S = np.random.randint(0, 2, size=traj.cells_indSet.size) >>> signal_contributions = traj.get_signal_contributions(S, time_lag=1, rmax=10, R=15) # This computes the signal contributions for each cell, considering interactions within a radius of 10 units.
Notes
This method is useful for understanding the influence of cell-cell interactions within a defined spatial range
and can be particularly insightful in dynamic cellular environments where signaling is a key factor. - The distances are normalized by the cell radius R to provide a relative measure of proximity, and the contributions are weighted by the inverse of these normalized distances.
- Raises:
ValueError – If necessary parameters are missing or incorrectly formatted.
- get_stack_trans(mskchannel=0, ntrans=20, maxt=10, dist_function=<function get_pairwise_distance_sum>, zscale=None, save_h5=False, overwrite=False, do_global=False, **dist_function_keys)[source]
Computes translations across an image stack using a brute force optimization method to align cell centers from frame to frame. This method can apply both local and global alignment strategies based on the distribution of cell centers.
- Parameters:
mskchannel (
int
) – Mask channel to use for extracting cell centers from labels.ntrans (
int
orndarray
) – Number of translations to try in each dimension during optimization.maxt (
float
orndarray
) – Maximum translation distance to consider in each dimension.dist_function (
function
) – Optimization function that takes cell centers from two frames and a translation vector, returning a score where lower values indicate better alignment.zscale (
float
, optional) – Scaling factor for the z-dimension to normalize it with x and y dimensions.save_h5 (
bool
, optional) – If True, saves the computed transformation matrices to an HDF5 file.overwrite (
bool
, optional) – If True and save_h5 is True, overwrites existing data in the HDF5 file.do_global (
bool
, optional) – If True, performs a global alignment using the center of mass of all masks prior to brute force optimization.dist_function_keys (
dict
) – Additional keyword arguments to pass to the dist_function.
- Returns:
tf_matrix_set – An array of shape (nframes, ndim+1, ndim+1) containing the transformation matrices for aligning each frame to the first frame based on the computed translations.
- Return type:
ndarray
Examples
>>> transformations = traj.get_stack_trans(mskchannel=1, ntrans=10, maxt=5, do_global=False)
- get_trajAB_segments(xt, stateA=None, stateB=None, clusters=None, states=None, distcutA=None, distcutB=None)[source]
Identifies segments within trajectories that transition between specified states, A and B. This method can be used to analyze transitions or dwell times in specific states within a trajectory dataset.
- Parameters:
xt (
ndarray
) – An array representing trajectories, either as direct state assignments or continuous data.stateA (
int
orarray-like
, optional) – The state or states considered as ‘A’. Transitions from this state are analyzed.stateB (
int
orarray-like
, optional) – The state or states considered as ‘B’. If defined, transitions from state A to state B are analyzed.clusters (
object
, optional) – A clustering object with an ‘assign’ method that can be used to discretize continuous trajectory data into states.states (
ndarray
, optional) – An array defining all possible states. Used to map states in ‘xt’ if it contains direct state assignments.distcutA (
float
, optional) – The distance cutoff for determining membership in state A if ‘xt’ is continuous.distcutB (
float
, optional) – The distance cutoff for determining membership in state B if ‘xt’ is continuous and ‘stateB’ is defined.
- Returns:
slices – A list of slice objects representing the indices of ‘xt’ where transitions between specified states occur. If only ‘stateA’ is specified, returns segments where the trajectory is in state A.
- Return type:
list
ofslice
- Raises:
ValueError – If required parameters for defining states or transitions are not provided or if the provided parameters are incompatible (e.g., ‘distcutA’ without a corresponding ‘stateA’).
Examples
>>> traj = Trajectory('path/to/data.h5') >>> xt = np.random.rand(100, 2) # Example continuous trajectory data >>> clusters = KMeans(n_clusters=3).fit(xt) # Example clustering model >>> segments = traj.get_trajAB_segments(xt, stateA=0, stateB=1, clusters=clusters) # Analyze transitions from state 0 to state 1 using cluster assignments
Notes
If ‘xt’ contains direct state assignments, ‘states’ must be provided to map these to actual state values.
For continuous data, ‘clusters’ or distance cutoffs (‘distcutA’, ‘distcutB’) must be used to define states.
This function is useful for analyzing kinetic data where transitions between states are of interest.
- get_traj_segments(seg_length)[source]
Divides each trajectory into multiple overlapping segments of a specified length. This method is useful for analyzing sections of trajectories or for preparing data for machine learning models that require fixed-size input.
- Parameters:
seg_length (
int
) – The length of each segment to be extracted from the trajectories. Segments are created by sliding a window of this length along each trajectory.- Returns:
traj_segSet – A 2D array where each row represents a segment of a trajectory. The number of columns in this array equals seg_length. Each segment includes consecutive cell indices from the original trajectories.
- Return type:
ndarray
Notes
This method requires that the trajectories attribute has been populated, typically by
a method that computes full trajectories such as get_unique_trajectories. - Only trajectories that are at least as long as seg_length will contribute segments to the output. Shorter trajectories are ignored.
Examples
>>> traj = Trajectory('path/to/data.h5') >>> traj.get_unique_trajectories() >>> segments = traj.get_traj_segments(5) >>> print(segments.shape) (number of segments, 5) # Example shape, actual values depend on trajectory lengths and seg_length.
- Raises:
ValueError – If seg_length is larger than the length of any available trajectory, resulting in no valid segments being produced.
- get_trajectory_steps(inds=None, traj=None, Xtraj=None, get_trajectories=True, nlag=1)[source]
Extracts sequential steps from cell trajectories and retrieves corresponding features from a feature matrix. This method is useful for analyses that require step-wise comparison of trajectories, such as calculating changes or transitions over time.
- Parameters:
inds (
array
ofint
, optional) – Indices of cells for which to get trajectory steps. If None, processes all cells.traj (
ndarray
, optional) – The trajectory data matrix. If None, uses the instance’s traj attribute.Xtraj (
ndarray
, optional) – The feature data matrix corresponding to trajectories. If None, uses the instance’s Xtraj attribute.get_trajectories (
bool
, optional) – If True, computes unique trajectories for the specified indices before processing steps.nlag (
int
, optional) – The lag between steps in a trajectory to consider. A value of 1 means consecutive steps.
Notes
The method assumes that the trajectory and feature data matrices (traj and Xtraj, respectively)
are indexed in the same way. - This function can optionally calculate unique trajectories before extracting steps, making it versatile for both freshly calculated and pre-computed trajectory datasets.
Examples
>>> traj = Trajectory('path/to/data.h5') >>> traj.get_trajectory_steps(get_trajectories=True, nlag=2) # This will compute unique trajectories for all cells and then extract every second step.
- Raises:
IndexError – If any index in inds is out of bounds of the available data.
ValueError – If traj or Xtraj data matrices are not set and not provided as arguments.
- get_unique_trajectories(cell_inds=None, verbose=False, extra_depth=None, save_h5=False, overwrite=False)[source]
Computes unique trajectories for a set of cells over multiple frames, minimizing redundancy by ensuring that no two trajectories cover the same cell path beyond a specified overlap (extra_depth).
- Parameters:
cell_inds (
array
ofint
, optional) – Array of cell indices for which to calculate trajectories. If None, calculates trajectories for all cells.verbose (
bool
, optional) – If True, provides detailed logs during the trajectory calculation process.extra_depth (
int
, optional) – Specifies how many frames of overlap to allow between different trajectories. If not set, uses the pre-set attribute ‘trajl’ minus one as the depth; if ‘trajl’ is not set, defaults to 0.
Notes
This method identifies unique trajectories by tracking each cell backward from its last appearance
to its first, recording the trajectory, and then ensuring subsequent trajectories do not retread the same path beyond the allowed overlap specified by ‘extra_depth’. - Each trajectory is tracked until it either reaches the start of the dataset or an earlier part of another trajectory within the allowed overlap. - This function updates the instance’s ‘trajectories’ attribute, storing each unique trajectory.
Examples
>>> traj.get_unique_trajectories(verbose=True)
- load_from_h5(path)[source]
Load data from a specified path within an HDF5 file. This method attempts to read records recursively from the given path in the HDF5 file specified by the h5filename attribute of the instance.
- Parameters:
path (
str
) – The base path in the HDF5 file from which to load data.- Returns:
Returns True if the data was successfully loaded, False otherwise.
- Return type:
bool
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> traj.load_from_h5('/data/group1') loading path/to/your/hdf5file.h5 True
- manual_fate_validation(indcells, fate_attr, trajl_future=2, trajl_past=2, stride=1, restart=False, val_tracks=False, rep_channel=2, bf_channel=0, nuc_channel=1, show_nuc=True, msk_channel=0, projection=False, nuclow=98, nuchigh=100, show_allcells=True, show_linked=True, pathto='./', save_pic=False, boundary_expansion=[1, 40, 40], save_attr=True, save_h5=False, overwrite=False)[source]
Manually validates cell fate by reviewing images and tracking data for individual cells over time. Provides an interactive validation interface to assess whether cells follow a specific fate or not.
- Parameters:
indcells (
array-like
ofint
) – Indices of cells to review for fate validation.fate_attr (
str
) – The name of the fate attribute being reviewed and validated.trajl_future (
int
, optional) – Number of future frames to include in the trajectory review (default is 2).trajl_past (
int
, optional) – Number of past frames to include in the trajectory review (default is 2).restart (
bool
, optional) – Whether to restart validation from the beginning (default is True). If False, previously reviewed cells are re-evaluated.val_tracks (
bool
, optional) – If True, validates the cell tracking in addition to fate validation (default is True).rep_channel (
int
, optional) – The channel used for representation images (default is 2).bf_channel (
int
, optional) – The bright-field channel (default is 0).nuc_channel (
int
, optional) – The nucleus channel (default is 1).msk_channel (
int
, optional) – The mask channel used for cell identification (default is 0).pathto (
str
, optional) – The path to save images if save_pic is True (default is ‘./’).save_pic (
bool
, optional) – Whether to save the images for each validated cell (default is False).boundary_expansion (
list
ofint
, optional) – The number of pixels to expand the boundary of the cell block in the image (default is [1, 40, 40]).save_attr (
bool
, optional) – Whether to save the validated attributes during the process (default is True).save_h5 (
bool
, optional) – Whether to save the updated attributes to an HDF5 file (default is False).overwrite (
bool
, optional) – Whether to overwrite existing data when saving to HDF5 (default is False).
- Returns:
vals_fate (
ndarray
ofint
) – Validated fate values for each cell. 1 indicates fate, 0 indicates not fate, and -1 indicates unclear or indeterminate fate.inds_fate (
ndarray
ofint
) – Indices of the cells that were confirmed to follow the fate of interest.
Notes
This function provides an interactive review interface, where users can manually validate the fate and tracking of cells.
It allows users to interactively break cell lineage links if necessary and stores the results of each review session.
The images and masks for each cell across its past and future trajectory are visualized for validation.
Examples
>>> vals_fate, inds_fate = model.manual_fate_validation(indcells, 'apoptosis', trajl_future=3, trajl_past=3, save_pic=True, pathto='/output/images') >>> print(f'Validated fates for {len(inds_fate)} cells.')
- save_to_h5(path, attribute_list, overwrite=False)[source]
Save specified attributes to an HDF5 file at the given path. This method saves attributes from the current instance to a specified location within the HDF5 file, creating or overwriting data as necessary based on the overwrite parameter.
- Parameters:
path (
str
) – The base path in the HDF5 file where attributes will be saved.attribute_list (
list
ofstr
) – A list containing the names of attributes to save to the HDF5 file.overwrite (
bool
, optional) – If True, existing data at the specified path will be overwritten. Default is False.
- Returns:
Returns True if the attributes were successfully saved, False otherwise, such as when the HDF5 file does not exist or attributes cannot be written.
- Return type:
bool
Examples
>>> traj = Trajectory('path/to/your/hdf5file.h5') >>> traj.some_attribute = np.array([1, 2, 3]) >>> traj.save_to_h5('/data/', ['some_attribute']) saving attributes ['some_attribute'] to /data/ in path/to/your/hdf5file.h5 saved some_attribute to path/to/your/hdf5file.h5/data/ True
celltraj.features
Featurization functions for cell images.
- celltraj.features.apply3d(img, function2d, dtype=None, **function2d_args)[source]
Applies a 2D function across each slice of a 3D image stack or directly to a 2D image, allowing for specific operations like filtering or transformation to be uniformly applied across all spatial slices.
- Parameters:
img (
ndarray
) – The input image which can be either 2D or 3D. If the image is 3D, the function is applied slice by slice.function2d (
callable
) – A function that is applied to each 2D slice of the image. This function must accept an image as its first argument and can accept additional named arguments.dtype (
data-type
, optional) – The desired data-type for the output image. If None, the dtype of img is used. Specifying a dtype can be useful for managing memory or computational requirements.**function2d_args (
dict
) – Additional keyword arguments to pass to function2d.
- Returns:
img_processed – The image resulting from the application of function2d to each slice of img or directly to img if it is 2D.
- Return type:
ndarray
Examples
>>> import numpy as np >>> img = np.random.rand(5, 100, 100) # Example 3D image >>> result = apply3d(img, np.mean, axis=0) # Apply np.mean across axis 0 of each 2D slice >>> print(result.shape) (5, 100)
Notes
This function is particularly useful for processing 3D data where an operation is intended to be repeated across each 2D section. For example, applying edge detection or blurring slice-by-slice.
The performance of this function depends on the complexity of function2d and the size of the image.
- Raises:
ValueError – If img is not 2D or 3D, or if function2d cannot be applied as specified.
- celltraj.features.boundaryCB_FFT(msk, fmsk, ncomp=15, center=None, nth=256, bordersize=0)[source]
Computes the Fourier Transform of boundary data for a mask distinguishing between core-cell and cell-surrounding regions, encoding the shape information in frequency space.
This function identifies boundaries within a mask and differentiates between core-cell (cc) and cell-surrounding (cs) regions. It then calculates the Fourier Transform of these boundary classifications relative to a center, capturing the spatial distribution of core and surrounding areas.
- Parameters:
msk (
ndarray
) – A binary mask where the non-zero region defines the cells.fmsk (
ndarray
) – A foreground mask, used to define foreground regions for identifying core and surrounding cell regions.ncomp (
int
, optional) – Number of components to return from the Fourier Transform (default is 15).center (
array-like
, optional) – The center of the image from which to calculate radial coordinates. If None, it defaults to the image center.nth (
int
, optional) – Number of angular steps to interpolate over the [0, 2π] interval, default is 256.bordersize (
int
, optional) – The size of the border around cells to consider for differentiation between core and surrounding areas, default is 1.
- Returns:
rtha – An array containing the first ncomp normalized magnitudes of the Fourier components of the boundary data.
- Return type:
ndarray
Examples
>>> msk = np.zeros((100, 100), dtype=bool) >>> msk[30:70, 30:70] = True >>> fmsk = np.zeros_like(msk) >>> fmsk[35:65, 35:65] = True >>> fft_result = boundaryCB_FFT(msk, fmsk) >>> print(fft_result.shape) (15,)
Notes
The function first distinguishes between core-cell and cell-surrounding regions using morphological operations.
It then maps these regions onto a polar coordinate system centered on center and computes the FFT of this radial binary function, which describes the presence of core-cell versus cell-surrounding regions as a function of angle.
- Raises:
Exception – If there is an error during processing, possibly due to issues with input data shapes or computation failures.
- celltraj.features.boundaryFFT(msk, ncomp=15, center=None, nth=256)[source]
Computes the normalized Fast Fourier Transform (FFT) of the boundary of a mask. The boundary is first represented in polar coordinates (radius as a function of angle), and the FFT is used to capture the frequency components of this boundary representation, providing a spectral description of the shape.
- Parameters:
msk (
ndarray
) – A binary mask where the non-zero region defines the shape whose boundary will be analyzed.ncomp (
int
, optional) – The number of Fourier components to return. Default is 15.center (
array-like
, optional) – The center of the mask from which radial distances are measured. If None, the geometric center of the mask is used.nth (
int
, optional) – The number of points to interpolate along the boundary before computing the FFT. More points can improve the smoothness of the interpolation. Default is 256.
- Returns:
rtha – An array of the first ncomp normalized magnitudes of the Fourier components of the boundary.
- Return type:
ndarray
- Raises:
Exception – If there is an error in computing the Fourier transform, possibly due to issues with the boundary extraction or interpolation.
Examples
>>> msk = np.zeros((100, 100)) >>> msk[30:70, 30:70] = 1 # Define a square region >>> fft_components = boundaryFFT(msk) >>> print(fft_components.shape) (15,)
Notes
This function first identifies the boundary pixels of the mask using image processing techniques.
It then converts these boundary coordinates into polar coordinates centered around center.
After sorting and unique filtering of angular coordinates, it interpolates the radial distance as a function of angle and computes the FFT, returning the normalized magnitudes of its components.
- celltraj.features.featBoundary(regionmask, intensity)[source]
Calculates boundary-based Fourier Transform features for specified regions within an image. This function applies a Fourier Transform to the boundaries of regions defined by regionmask to capture the shape characteristics in the frequency domain.
- Parameters:
regionmask (
ndarray
) – A binary mask where non-zero values indicate the region of interest whose boundary is analyzed. The mask can be either 2D or 3D.intensity (
ndarray
) – The intensity image corresponding to regionmask. This parameter is currently not used in the function but is included for compatibility with other feature extraction functions.
- Returns:
xf – An array of Fourier Transform features of the boundary. If the regionmask is 3D, the function returns the mean of the Fourier Transform features computed across all slices.
- Return type:
ndarray
Examples
>>> regionmask = np.zeros((100, 100), dtype=bool) >>> regionmask[30:70, 30:70] = True # Define a square region >>> intensity = np.random.rand(100, 100) # Not used in this function >>> boundary_features = featBoundary(regionmask, intensity) >>> print(boundary_features.shape) (15,)
Notes
The function computes boundary features by first extracting the boundary of the masked region using image processing techniques and then applying a Fourier Transform to describe the shape in the frequency domain.
If no valid region is found in regionmask (i.e., all values are zero), the function returns an array of zeros with a length defined by the number of components used in the boundaryFFT function.
- Raises:
ValueError – If the regionmask is empty or does not contain any regions to process.
- celltraj.features.featBoundaryCB(regionmask, intensity)[source]
Computes boundary-based Fourier Transform features for a region mask distinguishing between core-cell and surrounding areas by using intensity to define active regions. This function applies a binary erosion to the region mask to refine the core region and then calculates the Fourier Transform features based on the refined mask and intensity data. Currently there is no way to pass parameters to the boundaryCB_FFT function.
- Parameters:
regionmask (
ndarray
) – A binary mask indicating the presence of cells. This mask is eroded to focus more on the core region of cells.intensity (
ndarray
) – An intensity image where non-zero values indicate active regions. This is used to distinguish between core-cell and surrounding areas.
- Returns:
xf – An array containing the Fourier Transform features of the boundary data between core and surrounding regions. If the input regionmask is 3D, the function returns the mean of features computed across all slices.
- Return type:
ndarray
Examples
>>> regionmask = np.zeros((100, 100), dtype=bool) >>> regionmask[30:70, 30:70] = True >>> intensity = np.random.randint(0, 2, (100, 100)) >>> boundary_features = featBoundaryCB(regionmask, intensity) >>> print(boundary_features.shape) (15,)
Notes
The function first applies a binary erosion to the regionmask to slightly reduce the region size, aiming to focus more on the core regions.
It then uses these regions along with intensity data to calculate Fourier Transform features that describe the spatial relationship between core-cell areas and their surrounding based on intensity.
Visualization commands within the function (commented out) can be enabled for debugging or understanding the process by visual inspection.
- celltraj.features.featHaralick(regionmask, intensity)[source]
Computes Haralick texture features for a specified region within an image, offering a statistical view of texture based on the image’s gray-level co-occurrence matrix (GLCM).
- Parameters:
regionmask (
ndarray
) – A binary mask where non-zero values define the region of interest for feature calculation.intensity (
ndarray
) – The intensity image corresponding to regionmask. Texture features are calculated from this image within the boundaries defined by regionmask.
- Returns:
xf – An array of computed Haralick features. If regionmask is 3-dimensional, returns the mean of the features calculated for each slice. If regionmask is 2-dimensional, returns the Haralick features for that single slice.
- Return type:
ndarray
Examples
>>> regionmask = np.zeros((100, 100), dtype=bool) >>> regionmask[30:70, 30:70] = True # Defining a square region >>> intensity = np.random.rand(100, 100) >>> haralick_features = featHaralick(regionmask, intensity) >>> print(haralick_features.shape) (13,)
Notes
Haralick features are calculated using Mahotas library functions based on the GLCM of the image.
The intensity image is quantized into several levels which are then used to compute the GLCM.
Feature 5 (sum average) is normalized by dividing by the number of quantization levels to match the scale of other features.
- Raises:
ValueError – If the regionmask and intensity arrays do not match in dimensions or if other processing errors occur.
- celltraj.features.featNucBoundary(regionmask, intensity)[source]
Computes Fourier Transform features from the boundaries of a specified region within an intensity image. This function is primarily used to analyze the structural properties of nuclear boundaries in biological imaging data.
- Parameters:
regionmask (
ndarray
) – A binary mask indicating the presence of nuclear regions. The mask can be 2D or 3D.intensity (
ndarray
) – The intensity image corresponding to regionmask, which is binarized within the function to delineate boundaries more clearly.
- Returns:
xf – An array containing Fourier Transform features derived from the boundary of the specified region. If no valid region or intensity is detected, an array of NaNs is returned.
- Return type:
ndarray
Examples
>>> regionmask = np.zeros((100, 100), dtype=bool) >>> regionmask[40:60, 40:60] = True # Define a square region >>> intensity = np.random.rand(100, 100) # Random intensity image >>> features = featNucBoundary(regionmask, intensity) >>> print(features.shape) (15,)
Notes
If the regionmask is 3D and contains multiple slices, the function calculates the Fourier Transform features for slices with non-zero intensity, then averages these features across the active slices.
- Raises:
ValueError – If regionmask and intensity do not have the same dimensions or if they are neither 2D nor 3D arrays.
- celltraj.features.featSize(regionmask, intensity)[source]
Calculates the size of a region specified by a mask.
This function computes the total number of pixels (or voxels) within a region defined by a non-zero mask. The intensity parameter is included for compatibility with skimage’s regionprops, which requires this parameter signature even if it’s not used in the computation.
- Parameters:
regionmask (
ndarray
) – A binary mask where non-zero values indicate the region of interest.intensity (
ndarray
) – The intensity image; not used in this function but required for consistency with regionprops function signatures.
- Returns:
size – The total count of non-zero pixels in the regionmask, representing the size of the region.
- Return type:
int
Examples
>>> regionmask = np.array([[0, 1, 1], [0, 1, 0], [0, 0, 0]]) >>> intensity = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Example intensity array (not used) >>> size = featSize(regionmask, intensity) >>> print(size) 3
- celltraj.features.featZernike(regionmask, intensity)[source]
Calculates the Zernike moments for a specified region within an image, quantifying the region’s shape and texture. This method uses Zernike polynomials to create a set of features that are invariant to rotation, making them particularly useful for shape analysis in image processing tasks.
- Parameters:
regionmask (
ndarray
) – A binary mask where non-zero values define the region of interest. The function computes Zernike moments for this specified region.intensity (
ndarray
) – An intensity image corresponding to regionmask. The function calculates moments based on the intensities within the region defined by regionmask.
- Returns:
xf – An array of computed Zernike moments. If regionmask is 3-dimensional, returns the mean of Zernike moments calculated for each slice. If regionmask is 2-dimensional, returns the Zernike moments for that single slice.
- Return type:
ndarray
Examples
>>> regionmask = np.zeros((100, 100), dtype=bool) >>> regionmask[30:70, 30:70] = True # Defining a square region >>> intensity = np.random.rand(100, 100) >>> zernike_features = featZernike(regionmask, intensity) >>> print(zernike_features.shape) (91,)
Notes
Zernike moments are calculated using a radius determined by the average dimensions of the regionmask.
The intensity values outside the regionmask are set to zero, and the intensities within the region are normalized before calculation to improve accuracy.
This function is useful for characterizing the morphological features of cellular structures or other similar objects in biomedical images.
- Raises:
ValueError – If the regionmask and intensity arrays do not match in dimensions.
- celltraj.features.get_cc_cs_border(mskcell, fmskcell, bordersize=0)[source]
Identifies core-cell (cc) and cell-surrounding (cs) borders within a given cell mask by applying morphological operations and boundary detection.
This function defines two regions within a cell mask: the core-cell border, which is far enough from the background to be considered central, and the cell-surrounding border, which is close to the background. Morphological erosion and dilation are used to refine these borders.
- Parameters:
mskcell (
ndarray
) – A binary mask indicating the presence of cells.fmskcell (
ndarray
) – A binary mask indicating foreground regions likely to include cells; this mask is modified by morphological operations to define borders.bordersize (
int
, optional) – The size of the border around cell regions to consider in the analysis. Default is 10.
- Returns:
ccborder (
ndarray
) – A binary mask where 1 indicates core-cell borders.csborder (
ndarray
) – A binary mask where 1 indicates cell-surrounding borders.
Examples
>>> mskcell = np.zeros((100, 100), dtype=bool) >>> mskcell[30:70, 30:70] = True >>> fmskcell = np.zeros_like(mskcell) >>> fmskcell[35:65, 35:65] = True >>> ccborder, csborder = get_cc_cs_border(mskcell, fmskcell, bordersize=5) >>> print(ccborder.sum(), csborder.sum()) (900, 100)
Notes
The function first finds the boundaries of the mskcell using the inner boundary mode.
It then applies sequential erosion and dilation to fmskcell to adjust the extent of the foreground mask.
Distances from the boundaries to the background are calculated to segregate core-cell and cell-surrounding regions.
- Raises:
ValueError – If the input masks are not of the same shape or if other processing errors occur.
- celltraj.features.get_contact_boundaries(labels, radius=10, boundary_only=True)[source]
Identifies contact boundaries within labeled regions, highlighting the edges where different labels meet. This function can apply a dilation operation to expand the regions before comparing them, which helps in identifying contact areas even if they are not immediately adjacent.
- Parameters:
labels (
ndarray
) – A labeled image where each unique integer (non-zero) represents a distinct region.radius (
int
, optional) – The radius of the structuring element used for dilation, which can expand the boundaries of the labels to identify near-contact areas. Default is 10.boundary_only (
bool
, optional) – If True, the function will return only the boundaries of the contact areas. If False, it will return the entire area affected by the dilation process where contacts occur. Default is True.
- Returns:
msk_contact – A binary mask indicating the areas where different labels are in contact. If boundary_only is True, this mask will only cover the actual boundaries; otherwise, it covers the dilated areas where contacts occur.
- Return type:
ndarray
Examples
>>> labels = np.array([[1, 1, 0, 2, 2], [1, 1, 0, 2, 2], [1, 1, 0, 0, 0], [0, 3, 3, 3, 0]]) >>> contact_msk = get_contact_boundaries(labels, radius=1, boundary_only=True) >>> print(contact_msk) [[False False False False False] [False False False False False] [False False True True False] [False True True True True]]
Notes
This function is particularly useful in cell imaging where identifying the boundaries between cells can help in analyzing cell interactions and morphology.
The dilation process helps to identify contacts even if cells (or other labeled regions) are not physically touching but are within a close proximity defined by radius.
- Raises:
ValueError – If labels is not a 2D or 3D ndarray.
- celltraj.features.get_contact_labels(labels0, radius=10)[source]
Identifies and returns labels that are in contact with each label in a segmented image. This function uses morphological dilation to find neighboring regions and constructs a mask that indicates the labels which each region is in contact with.
- Parameters:
labels0 (
ndarray
) – A labeled image where each unique positive integer represents a different segmented region.radius (
int
, optional) – The radius of the structuring element used for dilation. This determines how far out from the original label’s boundaries the function will look to identify contacts. Default is 10.
- Returns:
contact_labels – An image of the same shape as labels0 where each pixel in a contact region contains the label of the neighboring region it is in contact with. Pixels not in contact with different labels remain zero.
- Return type:
ndarray
Examples
>>> labels = np.array([[1, 1, 0, 0, 2, 2], [1, 1, 1, 2, 2, 2], [1, 1, 0, 0, 2, 2], [0, 0, 0, 0, 0, 0], [3, 3, 3, 3, 4, 4], [3, 3, 3, 3, 4, 4]]) >>> contact_labels = get_contact_labels(labels, radius=1) >>> print(contact_labels) [[0 0 0 0 0 0] [0 0 2 1 0 0] [0 0 0 0 0 0] [0 0 0 0 0 0] [0 0 0 4 3 0] [0 0 0 4 3 0]]
Notes
The function uses dilation to expand each label’s area and then checks for overlaps with other labels.
It works for both 2D and 3D images.
The resulting contact_labels map only shows where different labels meet; the rest of the area remains zero.
- Raises:
ValueError – If labels0 is not 2D or 3D, or if there are issues with dilation or label matching.
- celltraj.features.get_neighbor_feature_map(labels, neighbor_function=None, contact_labels=None, radius=10, dtype=<class 'numpy.float64'>, **neighbor_function_args)[source]
Constructs a map where each cell’s pixels are annotated with a feature value that quantifies some aspect of its relationship with neighboring cells. This is typically used in image analysis to evaluate how cells or segments interact with each other based on defined criteria.
- Parameters:
labels (
ndarray
) – A labeled image where each unique positive integer represents a distinct region or cell.neighbor_function (
callable
) – A function that computes a feature value given two labels. This function should accept at least two arguments, the labels of two neighboring regions, and return a scalar value that quantifies some aspect of their relationship.contact_labels (
ndarray
, optional) – A precomputed array the same shape as labels where each cell in a contact region contains the label of the neighboring region it is in contact with. If None, it will be computed within this function using get_contact_labels.dtype (
data-type
, optional) – The desired data-type for the output feature map. Default is np.float64.**neighbor_function_args (
dict
) – Additional keyword arguments to pass to neighbor_function.
- Returns:
neighbor_feature_map – An image of the same shape as labels where each pixel in a contact region is annotated with the feature value computed by neighbor_function.
- Return type:
ndarray
Notes
neighbor_function should be chosen based on the specific analysis required, e.g., calculating the distance, overlap, or other relational metrics between neighboring regions.
If contact_labels is not provided, the function calculates it internally, which may increase computational time.
- Raises:
ValueError – If labels does not have at least one dimension or if neighbor_function is not provided.
- celltraj.features.get_pca_fromdata(data, dim=-1, var_cutoff=0.95)[source]
A wrapper for sklearn Principal Component Analysis (PCA) on the provided dataset.
- Parameters:
data (
array-like
,shape (n_samples
,n_features)
) – The data matrix on which to perform PCA. Each row corresponds to a sample, and each column corresponds to a feature.dim (
int
, optional) – The specific number of principal components to retain. If -1, dim is ignored and var_cutoff is used instead.var_cutoff (
float
, optional) – The proportion of variance to retain. If dim is not -1, this parameter is ignored, and dim components are kept. Otherwise, the number of components is chosen to retain the specified variance proportion.
- Returns:
Xpca (
ndarray
) – The transformed data in the principal component space.pca (
PCA object
) – The PCA object from sklearn that contains the variance and principal component information.
Notes
If var_cutoff is used and set to less than 1, PCA selects the minimum number of principal components such that at least the specified variance proportion is retained.
- Raises:
ValueError – If var_cutoff is not between 0 and 1, or if dim is less than -1 or more than the number of features in the data.
- celltraj.features.meanIntensity(regionmask, intensity)[source]
Calculates the mean intensity of a specified region in an image, based on a given mask.
This function computes the mean value of pixel intensities within the area defined by the mask, where the mask contains non-zero values indicating the region of interest. The function handles regions without valid pixels (i.e., all zero mask or masked pixels) by returning NaN for those cases.
- Parameters:
regionmask (
ndarray
) – A binary mask where non-zero values delineate the region of interest over which the mean intensity is calculated.intensity (
ndarray
) – The intensity image where each pixel’s value represents its intensity, typically derived from grayscale or other types of imaging.
- Returns:
mean_intensity – The average intensity across all pixels within the region defined by regionmask. Returns NaN if the mask does not cover any valid pixels.
- Return type:
float
Examples
>>> regionmask = np.array([[0, 1, 1], [0, 1, 0], [0, 0, 0]]) >>> intensity = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Example intensity array >>> mean_intensity = meanIntensity(regionmask, intensity) >>> print(mean_intensity) 4.0
- celltraj.features.totalIntensity(regionmask, intensity)[source]
Computes the total intensity of a specified region within an image, using a mask to define the region.
This function sums the intensities of all pixels that fall within the region of interest specified by the mask. Pixels in the mask with non-zero values are considered part of the region. It is robust against NaN values in the intensity array, ignoring them in the sum.
- Parameters:
regionmask (
ndarray
) – A binary or boolean mask where non-zero or True values indicate the pixels to be included in the total intensity calculation.intensity (
ndarray
) – An array of the same shape as regionmask containing intensity values for each pixel.
- Returns:
total_intensity – The sum of the intensities of the pixels identified by regionmask. If all relevant pixels are NaN, returns 0.
- Return type:
float
Examples
>>> regionmask = np.array([[0, 1, 1], [0, 1, 0], [0, 0, 0]]) >>> intensity = np.array([[1, 2, 3], [4, np.nan, 6], [7, 8, 9]]) # Example intensity array >>> total_intensity = totalIntensity(regionmask, intensity) >>> print(total_intensity) 2.0
celltraj.imageprep
Image and mask processing tools.
- celltraj.imageprep.clean_labeled_mask(masks_nuc, remove_borders=False, remove_padding=False, edge_buffer=0, minsize=None, maxsize=None, verbose=False, fill_holes=True, selection='largest', test_map=None, test_cut=0.0)[source]
Cleans up a labeled mask by removing small or large objects based on size thresholds, handling image borders, and optionally filling holes within the objects. This function can also trim padding around the image and filter objects based on a secondary map.
- Parameters:
masks_nuc (
ndarray
) – An integer-labeled mask where each unique positive integer represents a separate object, and 0 represents the background.remove_borders (
bool
, optional) – If True, removes objects touching the image border. Default is False.remove_padding (
bool
, optional) – If True, removes padding around the image, focusing the mask on the central region. Default is False.edge_buffer (
int
, optional) – Defines a buffer zone around the edges when removing border-touching objects. Default is 0.minsize (
int
, optional) – The minimum size threshold for objects to be retained. Objects smaller than this are removed. Default is None, which disables this filter.maxsize (
int
, optional) – The maximum size threshold for objects. Objects larger than this are removed. Default is None, which disables this filter.verbose (
bool
, optional) – If True, print details about object removal. Default is False.fill_holes (
bool
, optional) – If True, fills holes within each labeled object. Default is True.selection (
str
, optional) – The method for selecting objects within a connected component. Supported values are ‘largest’ to keep only the largest object. Default is ‘largest’.test_map (
ndarray
, optional) – An additional map used to test objects for a secondary criterion, such as intensity thresholding. Default is None.test_cut (
float
, optional) – The cutoff value used along with test_map to decide whether an object should be retained. Default is 0.
- Returns:
A cleaned labeled mask with the same shape as masks_nuc, where retained objects are relabeled consecutively starting from 1, and background remains 0.
- Return type:
ndarray
Examples
>>> labeled_mask = np.array([[0, 1, 1], [1, 2, 2], [2, 2, 0]]) >>> cleaned_mask = clean_labeled_mask(labeled_mask, minsize=2, fill_holes=True) >>> print(cleaned_mask) [[0 1 1] [1 0 0] [0 0 0]]
Notes
The function is useful for post-processing segmentation outputs where removal of noise and small artifacts is necessary.
If remove_padding is used, ensure that the indices provided match the actual data layout to avoid misalignment.
Combining test_map and test_cut allows for sophisticated filtering based on specific measurement criteria, such as fluorescence intensity or other cell properties.
- celltraj.imageprep.create_h5(filename, dic, overwrite=False)[source]
Creates an HDF5 file and stores data from a dictionary into it under a specified group. The function checks if the file already exists and handles it based on the ‘overwrite’ parameter.
- Parameters:
filename (
str
) – The name of the file to create. This should include the path if the file is not to be created in the current directory.dic (
dict
) – The dictionary containing the data to be stored. This dictionary will be saved in the HDF5 file under the ‘/metadata/’ group.overwrite (
bool
, optional) – If True, if the file exists it will be overwritten. If False and the file exists, the function will return an error and not overwrite the file.
- Returns:
Returns 0 if the file was created and data was successfully saved. Returns 1 if an error occurred, such as if the file already exists and ‘overwrite’ is False, or if there is an issue in writing the data to the file.
- Return type:
int
Examples
>>> data = {'key1': 'value1', 'key2': 'value2'} >>> result = create_h5('data.h5', data, overwrite=False) >>> print(result) # If file does not exist or overwrite is True 0
Notes
This function uses the ‘utilities.save_dict_to_h5’ to save the dictionary into the HDF5 file.
It is important to handle exceptions during the file operation to avoid partial writes or file corruption.
- celltraj.imageprep.crop_image(img, tshift, nx, ny)[source]
Crop and resize an image based on specified translation shifts and dimensions.
- Parameters:
img (
ndarray
) – The original image to be cropped.tshift (
tuple
orndarray
) – A tuple or array indicating the x and y translation shifts where cropping should start.nx (
int
) – The desired width of the cropped image.ny (
int
) – The desired height of the cropped image.
- Returns:
img_cropped (
ndarray
) – The cropped and resized image.Description
-----------
This function crops the image starting from a point defined by `tshift
(top-left corner` ofthe crop)
and extends the crop
tothe specified width (`nx
)` andheight (`ny
). After cropping`,it resizes the
cropped portion back
tothe dimensions (`nx`
, ny) using an anti-aliasing filter tomaintain the quality.
Example
# Example of using the function to crop and resize an image: >>> img = np.random.rand(100, 100) # Create a random image of size 100x100 >>> tshift = (10, 10) # Start the crop 10 pixels down and right >>> nx, ny = 50, 50 # Dimensions of the cropped and resized image >>> cropped_img = crop_image(img, tshift, nx, ny)
- celltraj.imageprep.dist_to_contact(r, r0, d0, n=6, m=12)[source]
Calculate a contact potential value based on distance, using a Lennard-Jones-like formula.
- Parameters:
r (
float
orndarray
) – The radial distance or distances at which the potential is evaluated. Can be a single value or an array.r0 (
float
) – Characteristic distance scale, typically representing the distance beyond which the potential significantly decreases.d0 (
float
) – Offset distance, representing a threshold below which the potential is set to 1 (indicating maximum interaction).n (
int
, optional) – Power of the repulsive component of the potential. Default is 6.m (
int
, optional) – Power of the attractive component of the potential. Default is 12.
- Returns:
c – Computed potential values at each distance r. If r is an array, c will be an array of the same size.
- Return type:
float
orndarray
Notes
This function computes a value based on the generalized Lennard-Jones potential form: c(r) = (1 - w^n) / (1 - w^m) if r >= d0, c(r) = 1 if r < d0, where w = (r - d0) / r0.
Examples
>>> dist_to_contact(5, 1, 3) 0.25
>>> r = np.array([1, 2, 3, 4, 5]) >>> dist_to_contact(r, 1, 3) array([1. , 1. , 1. , 0.75 , 0.25])
- celltraj.imageprep.expand_registered_images(imgs, tSet)[source]
Applies transformations to a stack of images and expands them so that they align according to the provided transformation set. This function is useful for aligning images based on calculated translations and optionally rotations.
- Parameters:
imgs (
ndarray
orlist
ofndarrays
) – A stack of images where each image has dimensions (Z, X, Y). If a list is provided, it will be converted to an ndarray.tSet (
ndarray
) – An array of transformations for each image. Each transformation is a tuple or list of (radial angle, x-translation, y-translation), where angle is in degrees and translations are in pixels.
- Returns:
An ndarray containing the expanded and registered image stack. The dimensions of the output images will be adjusted to accommodate the maximum translation offsets to ensure all images fit within the new dimensions.
- Return type:
ndarray
Example
>>> import numpy as np >>> imgs = [np.random.rand(100, 100) for _ in range(10)] # Create a list of random images >>> tSet = np.array([[0, 10, -5] for _ in range(10)]) # Example transformations >>> registered_imgs = expand_registered_images(imgs, tSet) >>> print(registered_imgs.shape) (10, 105, 100) # Output dimensions may vary based on transformations
Notes
The transformations are applied using an affine transformation, where translations are adjusted to ensure no image content is lost.
The function automatically pads images based on the maximum translations specified in tSet to prevent image cropping.
- celltraj.imageprep.get_cell_centers(labels)[source]
Calculates the centers of mass for labeled regions in an image.
- Parameters:
labels (
ndarray
) – An array where each labeled region (cell) is marked with a distinct integer. The background should be labeled as 0.- Returns:
centers – An array of coordinates representing the centers of mass for each labeled region. Each row corresponds to a label, and the columns correspond to the coordinates along each dimension.
- Return type:
ndarray
Notes
This function returns the center of mass for each distinct label found in the labels array. The function will return an empty array if there are no labels greater than zero.
Examples
>>> labels = np.array([[0, 0, 1, 1], [0, 0, 1, 1], [2, 2, 3, 3], [2, 2, 3, 3]]) >>> centers = get_cell_centers(labels) >>> print(centers) [[2.5, 2.5], [3. , 0.5], [3. , 2.5]]
- celltraj.imageprep.get_cell_intensities(img, labels, averaging=False, is_3D=False)[source]
Calculate the sum or average intensity for each cell in a labeled image or image stack. This function handles both 2D and 3D images and can operate on multi-channel data, summing or averaging the intensities for each label in each channel.
- Parameters:
img (
ndarray
) – The image or image stack from which to calculate intensities. Can be 2D, 3D, or higher dimensions if channels are involved.labels (
ndarray
) – An integer array of the same shape as img where each unique non-zero value indicates a distinct cell region.averaging (
bool
, optional) – If True, calculate the mean intensity for each cell. If False, calculate the total intensity. Default is False.is_3D (
bool
, optional) – Set to True if img includes 3D spatial data (as opposed to 2D images with multiple channels). Default is False.
- Returns:
A 1D array of intensities for each cell. If img includes multiple channels, the result will be a 2D array with one row per cell and one column per channel.
- Return type:
ndarray
Examples
>>> img = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) >>> labels = np.array([[1, 1], [2, 2]]) >>> get_cell_intensities(img, labels, averaging=True) array([2.5, 6.5])
Notes
If averaging is False, the function sums the pixel values for each cell; if True, it averages them.
The function handles multi-channel images correctly for both 2D and 3D cases, adjusting its behavior based on the is_3D parameter.
- celltraj.imageprep.get_contactsum_dev(centers1, centers2, img2, rp1, nt=None, savefile=None)[source]
Calculate a grid-based sum of contact deviations for center points translated across an image.
- Parameters:
centers1 (
ndarray
) – Array of original center points (x, y coordinates).centers2 (
ndarray
) – Array of target center points for comparison (x, y coordinates).img2 (
ndarray
) – The image data used for setting the boundary conditions for translations.rp1 (
float
) – Interaction potential radius to determine the contact potential calculation.nt (
int
, optional) – Number of translations along each axis, if None it defaults to 1/20th of the image dimension.savefile (
str
, optional) – Path to save the resulting deviation grid as a NumPy binary file.
- Returns:
nncs_dev (
ndarray
) – A grid of normalized deviations of contact sums from their local average across the translation space.Description
-----------
This function creates a grid
ofpotential translation points across the image. For each point in this grid,
it shifts the
'centers1'
coordinates andcalculates the minimum distances
to'centers2'
within the confinesof the translated box. It then calculates a contact potential using these distances
andcompares the sum
to the local average
toassess deviations in potential interactions. This can help in understanding how
interactions vary spatially within an image. The function optionally saves the output grid
toa file for
further analysis.
Example
# Example of using the function to calculate contact deviations: >>> centers1 = np.array([[10, 10], [20, 20], [30, 30]]) >>> centers2 = np.array([[15, 15], [25, 25], [35, 35]]) >>> img = np.zeros((100, 100)) >>> rp1 = 10 >>> deviations = get_contactsum_dev(centers1, centers2, img, rp1, nt=10)
- celltraj.imageprep.get_cyto_minus_nuc_labels(labels_cyto, labels_nuc)[source]
Generates new cytoplasmic labels where nuclear labels are excluded. This function adjusts cytoplasmic labels to ensure they do not overlap with nuclear labels by dilating nuclear areas and subtracting them from corresponding cytoplasmic regions. This method helps in distinguishing between nuclear and cytoplasmic components of a cell, often necessary for detailed cellular analysis.
- Parameters:
labels_cyto (
ndarray
) – A 2D array where each integer represents a unique cytoplasmic region.labels_nuc (
ndarray
) – A 2D array of the same shape as labels_cyto, where each integer represents a unique nuclear region.
- Returns:
A 2D array of the same shape as labels_cyto, containing the refined cytoplasmic labels with nuclear regions excluded.
- Return type:
ndarray
Examples
>>> labels_cyto = np.array([[1, 1, 0], [1, 1, 0], [0, 0, 0]]) >>> labels_nuc = np.array([[0, 1, 0], [0, 1, 0], [0, 0, 0]]) >>> labels_cyto_new = get_cyto_minus_nuc_labels(labels_cyto, labels_nuc) >>> print(labels_cyto_new) [[1 0 0] [1 0 0] [0 0 0]]
Notes
This function ensures that nuclear regions are excluded from the cytoplasmic labels by first dilating the nuclear masks and then eroding them before subtracting from the cytoplasmic masks.
The output labels for cytoplasmic areas are adjusted to ensure no overlap with nuclear regions.
- Raises:
Exception – Raises an exception if there is an error during the label processing steps.
- celltraj.imageprep.get_feature_map(features, labels)[source]
Maps an array of features to their corresponding labels in a labeled image. Each feature is assigned to the region of the mask identified by the same label index. This function ensures that each discrete label in the labeled mask gets the corresponding feature value from the features array.
- Parameters:
features (
ndarray
) – An array of feature values where each entry corresponds to a label in the labels mask. The length of features must match the highest label value in the labels mask.labels (
ndarray
) – An integer-labeled mask where each unique positive integer represents a different region. Each region (label) will be assigned the corresponding feature value from the features array based on its label index.
- Returns:
An array of the same shape as labels where each labeled region is filled with its corresponding feature value from the features array.
- Return type:
ndarray
Examples
>>> labels = np.array([[1, 1, 0], [0, 2, 2], [2, 2, 0]]) >>> features = np.array([10, 20]) >>> feature_map = get_feature_map(features, labels) >>> print(feature_map) [[10 10 0] [ 0 20 20] [20 20 0]]
Notes
This function is particularly useful in imaging and machine learning applications where each segmented region’s properties need to be mapped back onto the original labeled mask for visualization or further analysis.
Ensure that the number of features matches the maximum label in the labels mask to avoid mismatches and errors.
- Raises:
ValueError – If the size of the features array does not match the highest label value in the labels mask.
- celltraj.imageprep.get_images(filelist)[source]
Reads a list of image files and loads them into memory as arrays. This function is useful for batch processing images for analysis or input into machine learning models.
- Parameters:
filelist (
list
ofstr
) – A list containing the file paths of images to be loaded. Each element in the list should be a string specifying the full path to an image file.- Returns:
A list of image arrays, where each array corresponds to an image file from filelist. The format and dimensions of each image array depend on the image file format and its content.
- Return type:
list
ofndarray
Examples
>>> filelist = ['path/to/image1.jpg', 'path/to/image2.png'] >>> images = get_images(filelist) >>> print(type(images[0]), images[0].shape) (<class 'numpy.ndarray'>, (height, width, channels))
Notes
This function uses skimage.io.imread to load images, which supports various image formats including JPEG, PNG, and TIFF among others.
The function directly reads images into memory, which may consume a lot of resources for large image files or long lists of images.
- Raises:
IOError – If any file in the list cannot be opened or read. This could be due to the file not existing, being unreadable, or being corrupted.
- celltraj.imageprep.get_intensity_centers(img, msk=None, footprint_shape=None, rcut=None, smooth_sigma=None, pad_zeros=True)[source]
Identifies centers of intensity within an image, optionally constrained by a mask. This function is useful for detecting features like local maxima that represent points of interest within an image, such as cell centers in microscopy images.
- Parameters:
img (
ndarray
) – The image in which intensity centers are to be identified.msk (
ndarray
, optional) – A boolean mask of the same shape as img that specifies regions within which centers should be identified. If None, the entire image is considered.footprint_shape (
tuple
, optional) – The size of the neighborhood considered for the local maximum. Should be a tuple corresponding to the image dimensions. If None, a minimal footprint of shape (1,1,…) for each dimension is used.rcut (
float
, optional) – The minimum allowed distance between centers. If centers are closer than this value, they will be merged. If None, no merging is performed.smooth_sigma (
float
orsequence
offloats
, optional) – The standard deviation for Gaussian smoothing applied to the image before identifying centers. This helps to reduce noise and improve the robustness of center detection.pad_zeros (
bool
, optional) – If True, the image will be padded with zeros on all sides by the width specified in footprint_shape. This helps to handle edge effects during local maximum detection.
- Returns:
An array of coordinates for the detected intensity centers.
- Return type:
ndarray
Examples
>>> img = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> centers = get_intensity_centers(img, smooth_sigma=1, footprint_shape=(1, 1)) >>> print(centers) [[2 2]]
Notes
The function is particularly useful for preprocessing steps in image analysis where features need to be extracted from local intensity variations.
Adjusting rcut and smooth_sigma according to the scale and noise characteristics of the image can significantly affect the accuracy and reliability of the detected centers.
- celltraj.imageprep.get_label_largestcc(label, fill_holes=True)[source]
Processes a labeled mask to keep only the largest connected component (CC) for each unique label in the mask, optionally filling holes within those components. This function is useful for cleaning segmentation results by removing smaller fragments of labels and ensuring continuity in the detected objects.
- Parameters:
label (
ndarray
) – An integer-labeled mask where each unique positive integer represents a separate object, and 0 represents the background.fill_holes (
bool
, optional) – If True, fills holes within the labeled objects before identifying the largest connected component. This can help in creating more robust and continuous object detections. Default is True.
- Returns:
A labeled mask similar in shape to the input label, but with only the largest connected component retained for each label, and all other components removed.
- Return type:
ndarray
Notes
This function is particularly useful when segmentation algorithms produce noisy results or when labels are fragmented. Cleaning up the labels to retain only the largest component can significantly improve the quality of the final analysis, especially in quantitative measurements where object integrity is crucial.
If using 3D data, the function will process each slice independently unless the mask is inherently volumetric, in which case 3D hole filling and labeling is applied.
- celltraj.imageprep.get_labeled_mask(b_imgr, imgM=None, apply_watershed=False, fill_holes=True, dist_footprint=None, zscale=None)[source]
Processes a binary image to label connected components, optionally applying the watershed algorithm to separate closely touching objects. This function can also fill holes within binary objects and mask out areas from an exclusion mask.
- Parameters:
b_imgr (
ndarray
) – A binary image where True represents the foreground (objects to label) and False represents the background.imgM (
ndarray
, optional) – An exclusion mask where True values specify areas to ignore during labeling, such as known noise or artifacts. If provided, any foreground in these areas will be set to False. Default is None.apply_watershed (
bool
, optional) – Whether to apply the watershed algorithm to separate overlapping or touching objects using a distance transform. Default is False.fill_holes (
bool
, optional) – If True, fills holes within the binary objects. This is often useful for cleaning up segmentation artifacts. Default is True.dist_footprint (
int
, optional) – The size of the footprint used for the distance transform if applying the watershed. Specifies the connectivity of the neighborhood used in the local maximum detection. Default is None, which uses a 3x3 square.zscale (
float
, optional) – The scaling factor for z-dimension in volumetric data (3D). It compensates for the difference in resolution between xy-plane and z-axis and is used only if the image is three-dimensional. Default is None.
- Returns:
A labeled image where each unique integer (starting from 1) corresponds to a separate object, with 0 representing the background.
- Return type:
ndarray
Examples
>>> img = np.random.randint(0, 2, size=(100, 100), dtype=bool) >>> labeled_mask = get_labeled_mask(img, apply_watershed=True, fill_holes=True, dist_footprint=5) >>> print(np.unique(labeled_mask)) [0 1 2 3 ...] # Example of labels found in the mask
Notes
The watershed algorithm can help in separating objects that touch each other but requires careful setting of the dist_footprint and zscale in case of volumetric data.
Exclusion masks are useful in experiments where certain areas need to be systematically ignored, such as damaged regions on a slide or expected artifacts.
- celltraj.imageprep.get_mask_2channel_ilastik(file_ilastik, fore_channel=0, holefill_area=0, growthcycles=0, pcut=0.8)[source]
Processes a pixel classification output from Ilastik to generate a binary mask for a specified foreground channel. This function includes options to fill holes, apply morphological operations, and threshold the probability maps to create a final binary mask.
- Parameters:
file_ilastik (
str
) – The path to the HDF5 file containing the Ilastik classification output.fore_channel (
int
, optional) – The index of the channel in the Ilastik output that represents the foreground probability. Default is 0.holefill_area (
int
, optional) – The minimum area threshold for opening and closing operations to fill holes in the foreground mask. If 0, no hole filling is performed. Default is 0.growthcycles (
int
, optional) – The number of cycles of dilation followed by erosion to grow and then shrink the foreground mask. This can help in smoothing the mask edges. Default is 0, which means no growth or erosion cycles.pcut (
float
, optional) – The probability cutoff threshold to convert the probability map to a binary mask. Values above this threshold will be considered foreground. Default is 0.8.
- Returns:
A 2D binary mask where pixels classified as foreground based on the specified channel and probability threshold are marked as True, and all other pixels are False.
- Return type:
ndarray
Examples
>>> binary_mask = get_mask_2channel_ilastik('output_from_ilastik.h5', fore_channel=1, holefill_area=500, growthcycles=2, pcut=0.5) >>> print(binary_mask.shape) (1024, 1024) # Example shape for a typical output mask
Notes
The function uses skimage’s morphological operations for hole filling and size adjustments, which are highly effective in post-processing segmentation masks.
Appropriate tuning of holefill_area, growthcycles, and pcut parameters is crucial for achieving optimal segmentation results based on the specific characteristics of the image data.
- celltraj.imageprep.get_masks(masklist, fore_channel=0, holefill_area=0, growthcycles=0, pcut=0.8)[source]
Processes a list of Ilastik prediction files to generate binary masks based on specified foreground channels and other morphological processing parameters. This function is useful for batch processing multiple segmentation outputs, applying uniform post-processing steps such as hole filling, growth cycles, and probability thresholding.
- Parameters:
masklist (
list
ofstr
) – A list of file paths to Ilastik prediction outputs (HDF5 files).fore_channel (
int
, optional) – The index of the channel in the Ilastik output that represents the foreground probabilities. Default is 0.holefill_area (
int
, optional) – The minimum area threshold for opening and closing operations to fill holes in the masks. If set to 0, no hole filling is performed. Default is 0.growthcycles (
int
, optional) – The number of dilation followed by erosion cycles applied to the masks to enhance mask boundaries. Default is 0, meaning no growth or erosion cycles are applied.pcut (
float
, optional) – The probability threshold above which a pixel is considered as foreground (mask is set to True). Default is 0.8.
- Returns:
A list of 2D binary masks where each mask corresponds to the processed output of each file in masklist. Each mask has pixels marked as True for foreground and False for background based on the provided parameters.
- Return type:
list
ofndarray
Examples
>>> mask_files = ['path/to/ilastik_output1.h5', 'path/to/ilastik_output2.h5'] >>> masks = get_masks(mask_files, fore_channel=1, holefill_area=500, growthcycles=2, pcut=0.5) >>> print(len(masks), masks[0].shape) 2, (1024, 1024) # Assuming the masks are from 1024x1024 pixel images
Notes
This function is particularly useful in large-scale image processing workflows where consistent mask processing across multiple images or conditions is required.
Ensure that all files in masklist are accessible and properly formatted as Ilastik output HDF5 files.
- Raises:
FileNotFoundError – If any file in masklist does not exist or cannot be read.
ValueError – If pcut is not between 0 and 1, or other parameter constraints are violated.
- celltraj.imageprep.get_nndist_sum(self, tshift, centers1, centers2, rcut=None)[source]
Calculates the sum of the nearest neighbor distances between two sets of points, potentially shifted by a vector, with an optional distance cutoff to consider only close points.
- Parameters:
tshift (
ndarray
) – Translation vector to apply to the first set of centers before calculating distances.centers1 (
ndarray
) – Array of coordinates for the first set of points.centers2 (
ndarray
) – Array of coordinates for the second set of points.rcut (
float
, optional) – Cutoff distance beyond which points are not considered as neighbors. If not provided, it will default to infinity, considering all points.
- Returns:
nnd – The sum of the nearest neighbor distances after considering the translation and cutoff.
- Return type:
float
Notes
This function is particularly useful in optimization problems where one needs to minimize the distance between two configurations of points subject to translations. The distance matrix calculations are optimized by only considering points within a specified cutoff.
Examples
>>> centers1 = np.array([[1, 1], [2, 2], [3, 3]]) >>> centers2 = np.array([[1, 2], [2, 3], [3, 4]]) >>> tshift = np.array([1, 1]) >>> rcut = 5 >>> nnd = get_nndist_sum(tshift, centers1, centers2, rcut) >>> print(nnd)
- celltraj.imageprep.get_pair_rdf_fromcenters(self, centers, rbins=None, nr=50, rmax=500)[source]
Calculate the radial distribution function (RDF) from a set of center points. The RDF provides a measure of the density distribution of a set of points as a function of distance.
- Parameters:
centers (
ndarray
) – Array containing the coordinates of the center points for which the RDF is to be calculated.rbins (
ndarray
, optional) – Array of radii to define the bins for RDF calculation. If None, bins are automatically generated.nr (
int
, optional) – Number of bins if rbins is not provided. Default is 50.rmax (
float
, optional) – Maximum radius to consider if rbins is not provided. Default is 500.
- Returns:
rbins (
ndarray
) – The radii at which the RDF is evaluated, corresponding to the bin edges.paircorrx (
ndarray
) – Radial distribution function values corresponding to rbins.
Notes
The radial distribution function g(r) describes how density varies as a function of distance from a reference particle, and it is typically normalized such that g(r) approaches 1 at large distances.
Examples
>>> centers = np.array([[1, 1], [2, 2], [3, 3]]) >>> rbins, rdf = get_pair_rdf_fromcenters(centers) >>> print(rbins) >>> print(rdf)
- celltraj.imageprep.get_registration_expansions(tf_matrix_set, *imgdims)[source]
Calculate the new padded dimensions for an image based on the translations found in a set of transformation matrices. Adjusts the transformation matrices to centralize the image after applying translations.
- Parameters:
tf_matrix_set (
ndarray
) – An array of transformation matrices of shape (N, D+1, D+1) where N is the number of frames and D is the number of dimensions.*imgdims (
int
) – Variable length argument list specifying the original dimensions of the images (Z, Y, X) or (Y, X).
- Returns:
A tuple containing: - tf_matrix_set : ndarray, the adjusted transformation matrices centered based on the maximum translation. - pad_dims : tuple, the new dimensions for padding the image to accommodate all translations.
- Return type:
tuple
Example
>>> import numpy as np >>> tf_matrix_set = np.array([[[1, 0, 10], [0, 1, 20], [0, 0, 1]], ... [[1, 0, -5], [0, 1, 15], [0, 0, 1]]]) >>> imgdims = (100, 200) # Y, X dimensions >>> adjusted_tf, pad_dims = get_registration_expansions(tf_matrix_set, *imgdims) >>> print(pad_dims) (105, 225)
Notes
The function automatically adjusts the translation vectors in tf_matrix_set to ensure the entire image remains visible within the new dimensions after transformation.
The calculated pad_dims is large enough to fit the original image plus the maximum translation offsets.
- celltraj.imageprep.get_registrations(imgs)[source]
Apply the pystackreg library’s StackReg algorithm to compute translations needed to register a stack of images along the Z-axis. This function assumes the stack is in the form (Z, X, Y) and uses the ‘previous’ image as a reference for registration.
- Parameters:
imgs (
ndarray
) – A 3D numpy array representing a stack of 2D images. The stack’s first dimension corresponds to the Z-axis, and each slice (X, Y) is a 2D image.- Returns:
A 2D numpy array with shape (NZ, 3), where NZ is the number of images in the stack. Each row contains three values: - the radial angle (currently unused and set to 0), - the x-translation, - the y-translation. These translations are computed to register each image with respect to the previous one in the stack.
- Return type:
ndarray
Notes
The radial angle computation is commented out in the current implementation and could be included for more complex transformations such as rotation. The function primarily outputs translations in the x and y directions as computed by the StackReg algorithm.
Example
>>> imgs = np.random.rand(10, 256, 256) # Simulated stack of 10 images >>> registrations = get_registrations(imgs) >>> print(registrations) array([[ 0. , 5.1, -3.2], [ 0. , 2.3, -1.5], ..., [ 0. , 0.2, -0.1]])
- celltraj.imageprep.get_slide_image(imgs, nrows=None, ncols=None, image_inds=None, foverlap=0.0, histnorm=True)[source]
Constructs a single composite image from a list of tiled images based on specified row and column information, overlap, and optional histogram normalization. This function is useful for reconstructing large images from smaller segmented parts, such as in digital microscopy or image stitching applications.
- Parameters:
imgs (
list
ofndarray
) – A list of 2D arrays, where each array is an image tile.nrows (
int
, optional) – The number of rows in the tiled image layout. If None, it is assumed that the tiling is square, and nrows is calculated as the square root of the number of images. Defaults to None.ncols (
int
, optional) – The number of columns in the tiled image layout. If None and nrows is also None, ncols is set to the same value as nrows, assuming a square layout. Defaults to None.image_inds (
ndarray
, optional) – A 2D array indicating the ordering of image tiles within the grid. If None, ordering is generated by get_tile_order. Defaults to None.foverlap (
float
, optional) – The fraction of overlap between adjacent images, expressed as a decimal between 0 and 1. Defaults to 0.histnorm (
bool
, optional) – If True, histogram stretching is applied to each tile before assembly to normalize contrast across the slide. Defaults to True.
- Returns:
A 2D array representing the assembled slide image from the given tiles.
- Return type:
ndarray
Examples
>>> img_list = [np.random.rand(100, 100) for _ in range(16)] >>> slide_image = get_slide_image(img_list, nrows=4, ncols=4, foverlap=0.1, histnorm=False) >>> print(slide_image.shape) (370, 370)
Notes
The function adjusts the position of each tile based on the overlap specified and stitches them together to form a larger image.
The images in imgs should be of the same dimensions. Variable dimensions across tiles may lead to unexpected results.
- Raises:
ValueError – If the dimensions of the tiles in imgs do not match or if the number of provided images does not fit the specified nrows and ncols layout.
- celltraj.imageprep.get_tile_order(nrows, ncols, snake=False)[source]
Constructs an ordering matrix for assembling image tiles, often used to arrange microscope image tiles into a single large image. This function generates a 2D array where each element represents the positional index of a tile in a grid layout. The layout can be in a standard or snaked pattern.
- Parameters:
nrows (
int
) – The number of rows in the tile grid.ncols (
int
) – The number of columns in the tile grid.snake (
bool
, optional) – If True, the order of tiles will alternate directions across rows to form a snaking pattern. Specifically, odd-numbered rows (0-indexed) will be flipped. Default is False, where tiles are ordered in a standard left-to-right, top-to-bottom pattern.
- Returns:
A 2D array of integers, with each value representing the index of a tile. The dimensions of the array are determined by nrows and ncols.
- Return type:
ndarray
Examples
>>> nrows = 3 >>> ncols = 4 >>> get_tile_order(nrows, ncols) array([[11, 10, 9, 8], [ 7, 6, 5, 4], [ 3, 2, 1, 0]])
>>> get_tile_order(nrows, ncols, snake=True) array([[11, 10, 9, 8], [ 4, 5, 6, 7], [ 3, 2, 1, 0]])
Notes
This ordering is particularly useful in scenarios where image tiles must be stitched together in a specific sequence to correctly reconstruct the original scene, such as in microscopy imaging where individual fields of view are captured in a grid pattern.
- celltraj.imageprep.get_voronoi_masks(labels, imgM=None)[source]
Generates Voronoi masks based on the centers of mass of labeled regions within an image. This function is typically used in image segmentation tasks where each label represents a distinct object or region, and the goal is to create a Voronoi diagram to partition the space among the nearest labels.
- Parameters:
labels (
ndarray
) – A 2D array where each unique non-zero integer represents a distinct labeled region.imgM (
ndarray
, optional) – A binary mask defining the foreground of the image. If None, the entire image is considered as the foreground.
- Returns:
A 2D array of the same shape as labels, where each cell contains the label of the nearest labeled region, forming Voronoi regions.
- Return type:
ndarray
Examples
>>> labels = np.array([[0, 0, 1], [0, 2, 0], [3, 0, 0]]) >>> voronoi_masks = get_voronoi_masks(labels) >>> print(voronoi_masks) [[2 2 1] [2 2 1] [3 3 3]]
Notes
The function uses Euclidean distance to determine the nearest labeled center for each pixel.
Voronoi masks are useful for delineating boundaries between adjacent regions based on proximity to their respective centers of mass.
- Raises:
Exception – Raises an exception if there is an error in calculating the centers of mass or assigning Voronoi regions.
- celltraj.imageprep.get_voronoi_masks_fromcenters(nuc_centers, imgM, selection='closest')[source]
Generates Voronoi masks from provided nucleus centers within a given image mask. The function assigns each pixel to the nearest nucleus center, creating distinct regions (Voronoi tessellation). Optionally, the user can choose to select the largest or the closest connected component within each Voronoi region as the final mask.
- Parameters:
nuc_centers (
ndarray
) – An array of nucleus center coordinates where each row represents a center (z, y, x) for 3D or (y, x) for 2D.imgM (
ndarray
) – A binary image mask defining the area within which the Voronoi tessellation is to be computed. True values indicate the region of interest where tessellation is applicable.selection (
str
, optional) – Method for selecting the final mask within each tessellated region. Options include: - ‘closest’: Selects the connected component closest to the nucleus center. - ‘largest’: Selects the largest connected component within the tessellated region. Default is ‘closest’.
- Returns:
A labeled mask with the same dimensions as imgM. Each pixel’s value corresponds to the region number it belongs to, with 0 representing background or areas outside the regions of interest.
- Return type:
ndarray
Examples
>>> nuc_centers = np.array([[10, 10], [30, 30]]) >>> imgM = np.zeros((50, 50), dtype=bool) >>> imgM[5:45, 5:45] = True # Define an area of interest >>> voronoi_masks = get_voronoi_masks_fromcenters(nuc_centers, imgM, selection='largest') >>> print(voronoi_masks.shape) (50, 50)
Notes
This function is useful in cell imaging where cells are identified by their nuclei, and each cell’s region needs to be delineated based on the proximity to these nuclei.
The Voronoi tessellation is constrained by the binary mask imgM, which means that no tessellation occurs outside the specified mask area.
- Raises:
ValueError – If the dimensions of nuc_centers do not match the dimensions of imgM or if selection is not a recognized option.
- celltraj.imageprep.histogram_stretch(img, lp=1, hp=99)[source]
Performs histogram stretching on an input array or image to enhance the contrast by scaling the pixel intensity values to the specified lower and upper percentile bounds. This method spreads out the most frequent intensity values, improving the perceptual contrast of the image.
- Parameters:
img (
ndarray
) – The input image or array to be processed. This array should contain real numbers.lp (
float
, optional) – The lower percentile to use for scaling the histogram. Default is 1, which uses the 1st percentile.hp (
float
, optional) – The upper percentile to use for scaling the histogram. Default is 99, which uses the 99th percentile.
- Returns:
The histogram stretched version of img where pixel values are scaled between the values at the lp and hp percentiles.
- Return type:
ndarray
Examples
>>> import numpy as np >>> img = np.array([50, 100, 150, 200, 250]) >>> stretched_img = histogram_stretch(img, lp=10, hp=90) >>> print(stretched_img) [ 0. 0.2 0.4 0.8 1. ]
Notes
This function is useful for enhancing features in an image that are difficult to see due to poor contrast between high and low intensity.
If the specified percentiles result in a divide-by-zero (when plow is equal to phigh), the output will contain NaNs.
- Raises:
ValueError – If lp or hp are not within the range [0, 100] or if lp is greater than hp.
- celltraj.imageprep.list_images(imagespecifier)[source]
Lists image files in a directory matching a specified pattern by executing a shell command. This function constructs a command to list files using Unix ‘ls’ based on the given pattern, which includes both the path and the file matching pattern (e.g., ‘/path/to/images/*.jpg’). It then executes this command and parses the output to return a list of file names.
- Parameters:
imagespecifier (
str
) – A string that specifies the directory and pattern to match for image files. This should be a path including a wildcard expression to match files, for example, ‘/path/to/images/*.png’.- Returns:
A list containing the names of files that match the specified pattern. If no files match, the list will be empty.
- Return type:
list
ofstr
Examples
>>> image_files = list_images('/path/to/images/*.jpg') >>> print(image_files) ['image1.jpg', 'image2.jpg', ...]
Notes
This function relies on the Unix ‘ls’ command, which makes it platform-specific and not portable to Windows without modification.
The function requires that the shell used to execute commands has access to ‘ls’, which is typical for Unix-like systems.
- Raises:
OSError – If the ‘ls’ command fails or the specified directory does not exist or cannot be accessed.
- celltraj.imageprep.load_for_viewing(fname)[source]
Loads data and optional metadata from a specified file that was saved using Python’s pickle serialization. This function is useful for retrieving saved datasets and their associated metadata for further processing or analysis.
- Parameters:
fname (
str
) – The file name or path from which the data will be loaded. If only a name is provided, it assumes the file is in the current working directory.- Returns:
Returns a list containing the data and metadata if the file is successfully loaded. Returns 1 if there was an error during the loading process.
- Return type:
list
orint
Examples
>>> datalist = load_for_viewing('example.pkl') >>> data, metadata = datalist[0], datalist[1] >>> print(metadata) {'description': 'Sample data'}
Notes
Ensure that the file specified exists and was written in the appropriate format by save_for_viewing or another function using Python’s pickle module.
This function attempts to handle exceptions gracefully and will notify the user if the load operation fails.
- Raises:
Exception – Raises an exception if the file cannot be opened, if reading the file fails, or if the data cannot be deserialized. Specific errors during the loading process are not caught explicitly but will prompt a general failure message.
- celltraj.imageprep.load_ilastik(file_ilastik)[source]
Loads pixel classification results from an HDF5 file generated by Ilastik. This function reads the dataset containing pixel predictions and returns it as a numpy array.
- Parameters:
file_ilastik (
str
) – The path to the HDF5 file containing the Ilastik prediction data. This file typically contains segmented or classified image data where each pixel is assigned a label.- Returns:
A multi-dimensional array extracted from the Ilastik HDF5 file. The shape of the array is typically 2D (for image data) extended by the number of label classes predicted by Ilastik. Each slice along the third dimension corresponds to a different label class.
- Return type:
ndarray
Examples
>>> prediction = load_ilastik('path/to/ilastik/output.h5') >>> print(prediction.shape) (1024, 1024, 3) # Example shape, indicating an image of 1024x1024 pixels and 3 label classes
Notes
The function assumes that the dataset is stored under the key ‘exported_data’ in the HDF5 file, which is the default output configuration for Ilastik predictions.
Users should ensure that the HDF5 file exists and is not corrupted before attempting to load it.
- Raises:
OSError – If the file cannot be opened, possibly due to being nonexistent or corrupted.
KeyError – If the expected dataset ‘exported_data’ is not found in the file.
- celltraj.imageprep.local_threshold(imgr, block_size=51, z_std=1.0)[source]
Applies a local thresholding algorithm to an image using adaptive threshold values computed from each pixel’s local neighborhood, adjusted by a global threshold defined as a multiple of the image’s standard deviation.
- Parameters:
imgr (
ndarray
) – The input image array for which local thresholding is to be performed. Typically, this should be a 2D grayscale image.block_size (
int
, optional) – The size of the neighborhood block used for calculating the local threshold for each pixel. This value should be an odd integer. Default is 51, which balances responsiveness to local variations with noise reduction.z_std (
float
, optional) – The standard deviation multiplier to adjust the global thresholding offset. Default is 1.0, which sets the offset to one standard deviation of the image’s intensity values.
- Returns:
A binary image of the same shape as imgr, where pixels are True if their intensity is greater than the local threshold value, otherwise False.
- Return type:
ndarray
Examples
>>> import numpy as np >>> imgr = np.random.rand(100, 100) * 255 # Create a random grayscale image >>> binary_image = local_threshold(imgr, block_size=51, z_std=1.5) >>> print(binary_image.shape) (100, 100)
Notes
Local thresholding is particularly useful in images with varying lighting conditions where global thresholding might fail.
The block_size determines the adaptability of the thresholding algorithm to local changes in lighting and should be chosen based on the specific spatial scale of features of interest.
- Raises:
ValueError – If block_size is even, as an odd-sized block is required to have a central pixel.
- celltraj.imageprep.make_odd(x)[source]
Converts a number to the nearest odd integer. If the number is even, it will be increased to the next odd number. If the number is odd, it will be returned as is.
- Parameters:
x (
float
orint
) – The number to be converted to the nearest odd integer.- Returns:
The nearest odd integer.
- Return type:
int
Examples
>>> print(make_odd(4)) 5 >>> print(make_odd(5)) 5 >>> print(make_odd(2.7)) 3
Notes
This function can be used where algorithm parameters such as kernel sizes need to be odd numbers (e.g., for median filtering or convolution operations in image processing).
The function works by rounding up to the next integer if the input is not an integer, ensuring the result is odd.
- celltraj.imageprep.organize_filelist_fov(filelist, fov_pos=None, fov_len=2)[source]
Organizes a list of image files by sorting them according to the field of view (FOV) identifier specified within each file name. This function is useful for grouping and sorting files that include a numeric FOV identifier at a known position within their names.
- Parameters:
filelist (
list
ofstr
) – A list containing file names to be organized.fov_pos (
int
, optional) – The position in the file name string where the FOV identifier begins. If not provided, the function will request this parameter explicitly.fov_len (
int
, optional) – The number of characters in the file name that make up the FOV identifier (default is 2).
- Returns:
A list of file names sorted by their FOV identifiers in ascending order.
- Return type:
list
ofstr
Examples
>>> filelist = ['image_fov01.tif', 'image_fov02.tif', 'image_fov10.tif'] >>> sorted_files = organize_filelist_fov(filelist, fov_pos=11, fov_len=2) >>> print(sorted_files) ['image_fov01.tif', 'image_fov02.tif', 'image_fov10.tif']
Notes
The function assumes that the FOV identifiers in the file names are numeric and located in a fixed position.
It is crucial to correctly specify fov_pos and fov_len to ensure files are correctly identified and sorted.
- Raises:
ValueError – If fov_pos is None, indicating the position of the FOV specifier was not set.
- celltraj.imageprep.organize_filelist_time(filelist, time_pos=None)[source]
Organizes a list of image files by sorting them based on timestamps contained in the filenames. The expected timestamp format is “??d??h??m” (e.g., “02d11h30m” for 2 days, 11 hours, and 30 minutes).
- Parameters:
filelist (
list
ofstr
) – A list containing filenames to be organized, each containing a timestamp.time_pos (
int
, optional) – The starting position in the filename where the timestamp pattern begins. If None, the function searches for a timestamp anywhere in the filename.
- Returns:
A list of filenames sorted by their timestamps in ascending order.
- Return type:
list
ofstr
Examples
>>> filelist = ['image_02d11h30m.jpg', 'image_01d05h00m.jpg', 'image_03d12h15m.jpg'] >>> sorted_files = organize_filelist_time(filelist) >>> print(sorted_files) ['image_01d05h00m.jpg', 'image_02d11h30m.jpg', 'image_03d12h15m.jpg']
Notes
The function converts each timestamp into seconds to compare and sort them effectively.
It is essential that the timestamp format strictly follows the “??d??h??m” pattern for correct processing.
- Raises:
ValueError – If no timestamp can be found in a filename, or if time_pos is provided but incorrect, leading to unsuccessful timestamp parsing.
- celltraj.imageprep.pad_image(img, *maxdims, padvalue=0)[source]
Pads an image to the specified dimensions using a constant value, with optional padding value specification. The function ensures the new image has central alignment relative to the original image dimensions.
- Parameters:
img (
ndarray
) – The image array to be padded. Can be 2D or 3D.*maxdims (
int
) – Variable length argument list specifying the target dimensions for padding. The number of dimensions provided should match the dimensionality of img.padvalue (
int
orfloat
, optional) – The value used to fill in the padding areas. Default is 0.
- Returns:
img – The padded image array, now resized to maxdims. If the dimensions of maxdims are less than or equal to the original dimensions, the image will be trimmed instead.
- Return type:
ndarray
- Raises:
ValueError – If the number of dimensions provided in maxdims does not match the dimensionality of img.
Example
>>> import numpy as np >>> img = np.array([[1, 2], [3, 4]]) >>> padded_img = pad_image(img, 4, 4) >>> print(padded_img) [[0 0 0 0] [0 1 2 0] [0 3 4 0] [0 0 0 0]]
- celltraj.imageprep.save_for_viewing(data, fname, metadata=None, overwrite=False)[source]
Saves data and optional metadata to a specified file using serialization. The function uses Python’s pickle module to serialize the data and metadata into a single file. It allows for optional overwriting of existing files.
- Parameters:
data (
any serializable object
) – The primary data to be saved. This can be any object that pickle can serialize.fname (
str
) – The file name or path where the data will be saved. If only a name is provided, the file will be saved in the current working directory.metadata (
dict
, optional) – Additional metadata to be saved along with the main data. This should be a dictionary containing the metadata.overwrite (
bool
, optional) – If True, will overwrite the existing file without any warnings. If False, the function will not overwrite an existing file and will return 1 if the file already exists.
- Returns:
Returns 0 if the file was successfully saved. Returns 1 if the file already exists and overwrite is False.
- Return type:
int
Examples
>>> data = {'a': 1, 'b': 2} >>> metadata = {'description': 'Sample data'} >>> save_for_viewing(data, 'example.pkl', metadata=metadata) 0
Notes
The function is particularly useful for saving intermediate processing stages in data analysis pipelines where both data and contextual metadata are important.
Care should be taken with the overwrite parameter to avoid unintentional data loss.
- Raises:
Exception – Raises an exception if there are issues during the file opening or writing process not related to overwriting existing files.
- celltraj.imageprep.save_frame_h5(filename, frame, img=None, msks=None, fmsk=None, features=None, overwrite=False, timestamp=None)[source]
Saves data related to a specific frame into an HDF5 file. This function can handle images, masks, foreground masks, and features. Each type of data is saved into a distinct dataset within the file. Existing data can be overwritten if specified.
- Parameters:
filename (
str
) – The name of the HDF5 file to which the data will be saved.frame (
int
) – The frame number associated with the data to be saved.img (
ndarray
, optional) – The image data to save. If provided, it will be saved under ‘/images/img_<frame>/image’.msks (
ndarray
, optional) – The mask data to save. If provided, it will be saved under ‘/images/img_<frame>/mask’.fmsk (
ndarray
, optional) – The foreground mask data to save. If provided, it will be saved under ‘/images/img_<frame>/fmsk’.features (
ndarray
, optional) – The features data to save. If provided, it will be saved under ‘/images/img_<frame>/features’.overwrite (
bool
, optional) – Whether to overwrite existing datasets. If False and a dataset exists, it will not overwrite and will print a message.timestamp (
float
, optional) – The timestamp to associate with the data. If not provided, the frame number is used as the timestamp.
Examples
>>> img_data = np.random.rand(256, 256) >>> mask_data = np.random.randint(0, 2, (256, 256)) >>> save_frame_h5('example.h5', frame=1, img=img_data, msks=mask_data)
Notes
This function opens the HDF5 file in append mode (‘a’), which allows adding new data without deleting existing data. Each type of data is stored in a specific dataset structured as ‘/images/img_<frame>/<datatype>’. If overwrite is True and the dataset already exists, it will delete the old dataset before creating a new one.
- celltraj.imageprep.transform_image(img, tf_matrix, inverse_tform=False, pad_dims=None, **ndimage_args)[source]
Applies a geometric transformation to an image using a specified transformation matrix. It can handle 2D and 3D transformations, supports padding, and can invert the transformation.
- Parameters:
img (
ndarray
) – The image to be transformed. Can be 2D or 3D.tf_matrix (
ndarray
) – The transformation matrix, which must be either 3x3 for 2D transformations or 4x4 for 3D.inverse_tform (
bool
, optional) – If True, the inverse of the transformation matrix is applied. Default is False.pad_dims (
tuple
, optional) – Dimensions for padding the image before applying the transformation. Expected format is (pad_before, pad_after) for each axis.**ndimage_args (
dict
) – Additional keyword arguments passed to scipy.ndimage.affine_transform.
- Returns:
img_tf – The transformed image, with the same data type as the input image.
- Return type:
ndarray
- Raises:
ValueError – If an invalid transformation matrix is provided or if the image array is flat (1D).
Example
>>> import numpy as np >>> img = np.random.rand(100, 100) >>> tf_matrix = np.array([[1, 0, 10], [0, 1, -10], [0, 0, 1]]) >>> transformed_img = transform_image(img, tf_matrix)
- celltraj.imageprep.znorm(img)[source]
Performs variance normalization (Z-normalization) on an input array or image, scaling it by its mean and standard deviation to achieve a mean of zero and a standard deviation of one.
- Parameters:
img (
ndarray
) – The input array or image to be normalized. The input should be a real array where operations such as mean and standard deviation can be computed.- Returns:
The Z-normalized version of img where each element has been scaled by the mean and standard deviation of the original array.
- Return type:
ndarray
Examples
>>> import numpy as np >>> img = np.array([1, 2, 3, 4, 5]) >>> normalized_img = znorm(img) >>> print(normalized_img) [-1.41421356 -0.70710678 0. 0.70710678 1.41421356]
Notes
This function handles NaN values in the input by ignoring them in the computation of the mean and standard deviation, which prevents NaN propagation but assumes NaNs are missing data points.
- Raises:
ZeroDivisionError – If the standard deviation of the input array is zero.
celltraj.model
Modeling functions for single-cell trajectories.
- celltraj.model.clean_clusters(clusters, P)[source]
Clean clusters by removing isolated clusters based on connectivity in a transition probability matrix.
This function identifies the largest connected component in the cluster transition graph and retains only the clusters that are part of this component. This is used to filter out clusters that are not well connected to the main body of data, potentially representing outliers or noise.
- Parameters:
clusters (
object
) – A clustering object with an attribute clustercenters which is an ndarray where each row represents the center of a cluster.P (
ndarray
) – A transition probability matrix where P[i, j] represents the probability of transitioning from cluster i to cluster j.
- Returns:
A clustering object similar to the input but with cluster centers filtered to only include those in the largest connected component of the transition graph.
- Return type:
object
Examples
>>> from sklearn.cluster import KMeans >>> from scipy.sparse import csr_matrix >>> x = np.random.rand(100, 5) >>> clusters = KMeans(n_clusters=10).fit(x) >>> P = (np.random.rand(10, 10) > 0.8).astype(float) # Random transition matrix >>> cleaned_clusters = clean_clusters(clusters, P) >>> print(cleaned_clusters.clustercenters.shape) (n, 5) # Where n is the number of clusters in the largest connected component
- celltraj.model.get_H_eigs(Mt)[source]
Calculates the eigenvalues and eigenvectors of the Hermitian matrix formed from a given Markov transition matrix.
The function constructs a Hermitian matrix, H, by symmetrizing the input matrix Mt and computes its eigenvalues and eigenvectors. The Hermitian matrix is constructed as H = 0.5 * (Mt + Mt.T) + 0.5j * (Mt - Mt.T), where Mt.T is the transpose of Mt.
- Parameters:
Mt (
ndarray
) – A square numpy array representing a Markov transition matrix from which the Hermitian matrix H is derived.- Returns:
w (
ndarray
) – An array of real eigenvalues of the Hermitian matrix, sorted in ascending order.v (
ndarray
) – An array of the corresponding eigenvectors, where each column corresponds to an eigenvalue in w.
Examples
>>> Mt = np.array([[0.8, 0.2], [0.4, 0.6]]) >>> eigenvalues, eigenvectors = get_H_eigs(Mt) >>> print("Eigenvalues:", eigenvalues) >>> print("Eigenvectors:", eigenvectors)
Notes
The function is designed to work with stochastic matrices, such as those used in Markov models, providing an alternative matrix decomposition with real eigenvalues and unambiguous sorting of components.
- celltraj.model.get_avdx_clusters(x_clusters, Mt)[source]
Calculates the average directional changes between clusters weighted by transition probabilities, based on cluster embeddings and a transition matrix. The result captures the average directional movement expected from one cluster to another.
- Parameters:
x_clusters (
ndarray
) – A 2D array containing the embedded coordinates of each cluster. Each row corresponds to a cluster and the columns to the coordinates in the reduced space.Mt (
ndarray
) – A 2D array (transition matrix) where each element (i, j) represents the probability of transitioning from cluster i to cluster j.
- Returns:
dx_clusters – A 2D array where each row represents a cluster and the columns contain the sum of weighted directional changes to all other clusters, indicating the net direction and magnitude of transitions for each cluster.
- Return type:
ndarray
Examples
>>> x_clusters = np.array([[1, 2], [3, 4], [5, 6]]) # Example coordinates of clusters >>> Mt = np.array([[0.1, 0.2, 0.7], [0.3, 0.4, 0.3], [0.2, 0.3, 0.5]]) # Example transition matrix >>> dx_clusters = get_avdx_clusters(x_clusters, Mt) >>> print(dx_clusters)
Notes
The function is useful in analyzing the overall directional dynamics of a system where clusters represent different states or configurations, and the transition matrix describes the likelihood of transitions between these states.
This function assumes the transition matrix is properly normalized such that each row sums to one.
- celltraj.model.get_committor(Tmatrix, indTargets, indSource, conv=0.001, verbose=False)[source]
Computes the committor probabilities for a Markov state model, which represent the probability of reaching a set of target states before returning to any source state.
- Parameters:
Tmatrix (
ndarray
) – A 2D array representing the transition probability matrix of the Markov state model, where Tmatrix[i, j] is the probability of transitioning from state i to state j.indTargets (
array-like
) – An array of indices representing the target states, i.e., the states to which the committor probabilities are calculated.indSource (
array-like
) – An array of indices representing the source states, which are treated as absorbing states for the calculation of committor probabilities.conv (
float
, optional) – The convergence threshold for the iterative solution of the committor probabilities. The iteration stops when the change in probabilities between successive iterations is below this threshold.
- Returns:
q – An array of committor probabilities, where each entry q[i] gives the probability of reaching any of the target states before any of the source states, starting from state i.
- Return type:
ndarray
Examples
>>> Tmatrix = np.array([[0.8, 0.2], [0.1, 0.9]]) >>> indTargets = [1] >>> indSource = [0] >>> committor_probabilities = get_committor(Tmatrix, indTargets, indSource) >>> print(committor_probabilities)
Notes
This function modifies the transition matrix to make the source states absorbing and sets the target states to have a committor probability of 1.
The algorithm iteratively updates the committor probabilities until changes between iterations are less than the specified convergence threshold.
It is essential that the transition matrix is stochastic, and the sum of probabilities from each state equals 1.
- celltraj.model.get_gaussianKernelM(X, Y, M, h)[source]
Computes a Gaussian kernel matrix scaled by a Mahalanobis distance between two observation matrices X and Y. Each element of the kernel matrix represents the Gaussian kernel between samples from X and Y with scaling matrix M and bandwidths h.
- Parameters:
X (
ndarray
) – Observation matrix at time t, where each row is a sample and each column is a feature.Y (
ndarray
) – Observation matrix at time t+1, similar in structure to X.M (
ndarray
) – Mahalanobis scaling matrix, a square matrix of dimensions equal to the number of features in X and Y, used to scale the features for the distance calculation.h (
ndarray
) – A vector of sigma scalings for the Gaussian kernel, typically computed using get_kernel_sigmas. The length of h should match the number of features in X and Y.
- Returns:
k – A matrix of dimensions (n_samples_X, n_samples_Y) where each element [i, j] is the Gaussian kernel value between the i-th sample of X and the j-th sample of Y, scaled according to M and h.
- Return type:
ndarray
Examples
>>> X = np.random.rand(5, 3) >>> Y = np.random.rand(6, 3) >>> M = np.eye(3) >>> h = np.array([1.0, 1.0, 1.0]) >>> K = get_gaussianKernelM(X, Y, M, h) >>> print(K.shape) (5, 6)
Notes
The function applies a Mahalanobis transformation to X and Y before computing the Euclidean distance for the Gaussian kernel. This accounts for the correlation between different features and adjusts distances accordingly. This is particularly useful in multivariate data analysis where feature scaling and normalization are critical.
- celltraj.model.get_kernel_sigmas(X, M, s=0.05, vector_sigma=True)[source]
Computes a vector of bandwidths (sigmas) for each feature in the observation matrix X, scaled by a Mahalanobis matrix M, which are used to scale observations in a kernel.
- Parameters:
X (
ndarray
) – Observation matrix where each row is a sample and each column is a feature.M (
ndarray
) – Mahalanobis scaling matrix, which is a square matrix of dimension equal to the number of features in X.s (
float
, optional) – Bandwidth scaling factor, by default 0.05.vector_sigma (
bool
, optional) – If True, returns a vector of sigmas for each feature; otherwise, returns a single sigma based on the aggregate statistics.
- Returns:
h – If vector_sigma is True, returns an array of bandwidths (sigmas) for each feature, otherwise a single float value representing the overall bandwidth.
- Return type:
ndarray
Examples
>>> X = np.array([[1, 2], [3, 4], [5, 6]]) >>> M = np.eye(2) >>> sigmas = get_kernel_sigmas(X, M) >>> print(sigmas) [value1, value2] # Example output; actual values will depend on input data and parameters.
Notes
The function utilizes the Mahalanobis distance to adjust the typical Euclidean distance measure, taking into account the covariance among different features, thus scaling the input features in a way that reflects their statistical properties.
- celltraj.model.get_kineticstates(vkin, nstates_final, nstates_initial=None, pcut_final=0.01, seed=0, max_states=100, return_nstates_initial=False, cluster_ninit=10)[source]
Determines kinetic states from dynamical motifs using an iterative k-means clustering approach, aiming to find a specified number of states with sufficient representation. This function attempts to find a user-specified number of final kinetic states (nstates_final) by iteratively applying k-means clustering and increasing the number of clusters until the desired number of states with a probability above a certain threshold (pcut_final) is achieved or the maximum limit of states (max_states) is reached. It refines the clustering by merging less probable states into their nearest more probable states.
- Parameters:
vkin (
ndarray
) – A 2D array of dynamical motifs, where each row corresponds to a sample and columns correspond to features.nstates_final (
int
) – The desired number of final states to achieve with sufficient sample representation.nstates_initial (
int
, optional) – The initial number of states to start clustering. If None, it is set equal to nstates_final.pcut_final (
float
, optional) – The probability cutoff to consider a state as sufficiently populated. States below this cutoff are considered sparsely populated and are merged.seed (
int
, optional) – Seed for random number generator for reproducibility of k-means clustering.max_states (
int
, optional) – The maximum number of states to try before stopping the clustering process.return_nstates_initial (
bool
, optional) – If True, returns the number of initial states along with the state labels.cluster_ninit (
int
, optional) – The number of times the k-means algorithm will be run with different centroid seeds.
- Returns:
stateSet (
ndarray
) – An array of state labels for each sample in vkin.nstates_initial (
int
, optional) – The initial number of states tried, returned only if return_nstates_initial is True.
Examples
>>> vkin = np.random.rand(100, 10) # Randomly generated dynamical motifs >>> states = get_kineticstates(vkin, 5, seed=42, pcut_final=0.05, max_states=50) >>> print(states)
Notes
The function ensures that all final states have a probability greater than pcut_final by merging underpopulated states into their nearest populated neighbors.
The process is stochastic due to the initialization of k-means; thus, setting a seed can help in achieving reproducible results.
- celltraj.model.get_koopman_eig(X, Y, M=None, s=0.05, bta=1e-05, h=None, psi_X=None, psi_Y=None)[source]
Computes the Koopman operator and its eigendecomposition, which describes the evolution of observables in a dynamical system. This method utilizes a kernel-based approach to approximate the forward map F(X) = Y using observations X and Y.
- Parameters:
X (
ndarray
) – Observation matrix at initial time, with samples as rows and features as columns.Y (
ndarray
) – Observation matrix at a subsequent time, aligned with X.M (
ndarray
, optional) – Mahalanobis scaling matrix for distance computation in feature space. If None, the identity matrix is used.s (
float
, optional) – Scaling factor for the bandwidth of the Gaussian kernel used in the computations.bta (
float
, optional) – Regularization parameter for the least-squares solution to stabilize the inversion.h (
ndarray
, optional) – Bandwidths for the Gaussian kernel. If None, they are computed internally using the scaling factor s.psi_X (
ndarray
, optional) – Precomputed Gaussian kernel matrix for X. If None, it is computed within the function.psi_Y (
ndarray
, optional) – Precomputed Gaussian kernel matrix for the transformation of X to Y. If None, it is computed within the function.
- Returns:
K (
ndarray
) – Approximated Koopman operator matrix, which is the linear transformation matrix in the lifted space.Xi (
ndarray
) – Left eigenvectors of the Koopman operator.Lam (
ndarray
) – Eigenvalues (diagonal matrix) of the Koopman operator, representing the dynamics’ temporal evolution.W (
ndarray
) – Right eigenvectors of the Koopman operator.
Examples
>>> X = np.random.normal(size=(100, 3)) >>> Y = X + 0.1 * np.random.normal(size=(100, 3)) >>> K, Xi, Lam, W = get_koopman_eig(X, Y, s=0.1, bta=1e-4)
Notes
The computation involves: - Constructing kernel matrices for X and Y using a Gaussian kernel with Mahalanobis distance scaling. - Solving a regularized linear system to find the Koopman operator. - Performing eigendecomposition on the Koopman operator to extract its spectral properties, which reveal
the dynamics of the underlying system.
- celltraj.model.get_koopman_inference_multiple(starts, steps, phi_X, V, Lam, nmodes=2)[source]
Predicts future states of observables using the Koopman operator framework over multiple starting indices and time steps.
This function uses the precomputed Koopman eigenfunctions, modes, and eigenvalues to propagate an initial state through the dynamical system defined by the Koopman operator. The prediction considers a set of initial points and performs the evolution for a specified number of time steps.
- Parameters:
starts (
ndarray
) – Array of indices specifying the starting points for the predictions. Shape should be (n_starts,).steps (
int
) – Number of future time steps to predict.phi_X (
ndarray
) – Koopman eigenfunctions, with shape (samples, samples).V (
ndarray
) – Koopman modes of the observables, with shape (observables, samples).Lam (
ndarray
) – Diagonal matrix of Koopman eigenvalues, with shape (samples, samples).nmodes (
int
orarray_like
, optional) – Number of modes to include in the prediction or indices of specific modes to use. Default is 2.
- Returns:
X_pred – Predicted values of the observables for each start index and each time step, with shape (n_starts, steps, observables).
- Return type:
ndarray
Examples
>>> starts = np.array([10, 20, 30]) # Example starting indices >>> steps = 5 # Predict 5 steps into the future >>> predictions = get_koopman_inference_multiple(starts, steps, phi_X, V, Lam, nmodes=3) >>> print(predictions.shape) (3, 5, number_of_observables)
Notes
The function assumes that phi_X, V, and Lam are derived from the same Koopman analysis and are consistent in dimensions.
The evolution is that of an ensemble of identical systems initiated from the same starting point.
- celltraj.model.get_koopman_modes(psi_X, Xi, W, X_obs, bta=1e-05)[source]
Computes the Koopman modes for specified observables using the Koopman operator’s eigendecomposition. Koopman modes represent the spatial structures associated with the dynamics captured by the Koopman eigenfunctions.
- Parameters:
psi_X (
ndarray
) – The kernel matrix corresponding to the data, usually derived from the Gaussian kernel of the observation matrix. Shape should be (samples, samples).Xi (
ndarray
) – Right eigenvectors of the Koopman operator matrix. Shape should be (samples, samples).W (
ndarray
) – Left eigenvectors of the Koopman operator matrix. Shape should be (samples, samples).X_obs (
ndarray
) – Observables of interest corresponding to the observations. These could be the same as the original observations or some function/feature of them. Shape should be (samples, observables).bta (
float
, optional) – Regularization parameter for the least-squares problem, default is 1.e-5.
- Returns:
phi_X (
ndarray
) – Koopman eigenfunctions, computed as the product of the kernel matrix and the right eigenvectors. Shape is (samples, samples).V (
ndarray
) – Koopman modes of the observables, indicating how each mode contributes to the observables. Shape is (observables, samples).
Examples
>>> psi_X = get_gaussianKernelM(X, X, M, h) >>> K, Xi, Lam, W = get_koopman_eig(X, Y) >>> phi_X, V = get_koopman_modes(psi_X, Xi, W, X)
Notes
The function solves a regularized linear system to stabilize the inversion when calculating the Koopman modes. The modes are useful for understanding complex dynamics in the data, capturing the essential patterns associated with changes in observables.
- celltraj.model.get_kscore(Mt, eps=0.001)[source]
Calculates the k-score for a given transition matrix. The k-score measures the kinetic separability of states within the transition matrix, which is derived from the eigenvalues of the matrix. It provides an indication of how well-separated the dynamics of the system are, based on the time it takes to reach equilibrium from non-equilibrium states.
- Parameters:
Mt (
ndarray
) – The transition matrix, which should be square and represent the probability of transitioning from one state to another.eps (
float
, optional) – A small threshold to determine the relevance of eigenvalues close to 1 (default is 1.e-3).
- Returns:
The calculated k-score, which quantifies the kinetic separability of states in the transition matrix. If the eigenvalues are such that no significant non-equilibrium dynamics are detected, it returns np.nan.
- Return type:
float
Notes
The eigenvalues are used to calculate the time constants associated with the decay modes of the system.
Only the modes with eigenvalues less than 1 and significantly different from 1 (as determined by eps) are considered. - Eigenvalues exactly equal to 1 correspond to steady-state or equilibrium conditions and are excluded from the k-score calculation. - A higher k-score indicates that the system has more slow modes and hence more kinetic separability.
Examples
>>> Mt = np.array([[0.9, 0.1], [0.05, 0.95]]) # Example transition matrix >>> kscore = get_kscore(Mt) >>> print(f"K-score: {kscore:.2f}")
- celltraj.model.get_landscape_coords_umap(vkin, **embedding_args)[source]
Just a wrapper for UMAP.
- Parameters:
vkin (
ndarray
) – A 2D array where each row contains dynamical motifs or any other high-dimensional data. Each row is treated as an individual data point.embedding_args (
dict
, optional) – Additional keyword arguments to pass to the UMAP constructor, allowing customization of the UMAP behavior (e.g., n_neighbors, min_dist).
- Returns:
x_clusters – A 2D array with two columns, representing the 2D embedded coordinates of the input data obtained via UMAP.
- Return type:
ndarray
Examples
>>> v = np.array([[1, 2], [3, 4], [5, 6]]) >>> x_clusters = get_landscape_coords_umap(v, min_dist=0.1) >>> print(x_clusters)
Notes
UMAP is a powerful method for embedding high-dimensional data into a lower-dimensional space, preserving both local and global structure of the data.
The flexibility to specify additional parameters allows for tuning the algorithm based on specific dataset characteristics or analysis requirements.
- celltraj.model.get_motifs(v, ncomp, w=None)[source]
Extracts and scales the last ncomp components of complex eigenvectors from a given set of eigenvectors, optionally weighted by given weights, eigenvalues can be used as weights for a kinetic scaling.
- Parameters:
v (
ndarray
) – A 2D array containing eigenvectors where each column represents an eigenvector. The array can be complex-valued.ncomp (
int
) – The number of components from the end of each eigenvector to process.w (
ndarray
, optional) – A 1D array of weights to scale the components of the eigenvectors. If not provided, the components are processed without scaling.
- Returns:
vkin – A 2D array where each row represents the concatenated scaled real and imaginary parts of the last ncomp components of the eigenvectors from v.
- Return type:
ndarray
Examples
>>> v = np.array([[1+1j, 2+2j, 3+3j], [4+4j, 5+5j, 6+6j]]) >>> ncomp = 2 >>> weights = np.array([0.5, 1.5]) >>> motifs = get_motifs(v, ncomp, weights) >>> print(motifs)
Notes
The function is useful for to describe or classify a complex system based upon its dynamics as described by a stochastic matrix yielding H-eigs stored as columns in v.
- celltraj.model.get_path_entropy_2point(x0, x1, Mt, clusters=None, exclude_stays=False)[source]
Calculates the entropy of transitions between states over a single step for a set of trajectories, using a given transition matrix. The entropy is calculated based on the negative logarithm of the transition probabilities.
- Parameters:
x0 (
array_like
) – The initial states of the trajectories.x1 (
array_like
) – The final states of the trajectories after one transition.Mt (
ndarray
) – A square matrix representing the transition probabilities between states. The element Mt[i, j] is the probability of transitioning from state i to state j.clusters (
Clustering object
, optional) – A clustering object (e.g., from scikit-learn) that can assign states to x0 and x1 data points. If None, x0 and x1 are assumed to be already in the form of state indices (default: None).exclude_stays (
bool
, optional) – If True, transitions where the state does not change (indc1[itraj] == indc0[itraj]) are excluded from the entropy calculation (default: False).
- Returns:
The calculated entropy value for the transitions in the trajectories. Returns np.nan if the calculation fails due to empty arrays or other errors.
- Return type:
float
- Raises:
ValueError – If x0 and x1 have different lengths, or if Mt is not a square matrix.
Examples
>>> x0 = np.array([0, 1, 1, 2]) >>> x1 = np.array([1, 1, 2, 0]) >>> Mt = np.array([[0.1, 0.9, 0], [0.5, 0.5, 0], [0.3, 0, 0.7]]) >>> entropy = get_path_entropy_2point(x0, x1, Mt) >>> print(f"Calculated entropy: {entropy:.2f}")
Notes
The function assumes that Mt is properly normalized such that each row sums to 1.
Entropy is a measure of uncertainty or randomness. In this context, it quantifies the unpredictability
in the transitions between states.
- celltraj.model.get_path_ll_2point(self, x0, x1, exclude_stays=False)[source]
Calculates the log-likelihood of observing specific transitions between states over one step for a set of trajectories, using a provided transition matrix. The log-likelihood is computed as the logarithm of transition probabilities.
- Parameters:
x0 (
array_like
) – The initial states of the trajectories, assumed to be indices corresponding to the rows in the transition matrix.x1 (
array_like
) – The final states of the trajectories after one transition, assumed to be indices corresponding to the columns in the transition matrix.exclude_stays (
bool
, optional) – If True, transitions where the state does not change (where indc1[itraj] == indc0[itraj]) are excluded from the log-likelihood calculation (default: False).
- Returns:
The calculated log-likelihood value for the observed transitions in the trajectories. Returns np.nan if the calculation fails due to empty arrays or other errors.
- Return type:
float
- Raises:
ValueError – If x0 and x1 have different lengths or if the transition probabilities cannot be computed because Mt is not correctly set in the scope of this function.
Examples
>>> x0 = np.array([0, 1, 1, 2]) >>> x1 = np.array([1, 1, 2, 0]) >>> Mt = np.array([[0.1, 0.9, 0], [0.5, 0.5, 0], [0.3, 0, 0.7]]) # Example transition matrix >>> log_likelihood = get_path_ll_2point(x0, x1) >>> print(f"Calculated log likelihood: {log_likelihood:.2f}")
Notes
The function assumes that the transition matrix Mt is correctly normalized such that each row sums to 1.
The log-likelihood measure provides insights into the predictability of the transitions, with higher values indicating
more predictable transitions based on the model’s transition probabilities. - The function clusters.assign must be correctly defined to map data points x0 and x1 to state indices used in Mt.
- celltraj.model.get_steady_state_matrixpowers(Tmatrix, conv=0.001)[source]
Computes the steady-state distribution of a Markov chain by repeatedly multiplying the transition matrix by itself and averaging the rows until convergence.
- Parameters:
Tmatrix (
ndarray
) – A 2D array representing the transition matrix of the Markov chain, where Tmatrix[i, j] is the probability of transitioning from state i to state j.conv (
float
, optional) – The convergence threshold for the iterative solution. The iteration stops when the change in the steady-state distribution between successive iterations is below this threshold.
- Returns:
pSS – An array representing the steady-state distribution, where pSS[i] is the long-term probability of being in state i.
- Return type:
ndarray
Examples
>>> Tmatrix = np.array([[0.1, 0.9], [0.5, 0.5]]) >>> steady_state_distribution = get_steady_state_matrixpowers(Tmatrix) >>> print(steady_state_distribution)
Notes
This function uses a matrix power method, where the transition matrix is repeatedly squared to accelerate convergence to the steady state.
The convergence is checked every 10 iterations, comparing the average of the resulting matrix’s rows to the average from the previous iteration.
If the maximum number of iterations (max_iters) is reached without achieving the desired convergence, the last computed distribution is returned.
Ensure that the transition matrix is stochastic (rows sum to 1) and ergodic to guarantee convergence.
- Raises:
ValueError – If Tmatrix is not a square matrix or if any rows sum to more than 1.
- celltraj.model.get_traj_ll_gmean(self, xt, exclude_stays=False, states=None)[source]
Calculates the geometric mean of the log-likelihoods for the transitions of trajectories based on their assignments to clusters and a transition matrix.
- Parameters:
xt (
ndarray
) – An array of trajectories’ data points or features from which states are derived.exclude_stays (
bool
, optional) – If True, transitions where the state does not change (stays in the same state) are excluded from the calculation.states (
ndarray
, optional) – An array indicating the state assignment for each data point in xt. If None, states are assumed to be a sequence from 0 to Mt.shape[0] - 1.
- Returns:
The geometric mean of the log-likelihoods of transitions between states. Returns np.nan if the calculation fails due to empty input arrays or other computational issues.
- Return type:
float
- Raises:
IndexError – If the length of states does not match the expected size based on Mt.
Notes
The log-likelihood for each transition is taken from a Markov transition matrix Mt, which must
be accessible within the method’s scope. - This function is particularly useful for analyzing the stability or persistence of states in Markovian models of dynamic systems.
Examples
>>> xt = np.random.rand(100, 10) # Example trajectory data >>> states = np.random.randint(0, 5, size=100) # Random state assignments >>> traj_ll_mean = model.get_traj_ll_gmean(xt, states=states) >>> print(f"Geometric mean of log-likelihoods: {traj_ll_mean:.4f}")
- celltraj.model.get_transition_matrix(x0, x1, clusters)[source]
Calculate the transition matrix from the cluster assignments of two consecutive time points.
This function computes a transition matrix that represents the probabilities of transitions between clusters from one state (x0) to the next (x1). Each element of the matrix indicates the probability of a cell transitioning from a cluster at time t (represented by x0) to another cluster at time t+1 (represented by x1).
- Parameters:
x0 (
ndarray
) – The dataset representing the state of each cell at time t, where each row is a cell and its columns are features (e.g., gene expression levels, morphological features).x1 (
ndarray
) – The dataset representing the state of each cell at time t+1, with the same structure as x0.clusters (
object
) – A clustering object which must have a clustercenters attribute representing the centers of each cluster and an assign method to assign each instance in x0 and x1 to a cluster. This object typically comes from a clustering library or a custom implementation that supports these functionalities.
- Returns:
A 2D numpy array where element (i, j) represents the probability of transitioning from cluster i at time t to cluster j at time t+1.
- Return type:
ndarray
Examples
>>> from sklearn.cluster import KMeans >>> x0 = np.random.rand(100, 5) >>> x1 = np.random.rand(100, 5) >>> clusters = KMeans(n_clusters=5) >>> clusters.fit(np.vstack((x0, x1))) # Fitting on the combined dataset >>> transition_matrix = get_transition_matrix(x0, x1, clusters) >>> print(transition_matrix.shape) (5, 5)
- celltraj.model.get_transition_matrix_CG(x0, x1, clusters, states)[source]
Calculate the coarse-grained transition matrix from the cluster assignments of two consecutive time points, considering predefined states.
This function constructs a transition matrix based on states defined by cluster assignments in x0 and x1. It counts transitions between these states to calculate probabilities, allowing for analysis of more abstracted dynamics than direct cluster-to-cluster transitions.
- Parameters:
x0 (
ndarray
) – The dataset representing the state of each cell at time t, where each row is a cell and its columns are features (e.g., gene expression levels, morphological features).x1 (
ndarray
) – The dataset representing the state of each cell at time t+1, with the same structure as x0.clusters (
object
) – A clustering object with clustercenters attribute representing the centers of each cluster and an assign method to map instances in x0 and x1 to a cluster index.states (
ndarray
) – An array where each element is a state assignment for the corresponding cluster index, providing a mapping from cluster index to a higher-level state.
- Returns:
A 2D numpy array where element (i, j) represents the probability of transitioning from state i at time t to state j at time t+1.
- Return type:
ndarray
Examples
>>> from sklearn.cluster import KMeans >>> x0 = np.random.rand(100, 5) >>> x1 = np.random.rand(100, 5) >>> clusters = KMeans(n_clusters=5) >>> clusters.fit(np.vstack((x0, x1))) # Fitting on the combined dataset >>> states = np.array([0, 1, 2, 2, 1]) # Coarse-graining clusters into states >>> transition_matrix = get_transition_matrix_CG(x0, x1, clusters, states) >>> print(transition_matrix.shape) (3, 3) # Assuming states are labeled from 0 to 2
- celltraj.model.update_mahalanobis_matrix_J(Mprev, X, Xi, V, lam, h=None, s=0.05)[source]
Update the Mahalanobis matrix based on Koopman operator analysis, using the eigenfunctions and eigenvalues derived from the Koopman operator. This update aims to tune the kernel for better feature scaling in further analyses.
- Parameters:
Mprev (
ndarray
) – The previous Mahalanobis matrix, with shape (features, features), used for scaling the input data.X (
ndarray
) – The observation matrix with shape (samples, features).Xi (
ndarray
) – Right eigenvectors of the Koopman operator, with shape (samples, samples).V (
ndarray
) – Left eigenvectors of the Koopman operator, with shape (samples, samples).lam (
ndarray
) – Eigenvalues of the Koopman operator, arranged in a diagonal matrix with shape (samples, samples).h (
ndarray
, optional) – Vector of sigma scalings for the Gaussian kernel; if not provided, it will be computed inside the function.s (
float
, optional) – Scaling factor for kernel bandwidth, default is 0.05.
- Returns:
M – The updated Mahalanobis matrix, used for scaling the input data in the kernel.
- Return type:
ndarray
Notes
The function computes an updated Mahalanobis matrix by evaluating the gradients of the Koopman eigenfunctions. These gradients are used to compute fluxes in the eigenspace, which are then used to adjust the Mahalanobis matrix to ensure that the observed flux is isotropic in all dimensions.
- celltraj.model.update_mahalanobis_matrix_J_old(Mprev, X, phi_X, V, Lam, h=None, s=0.05)[source]
Update estimation of mahalanobis matrix for kernel tuning
- Parameters:
Mprev (
ndarray
,features x features
) – Koopman eigenfunctionsX (
ndarray
) – samples by featuresphi_X (
ndarray
) – Koopman eigenfunctions, samples x samples
- Returns:
M – updated mahalanobis matrix using Koopman eigenfunction gradients
- Return type:
ndarray
- celltraj.model.update_mahalanobis_matrix_flux(Mprev, X, phi_X, V, Lam, h=None, s=0.05)[source]
Update estimation of mahalanobis matrix for kernel tuning
- Parameters:
Mprev (
ndarray
,features x features
) – Koopman eigenfunctionsX (
ndarray
) – samples by featuresphi_X (
ndarray
) – Koopman eigenfunctions, samples x samples
- Returns:
M – updated mahalanobis matrix using Koopman eigenfunction gradients
- Return type:
ndarray
- celltraj.model.update_mahalanobis_matrix_grad(Mprev, X, phi_X, h=None, s=0.05)[source]
Update estimation of mahalanobis matrix for kernel tuning
- Parameters:
Mprev (
ndarray
,features x features
) – Koopman eigenfunctionsX (
ndarray
) – samples by featuresphi_X (
ndarray
) – Koopman eigenfunctions, samples x samples
- Returns:
M – updated mahalanobis matrix using Koopman eigenfunction gradients
- Return type:
ndarray
celltraj.spatial
Modeling functions for boundary-resolved and physics-based single-cell modeling.
- celltraj.spatial.constrain_volume(border_dict, target_vols, exclude_states=None, **volconstraint_args)[source]
Adjusts the positions of boundary points to achieve target volumes for different regions.
This function iterates through different regions identified by their indices and adjusts the boundary points to match specified target volumes. The adjustments are performed using the get_volconstraint_com function, which modifies the boundary points to achieve the desired volume.
- Parameters:
border_dict (
dict
) – A dictionary containing boundary information, typically with keys: - ‘pts’: ndarray of shape (N, 3), coordinates of the boundary points. - ‘index’: ndarray of shape (N,), indices identifying the region each point belongs to. - ‘n’: ndarray of shape (N, 3), normals at the boundary points.target_vols (
dict
orndarray
) – A dictionary or array where each key or index corresponds to a region index, and the value is the target volume for that region.exclude_states (
array-like
, optional) – States to be excluded from volume adjustment. If not provided, all states will be adjusted. Default is None.**volconstraint_args (
dict
, optional) – Additional arguments to pass to the get_volconstraint_com function, such as maximum iterations or convergence criteria.
- Returns:
border_pts_c – An array of shape (N, 3) representing the adjusted coordinates of the boundary points.
- Return type:
ndarray
Notes
This function uses volume constraints to adjust the morphology of different regions based on specified target volumes.
The regions are identified by the ‘index’ values in border_dict.
Points belonging to excluded states are not adjusted.
Examples
>>> border_dict = {'pts': np.array([[0, 0, 0], [1, 1, 1], [2, 2, 2]]), 'index': np.array([1, 1, 2]), 'n': np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]])} >>> target_vols = {1: 10.0, 2: 5.0} >>> adjusted_pts = constrain_volume(border_dict, target_vols) >>> print(adjusted_pts) array([[ ... ]]) # Adjusted coordinates for regions with target volumes
- celltraj.spatial.get_LJ_force(r, eps, R=1.0, max_repulsion=True)[source]
Computes the Lennard-Jones (LJ) force for a given set of distances.
The Lennard-Jones potential models interactions between a pair of neutral atoms or molecules, capturing both the attractive and repulsive forces. This function calculates the LJ force based on the provided distances, interaction strength, and characteristic distance.
- Parameters:
r (
array-like
orfloat
) – The distance(s) at which to calculate the Lennard-Jones force. Can be a single float or an array of distances.eps (
float
) – The depth of the potential well, representing the strength of the interaction.R (
float
, optional) – The characteristic distance parameter, which influences the distance at which the potential well occurs. Default is 1.max_repulsion (
bool
, optional) – If True, the force is limited to a maximum repulsion by setting distances below the sigma value to sigma, where sigma is the distance at which the potential crosses zero (point of maximum repulsion). Default is True.
- Returns:
force – The computed Lennard-Jones force at each distance provided in r. The shape of the output matches the shape of r.
- Return type:
array-like
orfloat
Examples
>>> distances = np.array([0.5, 1.0, 1.5]) >>> interaction_strength = 1.0 >>> characteristic_distance = 1.0 >>> forces = get_LJ_force(distances, interaction_strength, R=characteristic_distance) >>> print(forces) [ 0. -24. 0.7410312]
Notes
The Lennard-Jones force is computed using the formula: force = 48 * eps * [(sigma^12 / r^13) - 0.5 * (sigma^6 / r^7)], where eps is the interaction strength and sigma is the effective particle diameter, calculated as sigma = R / 2^(1/6).
The max_repulsion option ensures that no distances smaller than sigma are considered, effectively limiting the maximum repulsive force.
This function can handle both scalar and array inputs for r.
- celltraj.spatial.get_adhesive_displacement(border_dict, surf_force_function, eps, alpha=1.0, maxd=None, rmin=None, rmax=None, active_neighbor_states=array([1]), active_displacement_states=array([], dtype=float64), symmetrize=True, **force_args)[source]
Computes the adhesive displacement between cell surfaces using a specified surface force function.
This function calculates the displacement of cell surfaces based on adhesive forces. It uses the states and positions of neighboring cells to determine active interfaces and apply force-based displacements. Optionally, the displacements can be symmetrized to ensure consistency across cell borders.
- Parameters:
border_dict (
dict
) – A dictionary containing border information, including: - ‘pts’: ndarray of shape (N, 3), coordinates of border points. - ‘nn_pts’: ndarray of shape (N, 3), coordinates of nearest neighbor points. - ‘states’: ndarray of shape (N,), states of the border points. - ‘nn_states’: ndarray of shape (N,), states of the nearest neighbor points. - ‘nn_inds’: ndarray of shape (N,), indices of the nearest neighbor points.surf_force_function (
callable
) – A function that computes the surface force based on distance and other parameters. Should take distance, epsilon, and additional arguments as inputs.eps (
ndarray
) – A 2D array where eps[i, j] represents the interaction strength between state i and state j.alpha (
float
, optional) – A scaling factor for the displacement magnitude (default is 1.0).maxd (
float
, optional) – The maximum allowed displacement. Displacements will be scaled if any calculated displacements exceed this value.rmin (
float
, optional) – The minimum interaction distance. Displacements calculated from distances smaller than rmin will be set to rmin.rmax (
float
, optional) – The maximum interaction distance. Displacements calculated from distances larger than rmax will be set to rmax.active_neighbor_states (
ndarray
, optional) – An array specifying the states of neighbors that are active for interaction (default is np.array([1])).active_displacement_states (
ndarray
, optional) – An array specifying the states of cells that are active for displacement (default is an empty array, which means all states are active).symmetrize (
bool
, optional) – If True, the displacements are symmetrized to ensure consistency across borders (default is True).**force_args (
dict
, optional) – Additional arguments to be passed to the surf_force_function.
- Returns:
dr – A 2D array of shape (N, 3) representing the displacements of the border points.
- Return type:
ndarray
Notes
The function filters out inactive or excluded states before computing the displacement.
Displacement is scaled using the surface force and optionally capped by maxd.
Symmetrization ensures that the displacement is consistent from both interacting cells’ perspectives.
Examples
>>> border_dict = { ... 'pts': np.random.rand(100, 3), ... 'nn_pts': np.random.rand(100, 3), ... 'states': np.random.randint(0, 2, 100), ... 'nn_states': np.random.randint(0, 2, 100), ... 'nn_inds': np.random.randint(0, 100, 100) ... } >>> surf_force_function = lambda r, eps: -eps * (r - 1) >>> eps = np.array([[0.1, 0.2], [0.2, 0.3]]) >>> dr = get_adhesive_displacement(border_dict, surf_force_function, eps, alpha=0.5) >>> dr.shape (100, 3)
- celltraj.spatial.get_border_dict(labels, states=None, radius=10, vdist=None, return_nnindex=True, return_nnvector=True, return_curvature=True, nn_states=None, scale=None, **border_args)[source]
Computes the border properties of labeled regions in a segmented image.
This function identifies the borders of labeled regions in a given image and calculates various properties such as nearest neighbor indices, vectors, and curvature. It can also return the scaled distances if specified.
- Parameters:
labels (
ndarray
) – A 2D or 3D array where each element represents a label, identifying different regions in the image.states (
ndarray
, optional) – An array indicating the state of each labeled region. If provided, states are used to differentiate regions. If None, all regions are assumed to have the same state.radius (
float
, optional) – The radius for finding nearest neighbors around each border point (default is 10).vdist (
ndarray
, optional) – An array representing a scalar value in the image, such as estimated ligand concentration, to store in the border point dictionary.return_nnindex (
bool
, optional) – If True, returns the nearest neighbor index for each border point (default is True).return_nnvector (
bool
, optional) – If True, returns the vector pointing to the nearest neighbor for each border point (default is True).return_curvature (
bool
, optional) – If True, calculates and returns the curvature at each border point (default is True).scale (
list
orndarray
, optional) – Scaling factors for each dimension of the labels array. If provided, scales the labels accordingly.**border_args (
dict
, optional) – Additional arguments to control border property calculations, such as ‘knn’ for the number of nearest neighbors when computing curvature.
- Returns:
border_dict – A dictionary containing the computed border properties: - ‘pts’: ndarray of float, coordinates of the border points. - ‘index’: ndarray of int, indices of the regions to which each border point belongs. - ‘states’: ndarray of int, states of the regions to which each border point belongs. - ‘nn_index’: ndarray of int, nearest neighbor indices for each border point (if return_nnindex is True). - ‘nn_states’: ndarray of int, states of the nearest neighbors (if return_nnindex is True). - ‘nn_pts’: ndarray of float, coordinates of the nearest neighbors (if return_nnvector is True). - ‘nn_inds’: ndarray of int, indices of the nearest neighbors (if return_nnvector is True). - ‘n’: ndarray of float, normals at each border point (if return_curvature is True). - ‘c’: ndarray of float, curvature at each border point (if return_curvature is True). - ‘vdist’: ndarray of float, scaled distances at each border point (if vdist is provided).
- Return type:
dict
Notes
This function is useful for analyzing cell shapes and their interactions in spatially resolved images.
The nearest neighbor indices and vectors can help understand cell-cell interactions and local neighborhood structures.
The curvature values can provide insights into the geometrical properties of cell boundaries.
Examples
>>> labels = np.array([[0, 1, 1, 0], [0, 1, 1, 0], [2, 2, 0, 0], [0, 0, 0, 0]]) >>> scale = [1.0, 1.0] >>> border_dict = get_border_dict(labels, scale=scale, return_nnindex=True, return_nnvector=True) >>> print(border_dict['pts']) [[0., 1.], [0., 2.], [1., 1.], [2., 0.], [2., 1.]] >>> print(border_dict['nn_index']) [2, 2, 2, 1, 1]
- celltraj.spatial.get_border_properties(cell_labels, surfaces=None, cell_states=None, surface_states=None, cell_labels_next=None, cell_labels_prev=None, lin_next=None, lin_prev=None, vdist=None, border_scale=1.0, radius=2.0, order=1, return_border_properties_list=False)[source]
Calculate geometric and physical properties of cell borders, with optional tracking between frames and surface contact details.
This function computes properties of cell borders based on cell_labels, surface contacts, and multipole moments up to a specified order. It supports tracking cell border displacement across frames using optional next and previous frame labels (cell_labels_next, cell_labels_prev). Outputs include multipole moments and border displacements useful in characterizing cell interactions and tracking cell behaviors.
- Parameters:
cell_labels (
ndarray
) – Array of labels identifying distinct cells within the domain.surfaces (
list
ofndarrays
, optional) – List of binary masks indicating surface layers adjacent to cells (e.g., membranes or other boundaries). Each is assigned a unique identifier.cell_states (
ndarray
ofint
, optional) – Array specifying a state for each cell in cell_labels. Defaults to 1 if not provided.surface_states (
ndarray
ofint
, optional) – Array of states corresponding to each surface in surfaces. These states must not overlap with cell_states.cell_labels_next (
ndarray
, optional) – Label array for the next frame to track cell borders over time.cell_labels_prev (
ndarray
, optional) – Label array for the previous frame, used for backward tracking of cells.lin_next (
list
oflists
, optional) – Mapping of cells in cell_labels to corresponding cells in cell_labels_next. Each entry contains IDs of linked cells in the next frame.lin_prev (
list
oflists
, optional) – Mapping of cells in cell_labels to corresponding cells in cell_labels_prev, used for backward tracking.border_scale (
float
, optional) – Resolution in microns applied to border point extraction (default is 1.0).radius (
float
, optional) – Radius in pixels for calculating border properties such as contact neighbors (default is 2.0).order (
int
, optional) – Order of multipole moments to calculate for each border contact (default is 1).return_border_properties_list (
bool
, optional) – If True, returns a list of property names for multipole moments alongside the border dictionary.
- Returns:
border_dict (
dict
) – Dictionary containing calculated properties of each cell’s border, including:pts : Array of border points.
index : Label of each cell at the border.
nn_states : Neighboring cell states at each border point.
contact_properties : Array of multipole moment properties calculated up to order.
border_dx_next, border_dx_prev : Displacements of each border in the next and previous frames, if applicable.
dx_next_properties, dx_prev_properties : Arrays of multipole moments based on border displacement for next and previous frames.
border_properties_list (
list
ofstr
, optional) – List of property names for contact_properties array if return_border_properties_list is True.
Notes
Multipole Moment Calculation: Moments up to the specified order are computed based on border charges (representing contact fraction) for each border point.
Cell Tracking: When provided with cell_labels_next or cell_labels_prev, the function calculates optimal transport (OT) vectors to track border movement between frames.
Examples
Calculate properties of cell borders and multipole moments with next-frame tracking:
>>> border_dict, property_list = get_border_properties( ... cell_labels=my_cell_labels, surfaces=my_surfaces, cell_states=my_states, ... cell_labels_next=my_next_frame_labels, lin_next=my_next_frame_links, ... border_scale=0.8, radius=1.5, order=2, return_border_properties_list=True ... )
- celltraj.spatial.get_flux_displacement(border_dict, border_features=None, flux_function=None, exclude_states=None, n=None, fmeans=0.0, fsigmas=0.0, random_seed=None, alpha=1.0, maxd=None, **flux_function_args)[source]
Calculates the displacement of border points using flux information.
This function computes the displacement of border points by applying a flux function or random sampling based on mean and standard deviation values. The displacements can be controlled by normal vectors, excluded for certain states, and scaled to a maximum displacement.
- Parameters:
border_dict (
dict
) – A dictionary containing information about the current cell borders, including: - ‘n’: ndarray of shape (N, 3), normal vectors at the border points. - ‘states’: ndarray of shape (N,), states of the border points.border_features (
ndarray
, optional) – Features at the border points used as input to the flux function (default is None).flux_function (
callable
, optional) – A function that takes border_features and additional arguments to compute mean (fmeans) and standard deviation (fsigmas) of the flux at each border point (default is None, meaning random sampling is used).exclude_states (
array-like
, optional) – A list or array of states to exclude from displacement calculations (default is None, meaning no states are excluded).n (
ndarray
, optional) – Normal vectors at the border points. If None, normal vectors are taken from border_dict[‘n’] (default is None).fmeans (
float
orarray-like
, optional) – Mean flux value(s) for random sampling (default is 0.). If flux_function is provided, this value is ignored.fsigmas (
float
orarray-like
, optional) – Standard deviation of flux value(s) for random sampling (default is 0.). If flux_function is provided, this value is ignored.random_seed (
int
, optional) – Seed for the random number generator to ensure reproducibility (default is None).alpha (
float
, optional) – A scaling factor for the displacement magnitude (default is 1.0).maxd (
float
, optional) – The maximum allowed displacement. If specified, the displacement is scaled to ensure it does not exceed this value (default is None).**flux_function_args (
dict
, optional) – Additional arguments to pass to the flux_function.
- Returns:
dx – A 2D array of shape (N, 3) representing the displacements of the border points.
- Return type:
ndarray
Notes
The function can use a flux function to calculate displacements based on border features or perform random sampling with specified mean and standard deviation values.
Displacement deviations are scaled by normal vectors and can be controlled by alpha and capped by maxd.
Excludes displacements for specified states, if exclude_states is provided.
The random number generator can be seeded for reproducibility using random_seed.
Examples
>>> border_dict = { ... 'n': np.random.rand(100, 3), ... 'states': np.random.randint(0, 2, 100) ... } >>> dx = get_flux_displacement(border_dict, fmeans=0.5, fsigmas=0.1, random_seed=42, alpha=0.8, maxd=0.2) >>> dx.shape (100, 3)
- celltraj.spatial.get_flux_ligdist(vdist, cmean=1.0, csigma=0.5, center=True)[source]
Calculate the mean and standard deviation of flux values based on ligand distribution.
This function computes the flux mean and standard deviation for a given ligand distribution, using specified parameters for the mean and scaling factor for the standard deviation. Optionally, it can center the mean flux to ensure the overall flux is balanced.
- Parameters:
vdist (
ndarray
) – A 3D array representing the ligand concentration distribution in a tissue volume.cmean (
float
, optional) – A scaling factor for the mean flux. Default is 1.0.csigma (
float
, optional) – A scaling factor for the standard deviation of the flux. Default is 0.5.center (
bool
, optional) – If True, centers the mean flux distribution around zero by subtracting the overall mean. Default is True.
- Returns:
fmeans (
ndarray
) – A 3D array representing the mean flux values based on the ligand distribution.fsigmas (
ndarray
) – A 3D array representing the standard deviation of flux values based on the ligand distribution.
Examples
>>> ligand_distribution = np.random.random((100, 100, 50)) >>> mean_flux, sigma_flux = get_flux_ligdist(ligand_distribution, cmean=1.2, csigma=0.8) >>> print(mean_flux.shape, sigma_flux.shape) (100, 100, 50) (100, 100, 50)
Notes
The function uses the absolute value of vdist to calculate the standard deviation of the flux.
Centering the mean flux helps in ensuring there is no net flux imbalance across the tissue volume.
- celltraj.spatial.get_labels_fromborderdict(border_dict, labels_shape, active_states=None, surface_labels=None, connected=True, random_seed=None)[source]
Generates a label mask from a dictionary of border points and associated states.
This function creates a 3D label array by identifying regions enclosed by the boundary points in border_dict. It assigns unique labels to each region based on the indices of the border points.
- Parameters:
border_dict (
dict
) – A dictionary containing the border points and associated information. Expected keys include: - ‘pts’: ndarray of shape (N, D), coordinates of the border points. - ‘index’: ndarray of shape (N,), labels for each border point. - ‘states’: ndarray of shape (N,), states associated with each border point.labels_shape (
tuple
ofints
) – The shape of the output labels array.active_states (
array-like
, optional) – A list or array of states to include in the labeling. If None, all unique states in border_dict[‘states’] are used.surface_labels (
ndarray
, optional) – A pre-existing label array to use as a base. Regions with non-zero values in this array will retain their labels.connected (
bool
, optional) – If True, ensures that labeled regions are connected. Uses the largest connected component labeling method.random_seed (
int
, optional) – A seed for the random number generator to ensure reproducibility.
- Returns:
labels – An array of the same shape as labels_shape with labeled regions. Each unique region enclosed by border points is assigned a unique label.
- Return type:
ndarray
Notes
This function utilizes convex hull and Delaunay triangulation to determine the regions enclosed by the border points.
It can be used to generate labels for 3D volumes, based on the locations and states of border points.
The function includes options for randomization and enforcing connectivity of labeled regions.
Examples
>>> border_dict = { ... 'pts': np.array([[0, 0, 0], [1, 1, 1], [2, 2, 2]]), ... 'index': np.array([1, 1, 2]), ... 'states': np.array([1, 1, 2]) ... } >>> labels_shape = (3, 3, 3) >>> labels = get_labels_fromborderdict(border_dict, labels_shape) >>> print(labels) array([[[1, 1, 0], [1, 1, 0], [0, 0, 0]], [[1, 1, 0], [1, 1, 0], [0, 0, 0]], [[0, 0, 0], [0, 0, 0], [0, 0, 0]]])
- celltraj.spatial.get_morse_force(r, eps, R=1.0, L=4.0)[source]
Computes the Morse force for a given set of distances.
The Morse potential is used to model the interaction between a pair of atoms or molecules, capturing both the attractive and repulsive forces more realistically than the Lennard-Jones potential. This function calculates the Morse force based on the provided distances, interaction strength, characteristic distance, and interaction range.
- Parameters:
r (
array-like
orfloat
) – The distance(s) at which to calculate the Morse force. Can be a single float or an array of distances.eps (
float
) – The depth of the potential well, representing the strength of the interaction.R (
float
, optional) – The equilibrium distance where the potential reaches its minimum. Default is 1.L (
float
, optional) – The width of the potential well, determining the range of the interaction. A larger value of L indicates a narrower well, meaning the potential changes more rapidly with distance. Default is 4.
- Returns:
force – The computed Morse force at each distance provided in r. The shape of the output matches the shape of r.
- Return type:
array-like
orfloat
Examples
>>> distances = np.array([0.8, 1.0, 1.2]) >>> interaction_strength = 2.0 >>> equilibrium_distance = 1.0 >>> interaction_range = 4.0 >>> forces = get_morse_force(distances, interaction_strength, R=equilibrium_distance, L=interaction_range) >>> print(forces) [ 1.17328042 0. -0.63212056]
Notes
The Morse force is derived from the Morse potential and is calculated using the formula: force = eps * [exp(-2 * (r - R) / L) - exp(-(r - R) / L)], where eps is the interaction strength, R is the equilibrium distance, and L is the interaction range.
This function can handle both scalar and array inputs for r.
- celltraj.spatial.get_nuc_displacement(border_pts_new, border_dict, Rset, nuc_states=array([1]), **nuc_args)[source]
- celltraj.spatial.get_ot_displacement(border_dict, border_dict_prev, parent_index=None)[source]
Computes the optimal transport (OT) displacement between two sets of boundary points.
This function calculates the optimal transport displacements between the points in the current boundary (border_dict) and the points in the previous boundary (border_dict_prev). It finds the optimal matches and computes the displacement vectors for each point.
- Parameters:
border_dict (
dict
) – A dictionary containing the current boundary points and related information. Expected keys include: - ‘index’: ndarray of shape (N,), unique labels of current boundary points. - ‘pts’: ndarray of shape (N, D), coordinates of the current boundary points.border_dict_prev (
dict
) – A dictionary containing the previous boundary points and related information. Expected keys include: - ‘index’: ndarray of shape (M,), unique labels of previous boundary points. - ‘pts’: ndarray of shape (M, D), coordinates of the previous boundary points.parent_index (
ndarray
, optional) – An array of unique labels (indices) to use for matching previous boundary points. If not provided, index1 from border_dict will be used to match with border_dict_prev.
- Returns:
inds_ot (
ndarray
) – A 1D array containing the indices of the optimal transport matches for the current boundary points from the previous boundary points.dxs_ot (
ndarray
) – A 2D array of shape (N, D), representing the displacement vectors from the current boundary points to the matched previous boundary points.
Notes
The function uses the get_ot_dx function to compute the optimal transport match and displacement between boundary points.
If parent_index is not provided, it defaults to using the indices of the current boundary points (index1).
Examples
>>> border_dict = { ... 'index': np.array([1, 2, 3]), ... 'pts': np.array([[0, 0], [1, 1], [2, 2]]) ... } >>> border_dict_prev = { ... 'index': np.array([1, 2, 3]), ... 'pts': np.array([[0, 1], [1, 0], [2, 1]]) ... } >>> inds_ot, dxs_ot = get_ot_displacement(border_dict, border_dict_prev) >>> inds_ot array([0, 0, 0]) >>> dxs_ot array([[ 0, -1], [ 0, 1], [ 0, 1]])
- celltraj.spatial.get_ot_dx(pts0, pts1, return_dx=True, return_cost=False)[source]
Computes the optimal transport (OT) displacement and cost between two sets of points.
This function calculates the optimal transport map between two sets of points pts0 and pts1 using the Earth Mover’s Distance (EMD). It returns the indices of the optimal transport matches and the displacement vectors, as well as the transport cost if specified.
- Parameters:
pts0 (
ndarray
) – A 2D array of shape (N, D), representing the first set of points, where N is the number of points and D is the dimensionality.pts1 (
ndarray
) – A 2D array of shape (M, D), representing the second set of points, where M is the number of points and D is the dimensionality.return_dx (
bool
, optional) – If True, returns the displacement vectors between matched points (default is True).return_cost (
bool
, optional) – If True, returns the total transport cost (default is False).
- Returns:
inds_ot (
ndarray
) – A 1D array of shape (N,), representing the indices of the points in pts1 that are matched to the points in pts0 according to the optimal transport map.dx (
ndarray
, optional) – A 2D array of shape (N, D), representing the displacement vectors from the points in pts0 to the matched points in pts1. Returned only if return_dx is True.cost (
float
, optional) – The total optimal transport cost, calculated as the sum of the transport cost between matched points. Returned only if return_cost is True.
Notes
The function uses the Earth Mover’s Distance (EMD) for computing the optimal transport map, which minimizes the cost of moving mass from pts0 to pts1.
The cost is computed as the sum of the pairwise distances weighted by the transport plan.
Displacement vectors are computed as the difference between points in pts0 and their matched points in pts1.
Examples
>>> pts0 = np.array([[0, 0], [1, 1], [2, 2]]) >>> pts1 = np.array([[0, 1], [1, 0], [2, 1]]) >>> inds_ot, dx, cost = get_ot_dx(pts0, pts1, return_dx=True, return_cost=True) >>> inds_ot array([0, 1, 2]) >>> dx array([[ 0, -1], [ 0, 1], [ 0, 1]]) >>> cost 1.0
- celltraj.spatial.get_secreted_ligand_density(msk, scale=2.0, zscale=1.0, npad=None, indz_bm=0, secretion_rate=1.0, D=None, micron_per_pixel=1.0, visual=False)[source]
Calculate the spatial distribution of secreted ligand density in a 3D tissue model.
This function simulates the diffusion and absorption of secreted ligands in a 3D volume defined by a binary mask. It uses finite element methods to solve the diffusion equation for ligand concentration, taking into account secretion from cell surfaces and absorption at boundaries.
- Parameters:
msk (
ndarray
) – A 3D binary mask representing the tissue, where non-zero values indicate the presence of cells.scale (
float
, optional) – The scaling factor for spatial resolution in the x and y dimensions. Default is 2.zscale (
float
, optional) – The scaling factor for spatial resolution in the z dimension. Default is 1.npad (
array-like
ofint
, optional) – Number of pixels to pad the mask in each dimension. Default is None, implying no padding.indz_bm (
int
, optional) – The index for the basal membrane in the z-dimension, where diffusion starts. Default is 0.secretion_rate (
float
orarray-like
, optional) – The rate of ligand secretion from the cell surfaces. Can be a scalar or array for different cell types. Default is 1.0.D (
float
, optional) – The diffusion coefficient for the ligand. If None, it is set to a default value based on the pixel size. Default is None.micron_per_pixel (
float
, optional) – The conversion factor from pixels to microns. Default is 1.visual (
bool
, optional) – If True, generates visualizations of the cell borders and diffusion process. Default is False.
- Returns:
vdist – A 3D array representing the steady-state concentration of the secreted ligand in the tissue volume.
- Return type:
ndarray
Examples
>>> tissue_mask = np.random.randint(0, 2, size=(100, 100, 50)) >>> ligand_density = get_secreted_ligand_density(tissue_mask, scale=2.5, zscale=1.2, secretion_rate=0.8) >>> print(ligand_density.shape) (100, 100, 50)
Notes
This function uses fipy for solving the diffusion equation and skimage.segmentation.find_boundaries for identifying cell borders.
The function includes various options for handling different boundary conditions, cell shapes, and secretion rates.
- celltraj.spatial.get_surface_displacement(border_dict, sts=None, c=None, n=None, alpha=1.0, maxd=None)[source]
Computes the surface displacement of cells based on their curvature and normal vectors.
This function calculates the displacement of cell surfaces using the curvature values and normal vectors. The displacement can be scaled by a factor alpha, and optionally constrained by a maximum displacement value.
- Parameters:
border_dict (
dict
) – A dictionary containing information about the cell borders, including: - ‘n’: ndarray of shape (N, 3), normal vectors at the border points. - ‘c’: ndarray of shape (N,), curvature values at the border points. - ‘states’: ndarray of shape (N,), states of the border points.sts (
ndarray
, optional) – An array of scaling factors for each state, used to modify the curvature. If provided, sts is multiplied with the curvature values based on the state of each border point (default is None, meaning no scaling is applied).c (
ndarray
, optional) – Curvature values at the border points. If None, it uses the curvature from border_dict (default is None).n (
ndarray
, optional) – Normal vectors at the border points. If None, it uses the normal vectors from border_dict (default is None).alpha (
float
, optional) – A scaling factor for the displacement magnitude (default is 1.0).maxd (
float
, optional) – The maximum allowed displacement. If specified, the displacement is scaled to ensure it does not exceed this value.
- Returns:
dx – A 2D array of shape (N, 3) representing the displacements of the border points.
- Return type:
ndarray
Notes
The displacement is calculated as a product of curvature, normal vectors, and the scaling factor alpha.
If sts is provided, curvature values are scaled according to the states of the border points.
Displacement magnitude is capped by maxd if specified, ensuring that no displacement exceeds this value.
Examples
>>> border_dict = { ... 'n': np.random.rand(100, 3), ... 'c': np.random.rand(100), ... 'states': np.random.randint(0, 2, 100) ... } >>> sts = np.array([1.0, 0.5]) >>> dx = get_surface_displacement(border_dict, sts=sts, alpha=0.2, maxd=0.1) >>> dx.shape (100, 3)
- celltraj.spatial.get_surface_displacement_deviation(border_dict, border_pts_prev, exclude_states=None, n=None, knn=12, use_eigs=False, alpha=1.0, maxd=None)[source]
Calculates the surface displacement deviation using optimal transport between current and previous border points.
This function computes the displacement of cell surface points based on deviations from previous positions. The displacement can be modified by normal vectors, filtered by specific states, and controlled by curvature or variance in displacement.
- Parameters:
border_dict (
dict
) – A dictionary containing information about the current cell borders, including: - ‘pts’: ndarray of shape (N, 3), current border points. - ‘states’: ndarray of shape (N,), states of the border points.border_pts_prev (
ndarray
) – A 2D array of shape (N, 3) containing the positions of border points from the previous time step.exclude_states (
array-like
, optional) – A list or array of states to exclude from displacement calculations (default is None, meaning no states are excluded).n (
ndarray
, optional) – Normal vectors at the border points. If None, normal vectors are calculated based on the optimal transport displacement (default is None).knn (
int
, optional) – The number of nearest neighbors to consider when computing variance or eigen decomposition for curvature calculations (default is 12).use_eigs (
bool
, optional) – If True, use eigen decomposition to calculate the displacement deviation; otherwise, use variance (default is False).alpha (
float
, optional) – A scaling factor for the displacement magnitude (default is 1.0).maxd (
float
, optional) – The maximum allowed displacement. If specified, the displacement is scaled to ensure it does not exceed this value (default is None).
- Returns:
dx – A 2D array of shape (N, 3) representing the displacements of the border points.
- Return type:
ndarray
Notes
The function uses optimal transport to calculate deviations between current and previous border points.
The surface displacement deviation is inspired by the “mother of all non-linearities”– the Kardar-Parisi-Zhang non-linear surface growth universality class.
Displacement deviations are scaled by the normal vectors and can be controlled by alpha and capped by maxd.
If use_eigs is True, eigen decomposition of the displacement field is used to calculate deviations, otherwise variance is used.
Excludes displacements for specified states, if exclude_states is provided.
Examples
>>> border_dict = { ... 'pts': np.random.rand(100, 3), ... 'states': np.random.randint(0, 2, 100) ... } >>> border_pts_prev = np.random.rand(100, 3) >>> dx = get_surface_displacement_deviation(border_dict, border_pts_prev, exclude_states=[0], alpha=0.5, maxd=0.1) >>> dx.shape (100, 3)
- celltraj.spatial.get_surface_points(msk, return_normals=False, return_curvature=False, knn=20, rscale=0.1)[source]
Computes the surface points of a labeled mask and optionally calculates normals and curvature.
This function identifies the surface (border) points of a given labeled mask using segmentation techniques. It can also compute normals (perpendicular vectors to the surface) and curvature values at these points if requested.
- Parameters:
msk (
ndarray
) – A 3D binary or labeled array representing the mask of regions of interest. Non-zero values represent the regions.return_normals (
bool
, optional) – If True, computes and returns the normals at each surface point (default is False).return_curvature (
bool
, optional) – If True, computes and returns the curvature at each surface point (default is False).knn (
int
, optional) – The number of nearest neighbors to consider when calculating normals and curvature (default is 20).
- Returns:
border_pts (
ndarray
) – A 2D array of shape (N, 3) containing the coordinates of the border points, where N is the number of border points found.n (
ndarray
, optional) – A 2D array of shape (N, 3) containing the normal vectors at each border point. Only returned if return_normals is True.c (
ndarray
, optional) – A 1D array of length N containing the curvature values at each border point. Only returned if return_curvature is True.
Notes
The function uses eigen decomposition on the neighborhood of each surface point to compute normals and curvature.
The normals are adjusted to face outward from the surface. If normals face inward, they are flipped.
Curvature is calculated as the ratio of the smallest eigenvalue to the sum of all eigenvalues, giving an estimate of local surface bending.
Examples
>>> msk = np.zeros((100, 100, 100), dtype=int) >>> msk[40:60, 40:60, 40:60] = 1 # A cube in the center >>> border_pts = get_surface_points(msk) >>> border_pts.shape (960, 3)
>>> border_pts, normals = get_surface_points(msk, return_normals=True) >>> border_pts.shape, normals.shape ((960, 3), (960, 3))
>>> border_pts, normals, curvature = get_surface_points(msk, return_normals=True, return_curvature=True) >>> border_pts.shape, normals.shape, curvature.shape ((960, 3), (960, 3), (960,))
- celltraj.spatial.get_volconstraint_com(border_pts, target_volume, max_iter=1000, converror=0.05, dc=1.0)[source]
Adjusts the positions of boundary points to achieve a target volume using a centroid-based method.
This function iteratively adjusts the positions of boundary points to match a specified target volume. The adjustment is done by moving points along the direction from the centroid to the points, scaled by the difference between the current and target volumes.
- Parameters:
border_pts (
ndarray
) – An array of shape (N, 3) representing the coordinates of the boundary points.target_volume (
float
) – The desired volume to be achieved.max_iter (
int
, optional) – Maximum number of iterations to perform. Default is 1000.converror (
float
, optional) – Convergence error threshold. Iterations stop when the relative volume error is below this value. Default is 0.05.dc (
float
, optional) – A scaling factor for the displacement calculated in each iteration. Default is 1.0.
- Returns:
border_pts – An array of shape (N, 3) representing the adjusted coordinates of the boundary points that approximate the target volume.
- Return type:
ndarray
Notes
The method assumes a 3D convex hull can be formed by the points, which is adjusted iteratively.
The convergence is based on the relative difference between the current volume and the target volume.
If the boundary points are collinear in any dimension, the method adjusts them to ensure a valid convex hull.
Examples
>>> border_pts = np.array([[0, 0, 0], [1, 1, 1], [2, 2, 2]]) >>> target_volume = 10.0 >>> adjusted_pts = get_volconstraint_com(border_pts, target_volume) >>> print(adjusted_pts) array([[ ... ]]) # Adjusted coordinates to approximate the target volume
- celltraj.spatial.get_yukawa_force(r, eps, R=1.0)[source]
Computes the Yukawa force for a given set of distances.
The Yukawa force is a screened Coulomb force often used to describe interactions in plasmas and other systems with screened potentials. This function calculates the Yukawa force based on the provided distances, interaction strength, and screening length.
- Parameters:
r (
array-like
orfloat
) – The distance(s) at which to calculate the Yukawa force. Can be a single float or an array of distances.eps (
float
) – The interaction strength (or potential strength) parameter, determining the amplitude of the force.R (
float
, optional) – The screening length parameter, which determines how quickly the force decays with distance. Default is 1.
- Returns:
force – The computed Yukawa force at each distance provided in r. The shape of the output matches the shape of r.
- Return type:
array-like
orfloat
Examples
>>> distances = np.array([0.5, 1.0, 1.5]) >>> interaction_strength = 2.0 >>> screening_length = 1.0 >>> forces = get_yukawa_force(distances, interaction_strength, R=screening_length) >>> print(forces) [4. e+00 1.47151776e+00 4.48168907e-01]
Notes
The Yukawa force is computed using the formula: force = eps * exp(-r / R) * (r + R) / (R * r^2), where eps is the interaction strength and R is the screening length.
The function handles both scalar and array inputs for r.
celltraj.translate
Multi-domain translation functions between single-cell trajectory models and omics and others.
- celltraj.translate.get_null_correlations(x_fc, x_fc_states, x_fc_predicted, nrandom=500, seed=None, tmfSet=None)[source]
Calculate null correlations for predicted and real fold changes.
- Parameters:
x_fc (
ndarray
) – Measured fold change matrix (conditions x genes).x_fc_states (
ndarray
) – State-specific fold change matrix (states x genes).x_fc_predicted (
ndarray
) – Predicted fold change matrix (conditions x genes).nrandom (
int
, optional) – Number of random permutations for generating null distributions (default is 500).seed (
int
, optional) – Random seed for reproducibility (default is None).tmfSet (
ndarray
, optional) – Array of treatment names or identifiers (default is None).
- Returns:
corrSet_pred (
ndarray
) – Correlations between predicted and real fold changes for each condition.corrSet_rand (
ndarray
) – Null correlations between randomly generated state probabilities and real fold changes.corrSet_predrand (
ndarray
) – Null correlations between predicted fold changes and fold changes from randomly generated state probabilities.
Notes
This function generates null distributions by randomly permuting state probabilities and calculating the corresponding fold changes. The correlations between these null fold changes and the real/predicted fold changes are computed to evaluate the significance of the predictions.
Examples
>>> x_fc = np.random.rand(10, 5000) # Example fold change data >>> x_fc_states = np.random.rand(3, 5000) # Example state-specific fold changes >>> x_fc_predicted = get_predictedFC(state_probs, x_fc_states) # Example predicted fold changes >>> corr_pred, corr_rand, corr_predrand = get_null_correlations(x_fc, x_fc_states, x_fc_predicted)
- celltraj.translate.get_predictedFC(state_probs, statesFC)[source]
Predict fold changes based on state probabilities and state-specific fold changes.
- Parameters:
state_probs (
ndarray
) – State probability matrix (conditions x states).statesFC (
ndarray
) – State-specific fold change matrix (states x genes).
- Returns:
x_FC_predicted – Predicted fold change matrix (conditions x genes).
- Return type:
ndarray
Examples
>>> state_probs = np.random.rand(10, 3) # Example state probability data >>> statesFC = np.random.rand(3, 5000) # Example state-specific fold change data >>> predicted_fc = get_predictedFC(state_probs, statesFC)
- celltraj.translate.get_state_decomposition(x_fc, state_probs, npermutations=500, inds_tm_training=None, save_file=None, visual=False, verbose=True, nchunk=100, gene_names=None, lb=None, ub=None)[source]
Decompose paired bulk average data (e.g. bulk RNAseq or gene expression measurement) into state-specific contributions using least squares optimization.
- Parameters:
x_fc (
ndarray
) – Fold change matrix (samples x genes).state_probs (
ndarray
) – State probability matrix (samples x states).npermutations (
int
, optional) – Number of permutations for training set decompositions (default is 500).inds_tm_training (
ndarray
, optional) – Indices of training set conditions (default is None).save_file (
str
, optional) – File path to save the state-specific fold changes (default is None).visual (
bool
, optional) – If True, visualizes the decomposition process (default is False).verbose (
bool
, optional) – If True, provides detailed logs during the decomposition process (default is True).nchunk (
int
, optional) – Chunk size for logging and saving intermediate results (default is 100).gene_names (
ndarray
, optional) – Names of the genes (default is None).lb (
ndarray
, optional) – Lower bounds for the linear least squares optimization (default is None, which sets to zeros).ub (
ndarray
, optional) – Upper bounds for the linear least squares optimization (default is None, which sets to infinity).
- Returns:
x_fc_states – State-specific fold change matrix (states x genes).
- Return type:
ndarray
Notes
If the state corresponds to the same RNA level regardless of the ligand treatment, then the measured average fold change for gene g in condition t can be decomposed into a linear combination of state-specific fold changes s_g and state probabilities p_t, such that:
\[x_{tg} = \sum_{i=1}^{n} p_{ti} s_{ig}\]where: - x_{tg} is the measured fold change for gene g in condition t. - p_{ti} is the probability of state i in condition t. - s_{ig} is the state-specific fold change for state i and gene g. - n is the number of states.
Examples
>>> x_fc = np.random.rand(10, 5000) # Example fold change data >>> state_probs = np.random.rand(10, 3) # Example state probability data >>> x_fc_states = get_state_decomposition(x_fc, state_probs)
celltraj.utilities
Utility functions.
- celltraj.utilities.get_linear_batch_normalization(feat0_all, feat1_all, nbins=100, return_coef=False, stdcut=5)[source]
- celltraj.utilities.get_pairwise_distance_sum(tshift, centers1, centers2, contact_transform=False, r0=100.0, d0=100.0, n=6, m=12)[source]