lib5c.algorithms.thresholding module

lib5c.algorithms.thresholding.color_confusion(d)[source]

Extract the across-condition color confusion matrix.

Parameters:d (Dataset) – Dataset processed by two_way_thresholding().
Returns:The 2x2 confusion matrix.
Return type:np.ndarray
lib5c.algorithms.thresholding.concordance_confusion(d)[source]

Extract the within-condition concordance confusion matrices.

Parameters:d (Dataset) – Dataset processed by two_way_thresholding().
Returns:The keys are condition names, the values are the 2x2 confusion matrices.
Return type:dict
lib5c.algorithms.thresholding.count_clusters(d)[source]

Extract the final cluster counts.

Parameters:d (Dataset) – Dataset processed by two_way_thresholding() called with report_clusters=True.
Returns:The keys are the color names as strings, the values are integers representing the cluster counts.
Return type:dict
lib5c.algorithms.thresholding.filter_near_diagonal(df, distance=24000, drop=True)[source]

Drops rows from df where its ‘distance’ column is less than k.

Dropping occurs in-place.

Parameters:
  • df (pd.DataFrame) – Must have a ‘distance’ column.
  • distance (int) – Threshold for distance (in bp).
  • drop (bool) – Pass True to drop the filtered rows in-place. Pass False to return an index subset for the filtered rows instead.
lib5c.algorithms.thresholding.kappa(d)[source]

Compute the Cohen’s kappa values between the replicates of each condition.

Parameters:d (Dataset) – Dataset processed by two_way_thresholding().
Returns:The keys are condition names, the values are the kappa values.
Return type:dict
lib5c.algorithms.thresholding.label_connected_components(colors, color)[source]

Labels the connected components of a specific loop color.

Parameters:
  • colors (np.ndarray with string dtype) – The matrix of colors.
  • color (str) – The color to label.
Returns:

Same size and shape as colors, entries are ints which are the labels

Return type:

np.ndarray

Examples

>>> colors = np.array([['a', 'a', 'b', 'a'],
...                    ['a', 'a', 'b', 'b'],
...                    ['b', 'b', 'b', 'a'],
...                    ['a', 'b', 'a', 'a']])
>>> print(label_connected_components(colors, 'a'))
[[1 1 0 2]
 [1 1 0 0]
 [0 0 0 3]
 [2 0 3 3]]
lib5c.algorithms.thresholding.size_filter(calls, threshold)[source]

Removes calls which are in connected components smaller than a threshold.

Parameters:
  • calls (np.ndarray) – Boolean matrix of calls.
  • threshold (int) – Connected components smaller than this will be removed.
Returns:

The filtered calls.

Return type:

np.ndarray

Examples

>>> calls = np.array([[ True,  True, False,  True],
...                   [ True,  True, False, False],
...                   [False, False, False,  True],
...                   [ True, False,  True,  True]])
>>> size_filter(calls, 3)
array([[ True,  True, False, False],
       [ True,  True, False, False],
       [False, False, False, False],
       [False, False, False, False]])
lib5c.algorithms.thresholding.two_way_thresholding(pvalues_superdict, primermap, conditions=None, significance_threshold=1e-15, bh_fdr=False, two_tail=False, concordant=False, distance_threshold=24000, size_threshold=3, background_threshold=0.6, report_clusters=True)[source]

All-in-one heavy-lifting function for thresholding.

Parameters:
  • pvalues_superdict (dict of dict of np.ndarray) – The p-values to threshold.
  • primermap (primermap) – The primermap associated with the pvalues_superdict.
  • conditions (list of str, optional) – The list of condition names. Pass None to skip condition comparisons.
  • significance_threshold (float) – The p-value or q-value to threshold significance with.
  • bh_fdr (bool) – Pass True to apply multiple testing correction (BH-FDR) before checking the significance_threshold.
  • two_tail (bool) – If bh_fdr=True, pass True here to perform the BH-FDR on two-tailed p-values, but only report the significant right-tail events as loops. Note that two-tailed p-values are only accurate if p-values were called using a continuous distribution.
  • concordant (bool) – Pass True to report only those interactions which are significant in all replicates in each condition. Pass False to combine evidence from all replicates within each condition instead.
  • distance_threshold (int) – Interactions with interaction distance (in bp) shorter than this will not be called.
  • size_threshold (int) – Interactions within connected components smaller than this will not be called.
  • background_threshold (float, optional) – The p-value threshold to use to call a background loop class. Pass None to skip calling a background class.
  • report_clusters (bool) – Pass True to perform a second pass of connected component counting at the very end, reporting the numbers of clusters in each color category to the returned Dataset.
Returns:

The results of the thresholding.

Return type:

Dataset