Trimming ======== .. toctree:: :hidden: An important early preprocessing step is the removal of low-quality primers from the dataset. Command-line interface ---------------------- Primer trimming can be accomplished on the command line by running :: $ lib5c trim For complete details on the usage of this command, see the output of :: $ lib5c trim -h Exposed functionality --------------------- The algorithms which make up the primer trimming framework can be found in the :mod:`lib5c.algorithms.trimming` subpackage. The core API is exposed in the following convenience functions: * :func:`lib5c.algorithms.trimming.trim_primers` * :func:`lib5c.algorithms.trimming.wipe_counts` * :func:`lib5c.algorithms.trimming.trim_counts` The functions ``wipe_counts()`` and ``trim_counts()`` also have convenience wrappers which apply them over a counts superdict (dict of counts dicts, whose first-level keys are replicate names), which are: * :func:`lib5c.algorithms.trimming.wipe_counts_superdict` * :func:`lib5c.algorithms.trimming.trim_counts_superdict` Workflow ~~~~~~~~ The general workflow is to trim primers first (based on the quality of the counts matrices in the dataset), and then either trim or wipe those counts matrices:: from lib5c.algorithms.trimming import trim_primers, trim_counts_superdict trimmed_primermap, trimmed_indices = trim_primers(primermap, counts_superdict) trimmed_counts_superdict = trim_counts_superdict(counts_superdict, trimmed_indices) The call to ``trim_primers()`` does not modify the ``counts_superdict``, leaving the client to decide what to do next. Trimming versus wiping ~~~~~~~~~~~~~~~~~~~~~~ ``trim_counts()`` removes rows and columns from the matrices in the counts dict, with the result that the dimensions of these matrices will match the lengths of the values of ``trimmed_primermap``. This is the recommended way to treat removal of low-quality fragments. ``wipe_counts()`` does not change the dimensions of any matrix, and instead simply paints over the removed indices according to its kwarg ``wipe_value``. This can be useful when removing low-quality regions from already-binned data, for example:: from lib5c.algorithms.trimming import trim_primers, wipe_counts_superdict _, trimmed_indices = trim_primers(pixelmap, counts_superdict) wiped_counts_superdict = wipe_counts_superdict(counts_superdict, trimmed_indices) Notice that we discard the ``trimmed_pixelmap`` from the first function call, because this pixelmap's dimensions do not match any of the counts dicts. Trimming options ~~~~~~~~~~~~~~~~ There are two different ways to assess the quality of a primer: its total *cis* contact count (row sum in the counts matrix) or the fraction of its possible interactions which are nonzero. These two quality metrics are thresholded on by the two kwargs of ``trim_primers()``: ``min_sum`` and ``min_frac``.