lib5c.tools.helpers module¶
-
lib5c.tools.helpers.
infer_level_mapping
(rep_names, triggers)[source]¶ Infers a mapping from replicate names to level names (i.e., classes or conditions) using a simple trigger substring approach.
A replicate is assigned to a level if the level’s trigger substring is a substring of the replicate name.
Parameters: - rep_names (list of str) – The replicate names to assign levels to.
- triggers (dict or list of str) – Pass a dict mapping trigger substrings to level names, or pass a list of level names to use the level names as their own trigger substrings.
Returns: A mapping from rep_names to level names.
Return type: dict
-
lib5c.tools.helpers.
infer_replicate_names
(infiles, as_dict=False, pattern=None)[source]¶ Infers replicate names given a list of filenames.
Parameters: - infiles (list of str) – The filenames to consider.
- as_dict (bool) – Pass True to make this function return a dict mapping the the infiles to their inferred replicate names.
- pattern (str, optional) – If the infiles are glob-based matches to a patten containing one wildcard, pass that pattern to use it to reconstruct the replicate names.
Returns: If as_dict is False (the default), this is just the list of inferred rep names, in the same order as infiles. If as_dict is True, this is a dict mapping the original infiles to their inferred replicate names.
Return type: list of str or dict
-
lib5c.tools.helpers.
resolve_expected_models
(expected_model_string, observed_counts, primermap, level=None)[source]¶ Convenience helper for resolving expected models.
Parameters: - expected_model_string (str) – If None, we expect to estimate fresh expected models from
observed_counts
. If a path to a specific countsfile, we expect that it contains the expected model to be used for all the observed counts. If a glob-expandable path, we expect that each file matching the pattern is to be used for one of the observed counts (assuming they too are in glob order). - observed_counts (list of dict of np.ndarray) – Each element in the list is one replicate, represented as a counts dict of observed values.
- primermap (primermap) – The primermap to use for parsing files, etc.
- level ({'bin', 'fragment'}) – The level to use if a fresh expected modeul must be estimated.
Returns: The resolved expected models.
Return type: list of dict of np.ndarray
- expected_model_string (str) – If None, we expect to estimate fresh expected models from
-
lib5c.tools.helpers.
resolve_level
(primermap, level='auto')[source]¶ Resolves the level of some input data.
Parameters: - primermap (primermap) – Primermap to try to resolve the level of.
- level (str) – If you already know the level, you can pass it as a string here.
Returns: The resolved level.
Return type: str
Notes
This function operates on a “three in a row” heuristic: if the first three bins in the primemap are all of the same size, then we guess that it’s bin level data.
-
lib5c.tools.helpers.
resolve_parallel
(parser, args, subcommand='', key_arg='infile', root_command='lib5c')[source]¶ Parallelizes as a command via bsub if it is available.
Parameters: - parser (argparse.ArgumentParser) – The parser used to parse the args for the root command.
- args (argparse.Namespace) – The args parsed by the parser.
- subcommand (str) – The particular subcommand of the root command being invoked.
- key_arg (str) – The argument to parallelize over.
- root_command (str) – The string used to invoke the root command.
-
lib5c.tools.helpers.
resolve_primerfile
(infile, primerfile=None)[source]¶ Searches for a primerfile next to in infile.
Parameters: - infile (str or list of str) – The infile(s) to look next to.
- primerfile (str, optional) – If you already know where the primerfile is pass it here to skip the search.
Returns: The primerfile.
Return type: str
-
lib5c.tools.helpers.
split_self_regionally
(regions, script='lib5c', hang=False)[source]¶ Allows a command line script that accepts a –region flag to split itself into a separate command run for each region.
Parameters: - regions (list of str) – The regions to split into.
- script (str) – The name of the script to invoke.
- hang (bool) – Pass True to cause the original executing process to hang until all the bsub jobs spawned by this function complete. This does nothing if bsub is not available.