lib5c.tools.helpers module¶

lib5c.tools.helpers.infer_level_mapping(rep_names, triggers)[source]¶

Infers a mapping from replicate names to level names (i.e., classes or conditions) using a simple trigger substring approach.

A replicate is assigned to a level if the level’s trigger substring is a substring of the replicate name.

Parameters:	rep_names (list of str) – The replicate names to assign levels to. triggers (dict or list of str) – Pass a dict mapping trigger substrings to level names, or pass a list of level names to use the level names as their own trigger substrings.
Returns:	A mapping from rep_names to level names.
Return type:	dict

lib5c.tools.helpers.infer_replicate_names(infiles, as_dict=False, pattern=None)[source]¶

Infers replicate names given a list of filenames.

Parameters:	infiles (list of str) – The filenames to consider. as_dict (bool) – Pass True to make this function return a dict mapping the the infiles to their inferred replicate names. pattern (str, optional) – If the infiles are glob-based matches to a patten containing one wildcard, pass that pattern to use it to reconstruct the replicate names.
Returns:	If as_dict is False (the default), this is just the list of inferred rep names, in the same order as infiles. If as_dict is True, this is a dict mapping the original infiles to their inferred replicate names.
Return type:	list of str or dict

lib5c.tools.helpers.resolve_expected_models(expected_model_string, observed_counts, primermap, level=None)[source]¶

Convenience helper for resolving expected models.

Parameters:	expected_model_string (str) – If None, we expect to estimate fresh expected models from `observed_counts`. If a path to a specific countsfile, we expect that it contains the expected model to be used for all the observed counts. If a glob-expandable path, we expect that each file matching the pattern is to be used for one of the observed counts (assuming they too are in glob order). observed_counts (list of dict of np.ndarray) – Each element in the list is one replicate, represented as a counts dict of observed values. primermap (primermap) – The primermap to use for parsing files, etc. level ({'bin', 'fragment'}) – The level to use if a fresh expected modeul must be estimated.
Returns:	The resolved expected models.
Return type:	list of dict of np.ndarray

lib5c.tools.helpers.resolve_level(primermap, level='auto')[source]¶

Resolves the level of some input data.

Parameters:	primermap (primermap) – Primermap to try to resolve the level of. level (str) – If you already know the level, you can pass it as a string here.
Returns:	The resolved level.
Return type:	str

Notes

This function operates on a “three in a row” heuristic: if the first three bins in the primemap are all of the same size, then we guess that it’s bin level data.

lib5c.tools.helpers.resolve_parallel(parser, args, subcommand='', key_arg='infile', root_command='lib5c')[source]¶

Parallelizes as a command via bsub if it is available.

Parameters:	parser (argparse.ArgumentParser) – The parser used to parse the args for the root command. args (argparse.Namespace) – The args parsed by the parser. subcommand (str) – The particular subcommand of the root command being invoked. key_arg (str) – The argument to parallelize over. root_command (str) – The string used to invoke the root command.

lib5c.tools.helpers.resolve_primerfile(infile, primerfile=None)[source]¶

Searches for a primerfile next to in infile.

Parameters:	infile (str or list of str) – The infile(s) to look next to. primerfile (str, optional) – If you already know where the primerfile is pass it here to skip the search.
Returns:	The primerfile.
Return type:	str

lib5c.tools.helpers.split_self_regionally(regions, script='lib5c', hang=False)[source]¶

Allows a command line script that accepts a –region flag to split itself into a separate command run for each region.

Parameters:	regions (list of str) – The regions to split into. script (str) – The name of the script to invoke. hang (bool) – Pass True to cause the original executing process to hang until all the bsub jobs spawned by this function complete. This does nothing if bsub is not available.