hybkit.analysis

Functions for analysis of HybRecord and FoldRecord objects.

Analysis

class hybkit.analysis.Analysis(analysis_types: Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]], name: Optional[str] = None, quant_mode: Optional[Literal['single', 'reads', 'records']] = None)

Class for analysis of hybkit HybRecord and FoldRecord objects.

This class contains multiple conceptual analyses for HybRecord/FoldRecord Data:

Energy: Analysis of values of predicted intra-hybrid folding energy

Type: Analysis of segment types

miRNA: Analysis of miRNA segments distributions

Target: Analysis of mirna target segment names and types

Fold: Analysis of folding data included in the analyzed hyb_records.

This class used by selecting the desired analysis types on object initialization. Analyses are performed either by using either the add_record() or the add_all_records() methods. The results of the analysis are then available through the get_all_results(), get_analysis_results(), get_specific_result(), and plot_analysis_results() methods, which can return (or plot) the results of all analyses or of a specific subset of analyses.

Details for each respective analysis are provided here:

Energy Analysis:

This analysis evaluates the energy of each HybRecord object and provides a binned-histogram of all energy values represented.

Output Results:

energy_analysis_count (int): Count of energy values evaluated

has_energy_val (int): Count of hyb_records with an energy value

no_energy_val (int): Count of hyb_records without an energy value

energy_min (float): Minimum energy value

energy_max (float): Maximum energy value

energy_mean (float): Mean energy value

energy_std (float): Standard deviation of energy values

binned_energy_vals (Counter): Counter with integer keys of energy values from energy_min to energy_max storing the count of any hyb_records with energy values that fall within that range (rounded to the next highest integer (e.g. -12.5 -> -12).

Type Analysis:

This analysis evaluates the counts of each type of segment included in the HybRecord objects. The types of segments are determined by the seg1_type and seg2_type flags, which are set by the hybkit.HybRecord.eval_types() method.

Requirements:

seg1_type and seg2_type flags must be set for each HybRecord, (can be done by hybkit.HybRecord.eval_types()).

Output Results:

types_analysis_count (int): Count of hybrid types analyzed

hybrid_types (Counter): Counter containing annotated types of seg1 and seg (in original 5p / 3p order)

reordered_hybrid_types (Counter): Counter containing annotated types of seg1 and seg2. This is provided in "sorted" order, where types are sorted alphabetically (independent of 5p / 3p position).

mirna_hybrid_types (Counter): Counter containing annotated types of seg1 and seg2. This is provided in "miRNA-prime" order, where a miRNA type is always listed before other types, and then remaining types are sorted alphabetically (independent of 5p / 3p position).

seg1_types (Counter): Counter containing annotated type of segment in position seg1

seg2_types (Counter): Counter containing annotated type of segment in position seg2

all_seg_types (Counter): Counter containing position-independent annotated types

miRNA Analysis:

Analysis of miRNA segments in hybrids.

The mirna_analysis provides an analysis of what miRNA types are present in the hyb records. If a miRNA dimer is present in a hybrid, this is counted in mirna_dimers. If a single miRNA is present in a hybrid, this is counted in mirnas_5p or mirnas_3p depending on the miRNA location.

Requirements:

mirna_seg flag must be set for each HybRecord (can be done by hybkit.HybRecord.eval_mirna()).

Output Results:

mirna_analysis_count (int): Count of miRNA hybrids analyzed

mirnas_5p (int): Count of 5p miRNAs detected

mirnas_3p (int): Count of 3p miRNAs detected

mirna_dimers (int): Count of miRNA dimers (5p + 3p) detected

non_mirna (int): Count of non-miRNA hybrids detected

has_mirna (int): Hybrids with 5p, 3p, or both as miRNA

Target Analysis:

Analysis of targets in miRNA-containing hybrids.

The target analysis provides an analysis of what annotated sequences and sequence types are targeted by any miRNA within the hyb records. If a miRNA is not present in a hybrid, the hybrid is not included in the analysis. If a miRNA dimer is present in a hybrid, the 5p miRNA is used for the analysis, and the 3p miRNA is considered the "target."

Requirements:

mirna_seg flag must be set for each HybRecord (can be done by hybkit.HybRecord.eval_mirna()).

Output Results:

target_analysis_count (int): Count of hybrids analyzed

target_evals (int): Count of target evaluations performed

target_names (Counter): Counter containing names of miRNA targets detected.

target_types (Counter): Counter containing types of miRNA targets detected.

Fold Analysis:

This analysis evaluates the predicted binding of miRNA within hyb records that contain a miRNA and have an associated FoldRecord object as the attribute fold_record. This includes an analysis and plotting of the predicted binding by position among the provided miRNA.

Requirements:

The mirna_seg flag must be set for each HybRecord (can be done by hybkit.HybRecord.eval_mirna()).

The fold_record attribute must be set for each HybRecord with a corresponding FoldRecord object. This can be done using the hybkit.HybRecord.set_fold_record() method.

Output Results:

fold_analysis_count (int): Count of miRNA fold predictions analyzed

folds_recorded (int): Count of fold predictions with a mirna fold

mirna_nt_fold_counts (Counter) : Counter with keys of miRNA position index and values of number of miRNAs with a predicted bound state at that index.

mirna_nt_fold_props (Counter) : Counter with keys of miRNA position index and values of proportion (0.0 - 1.0) of miRNAs with a predicted bound state at that index.

fold_match_counts (Counter) : Counter with keys of count of predicted matches between miRNA and target with values of count of miRNAs with that number of predicted matches.

Parameters

analysis_types (str or list of str) -- Analysis types to perform
name (str, optional) -- Name of the analysis
quant_mode (str, optional) -- Mode to use for record quantification. Options are "single": One count per record; "reads": If "read_count" flag is set, count all reads in record (else count 1); "records": if the "record_count" flag is set, count all individual records within combined record (else count 1). If not provided, defaults to the value in Analysis.settings['quant_mode'].

Variables

name (str) -- Name of the analysis
analysis_types (list of str) -- List of analysis types to perform
quant_mode (str) -- Mode to use for record quantification.

settings = {'out_delim': ',', 'quant_mode': 'single'}: Class-level settings. See hybkit.settings.Analysis_settings for descriptions.

analysis_options = ['energy', 'type', 'mirna', 'target', 'fold']

add_hyb_record(hyb_record: HybRecord) → None

Add a HybRecord object to the analysis.

Parameters: hyb_record (HybRecord) -- HybRecord object to be added to the analysis.

add_hyb_records(hyb_records: List[HybRecord], eval_types: bool = False, eval_mirna: bool = False) → None

Add a list of HybRecord objects to the analysis.

Parameters

hyb_records (HybFile or list of HybRecord) -- HybFile to iterate over, or iterable of HybRecord objects to be added to the analysis.
eval_types (bool) -- If True, evaluate the hybrid type of the HybRecord before adding it to the analysis using hybkit.HybRecord.eval_types().
eval_mirna (bool) -- If True, evaluate the miRNA segment of the HybRecord before adding it to the analysis using hybkit.HybRecord.eval_mirna().

get_all_results() → dict

Return a dictionary with all results for all active analyses.

See Analyses for details on the results for each analysis type.

Returns

Dictionary with keys of analysis type and values of: dictionaries with results for that analysis type.

Return type

dict

get_analysis_results(analysis: Literal['energy', 'type', 'mirna', 'target']) → Dict

Return a dictionary with all results for a specific analysis.

See Analyses for details on the results for each analysis type.

Parameters

analysis (str) -- Analysis type to return results for.

Returns

Dictionary with results for the specified analysis type.: see :ref:Analyses for details.

Return type

dict

get_specific_result(result_key: str) → Any

Get a specific result from the analysis.

See Analyses for details on the results for each analysis type.

Parameters: result_key (str) -- Result key to return from one of the enabled analyses.
Returns: Result value for the specified result key.

get_analysis_delim_str(analysis: Optional[Literal['energy', 'type', 'mirna', 'target']] = None, out_delim: Optional[str] = None) → str

Return a delimited string containing the results of the analysis.

See Analyses for details on the results for each analysis type.

Parameters

analysis (str or list of str) -- Analysis type for return results. If not provided, return the results for all active analyses.
out_delim (str) -- Delimiter to use for output. If not provided, defaults to the value in settings['out_delim'].

write_analysis_delim_str(out_file_name: Optional[str] = None, analysis: Optional[Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]]] = None, out_delim: Optional[str] = None) → None

Write the results of the analysis to a delimited text file.

See Analyses for details on the results for each analysis type.

Parameters

out_file_name (str) -- Path to output file. If not provided, defaults to: ./<analysis_name>_<analysis>.csv if analysis/analyses provided, or ./<analysis_name>_multi_analysis.csv if no analysis/analyses provided.
analysis (str or list of str) -- Analysis type for return results. If not provided, return the results for all active analyses.
out_delim (str) -- Delimiter to use for output. If not provided, defaults to the value in settings['out_delim'].

write_analysis_results_special(out_basename: Optional[str] = None, analysis: Optional[Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]]] = None, out_delim: Optional[str] = None) → List[str]

Write the results of the analyses to specialized text files.

See Analyses for details on the results for each analysis type.

Parameters

out_basename (str) -- Path for basename of output file. Files will be renamed using the provided path as the base name. If not provided, defaults to: ./<analysis_name>_<analysis> if name is set, or ./Analysis_multi_<analysis> if name not set.
analysis (str or list of str) -- Analysis type to write results files for. If not provided, write results files for all active analyses.
out_delim (str) -- Delimiter to use for output where applicable. If not provided, defaults to the value in settings['out_delim'].

plot_analysis_results(out_basename: Optional[str] = None, analysis: Optional[Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]]] = None) → List[str]

Plot the results of the analyses.

See Analyses for details on the results for each analysis type.

Parameters

analysis (str or list of str) -- Analysis type to plot results for. If not provided, plot results for all active analyses.
out_basename (str) -- Path to output file. If not provided, defaults to: ./<analysis_name> if name provided or ./analysis if no name provided.

key = 'fold'