hybkit.analysis

Functions for analysis of HybRecord and FoldRecord objects.

Analysis

class hybkit.analysis.Analysis(analysis_types: Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]], name: Optional[str] = None, quant_mode: Optional[Literal['single', 'reads', 'records']] = None)

Class for analysis of hybkit HybRecord and FoldRecord objects.

This class contains multiple conceptual analyses for HybRecord/FoldRecord Data:

Energy: Analysis of values of predicted intra-hybrid folding energy
Type: Analysis of segment types
miRNA: Analysis of miRNA segments distributions
Target: Analysis of mirna target segment names and types
Fold: Analysis of folding data included in the analyzed hyb_records.

This class used by selecting the desired analysis types on object initialization. Analyses are performed either by using either the add_record() or the add_all_records() methods. The results of the analysis are then available through the get_all_results(), get_analysis_results(), get_specific_result(), and plot_analysis_results() methods, which can return (or plot) the results of all analyses or of a specific subset of analyses.

Details for each respective analysis are provided here:

Energy Analysis:

This analysis evaluates the energy of each HybRecord object and provides a binned-histogram of all energy values represented.

Output Results:
energy_analysis_count (int): Count of energy values evaluated
has_energy_val (int): Count of hyb_records with an energy value
no_energy_val (int): Count of hyb_records without an energy value
energy_min (float): Minimum energy value
energy_max (float): Maximum energy value
energy_mean (float): Mean energy value
energy_std (float): Standard deviation of energy values
binned_energy_vals (Counter): Counter with integer keys of energy values from energy_min to energy_max storing the count of any hyb_records with energy values that fall within that range (rounded to the next highest integer (e.g. -12.5 -> -12).

Type Analysis:

This analysis evaluates the counts of each type of segment included in the HybRecord objects. The types of segments are determined by the seg1_type and seg2_type flags, which are set by the hybkit.HybRecord.eval_types() method.

Requirements:

seg1_type and seg2_type flags must be set for each HybRecord, (can be done by hybkit.HybRecord.eval_types()).
Output Results:
types_analysis_count (int): Count of hybrid types analyzed
hybrid_types (Counter): Counter containing annotated types of seg1 and seg (in original 5p / 3p order)
reordered_hybrid_types (Counter): Counter containing annotated types of seg1 and seg2. This is provided in "sorted" order, where types are sorted alphabetically (independent of 5p / 3p position).
mirna_hybrid_types (Counter): Counter containing annotated types of seg1 and seg2. This is provided in "miRNA-prime" order, where a miRNA type is always listed before other types, and then remaining types are sorted alphabetically (independent of 5p / 3p position).
seg1_types (Counter): Counter containing annotated type of segment in position seg1
seg2_types (Counter): Counter containing annotated type of segment in position seg2
all_seg_types (Counter): Counter containing position-independent annotated types

miRNA Analysis:

Analysis of miRNA segments in hybrids.

The mirna_analysis provides an analysis of what miRNA types are present in the hyb records. If a miRNA dimer is present in a hybrid, this is counted in mirna_dimers. If a single miRNA is present in a hybrid, this is counted in mirnas_5p or mirnas_3p depending on the miRNA location.

Requirements:
mirna_seg flag must be set for each HybRecord (can be done by hybkit.HybRecord.eval_mirna()).
Output Results:
mirna_analysis_count (int): Count of miRNA hybrids analyzed
mirnas_5p (int): Count of 5p miRNAs detected
mirnas_3p (int): Count of 3p miRNAs detected
mirna_dimers (int): Count of miRNA dimers (5p + 3p) detected
non_mirna (int): Count of non-miRNA hybrids detected
has_mirna (int): Hybrids with 5p, 3p, or both as miRNA

Target Analysis:

Analysis of targets in miRNA-containing hybrids.

The target analysis provides an analysis of what annotated sequences and sequence types are targeted by any miRNA within the hyb records. If a miRNA is not present in a hybrid, the hybrid is not included in the analysis. If a miRNA dimer is present in a hybrid, the 5p miRNA is used for the analysis, and the 3p miRNA is considered the "target."

Requirements:
mirna_seg flag must be set for each HybRecord (can be done by hybkit.HybRecord.eval_mirna()).
Output Results:
target_analysis_count (int): Count of hybrids analyzed
target_evals (int): Count of target evaluations performed
target_names (Counter): Counter containing names of miRNA targets detected.
target_types (Counter): Counter containing types of miRNA targets detected.

Fold Analysis:

This analysis evaluates the predicted binding of miRNA within hyb records that contain a miRNA and have an associated FoldRecord object as the attribute fold_record. This includes an analysis and plotting of the predicted binding by position among the provided miRNA.

Requirements:
The mirna_seg flag must be set for each HybRecord (can be done by hybkit.HybRecord.eval_mirna()).
The fold_record attribute must be set for each HybRecord with a corresponding FoldRecord object. This can be done using the hybkit.HybRecord.set_fold_record() method.
Output Results:
fold_analysis_count (int): Count of miRNA fold predictions analyzed
folds_recorded (int): Count of fold predictions with a mirna fold
mirna_nt_fold_counts (Counter) : Counter with keys of miRNA position index and values of number of miRNAs with a predicted bound state at that index.
mirna_nt_fold_props (Counter) : Counter with keys of miRNA position index and values of proportion (0.0 - 1.0) of miRNAs with a predicted bound state at that index.
fold_match_counts (Counter) : Counter with keys of count of predicted matches between miRNA and target with values of count of miRNAs with that number of predicted matches.
Parameters
  • analysis_types (str or list of str) -- Analysis types to perform

  • name (str, optional) -- Name of the analysis

  • quant_mode (str, optional) -- Mode to use for record quantification. Options are "single": One count per record; "reads": If "read_count" flag is set, count all reads in record (else count 1); "records": if the "record_count" flag is set, count all individual records within combined record (else count 1). If not provided, defaults to the value in Analysis.settings['quant_mode'].

Variables
  • name (str) -- Name of the analysis

  • analysis_types (list of str) -- List of analysis types to perform

  • quant_mode (str) -- Mode to use for record quantification.

settings = {'out_delim': ',', 'quant_mode': 'single'}

Class-level settings. See hybkit.settings.Analysis_settings for descriptions.

analysis_options = ['energy', 'type', 'mirna', 'target', 'fold']
add_hyb_record(hyb_record: HybRecord) None

Add a HybRecord object to the analysis.

Parameters

hyb_record (HybRecord) -- HybRecord object to be added to the analysis.

add_hyb_records(hyb_records: List[HybRecord], eval_types: bool = False, eval_mirna: bool = False) None

Add a list of HybRecord objects to the analysis.

Parameters
get_all_results() dict

Return a dictionary with all results for all active analyses.

See Analyses for details on the results for each analysis type.

Returns

Dictionary with keys of analysis type and values of

dictionaries with results for that analysis type.

Return type

dict

get_analysis_results(analysis: Literal['energy', 'type', 'mirna', 'target']) Dict

Return a dictionary with all results for a specific analysis.

See Analyses for details on the results for each analysis type.

Parameters

analysis (str) -- Analysis type to return results for.

Returns

Dictionary with results for the specified analysis type.

see :ref:Analyses for details.

Return type

dict

get_specific_result(result_key: str) Any

Get a specific result from the analysis.

See Analyses for details on the results for each analysis type.

Parameters

result_key (str) -- Result key to return from one of the enabled analyses.

Returns

Result value for the specified result key.

get_analysis_delim_str(analysis: Optional[Literal['energy', 'type', 'mirna', 'target']] = None, out_delim: Optional[str] = None) str

Return a delimited string containing the results of the analysis.

See Analyses for details on the results for each analysis type.

Parameters
  • analysis (str or list of str) -- Analysis type for return results. If not provided, return the results for all active analyses.

  • out_delim (str) -- Delimiter to use for output. If not provided, defaults to the value in settings['out_delim'].

write_analysis_delim_str(out_file_name: Optional[str] = None, analysis: Optional[Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]]] = None, out_delim: Optional[str] = None) None

Write the results of the analysis to a delimited text file.

See Analyses for details on the results for each analysis type.

Parameters
  • out_file_name (str) -- Path to output file. If not provided, defaults to: ./<analysis_name>_<analysis>.csv if analysis/analyses provided, or ./<analysis_name>_multi_analysis.csv if no analysis/analyses provided.

  • analysis (str or list of str) -- Analysis type for return results. If not provided, return the results for all active analyses.

  • out_delim (str) -- Delimiter to use for output. If not provided, defaults to the value in settings['out_delim'].

write_analysis_results_special(out_basename: Optional[str] = None, analysis: Optional[Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]]] = None, out_delim: Optional[str] = None) List[str]

Write the results of the analyses to specialized text files.

See Analyses for details on the results for each analysis type.

Parameters
  • out_basename (str) -- Path for basename of output file. Files will be renamed using the provided path as the base name. If not provided, defaults to: ./<analysis_name>_<analysis> if name is set, or ./Analysis_multi_<analysis> if name not set.

  • analysis (str or list of str) -- Analysis type to write results files for. If not provided, write results files for all active analyses.

  • out_delim (str) -- Delimiter to use for output where applicable. If not provided, defaults to the value in settings['out_delim'].

plot_analysis_results(out_basename: Optional[str] = None, analysis: Optional[Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]]] = None) List[str]

Plot the results of the analyses.

See Analyses for details on the results for each analysis type.

Parameters
  • analysis (str or list of str) -- Analysis type to plot results for. If not provided, plot results for all active analyses.

  • out_basename (str) -- Path to output file. If not provided, defaults to: ./<analysis_name> if name provided or ./analysis if no name provided.

key = 'fold'