hybkit.analysis

Functions for analysis of HybRecord and FoldRecord Objects.

Type Analysis

hybkit.analysis.TYPE_DESCRIPTION = 'as_follows...'

The type analysis provides an analysis of what segment types are included in the analyzed hyb files.

Before using the analysis, the seg1_type and seg2_type flags must be set for the record, as is done by hybkit.HybRecord.find_seg_types(). A count is added to the analysis dict for each hybrid type (Ex: “miRNA-mRNA”) with segments placed in sorted order for non-redundant typing. The analysis additionally reports the number of individual segment types.

hybkit.analysis.type_dict()

Create a dictionary with keys of counter objects for running type analyses.

Returns:Dict object for type analyses:
{
 'hybrid_type_counts':Counter(),
 'seg1_types':Counter(),
 'seg2_types':Counter(),
 'all_seg_types':Counter(),
}
hybkit.analysis.combine_type_dicts(analysis_dicts)

Combine a list/tuple of dictionaries created from running type analyses.

Parameters:analysis_dicts (list or tuple) – Iterable of dict objects from the type analysis.
Returns:Combined dict object with keys ‘hybrid_type_counts’, ‘seg1_types’, ‘seg2_types’, and ‘all_seg_types’
hybkit.analysis.addto_type(record, analysis_dict, count_mode='record', type_sep='-', mirna_centric_sorting=True)

Add the information from a HybRecord to a type analysis.

This method is designed to perform the analysis during a single reading of a HybFile as to minimize memory and time usage. The provided dict object to analysis_dict is modified in-place.

Parameters:
  • record (HybRecord) – Record with information to add.
  • analysis_dict (dict) – Dict for type analysis (see type_dict()).
  • count_mode (str, optional) – Indicates how entries in record should be counted. Options are one of: {‘read, ‘record’}. See hybkit.HybRecord.count() for further details.
  • type_sep (str, optional) – Separator string to place between the seg_types.
  • mirna_centric_sorting (bool, optional) – Where a type contains an miRNA, place that first in the seg1_type<sep>seg2_type naming scheme. Otherwise seg1_type and seg2_type are ordered alphabetically.
hybkit.analysis.format_type(analysis_dict, sep=', ')

Return the results of a type analysis in a list of delimited lines.

Parameters:
  • analysis_dict (dict) – Dict for type analysis (see type_dict()).
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
Returns:

list of string objects with a terminating newline character representing the results of the analysis.

hybkit.analysis.write_type(file_name_base, analysis_dict, name=None, multi_files=False, sep=', ', file_suffix='.csv', make_plots=True)

Write the results of a type analysis to a file, and create plots of the results.

Parameters:
  • file_name_base (str) – “Base” name for output files. Final file names will be generated based on each respective analysis type and provided parameters.
  • analysis_dict (dict) – Dict for type analysis (see type_dict()).
  • name (str, optional) – String to add to title of plot indicating data source.
  • multi_files (bool, optional) – If True, output result lines in separate files. otherwise write a single delimited file containing results.
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
  • file_suffix (str, optional) – File suffix to add to delimited files. Defaults to “.csv” corresponding to using the delimiter: “,”.
  • make_plots (bool, optional) – If True, plot results using matplotlib via hybkit.plot.hybrid_type_counts(). Otherwise do not make plots.

miRNA Count Analysis

hybkit.analysis.MIRNA_COUNT_DESCRIPTION = 'as_follows...'

The mirna_count analysis determines what type each record is with regard to mirna and counts them accordingly. This includes:

5p_mirna_hybrids: Hybrids with a 5p miRNA.
3p_mirna_hybrids: Hybrids with a 3p miRNA.
mirna_dimer_hybrids: Hybrids with both a 5p and 3p miRNA.
no_mirna_hybrids: Hybrids with no miRNA.
(And additionally includes:)
all_mirna_hybrids: Hybrids that fall into the first three categories.

Before using the analysis, the mirna_seg flag must be set for each record as can be done by sequential use of the hybkit.HybRecord.find_seg_types() and hybkit.HybRecord.mirna_analysis() methods.

hybkit.analysis.mirna_count_dict()

Create a dictionary with keys for running miRNA count analyses.

Returns:Dict object for count analyses:
{
 '5p_mirna_hybrids': 0,
 '3p_mirna_hybrids': 0,
 'mirna_dimer_hybrids': 0,
 'all_mirna_hybrids': 0,
 'no_mirna_hybrids': 0,
}
hybkit.analysis.combine_mirna_count_dicts(analysis_dicts)

Combine a list/tuple of dictionaries created from running count analyses.

Parameters:analysis_dicts (list or tuple) – Iterable of dict objects from the count analysis.
Returns:
Combined dict object with keys ‘5p_mirna_hybrids’, ‘3p_mirna_hybrids’,
’mirna_dimer_hybrids’, ‘all_mirna_hybrids’, and ‘no_mirna_hybrids’.
hybkit.analysis.addto_mirna_count(record, analysis_dict, count_mode='record')

Add the information from a HybRecord to a mirna_count analysis.

This method is designed to perform the analysis during a single reading of a HybFile as to minimize memory and time usage. The provided dict object to analysis_dict is modified in-place.

Parameters:
  • record (HybRecord) – Record with information to add.
  • analysis_dict (dict) – Dict for mirna_count analysis created with mirna_count_dict().
  • count_mode (str, optional) – Indicates how entries in record should be counted. Options are one of: {‘read, ‘record’}. See hybkit.HybRecord.count() for further details.
hybkit.analysis.format_mirna_count(analysis_dict, sep=', ')

Return the results of a mirna_count analysis in a list of delimited lines.

Parameters:
  • analysis_dict (dict) – Dict from type analysis (see mirna_count_dict()).
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
Returns:

list of string objects with a terminating newline character representing the results of the analysis.

hybkit.analysis.write_mirna_count(file_name_base, analysis_dict, name=None, multi_files=False, sep=', ', file_suffix='.csv', make_plots=True)

Write the results of a mirna_count analysis to a file, and create a plot of the results.

Parameters:
  • file_name_base (str) – “Base” name for output files. Final file names will be generated based on each respective analysis type and provided parameters.
  • analysis_dict (dict) – Dict from mirna_count analysis (see mirna_count_dict()).
  • name (str, optional) – String to add to title of plot indicating data source.
  • multi_files (bool, optional) – Currently has no effect.
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
  • file_suffix (str, optional) – File suffix to add to delimited files. Defaults to “.csv” corresponding to using the delimiter: “,”.
  • make_plots (bool, optional) –

    If True, plot results using matplotlib via hybkit.plot.mirna_count(). Otherwise do not make plots.

Summary Analysis

hybkit.analysis.SUMMARY_DESCRIPTION = 'as_follows...'

This analysis includes the components of both the Type Analysis and miRNA Count Analysis analyses, performed simultaneously.

hybkit.analysis.summary_dict()

Return a combined dict from the type_dict() and mirna_count_dict() methods.

hybkit.analysis.combine_summary_dicts(analysis_dicts)

Combine a list/tuple of dictionaries created from running summary analyses.

Parameters:analysis_dicts (list or tuple) – Iterable of dict objects from the summary analysis.
Returns:Combined dict object with keys as in summary_dict() method.
hybkit.analysis.addto_summary(record, analysis_dict, count_mode='record', type_sep='-', mirna_centric_sorting=True)

Add the information from a HybRecord to a summary analysis.

This method is designed to perform the analysis during a single reading of a HybFile as to minimize memory and time usage. The provided dict object to analysis_dict is modified in-place.

Parameters:
  • record (HybRecord) – Record with information to add.
  • analysis_dict (dict) – Dict for summary analysis (see summary_dict()).
  • count_mode (str, optional) – Indicates how entries in record should be counted. Options are one of: {‘read, ‘record’}. See hybkit.HybRecord.count() for further details.
  • type_sep (str, optional) – Separator string to place between the seg_types.
  • mirna_centric_sorting (bool, optional) – Where a type contains an miRNA, place that first in the seg1_type<sep>seg2_type naming scheme. Otherwise seg1_type and seg2_type are ordered alphabetically.
hybkit.analysis.format_summary(analysis_dict, sep=', ')

Return the results of a summary analysis in a list of delimited lines.

Parameters:
  • analysis_dict (dict) – Dict for summary analysis (see summary_dict()).
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
Returns:

list of string objects with a terminating newline character representing the results of the analysis.

hybkit.analysis.write_summary(file_name_base, analysis_dict, name=None, multi_files=False, sep=', ', file_suffix='.csv', make_plots=True)

Write the results of a summary analysis to a file, and create plots of the results.

Parameters:
  • file_name_base (str) – “Base” name for output files. Final file names will be generated based on each respective analysis type and provided parameters.
  • analysis_dict (dict) – Dict for summary analysis (see summary_dict()).
  • name (str, optional) – String to add to title of plot indicating data source.
  • multi_files (bool, optional) – If True, output result lines in separate files. otherwise write a single delimited file containing results.
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
  • file_suffix (str, optional) – File suffix to add to delimited files. Defaults to “.csv” corresponding to using the delimiter: “,”.
  • make_plots (bool, optional) –

    If True, plot results using matplotlib via hybkit.plot.type() and hybkit.plot.mirna_count(). Otherwise do not make plots.

miRNA Target Analysis

hybkit.analysis.MIRNA_TARGET_DESCRIPTION = 'as_follows...'

The mirna_target analysis provides an analysis of what sequences are targeted by each respective miRNA within the hyb records. The analysis dict has keys of each miRNA, with each value being a dict of targeted sequences and their associated count of times targeted.

Before using the analysis, the seg1_type, seg2_type, and mirna_seg flags must be set for each record as can be done by sequential use of the hybkit.HybRecord.find_seg_types() and hybkit.HybRecord.mirna_analysis() methods.

hybkit.analysis.mirna_target_dict()

Create a dictionary with keys for running miRNA target analyses.

Returns:Dict object for target analyses:
{}
hybkit.analysis.combine_mirna_target_dicts(analysis_dicts)

Combine a list/tuple of dictionaries created from running mirna_target analyses.

Parameters:analysis_dicts (list or tuple) – Iterable of dict objects from the mirna_target analysis.
Returns:Combined dict object with keys of each respective mirna and their target coutns.
hybkit.analysis.addto_mirna_target(record, analysis_dict, count_mode='record', double_count_duplexes=False, mirna_contains=None, mirna_matches=None, target_contains=None, target_matches=None)

Add the information from a HybRecord to a mirna_target analysis. If the record contains a single miRNA, the miRNA and target are identified. The count for this miRNA and its target is then added to the dict.

This method is designed to perform the analysis during a single reading of a HybFile as to minimize memory and time usage. The provided dict object to analysis_dict is modified in-place.

Parameters:
  • record (HybRecord) – Record with information to add.
  • analysis_dict (dict) – Dict for mirna_target analysis created with mirna_count_dict().
  • count_mode (str, optional) – Indicates how entries in record should be counted. Options are one of: {‘read, ‘record’}. See hybkit.HybRecord.count() for further details.
  • double_count_duplexes (bool, optional) – for each of the miRNA with the target assigned as the other. This will cause the final count of miRNA not to equal the total record count containing miRNA, as all duplex miRNA records will be counted twice.
  • miRNA_contains (str, optional) – If provided, only miRNA with identifiers containing the provided string will be included.
  • miRNA_matches (str, optional) – If provided, only miRNA with identifiers matching the provided string will be included.
  • target_contains (str, optional) – If provided, only miRNA with target identifiers containing the provided string will be included.
  • target_matches (str, optional) – If provided, only miRNA with target identifiers matching the provided string will be included.
hybkit.analysis.process_mirna_target(analysis_dict)

Process and sort the results of a mirna_target analysis.

Parameters:analysis_dict (dict) – Dict from target analysis (see mirna_target_dict()).
Returns:A tuple of
(ret_dict, counts, target_type_counts, total_count)
ret_dict - A copy of the analysis dict with keys sorted alphabetically, and with values of counter objects containing counts of each miRNA’s targets.
counts - A dict of miRNA with keys sorted alphabetically, and values of the total sum of the respective miRNA’s targets.
target_type_counts - A dict of miRNA with keys sorted alphabetically, and values of a dict of counts of each respective type targeted by that miRNA.
total_count - An int of the sum total of counted mirna/targets.
hybkit.analysis.format_mirna_target(analysis_dict, counts=None, sep=', ', spacer_line=True)

Return the results of a mirna_target analysis in a list of delimited lines.

Parameters:
  • analysis_dict (dict) – Dict from type analysis (see mirna_target_dict()).
  • counts (dict, optional) – Dict of total miRNA couts from process_mirna_target(). If povided, adds a line for the total count of each miRNA to the output.
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
  • spacer_line (bool, optional) – If True, add a blank line between each included miRNA’s targets.
Returns:

list of string objects with a terminating newline character representing the results of the analysis.

hybkit.analysis.write_mirna_target(file_name_base, analysis_dict, counts_dict=None, target_types_count_dict=None, name=None, multi_files=False, sep=', ', file_suffix='.csv', spacer_line=True, make_plots=True, max_mirna=10)

Write the results of a mirna_target analysis to a file, and create a plot of the results.

Parameters:
  • file_name_base (str) – “Base” name for output files. Final file names will be generated based on each respective analysis type and provided parameters.
  • analysis_dict (dict) – Dict created from processing a mirna_target analysis (see mirna_target_dict(), and process_mirna_target()).
  • counts (dict, optional) – Dict of total miRNA counts from process_mirna_target(). If povided, adds a line for the total count of each miRNA to the output.
  • target_types_counts_dict (dict, optional) – Dict of mirna target type counts from process_mirna_target(). If povided, enables plotting of types targeted by each miRNA.
  • name (str, optional) – String to add to title of plot indicating data source.
  • multi_files (bool, optional) – If True, output results for each miRNA in separate files. Otherwise write a single delimited file containing results.
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
  • file_suffix (str, optional) – File suffix to add to delimited files. Defaults to “.csv” corresponding to using the delimiter: “,”.
  • make_plots (bool, optional) –

    If True, plot results using matplotlib via hybkit.plot.mirna_target() and hybkit.plot.mirna_target_type(). Otherwise do not make plots.

miRNA Fold Analysis

hybkit.analysis.MIRNA_FOLD_DESCRIPTION = 'as_follows...'

The mirna_fold analysis evaluates the predicted binding of miRNA within hyb records that contain a miRNA and have an associated FoldRecord object as the attribute fold_record. This includes an analysis and plotting of the predicted binding by position among the provided miRNA.

Before using the analysis, the mirna_seg flag must be set for each record as can be done by sequential use of the hybkit.HybRecord.find_seg_types() and hybkit.HybRecord.mirna_analysis() methods. Note: The hybkit.HybRecord.mirna_analysis() method must be used on the record to set appropriate variables for evaluation.

The analysis dict contains the keys:
base_counts: A by-index count of whether a miRNA is predicted to be base-paired.
base_percent: A placeholder for evaluation of bases by percentage following analysis.
all_evaluated: The number of records evaluated.
all_mirna: The number of records evaluated and determined to contain miRNA.
no_mirna: The number of records evaluated and determined not to contain miRNA.
all_folds: The number of records evaluated, determined to contain mirna, and which contained a fold record.
no_folds: The number of records evaluated, determined to contain mirna, and which did not contain a fold record.
hybkit.analysis.mirna_fold_dict()

Create a dictionary with keys for running mirna_fold analyses.

Returns:Dict object for fold analyses:
{
 'base_counts': Counter({i: 0 for i in range(1,28}),
 'base_pecent': None,
 'all_evaluated': 0,
 'all_mirna': 0,
 'no_mirna': 0,
 'all_folds': 0,
 'no_folds': 0,
}
hybkit.analysis.combine_mirna_fold_dicts(analysis_dicts)

Combine a list/tuple of dictionaries created from running mirna_fold analyses.

Parameters:analysis_dicts (list or tuple) – Iterable of dict objects from the mirna_fold analysis.
Returns:Combined dict object with keys as in mirna_fold_dict() method.
hybkit.analysis.addto_mirna_fold(record, analysis_dict, count_mode='record', allow_duplexes=False, skip_no_fold_record=False)

Add the information from a HybRecord to a mirna_target analysis. If the record contains a single miRNA, the miRNA and target are identified. The count for this miRNA and its target is then added to the dict.

This method is designed to perform the analysis during a single reading of a HybFile as to minimize memory and time usage. The provided dict object to analysis_dict is modified in-place.

Parameters:
  • record (HybRecord) – Record with information to add.
  • analysis_dict (dict) – Dict for mirna_fold analysis created with mirna_fold_dict().
  • count_mode (str, optional) – Indicates how entries in record should be counted. Options are one of: {‘read, ‘record’}. See hybkit.HybRecord.count() for further details.
  • allow_duplexes (bool, optional) – considering the 5p miRNA as the “miRNA” in the hybrid. the will otherwise be ignored.
  • skip_no_fold_record (bool, optional) – If True, hybkit.HybRecord records that do not contain a FoldRecord object stored in the fold_record attribute will be added to the ‘no_fold’ count. Otherwise an error will be raised.
hybkit.analysis.process_mirna_fold(analysis_dict)

Process the results of a mirna_fold analysis. Add the ‘base_fractions’ key.

Parameters:analysis_dict (dict) – Dict from mirna_fold analysis (see mirna_fold_dict()).
Returns:The original analysis_dict with an added ‘base_fractions’ key.
hybkit.analysis.format_mirna_fold(analysis_dict, sep=', ')

Return the results of a mirna_fold analysis in a list of delimited lines.

Parameters:
  • analysis_dict (dict) – Dict from mirna_fold analysis (see mirna_fold_dict()).
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
Returns:

list of string objects with a terminating newline character representing the results of the analysis.

hybkit.analysis.write_mirna_fold(file_name_base, analysis_dict, name=None, multi_files=False, sep=', ', file_suffix='.csv', spacer_line=True, make_plots=True)

Write the results of a mirna_fold analysis to a file, and create a plot of the results.

Parameters:
  • file_name_base (str) – “Base” name for output files. Final file names will be generated based on each respective analysis type and provided parameters.
  • analysis_dict (dict) – Dict created from processing a mirna_fold analysis (see mirna_fold_dict(), and process_mirna_fold()).
  • name (str, optional) – String to add to title of plot indicating data source.
  • multi_files (bool, optional) – If True, output results for the base counts and base fractions in separate files. Otherwise write a single delimited file containing results.
  • sep (str, optional) – Separator for entries within lines, such as ‘,’ or ‘\t’.
  • file_suffix (str, optional) – File suffix to add to delimited files. Defaults to “.csv” corresponding to using the delimiter: “,”.
  • spacer_line (bool, optional) – If True, add a blank line between the base counts and base fractions entries with combined outputting.
  • make_plots (bool, optional) –

    If True, plot results using matplotlib via hybkit.plot.mirna_fold(). Otherwise do not make plots.