hybkit (module)

This module contains classes and methods for reading, writing, and manipulating data in the “.hyb” genomic sequence format.

This includes two classes for storage of chimeric sequence information and associated fold-information:

HybRecord Class for storage of records in “.hyb” format
FoldRecord Class for storage of predicted folding information for hyb chimeric sequence reads

It also includes classes for reading, writing, and iterating over files containing that information:

HybFile Class for reading and writing “.hyb”-format files containing chimeric genomic sequence information.
ViennaFile Class for reading and writing “.vienna”-format files containing predicted folding information for a hyb sequence
ViennadFile Class for reading and writing “.viennad”-format files containing predicted folding information for a hyb sequence
CtFile Class for reading “.ct”-format files containing predicted folding information for a hyb sequence
HybFoldIter Class for simultaneous iteration over a HybFile and a ViennaFile, ViennadFile, or CtFile.

HybRecord Class

class hybkit.HybRecord(id, seq, energy=None, seg1_info={}, seg2_info={}, flags={}, read_count=None, fold_record=None, find_seg_types=False, mirna_analysis=False, target_region_analysis=False)

Class for storing and analyzing chimeric hybrid genomics reads in “.hyb” format.

Hyb format entries are a GFF-related file format described by Travis, et al. (see References) that contain information about a genomic sequence read identified to be a chimera by anlaysis sofwtare. The line contains 15 or 16 columns separated by tabs (“\t”) and provides information on each of the respective identified components. An example .hyb format line courtesy of Gay et al. (See References):

2407_718        ATCACATTGCCAGGGATTTCCAATCCCCAACAATGTGAAAACGGCTGTC       .       MIMAT0000078_MirBase_miR-23a_microRNA   1       21      1       21      0.0027  ENSG00000188229_ENST00000340384_TUBB2C_mRNA     23      49      1181    1207    1.2e-06

These columns are respectively described in hybkit as:

id, seq, energy, [seg1_]ref, [seg1_]read_start, [seg1_]read_end, [seg1_]ref_start, [seg1_]ref_end, [seg1_]score, [seg2_]read_start, [seg2_]read_end, [seg2_]ref_start, [seg2_]ref_end, [seg2_]score, [flag1=val1; flag2=val2;flag3=val3…]”

A minimum amount of data necessary for a HybRecord object is the genomic sequence and its corresponding identifier.

Examples

hyb_record_1 = hybkit.HybRecord('1_100', 'ACTG')
hyb_record_2 = hybkit.HybRecord('2_107', 'CTAG', '-7.3')

Details about segments are provided via dict objects with the keys specific to each segment. Data can be provided either as strings or as floats/integers (where relevant). For example, to create a HybRecord object representing the example line given above:

seg1_info = {'ref': 'MIMAT0000078_MirBase_miR-23a_microRNA',
             'read_start': '1',
             'read_end': '21',
             'ref_start': '1',
             'ref_end': '21',
             'score': '0.0027'}
seg2_info = {'ref': 'ENSG00000188229_ENST00000340384_TUBB2C_mRNA',
             'read_start': 23,
             'read_end': 49,
             'ref_start': 1181,
             'ref_end': 1207,
             'score': 1.2e-06}
seq_id = '2407_718'
seq = 'ATCACATTGCCAGGGATTTCCAATCCCCAACAATGTGAAAACGGCTGTC'
energy = None

hyb_record = hybkit.HybRecord(seq_id, seq, energy, seg1_info, seg2_info)
# OR
hyb_record = hybkit.HybRecord(seq_id, seq, seg1_info=seg1_info, seg2_info=seg2_info)

Though the preferred method for reading hyb-records from lines is via the HybRecord.from_line() constructor:

# line = "2407_718      ATC..."
hyb_record = hybkit.HybRecord.from_line(line)

This constructor allows convenient reading of “.hyb” files using the HybFile wrapper class described below. For example, to print all hybrid identifiers in a “.hyb” file:

with hybkit.HybFile('path/to/file.hyb', 'r') as hyb_file:
    for hyb_record in hyb_file:
        print(hyb_record.id)
Parameters:
  • id (str) – Identifier for the hyb record
  • seq (str) – Nucleotide sequence of the hyb record
  • energy (str, optional) – Predicted energy of record folding in kcal/mol
  • seg1_info (dict, optional) – Information on segment 1 of the record, containing possible: keys: (‘ref’, ‘read_start’, ‘read_end’, ‘ref_start’, ‘ref_end’, ‘score’)
  • seg2_info (dict, optional) – Information on segment 2 of the record, containing possible: keys: (‘ref’, ‘read_start’, ‘read_end’, ‘ref_start’, ‘ref_end’, ‘score’)
  • flags (dict, optional) – Dict with keys of flags for the record and their associated values. By default flags must be defined in ALL_FLAGS but custom allowed flags can be added via set_custom_flags(). This setting can also be disabled by setting ‘allow_undefined_flags’ to True in settings.
  • fold_record (FoldRecord, optional) – Set the record’s fold_record attribute as the provided FoldRecord object using set_fold_record() on initializtaion.
  • find_seg_types (bool, optional) – Perform find_seg_types() analysis on record initialization.
  • mirna_analysis (bool, optional) – Perform mirna_analysis() on record initialization.
  • target_region_analysis (bool, optional) – Perform target_region_analysis() on record initialization
Variables:
  • id (str) – Identifier for the hyb record (often “<read-num>_<read-count>”)
  • seq (str) – Nucleotide sequence of the hyb record
  • energy (str or None) – Predicted energy of folding
  • seg1_info (dict) – Information on segment 1, contains keys: ‘ref’ (str), ‘read_start’ (int), ‘read_end’ (int), ‘ref_start’ (int), ‘ref_end’ (int), and ‘score’ (float).
  • seg2_info (dict) – Information on segment 2, contains keys: ‘ref’ (str), ‘read_start’ (int), ‘read_end’ (int), ‘ref_start’ (int), ‘ref_end’ (int), and ‘score’ (float).
  • flags (dict) – Dict of flags with possible keys and values as defined in the Flags section of the hybkit Specification.
  • mirna_details (dict or None) – Dict of details on hybrid characteristics related to miRNA filled during mirna_analysis().
  • mirna_info (dict or None) – Link to appropriate seg1_info or seg2_info dict corresponding to a record’s miRNA (if present), assigned during mirna_analysis().
  • target_info (dict or None) – Link to appropriate seg1_info or seg2_info dict corresponding to a record’s target of a miRNA (if present), assigned during mirna_analysis().
  • fold_record (FoldRecord) – Information on the predicted folding of this hybrid sequence, set by set_fold_record().
  • fold_seq_match (bool or None) – Set to True if the sequence contained within a fold record exactly matches the sequence in (this) HybRecord, when the fold_record attribute is set via set_fold_record().
HYBRID_COLUMNS = ['id', 'seq', 'energy']

Record columns 1-3 defining parameters of the overall hybrid, defined by the Hyb format

SEGMENT_COLUMNS = ['ref', 'read_start', 'read_end', 'ref_start', 'ref_end', 'score']

Record columns 4-9 and 10-15, respectively, defining parameters of each respective segment mapping, defined by the Hyb format

ALL_FLAGS = ['count_total', 'count_last_clustering', 'two_way_merged', 'seq_IDs_in_cluster', 'read_count', 'orient', 'seg1_type', 'seg2_type', 'seg1_det', 'seg2_det', 'miRNA_seg', 'target_reg', 'ext', 'source']

Flags defined by the hybkit package. Flags 1-4 are utilized by the Hyb software package. For information on flags, see the Flags portion of the hybkit Specification.

MIRNA_TYPES = {'miRNA', 'microRNA'}

Default miRNA types for use in mirna_analysis().

CODING_TYPES = {'mRNA'}

Default coding sequence types for use in the target_region_analysis().

DEFAULTS = {'allow_fold_record_mismatch': True, 'allow_undefined_flags': False, 'allow_unknown_regions': False, 'allow_unknown_seg_types': False, 'check_complete': False, 'check_complete_seg_types': False, 'placeholder': '.', 'reorder_flags': True, 'warn_fold_record_mismatch': False, 'warn_unknown_regions': True}

Class-level default settings, copied into settings at runtime.

find_type_params = {}

Dict of information required for use by find_seg_types(). Set with the specific information required by the selected method for use by default. Currently supported paramater constructors:

target_region_info = {}

Dict of information required for use by target_region_analysis(). Make dict using make_region_info() then set via set_region_info(), or do both simultaneously with make_set_region_info().

settings = {'allow_fold_record_mismatch': True, 'allow_undefined_flags': False, 'allow_unknown_regions': False, 'allow_unknown_seg_types': False, 'check_complete': False, 'check_complete_seg_types': False, 'placeholder': '.', 'reorder_flags': True, 'warn_fold_record_mismatch': False, 'warn_unknown_regions': True}

Modifiable settings during usage. Copied at runtime from DEFAULTS.

seg1_id()

Return a copy of the id for segment 1 (5p), or None if not defined.

Specifically returns the ‘ref’ key of the seg1_info attribute.

seg2_id()

Return a copy of the id for segment 2 (3p), or None if not defined.

Specifically returns the ‘ref’ key of the seg2_info attribute.

seg_ids()

Return a tuple of the ids of segment 1 (5p) segment 2 (3p), or tuple of None.

set_flag(flag_key, flag_val, allow_undefined_flags=None)

Set the value of self.flags: flag_key to value flag_val.

Parameters:
  • flag_key (str) – Key for flag to set.
  • flag_val (any) – Value for flag to set.
  • allow_undefined_flags (bool or None, optional) – Allow inclusion of flags not defined in ALL_FLAGS or by set_custom_flags(). If None, uses setting in settings : allow_undefined_flags.
seg1_type(require=False)

Return the “seg1_type” flag if defined, or return None.

Parameters:require (bool, optional) – If True, raise an error if seg1_type is not defined.
seg2_type(require=False)

Return the “seg2_type” flag if defined, or return None.

Parameters:require (bool, optional) – If True, raise an error if seg2_type is not defined.
seg_types(require=False)

Return a tuple of the (“seg1_type”, “seg2_type”) flags where defined, or None.

Parameters:require (bool, optional) – If True, raise an error if either flag is not defined.
seg_types_sorted(require=False)

Return a sorted tuple of the (“seg1_type”, “seg2_type”) flags where defined, or None.

Parameters:require (bool, optional) – If True, raise an error if either flag is not defined.
set_seg1_type(seg1_type)

Set the “seg1_type” flag in flags.

set_seg2_type(seg2_type)

Set the “seg2_type” flag in flags.

set_seg_types(seg1_type, seg2_type)

Set the “seg1_type” and “seg2_type” flags in flags.

read_count(require=False, as_int=False)

Return the “read_count” flag if defined, otherwise return None.

Parameters:
  • require (bool, optional) – If True, raise an error if the “read_count” flag is not defined.
  • as_int (bool, optional) – If True, return the value as an int (instead of str).
set_read_count(read_count)

Set the “read_count” flag in flags as a str.

record_count(require=False, as_int=False)

If the “count_total” flag is defined, return it, otherwise return ‘1’ (this record).

Parameters:
  • require (bool, optional) – If True, raise an error if the “read_count” flag is not defined.
  • as_int (bool, optional) – If True, return the value as an int (instead of str).
count(count_mode, as_int=False)

Return either the read_count() or record_count().

Parameters:
  • count_mode (str) – Mode for returned count: one of : {‘read’, ‘record’} If ‘read’, require the ‘read_count’ flag to be defined. If ‘record’, return ‘1’ if the ‘count_total’ flag is not defined.
  • as_int (bool, optional) – If True, return the value as an int (instead of str).
find_seg_types(allow_unknown=None, check_complete=None)

Find the types of each segment using the method currently set for the class.

This method sequentually provides each seg_info_dict to the method set as find_type_method() by select_find_type_method() (or by set_find_type_method() for custom methods). The default supplied method is find_seg_type_hyb().

Parameters:
  • allow_unknown (bool, optional) – If True, allow segment types that cannot be identified and set as “unknown. Otherwise raise an error. If None, uses setting in settings : allow_unknown_seg_types.
  • check_complete (bool, optional) – If True, check every possibility for the type of a given segment, instead of stopping after finding the first type. If None, uses setting in settings : check_complete_seg_types.
check_fold_seq_match(fold_record)

Return True if record seq value matches fold_record. seq

set_fold_record(fold_record, allow_fold_record_mismatch=None, warn_fold_record_mismatch=None)

Check and set provided fold_record (FoldRecord) as fold_record.

Check to ensure that fold_record argument is an instance of FoldRecord, and that it has a matching sequence to this HybRecord, then set it as self.fold_record. Set fold_seq_match as True if sequence matches, otherwise set as False.

Parameters:
  • fold_record (FoldRecord) – FoldRecord instance to set as fold_record.
  • allow_fold_record_mismatch (bool, optional) – Allow mismatches between HybRecord sequence and the FoldRecord sequence. If None, uses setting in settings : allow_fold_record_mismatch.
  • warn_fold_record_mismatch (bool, optional) – Warn for mismatches between HybRecord sequence and the FoldRecord sequence. If None, uses setting in settings : warn_fold_record_mismatch.
mirna_analysis(mirna_types=None)

Analyze and store miRNA properties from other properties in the hyb record.

Perform an analysis of miRNA properties within the sequence record, set the mirna_seg flag, and also store the results in the miRNA_analysis dict. This analysis requries the seg1_type and seg2_type flags to be populated, which can be performed by the find_seg_types() method.

Parameters:mirna_types (list, tuple, or set, optional) – Iterable of strings of “types” to be considered as miRNA. Otherwise, the default types are used from MIRNA_TYPES.
target_region_analysis(region_info=None, coding_types=None, allow_unknown_regions=None, warn_unknown_regions=None)

For miRNA/coding-target pairs, find the region of the coding transcript targeted.

If the record contains an identified mirna and identified coding target, find the region in which the targeted sequence resides and store the results in the target_reg flag and miRNA_analysis dict. This analysis requries the seg1_type, :ref`seg2_type <seg2_type>`, and mirna_seg flags to be populated. This can be performed by sequentially using the find_seg_types() and mirna_analysis() methods. If the miRNA_seg flag is in {“3p” or “5p”} (the record contains a mirna but is not a mirna dimer), The target will be checked if it is a coding type. If the target is a coding type, the analysis will be performed and the target_reg flag will be set. The analysis requires a dict containing region information that can be made using the make_region_info() method. This dict should contain keys of transcript identifiers with values as a dict containing containing transcript region information.

Example

region_info = {'ENST00000372098': {'cdna_coding_start':'45340255',
                                   'cdna_coding_end':'45340388'}}

This can then either be provided directly to the “region_info” argument of this method, or can be provided to the class via make_region_info(). Both construction and setting the dict to the class together can be done using make_set_region_info().

Parameters:
  • region_info (dict, optional) – Transcript coding/utr region information. If None, uses class: target_region_info dict.
  • coding_types (iterable, optional) – Iterable of strings representing sequence types to be recognized as coding. If None, uses CODING_TYPES.
  • allow_unknown_regions (bool, optional) – Allow missing identifiers in analysis by skipping sequences instead of raising an error. If None, uses setting in settings : allow_unknown_regions.
  • allow_unknown_regions – Warn for missing identifiers in analysis by printing a message. If None, uses setting in settings : allow_undefined_flags.
has_property(prop_type, prop_compare=None, allow_unknown=False)

Check if HybRecord has property of “prop_type”, with detail “prop_compare”.

Check property against list of allowed properties in PROPERTIES. If query property has a comparator, provide this in prop_compare.

Parameters:allow_unknown (bool, optional) – If True, allow undefined properties to be checked and provide return value False.

Examples

General Record Properties:

# hyb_record = hybkit.HybRecord(id, seq....)
is_id = hyb_record.has_property('id', 'target_identifier')
seq_is_ATCG = hyb_record.has_peroperty('seq', 'ATCG')
seq_endswith_ATCG = hyb_record.has_property('seq_suffix', 'ATCG')

Record Type Properties:

# hyb_record = hybkit.HybRecord(id, seq....)
has_seg_types = hyb_record.has_property('has_seg_types')  # -> False
hyb_record.find_types()
has_seg_types = hyb_record.has_property('has_seg_types')  # -> True
# Requires Type Analysis
is_5p_mrna = hyb_record.has_property('seg1_type', 'mRNA')
has_mRNA = hyb_record.has_property('seg_type_contains', 'mRNA')

miRNA Properties:

# hyb_record = hybkit.HybRecord(id, seq....)
# hyb_record.find_types()
mirna_analyzed = hyb_record.has_property('has_mirna_seg')  # -> False
hyb_record.mirna_analysis()
mirna_analyzed = hyb_record.has_property('has_mirna_seg')  # -> True
# Requires mirna analysis
has_mirna = hyb_record.has_property('has_mirna')  # Requires miRNA Analysis
has_5p_mirna = hyb_record.has_property('5p_mirna')

Target Region Properties:

# hyb_record = hybkit.HybRecord(id, seq....)
# hyb_record.find_types()
# hyb_record.mirna_analysis()
targets_analyzed = hyb_record.has_property('has_target_reg')  # -> False
hyb_record.target_region_analysis()
targets_analyzed = hyb_record.has_property('has_target_reg')  # -> True
has_coding_target = hyb_record.has_property('target_coding')
prop(prop_type, prop_compare=None, allow_unknown=False)

Convenience Method for has_property()

to_line(newline=False, sep='\t')

Return a hyb-format string representation of the Hyb record.

Parameters:
  • newline (bool, optional) – If True, end the returned string with a newline.
  • sep (str, optional) – Default: “\t”, Provide a different separator (like “,”) for separation of columns.
classmethod set_find_type_method(find_method, find_params={})

Set the method for use to find seg types with find_seg_types().

This method is for providing a custom function. To use the included functions, use select_find_type_method(). Functions provided to this method must have the signature:

seg_type = custom_method(self, seg_info, find_params)

This method should return the string of the assigned segment type if found, or a None object if the type cannot be found. It can also take a dictionary in the “find_params” argument that specifies additional or dynamic search properties, as desired.

Parameters:
classmethod select_find_type_method(find_method_name, find_params={})

Select method to use with find_seg_types().

Available methods are listed in find_type_methods.

Parameters:
  • find_method_name (str) – Method opttion from find_seg_types to select for use by the find_seg_types() method.
  • find_params (dict, optional) – Dict object of parameters to use by selected method.
classmethod make_region_info(region_csv_name, sep=', ')

Return dict with information on coding transcript utr regions from an input csv.

The input csv must contain a header line, and must have the columns:

identifier,cdna_coding_start,cdna_coding_end

Example

Example return dict object:

region_info = {'ENST00000372098': {'cdna_coding_start':'45340255',
                                   'cdna_coding_end':'45340388'}}

The return dict can then be passed to set_region_info() or supplied directly to the target_region_analysis() method.

Parameters:
  • region_csv_name (str) – String of path to csv file to read information from.
  • sep (str, optional) – Separator for columns of input delimited file. (Default: ‘,’)
classmethod set_region_info(region_info_dict)

Set region_info_dict with information on coding transcript UTR regions.

This dict must have transcript identifiers as keys, with values of dicts with containing: cdna_coding_start, cdna_coding_end

Example

region_info = {'ENST00000372098': {'cdna_coding_start':'45340255',
                                   'cdna_coding_end':'45340388'}}
Parameters:region_info_dict (dict) – Dict of region information to set as target_region_info.
classmethod make_set_region_info(region_csv_name, sep=', ')

Convenience wrapper for calling make_region_info() then set_region_info().

Parameters:
  • region_csv_name (str) – String of path to csv file to read information from.
  • sep (str, optional) – Separator for columns of input delimited file. (Default: ‘,’)
classmethod set_custom_flags(custom_flags)

Set the custom flags allowed by instances of HybRecord.

Parameters:custom_flags (iterable) – List or tuple of flags to allow.
classmethod list_custom_flags()

List the class-level allowed custom flags.

classmethod from_line(line, hybformat_id=False, hybformat_ref=False)

Construct a HybRecord instance from a line in “.hyb” format.

The Hyb Software Package contains further information in the “id” field of the line that can be used to infer read counts represented by the hyb record. Additionally, the Hyb Software Package also utilizes a database by default that contains further information in the names of each respective reference sequence.

Parameters:
  • line (str) – Hyb-format line containing record information.
  • hybformat_id (bool, optional) – Read count information from identifier in “<id>_<count>” format. (Default: False)
  • hybformat_ref (bool, optional) – Read additional record information from identifier in “<gene_id>_<transcript_id>_<gene_name>_<seg_type>” format. (Default: False)
Returns:

HybRecord instance containing record information.

static find_seg_type_hyb(seg_info, find_type_params={}, check_complete=False)

Return the type of the provided segment, or None if segment cannot be identified.

This method works with sequence / alignment mapping identifiers in the format of the reference database provided by the Hyb Software Package, specifically identifiers of the format:

<gene_id>_<transcript_id>_<gene_name>_<seg_type>

This method returns the fourth component of the identifier, split by “_”, as the identfied sequence type.

Example

"MIMAT0000076_MirBase_miR-21_microRNA"  --->  "microRNA".
Parameters:
  • seg_info (dict) – :attr:seg_info from hyb_record
  • find_type_params (dict, optional) – Unused in this method.
  • check_complete (bool, optional) – Unused in this method.
static find_seg_type_string_match(seg_info, find_type_params={}, check_complete=False)

Return the type of the provided segment, or None if the segment cannot be identified.

This method attempts to find a string matching a specific pattern within the identifier of the aligned segment. Search options include “prefix”, “contains”, “suffix”, and “matches”. The required find_type_params dict should contain a key for each desired search type, with a list of 2-tuples for each search-string with assigned-type.

Example

find_type_params = {'suffix': [('_miR', 'microRNA'),
                               ('_trans', 'mRNA')   ]}

This dict can be generated with the associated make_string_match_parameters() method and an associated csv legend file with format:

#commentline
#search_type,search_string,seg_type
suffix,_miR,microRNA
suffix,_trans,mRNA
Parameters:check_complete (bool, optional) – If true, the method will continue checking search options after an option has been found, to ensure that no options conflict (more sure method). If False, it will stop after the first match is found (faster method). (Default: False)
static make_string_match_parameters(legend_file='/home/docs/checkouts/readthedocs.org/user_builds/hybkit/checkouts/stable/hybkit/string_match_params.csv')

Read csv and return a dict of search parameters for find_seg_type_string_match().

The my_legend.csv file should have the format:

#commentline
#search_type,search_string,seg_type
suffix,_miR,microRNA
suffix,_trans,mRNA

Search_type options include “prefix”, “contains”, “suffix”, and “matches” The produced dict object contains a key for each search type, with a list of 2-tuples for each search-string and associated segment-type.

For example:

{'suffix': [('_miR', 'microRNA'),
            ('_trans', 'mRNA')   ]}
static find_seg_type_from_id_map(seg_info, find_type_params={})

Return the type of the provided segment or None if it cannot be identified.

This method checks to see if the identifer of the segment is present in a list provided in find_type_params. find_type_params should be formatted as a dict with keys as sequence identifier names, and the corresponding type as the respective values.

Example

find_type_params = {'MIMAT0000076_MirBase_miR-21_microRNA': 'microRNA',
                    'ENSG00000XXXXXX_NR003287-2_RN28S1_rRNA': 'rRNA'}

This dict can be generated with the associated make_seg_type_id_map() method.

Parameters:find_type_params (dict) – Dict of mapping of sequence identifiers to sequence types.
Returns:Identified sequence type, or None if it cannot be found.
Return type:str
static make_seg_type_id_map(mapped_id_files=None, type_file_pairs=None)

Read file(s) into a mapping of sequence identifiers.

This method reads one or more files into a dict for use with the find_seg_type_from_id_map() method. The method requires passing either a list/tuple of one or more files to mapped_id_files, or a list/tuple of one or more pairs of file lists and file types passed to type_file_pairs. Files listed in the mapped_id_files argument should have the format:

#commentline
#seg_id,seg_type
seg1_unique_id,seg1_type
seg2_unique_id,seg2_type

Entries in the list/tuple passed to type_file_pairs should have the format: (seg1_type, file1_name)

Example

[(seg1_type, file1_name), (seg2_type, file2_name),]

The first entry in each (non-commented, non-blank) file line will be read and added to the mapping dictionary mapped to the provided seg_type.

Parameters:
  • mapped_id_files (list or tuple, optional) – to files containing id/type mapping information.
  • type_file_pairs (list or tuple, optional) – containing id/type mapping information.
static find_type_method(seg_info, find_type_params={}, check_complete=False)

find_type_method is set by default to find_seg_type_hyb()

find_type_methods = {'hyb': <staticmethod object>, 'id_map': <staticmethod object>, 'string_match': <staticmethod object>}

Dict of provided methods available to assign segment types

‘hyb’ find_seg_type_hyb()
‘string_match’ find_seg_type_string_match()
‘id_map’ find_seg_type_id_map()
PROPERTIES = {'3p_mirna', '3p_target', '5p_mirna', '5p_target', 'has_fold_record', 'has_mirna', 'has_mirna_details', 'has_mirna_dimer', 'has_mirna_fold', 'has_mirna_not_dimer', 'has_mirna_seg', 'has_seg1_type', 'has_seg2_type', 'has_seg_types', 'has_target', 'has_target_reg', 'id', 'id_contains', 'id_prefix', 'id_suffix', 'seg', 'seg1', 'seg1_contains', 'seg1_prefix', 'seg1_suffix', 'seg1_type', 'seg1_type_contains', 'seg1_type_prefix', 'seg1_type_suffix', 'seg2', 'seg2_contains', 'seg2_prefix', 'seg2_suffix', 'seg2_type', 'seg2_type_contains', 'seg2_type_prefix', 'seg2_type_suffix', 'seg_contains', 'seg_prefix', 'seg_suffix', 'seg_type', 'seg_type_contains', 'seg_type_prefix', 'seg_type_suffix', 'seq', 'seq_contains', 'seq_prefix', 'seq_suffix', 'target_3p_utr', 'target_5p_utr', 'target_coding'}

FoldRecord Class

class hybkit.FoldRecord(id, seq, fold, energy, seg1_fold_info={}, seg2_fold_info={})

Class for storing secondary structure (folding) information for a nucleotide sequence.

This class supports the following file types: (Data courtesy of Gay et al. [see References])

  • The Vienna file format (see References):
    Example:
    34_151138_MIMAT0000076_MirBase_miR-21_microRNA_1_19-...
    TAGCTTATCAGACTGATGTTAGCTTATCAGACTGATG
    .....((((((.((((((......)))))).))))))   (-11.1)
    
  • The Viennad file format utilizied in the Hyb Software package:
    Example:
    34_151138_MIMAT0000076_MirBase_miR-21_microRNA_1_19-34-...
    TAGCTTATCAGACTGATGTTAGCTTATCAGACTGATG
    TAGCTTATCAGACTGATGT------------------   miR-21_microRNA 1       19
    -------------------TAGCTTATCAGACTGATG   miR-21_microRNA 1       18
    .....((((((.((((((......)))))).))))))   (-11.1)
    [space-line]
    
  • The Ct file format utilized by the UNAFold Software Package:
    Example:
    41        dG = -8 dH = -93.9      seq1_name-seq2_name
    1 A       0       2       0       1       0       0
    2 G       1       3       0       2       0       0
    ...
    40        G       39      41      11      17      39      41
    41        T       40      0       10      18      40      0
    

A minimum amount of data necessary for a FoldRecord object is a sequence identifier, a genomic sequence, and its fold representaiton.

Parameters:
  • id (str) – Identifier for record
  • seq (str) – Nucleotide sequence of record.
  • fold (str) – Fold representation of record.
  • energy (str or float, optional) – Energy of folding for record.
  • seg1_fold_info (dict, optional) – Information about first portion (5p) of fold record.
  • seg2_fold_info (dict, optional) – Information about second portion (3p) for fold record.
Variables:
  • id (str) – Sequence Identifier (often seg1name-seg2name)
  • seq (str) – Genomic Sequence
  • fold (str) – Fold Representation, ‘(‘, ‘.’, and ‘)’ characters
  • energy (float or None) – Predicted energy of folding
  • seg1_fold_info (dict) – Information on segment 1, contains keys: ‘ref’ (str), ‘ref_short’ (str), ‘ref_start’ (int), ‘ref_end’ (int), ‘highlight’ (str), and ‘seg_fold’ (str).
  • seg2_fold_info (dict) – Information on segment 2, contains keys: ‘ref’ (str), ‘ref_short’ (str), ‘ref_start’ (int), ‘ref_end’ (int), ‘highlight’ (str), and ‘seg_fold’ (str).
DEFAULTS = {'placeholder': '.', 'skip_bad': False, 'warn_bad': True}

Class-level default settings, copied into settings at runtime.

settings = {'placeholder': '.', 'skip_bad': False, 'warn_bad': True}

Modifiable settings during usage. Copied at runtime from DEFAULTS.

seg1_id()

Return a copy of the id for segment 1 (5p), or None if not defined.

seg2_id()

Return a copy of the id for segment 2 (3p), or None if not defined.

seg_ids()

Return a tuple of the ids of segment 1 (5p) segment 2 (3p), or a tuple of None.

seg1_detail(detail)

Return a detail for seg1, or if it does not exist return None.

Parameters:detail (str) – ‘Key for information from seg1_info_dict to return.
seg2_detail(detail)

Return a detail for seg2, or if it does not exist return None.

Parameters:detail (str) – ‘Key for information from seg2_info_dict to return.
set_seg1_fold_info(seg_info_obj)

Set fold information for segment 1.

Parameters:seg_info_obj (dict) – Dict object with keys as specified for seg1_info_obj in Attributes.
set_seg2_fold_info(seg_info_obj)

Set fold information for segment 2.

Parameters:seg_info_obj (dict) – Dict object with keys as specified for seg2_info_obj in Attributes.
to_vienna_lines(newline=False)

Return a list of lines for the record in vienna format.

See (Vienna File Format).

Parameters:newline (bool, optional) – If True, add newline character to the end of each returned line. (Default: False)
to_vienna_string(newline=False)

Return a 3-line string for the record in vienna format.

See (Vienna File Format).

Parameters:newline (bool, optional) – If True, terminate the returned string with a newline character. (Default: False)
to_viennad_lines(newline=False)

Return a list of lines for the record in viennad format.

For an example of the Viennad format, see the documentation for FoldRecord.

Parameters:newline (bool, optional) – If True, add newline character to the end of each returned line. (Default: False)
to_viennad_string(newline=False)

Return a six-line string representation for the record in viennad format.

For an example of the Viennad format, see the documentation for FoldRecord.

Parameters:newline (bool, optional) – If True, add newline character to the end of the string. (Default: False)
check_hyb_record_match(hyb_record)

Check whether this record’s seq attribute matches provided HybRecord.seq.

Return True if the sequence (“.seq”) attribute of a HybRecord instance matches the sequence (“.seq”) attribute of this instance.

classmethod from_vienna_lines(record_lines, hybformat_file=False)

Construct instance from a list of 3 strings of Vienna-format lines.

The Hyb Software Package contains further information in the “name” field of the vienna record that can be used to infer further information about the fold divisions.

Parameters:
  • record_lines (str or tuple) – Iterable of 3 strings corresponding to lines of a vienna-format record.
  • hybformat_file (bool, optional) – If True, extra information stored in the record identifier by Hyb will be parsed.
classmethod from_vienna_string(record_string, hybformat_file=False)

Construct instance from a string representing 3 Vienna-format lines.

The Hyb Software Package contains further information in the “name” field of the vienna record that can be used to infer further information about the fold divisions.

Parameters:
  • record_lines (str or tuple) – Iterable of 3 strings corresponding to lines of a vienna-format record.
  • hybformat_file (bool, optional) – If True, extra information stored in the record identifier by Hyb will be parsed.
classmethod from_viennad_lines(record_lines, hybformat_file=False, skip_bad=None, warn_bad=None)

Construct instance from a list of 5 or 6 strings of Viennad-format lines.

The Hyb Software Package contains further information in the “name” field of the vienna record that can be used to infer further information about the fold divisions.

Parameters:
  • record_lines (str or tuple) – Iterable of 3 strings corresponding to lines of a vienna-format record.
  • hybformat_file (bool, optional) – If True, extra information stored in the record identifier by Hyb will be parsed.
  • (bool, optional (skip_bad) – If True, return None when parsing badly-formatted entries instead of raising an error. If None, uses setting in settings : skip_bad.
  • (bool, optional – If True, print a warning message when attempting to parse badly-formatted entries. If None, uses setting in settings : warn_bad.
classmethod from_viennad_string(record_string)

Create a FoldRecord entry from a string containing 5 or 6 lines corresponding to lines in the Viennad format.

classmethod from_ct_lines(record_lines, hybformat_file=False)

Create a FoldRecord entry from a list of an arbitrary number of strings corresponding to lines in the “.ct” file format. The Hyb Software Package contains further information in the “name” field of the ct record that can be used to infer further information about the fold divisions. If hybformat_file is provided as True, this extra information will be read. extra information.

classmethod from_ct_string(record_string)

Create a FoldRecord entry from a string containing an arbitrary number of lines corresponding to lines in the “.ct” file format.

HybFile Class

class hybkit.HybFile(*args, **kwargs)

File-Object wrapper that provides abiltity to return file lines as HybRecord entries.

The Hyb Software Package contains further information in the “name” field of the viennad record that can be used to infer further information about the fold divisions. Set this value to True with hybkit.ViennadFile.settings[‘hybformat_file’] = True to read this extra information.

DEFAULTS = {'hybformat_id': False, 'hybformat_ref': False}

Class-level default settings, copied into settings at runtime.

settings = {'hybformat_id': False, 'hybformat_ref': False}

Modifiable settings during usage. Copied at runtime from DEFAULTS.

close()

Close the file.

read_record()

Return next line of hyb file as HybRecord object.

read_records()

Return list of all records in hyb file as HybRecord objects.

write_record(write_record)

Write a HybRecord object to file as a Hyb-format string.

Unlike the file.write() method, this method will add a newline to the end of each written record line.

write_records(write_records)

Write a sequence of HybRecord objects as hyb-format lines to the Hyb file.

Unlike the file.writelines() method, this method will add a newline to the end of each written record line.

classmethod open(*args, **kwargs)

Return a new HybFile object.

ViennaFile Class

class hybkit.ViennaFile(*args, **kwargs)

Vienna file wrapper that returns “.vienna” file lines as FoldRecord objects.

The Hyb Software Package contains further information in the “name” field of the viennad record that can be used to infer further information about the fold divisions. Set this value to True with hybkit.ViennaFile.settings[‘hybformat_file’] = True to read this extra information.

DEFAULTS = {'hybformat_file': False}
close()

Close the file handle.

classmethod open(*args, **kwargs)

Return a new FoldFile object.

read_record()

Return next FoldRecord based on the appropriate file type.

read_records()

Return list of all FoldRecord objects based on the appropriate file type.

settings = {'hybformat_file': False}
write_record(write_record)

Write a FoldRecord object as the appropriate record/file type.

Unlike the file.write() method, this method will add a newline to the end of each written record line.

write_records(write_records)

Write a sequence of FoldRecord objects as the appropraite record/file type.

Unlike the file.writelines() method, this method will add a newline to the end of each written record line.

ViennadFile Class

class hybkit.ViennadFile(*args, **kwargs)

Viennad file wrapper that returns “.viennad” file lines as FoldRecord objects.

The Hyb Software Package contains further information in the “name” field of the viennad record that can be used to infer further information about the fold divisions. Set this value to True with hybkit.ViennadFile.settings[‘hybformat_file’] = True to read this extra information.

DEFAULTS = {'hybformat_file': False, 'parse_by_blank_line': False}
settings = {'hybformat_file': False, 'parse_by_blank_line': False}
close()

Close the file handle.

classmethod open(*args, **kwargs)

Return a new FoldFile object.

read_record()

Return next FoldRecord based on the appropriate file type.

read_records()

Return list of all FoldRecord objects based on the appropriate file type.

write_record(write_record)

Write a FoldRecord object as the appropriate record/file type.

Unlike the file.write() method, this method will add a newline to the end of each written record line.

write_records(write_records)

Write a sequence of FoldRecord objects as the appropraite record/file type.

Unlike the file.writelines() method, this method will add a newline to the end of each written record line.

CtFile Class

class hybkit.CtFile(*args, **kwargs)

Ct file wrapper that returns “.ct” file lines as FoldRecord objects.

The Hyb Software Package contains further information in the “name” field of the viennad record that can be used to infer further information about the fold divisions. Set this value to True with hybkit.CtFile.settings[‘hybformat_file’] = True to read this extra information.

DEFAULTS = {'hybformat_file': False}
close()

Close the file handle.

classmethod open(*args, **kwargs)

Return a new FoldFile object.

read_record()

Return next FoldRecord based on the appropriate file type.

read_records()

Return list of all FoldRecord objects based on the appropriate file type.

settings = {'hybformat_file': False}
write_record(write_record)

Write a FoldRecord object as the appropriate record/file type.

Unlike the file.write() method, this method will add a newline to the end of each written record line.

write_records(write_records)

Write a sequence of FoldRecord objects as the appropraite record/file type.

Unlike the file.writelines() method, this method will add a newline to the end of each written record line.

HybFoldIter Class

class hybkit.HybFoldIter(hybfile_handle, foldfile_handle, combine=False)

Iterator for simultaneous iteration over a HybFile and FoldFile object.

This class provides an iterator to iterate through a HybFile and one of a ViennaFile, ViennadFile, or CtFile simultaneously. It is assumed that each HybRecord and FoldRecord in the respective files are matching (though users are encouraged to inspect this themselves). If the “combine” argument is provided as true, the obtained FoldRecord will be set as HybRecord.fold_record of the returned HybRecord object. Otherwise, each iteration will produce a tuple (HybRecord, FoldRecord) containing the obtained information.

Parameters:
  • hybfile_handle (HybFile) – HybFile object for iteration
  • foldfile_handle (FoldFile) – FoldFile object for iteration
  • combine (bool) – Return a combined HybRecord object containing the obtained FoldRecord object as HybRecord.fold_record.
Returns:

Each next() call returns a combined HybRecord object if “combine” is True.
Otherwise returns a tuple of (HybRecord, FoldRecord) if “combine” is False.