hybkit (module)

Module storing primary hybkit classes and hybkit API.

This module contains classes and methods for reading, writing, and manipulating data in the “.hyb” genomic sequence format ([Travis2014]).

This is primarily based on three classes for storage of chimeric sequence information and associated fold-information:

HybRecord

Class for storage of hybrid sequence records

FoldRecord

Class for storage of predicted RNA secondary structure information for chimeric sequence reads

DynamicFoldRecord

Class for storage of predicted RNA secondary structure information for sequence constructed from aligned portions of chimeric sequence reads

It also includes classes for reading, writing, and iterating over files containing that information:

HybFile

Class for reading and writing “.hyb”-format files [Travis2014] containing chimeric RNA sequence information as HybRecord objects

ViennaFile

Class for reading and writing Vienna (.vienna)-format files [ViennaFormat] containing RNA secondary structure information dot-bracket format as FoldRecord objects

CtFile

Class for reading Connectivity Table (.ct)-format files [CTFormat] containing predicted RNA secondary-structure information as used by UNAFold as FoldRecord objects

HybFoldIter

Class for concurrent iteration over a HybFile and one of a ViennaFile or CtFile

HybRecord Class

class hybkit.HybRecord(id, seq, energy=None, seg1_props={}, seg2_props={}, flags={}, read_count=None, fold_record=None)

Class for storing and analyzing chimeric (hybrid) RNA-seq reads in “.hyb” format.

Hyb file (“.hyb”) format entries are a GFF-related file format described by [Travis2014]. that contain information about a genomic sequence read identified to be a chimera by anlaysis software. Each line contains 15 or 16 columns separated by tabs (”\t”) and provides annotations on each components. An example .hyb format line from [Gay2018]:

2407_718\tATCACATTGCCAGGGATTTCCAATCCCCAACAATGTGAAAACGGCTGTC\t.\tMIMAT0000078_MirBase_miR-23a_microRNA\t1\t21\t1\t21\t0.0027\tENSG00000188229_ENST00000340384_TUBB2C_mRNA\t23\t49\t1181\t1207\t1.2e-06

These columns are respectively described in hybkit as:

id, seq, energy, seg1_ref_name, seg1_read_start, seg1_read_end, seg1_ref_start, seg1_ref_end, seg1_score, seg2_ref_name, seg2_read_start, seg2_read_end, seg2_ref_start, seg2_ref_end, seg2_score, flag1=val1;flag2=val2;flag3=val3…”

The preferred method for reading hyb records from lines is with the HybRecord.from_line() constructor:

# line = "2407_718\tATC..."
hyb_record = hybkit.HybRecord.from_line(line)

This constructor parses hyb files using the HybFile class. For example, to print all hybrid identifiers in a hyb file:

with hybkit.HybFile('path/to/file.hyb', 'r') as hyb_file:
    for hyb_record in hyb_file:
        print(hyb_record.id)

HybRecord objects can also be constructed directly. A minimum amount of data necessary for a HybRecord object is the genomic sequence and its corresponding identifier.

Examples

hyb_record_1 = hybkit.HybRecord('1_100', 'ACTG')
hyb_record_2 = hybkit.HybRecord('2_107', 'CTAG', '-7.3')

Details about segments are provided via python dictionaries with keys specific to each segment. Data can be provided either as strings or as floats/integers (where appropriate). For example, to create a HybRecord object representing the example line given above:

seg1_props = {'ref_name': 'MIMAT0000078_MirBase_miR-23a_microRNA',
             'read_start': '1',
             'read_end': '21',
             'ref_start': '1',
             'ref_end': '21',
             'score': '0.0027'}
seg2_props = {'ref_name': 'ENSG00000188229_ENST00000340384_TUBB2C_mRNA',
             'read_start': 23,
             'read_end': 49,
             'ref_start': 1181,
             'ref_end': 1207,
             'score': 1.2e-06}
seq_id = '2407_718'
seq = 'ATCACATTGCCAGGGATTTCCAATCCCCAACAATGTGAAAACGGCTGTC'
energy = None

hyb_record = hybkit.HybRecord(seq_id, seq, energy, seg1_props, seg2_props)
# OR
hyb_record = hybkit.HybRecord(seq_id, seq, seg1_props=seg1_props, seg2_props=seg2_props)
Parameters
  • id (str) – Identifier for the hyb record

  • seq (str) – Nucleotide sequence of the hyb record

  • energy (str, optional) – Predicted energy of sequence folding in kcal/mol

  • seg1_props (dict, optional) – Properties of segment 1 of the record, containing possible segment column keys: (‘ref_name’, ‘read_start’, ‘read_end’, ‘ref_start’, ‘ref_end’, ‘score’)

  • seg2_props (dict, optional) – Properties of segment 2 of the record, containing possible: segment column keys: (‘ref_name’, ‘read_start’, ‘read_end’, ‘ref_start’, ‘ref_end’, ‘score’)

  • flags (dict, optional) – Dict with keys of flags for the record and their associated values. By default flags must be defined in ALL_FLAGS but custom flags can be supplied in settings['custom_flags']. This setting can also be disabled by setting ‘allow_undefined_flags’ to True in HybRecord.settings.

  • fold_record (FoldRecord, optional) – Set the record’s fold_record attribute as the provided FoldRecord object using set_fold_record() on initializtaion.

Variables
  • id (str) – Identifier for the hyb record (Hyb format: “<read-num>_<read-count>”)

  • seq (str) – Nucleotide sequence of the hyb record

  • energy (str or None) – Predicted energy of folding

  • seg1_props (dict) – Information on chimeric segment 1, contains segment column keys: ‘ref_name’ (str), ‘read_start’ (int), ‘read_end’ (int), ‘ref_start’ (int), ‘ref_end’ (int), and ‘score’ (float).

  • seg2_props (dict) – Information on segment 2, contains segment column keys: ‘ref_name’ (str), ‘read_start’ (int), ‘read_end’ (int), ‘ref_start’ (int), ‘ref_end’ (int), and ‘score’ (float).

  • flags (dict) – Dict of flags with possible flag keys and values as defined in the Flags section of the hybkit Hyb File Specification.

  • mirna_props (dict or None) – Link to appropriate seg1_props or seg2_props dict corresponding to a record’s miRNA (if present), assigned by the eval_mirna() method.

  • target_props (dict or None) – Link to appropriate seg1_props or seg2_props dict corresponding to a record’s target of a miRNA (if present), assigned by the eval_mirna() method.

  • fold_record (FoldRecord) – Information on the predicted secondary structure of the sequence set by set_fold_record().

HYBRID_COLUMNS = ['id', 'seq', 'energy']

Record columns 1-3 defining parameters of the overall hybrid, defined by the Hyb format

SEGMENT_COLUMNS = ['ref_name', 'read_start', 'read_end', 'ref_start', 'ref_end', 'score']

Record columns 4-9 and 10-15, respectively, defining annotated parameters of seg1 and seg2 respectively, defined by the Hyb format

ALL_FLAGS = ['count_total', 'count_last_clustering', 'two_way_merged', 'seq_IDs_in_cluster', 'read_count', 'orient', 'seg1_type', 'seg2_type', 'seg1_det', 'seg2_det', 'miRNA_seg', 'target_reg', 'ext', 'dataset']

Flags defined by the hybkit package. Flags 1-4 are utilized by the Hyb software package. For information on flags, see the Flags portion of the hybkit Hyb File Specification.

settings = {'allow_undefined_flags': False, 'allow_unknown_seg_types': False, 'check_complete_seg_types': False, 'custom_flags': [], 'hyb_placeholder': '.', 'mirna_types': ['miRNA', 'microRNA'], 'reorder_flags': True}

Class-level settings. See settings.HybRecord_settings for descriptions.

TypeFinder

Link to type_finder.TypeFinder class for parsing sequence identifiers in assigning segment types by eval_types().

set_flag(flag_key, flag_val, allow_undefined_flags=None)

Set the value of record flag_key to flag_val.

Parameters
get_seg1_type(require=False)

Return the seg1_type flag if defined, or return None.

Parameters

require (bool, optional) – If True, raise an error if seg1_type is not defined.

get_seg2_type(require=False)

Return the seg2_type flag if defined, or return None.

Parameters

require (bool, optional) – If True, raise an error if seg2_type is not defined.

get_seg_types(require=False)

Return seg1_type, seg2_type flags, or None.

Parameters

require (bool, optional) – If True, raise an error if either flag is not defined.

get_read_count(require=False)

Return the read_count flag if defined, otherwise return None.

Parameters

require (bool, optional) – If True, raise an error if the “read_count” flag is not defined.

get_record_count(require=False)

Return count_total flag if defined, or return 1 (this record).

Parameters

require (bool, optional) – If True, raise an error if the “count_total” flag is not defined.

get_count(count_mode, require=False)

Return either of get_read_count() or get_record_count().

Parameters
  • count_mode (str) –

    Mode for returned count, one of : {‘read’, ‘record’}
    read : Require the ‘read_count’ flag to be defined.
    record : Return ‘1’ if the ‘count_total’ flag is not defined.

  • require (bool, optional) – If True, raise an error if the “count_total” flag is not defined.

eval_types(allow_unknown=None, check_complete=None)

Find the types of each segment using the the TypeFinder class.

This method provides seg1_props and seg2_props to the TypeFinder class, linked as attribute HybRecord.TypeFinder. This uses the method: TypeFinder.method() set by TypeFinder.set_method() or TypeFinder.set_custom_method() to set the seg1_type, seg2_type flags if not already set.

To use a type-finding method other than the default, prepare the TypeFinder class by preparing and setting TypeFinder.params and using TypeFinder.set_method().

Parameters
  • allow_unknown (bool, optional) – If True, allow segment types that cannot be identified and set them as “unknown”. Otherwise raise an error. If None (default), uses setting in settings['allow_unknown_seg_types'].

  • check_complete (bool, optional) – If True, check every possibility for the type of a given segment (where applicable), instead of stopping after finding the first type. If None (default), uses setting in settings['check_complete_seg_types'].

set_fold_record(fold_record)

Check and set provided fold_record (FoldRecord) as fold_record.

Ensures that fold_record argument is an instance of FoldRecord and has a matching sequence to this HybRecord, then set as self.fold_record.

Parameters

fold_record (FoldRecord) – FoldRecord instance to set as fold_record.

eval_mirna(mirna_types=None)

Analyze and set mstore miRNA properties from other properties in the hyb record.

If not already done, determine whether a miRNA exists within this record and set the miRNA_seg flag. This evaluation requries the seg1_type and seg2_type flags to be populated, which can be performed by the eval_types() method. If the record contains a miRNA, link the mirna_props and target_props dicts to the corresponding seg1_props / seg2_props dicts as appropriate.

Parameters

mirna_types (list, tuple, or set, optional) – Iterable of strings of “types” to be considered as miRNA. Otherwise, the default types are used from settings['mirna_types'].

mirna_detail(detail='all', allow_mirna_dimers=False)

Provide a detail about the miRNA or target following eval_mirna().

Analyze miRNA properties within the sequence record and provide a detail as ouptut. Unless allow_mirna_dimers is True, this method requires record to contain a non-dimer miRNA, otherwise an error will be raised.

Parameters
  • detail (str) –

    Type of detail to return. Options include:
    all : (dict of all properties, default)
    mirna_ref : Identifier for Assigned miRNA
    target_ref : Identifier for Assigned Target
    mirna_seg_type : Assigned seg_type of miRNA
    target_seg_type : Assigned seg_type of target
    mirna_seq : Annotated subsequence of miRNA
    target_seq : Annotated subsequence of target
    mirna_fold : Annotated fold substring of miRNA (requires fold_record set)
    target_fold : Annotated fold substring target (requires fold_record set)

  • allow_mirna_dimers (bool, optional) – Allow miRNA/miRNA dimers. The 5p-position will be assigned as the “miRNA”, and the 3p-position will be assigned as the “target”.

is_set(prop)

Return True if HybRecord property “prop” is set (if relevant) and is not None.

Options described in SET_PROPS.

Parameters

prop (str) – Property / Analysis to check

has_prop(prop, prop_compare=None)

Return True if HybRecord has property: prop.

Check property against list of allowed properties in HAS_PROPS. If query property has a string comparator, provide this in prop_compare. Raises an error if a prerequisite field is not set (use is_set() to check whether properties are set).

Specific properties available to check are described in attributes:

GEN_PROPS

General Record Properties

STR_PROPS

Field String Comparison Properties

MIRNA_PROPS

miRNA-Associated Record Properties

MIRNA_STR_PROPS

miRNA-Associated String Comparison Properties

TARGET_PROPS

miRNA-Target-Associated Properties

Parameters
  • prop (str) – Property to check

  • prop_compare (str, optional) – Comparator to check.

to_line(newline=False, sep='\t')

Return a hyb format string representation of the record.

Parameters
  • newline (bool, optional) – If True, end the returned string with a newline

  • sep (str, optional) – Separator between fields (Default: “\t”)

to_csv(newline=False)

Return a comma-separated hyb-format string representation of the record.

Parameters

newline (bool, optional) – If True, end the returned string with a newline.

to_fasta_record(mode='hybrid', annotate=True)

Return nucleotide sequence as BioPython SeqRecord object.

Parameters
  • mode (str, optional) –

    Determines which sequence component to return. Options:
    hybrid: Entire hybrid sequence (default)
    seg1: Sequence 1 (if defined)
    seg2: Sequence 2 (if defined)
    miRNA: miRNA sequence of miRNA/target pair (if defined, else None)
    target: Target sequence of miRNA/target pair (if defined, else None)

  • annotate (bool, optional) – Add name of components to fasta sequence identifier if present.

to_fasta_str(mode='hybrid', annotate=True)

Return nucleotide sequence as a fasta string.

Parameters
  • mode (str, optional) –

    Determines which sequence component to return. Options:
    hybrid: Entire hybrid sequence (default)
    seg1: Sequence 1 (if defined)
    seg2: Sequence 2 (if defined)
    miRNA: miRNA sequence of miRNA/target pair (if defined, else None)
    target: Target sequence of miRNA/target pair (if defined, else None)

  • annotate (bool, optional) – Add name of components to fasta sequence identifier if present.

classmethod from_line(line, hybformat_id=False, hybformat_ref=False)

Construct a HybRecord instance from a single-line hyb-format string.

The Hyb software package ([Travis2014]) records read-count information in the “id” field of the record, which can be read by setting hybformat_id=True. Additionally, the Hyb hOH7 database contains the segment type in the identifier of each reference in the 4th field, which can be read by setting hybformat_ref=True.

Parameters
  • line (str) – Hyb-format string containing record information.

  • hybformat_id (bool, optional) – Read count information from identifier in “<read_id>_<read_count>” format. (Default: False)

  • hybformat_ref (bool, optional) – Read additional record information from identifier in “<gene_id>_<transcript_id>_<gene_name>_<seg_type>” format. (Default: False)

Returns

HybRecord instance containing record information.

SET_PROPS = ['energy', 'full_seg_props', 'fold_record', 'eval_types', 'eval_mirna', 'eval_target']

Properties for the is_set() method.

  • energy : record.energy is not None

  • full_seg_props : Each seg key is in segN_props dict and is not None

  • fold_record : record.fold_record has been set

  • eval_types : seg1_type and seg2_type flags have been set

  • eval_mirna : miRNA_seg flag has been set

GEN_PROPS = ['has_indels']

General record properties for the has_prop() method.

  • has_indels : either seg1 or seg2 alignments has insertions/deltions, shown by differing read/refernce length for the same alignment

STR_PROPS = ['id_is', 'id_prefix', 'id_suffix', 'id_contains', 'seq_is', 'seq_prefix', 'seq_suffix', 'seq_contains', 'seg1_is', 'seg1_prefix', 'seg1_suffix', 'seg1_contains', 'seg2_is', 'seg2_prefix', 'seg2_suffix', 'seg2_contains', 'any_seg_is', 'any_seg_prefix', 'any_seg_suffix', 'any_seg_contains', 'all_seg_is', 'all_seg_prefix', 'all_seg_suffix', 'all_seg_contains', 'seg1_type_is', 'seg1_type_prefix', 'seg1_type_suffix', 'seg1_type_contains', 'seg2_type_is', 'seg2_type_prefix', 'seg2_type_suffix', 'seg2_type_contains', 'any_seg_type_is', 'any_seg_type_prefix', 'any_seg_type_suffix', 'any_seg_type_contains', 'all_seg_type_is', 'all_seg_type_prefix', 'all_seg_type_suffix', 'all_seg_type_contains']

String-comparison properties for the has_prop() method.

  • Field Types:

    • id : record.id

    • seq : record.seq

    • seg1 : record.seg1_props[‘ref_name’]

    • seg2 : record.seg2_props[‘ref_name’]

    • any_seg : record.seg1_props[‘ref_name’] OR record.seg1_props[‘ref_name’]

    • all_seg : record.seg1_props[‘ref_name’] AND record.seg1_props[‘ref_name’]

    • seg1_type : seg1_type flag

    • seg2_type : seg2_type flag

    • any_seg_type : seg1_type OR seg2_type flags

    • all_seg_type : seg1_type AND seg2_type flags

  • Comparisons:

    • is : Comparison string matches field exactly

    • prefix : Comparison string matches beginning of field

    • suffix : Comparison string matches end of field

    • contains : Comparison string is contained within field

MIRNA_PROPS = ['has_mirna', 'no_mirna', 'mirna_dimer', 'mirna_not_dimer', '5p_mirna', '3p_mirna']

miRNA-evaluation-related properties for the has_prop() method. Requires miRNA_seg field to be set by eval_mirna() method.

  • has_mirna : Seg1 or seg2 has been identified as a miRNA

  • no_mirna : Seg1 and seg2 have been identified as not a miRNA

  • mirna_dimer : Both seg1 and seg2 have been identified as a miRNA

  • mirna_not_dimer : Only one of seg1 or seg2 has been identifed as a miRNA

  • 5p_mirna : Seg1 (5p) has been identifed as a miRNA

  • 3p_mirna : Seg2 (3p) has been identifed as a miRNA

MIRNA_STR_PROPS = ['mirna_is', 'mirna_prefix', 'mirna_suffix', 'mirna_contains', 'target_is', 'target_prefix', 'target_suffix', 'target_contains', 'mirna_seg_type_is', 'mirna_seg_type_prefix', 'mirna_seg_type_suffix', 'mirna_seg_type_contains', 'target_seg_type_is', 'target_seg_type_prefix', 'target_seg_type_suffix', 'target_seg_type_contains']

miRNA-evaluation & string-comparison properties for the has_prop() method. Requires miRNA_seg field to be set by eval_mirna() method.

  • Field Types:

    • mirna : segN_props[‘ref_name’] for identified miRNA segN_props

    • target : segN_props[‘ref_name’] for identified target segN_props

    • mirna_type : segN_type for identified miRNA segN for miRNA/target hybrid

    • target_type : segN_type for identified target segN for miRNA/target hybrid

  • Comparisons:

    • is : Comparison string matches field exactly

    • prefix : Comparison string matches beginning of field

    • suffix : Comparison string matches end of field

    • contains : Comparison string is contained within field

TARGET_PROPS = ['target_none', 'target_unknown', 'target_ncrna', 'target_5p_utr', 'target_3p_utr', 'target_coding']

Target-evaluation-related properties for the has_prop() method. Requires target_reg field to be set.

  • target_none : Identified to have no miRNA target

  • target_unknown : Unknown whether there is a miRNA target

  • target_ncrna : miRNA target is identified as in a noncoding transcript

  • target_5p_utr : miRNA target is identified as in the 5p UnTranslated Region of a coding transcript

  • target_3p_utr : miRNA target is identified as in the 5p UnTranslated Region of a coding transcript

  • target_coding : miRNA target is identified as in coding region of a coding transcript

HAS_PROPS = ['has_indels', 'id_is', 'id_prefix', 'id_suffix', 'id_contains', 'seq_is', 'seq_prefix', 'seq_suffix', 'seq_contains', 'seg1_is', 'seg1_prefix', 'seg1_suffix', 'seg1_contains', 'seg2_is', 'seg2_prefix', 'seg2_suffix', 'seg2_contains', 'any_seg_is', 'any_seg_prefix', 'any_seg_suffix', 'any_seg_contains', 'all_seg_is', 'all_seg_prefix', 'all_seg_suffix', 'all_seg_contains', 'seg1_type_is', 'seg1_type_prefix', 'seg1_type_suffix', 'seg1_type_contains', 'seg2_type_is', 'seg2_type_prefix', 'seg2_type_suffix', 'seg2_type_contains', 'any_seg_type_is', 'any_seg_type_prefix', 'any_seg_type_suffix', 'any_seg_type_contains', 'all_seg_type_is', 'all_seg_type_prefix', 'all_seg_type_suffix', 'all_seg_type_contains', 'has_mirna', 'no_mirna', 'mirna_dimer', 'mirna_not_dimer', '5p_mirna', '3p_mirna', 'mirna_is', 'mirna_prefix', 'mirna_suffix', 'mirna_contains', 'target_is', 'target_prefix', 'target_suffix', 'target_contains', 'mirna_seg_type_is', 'mirna_seg_type_prefix', 'mirna_seg_type_suffix', 'mirna_seg_type_contains', 'target_seg_type_is', 'target_seg_type_prefix', 'target_seg_type_suffix', 'target_seg_type_contains', 'target_none', 'target_unknown', 'target_ncrna', 'target_5p_utr', 'target_3p_utr', 'target_coding']

All allowed properties for the has_prop() method. See GEN_PROPS, STR_PROPS, MIRNA_PROPS, MIRNA_STR_PROPS, and TARGET_PROPS for details.

FoldRecord Class

class hybkit.FoldRecord(id, seq, fold, energy)

Class for storing secondary structure (folding) information for a nucleotide sequence.

This class supports the following file types: (Data courtesy of [Gay2018])

  • The “.vienna” file format used by the ViennaRNA package ([ViennaFormat]; [Lorenz2011]):
    Example:
    34_151138_MIMAT0000076_MirBase_miR-21_microRNA_1_19-...
    TAGCTTATCAGACTGATGTTAGCTTATCAGACTGATG
    .....((((((.((((((......)))))).))))))   (-11.1)
    
  • The “.ct” file format used by UNAFold and other packages ([CTFormat], [Zuker2003]):
    Example:
    41        dG = -8 dH = -93.9      seq1_name-seq2_name
    1 A       0       2       0       1       0       0
    2 G       1       3       0       2       0       0
    ...
    ...
    ...
    40        G       39      41      11      17      39      41
    41        T       40      0       10      18      40      0
    

A minimum amount of data necessary for a FoldRecord object is a sequence identifier, a genomic sequence, and its fold representaiton.

Parameters
  • id (str) – Identifier for record

  • seq (str) – Nucleotide sequence of record.

  • fold (str) – Fold representation of record.

  • energy (str or float, optional) – Energy of folding for record.

Variables
  • id (str) – Sequence Identifier (often seg1name-seg2name)

  • seq (str) – Genomic Sequence

  • fold (str) – Dot-bracket Fold Representation, ‘(’, ‘.’, and ‘)’ characters

  • energy (float or None) – Predicted energy of folding

settings = {'allowed_mismatches': 0, 'fold_placeholder': '.'}

Class-level settings. See settings.FoldRecord_settings for descriptions.

to_vienna_lines(newline=False)

Return a list of lines for the record in vienna format.

See (Vienna File Format).

Parameters

newline (bool, optional) – If True, add newline character to the end of each returned line. (Default: False)

to_vienna_string(newline=False)

Return a 3-line string for the record in vienna format.

See (Vienna File Format).

Parameters

newline (bool, optional) – If True, terminate the returned string with a newline character. (Default: False)

count_hyb_record_mismatches(hyb_record)

Count mismatches between hyb_record.seq and fold_record.seq.

Parameters

hyb_record (HybRecord) – hyb_record for comparison.

matches_hyb_record(hyb_record)

Return True if self.seq == hyb_record.seq.

Parameters

hyb_record (HybRecord) – hyb_record to compare.

ensure_matches_hyb_record(hyb_record)

Ensure self.seq == hyb_record.seq.

Parameters

hyb_record (HybRecord) – hyb_record to compare.

classmethod from_vienna_lines(record_lines, error_mode='raise')

Construct instance from a list of 3 strings of vienna-format ([ViennaFormat]) lines.

Parameters
  • record_lines (str or tuple) – Iterable of 3 strings corresponding to lines of a vienna-format record.

  • error_mode (str, optional) –

    Error mode. Options:
    raise : Raise an error when encountered and exit program
    warn_return : Print a warning and return the error_value
    return : Return the error value with no program output.

classmethod from_vienna_string(record_string, error_mode='raise')

Construct instance from a string representing 3 vienna-format ([ViennaFormat]) lines.

Parameters
  • record_string (str or tuple) – 3-line string containing a vienna-format record

  • error_mode (str, optional) – ‘string representing the error mode. Options: “raise”: Raise an error when encountered and exit program; “warn_return”: Print a warning and return the error_value ; “return”: Return the error value with no program output. record is encountered.

classmethod from_ct_lines(record_lines, error_mode='raise')

Create a FoldRecord from a list of record lines in “.ct” format ([CTFormat]).

Parameters
  • record_lines (list or tuple) – list containing lines of ct record

  • error_mode (str, optional) –

    Error mode. Options:
    raise : Raise an error when encountered and exit program
    warn_return : Print a warning and return the error_value
    return : Return the error value with no program output.

classmethod from_ct_string(record_string, error_mode='raise')

Create a FoldRecord entry from a multi-line string from “.ct” format ([CTFormat]).

Parameters
  • record_string (str) – String containing lines of ct record

  • error_mode (str, optional) –

    Error mode. Options:
    raise : Raise an error when encountered and exit program
    warn_return : Print a warning and return the error_value
    return : Return the error value with no program output

DynamicFoldRecord Class

class hybkit.DynamicFoldRecord(id, seq, fold, energy)

Class for storing secondary structure (folding) information for a nucleotide sequence.

Instead of expecting the nucleotide sequence to match a potential HybRecord.seq attribute exactly, this type of fold record is expected to be reconstructed from aligned regions of a chimeric read. For chimeras with overlapping alignments, the sequence will be longer. For chimeras with gapped alignments, the sequence will be shorter.

For an example read with overlapping aligned portions:

Orignal:
seg1: 11111111111111111111
seg2:                   2222222222222222222
seq: TAGCTTATCAGACTGATGTTAGCTTATCAGACTGATG

Dynamic:
seg1: 11111111111111111111
seg2:                     2222222222222222222
seq:  TAGCTTATCAGACTGATGTTTTAGCTTATCAGACTGATG

For an example read with gapped aligned portions:

Orignal:
seg1:  1111111111111111
seg2:                    222222222222222222
seq:  TAGCTTATCAGACTGATGTTAGCTTATCAGACTGATG

Dynamic:
seg1: 1111111111111111
seg2:                 222222222222222222
seq:  AGCTTATCAGACTGATTAGCTTATCAGACTGATG

This type of sequence is found in the Hyb program *_hybrids_ua.hyb file type. This is primarily relevant in error-checking when setting a HybRecord.fold_record attribute.

The primary diffences in this class from the base FoldRecord class include modified versions of the methods:
Parameters
  • id (str) – Identifier for record

  • seq (str) – Nucleotide sequence of record.

  • fold (str) – Fold representation of record.

  • energy (str or float, optional) – Energy of folding for record.

Variables
  • id (str) – Sequence Identifier (often seg1name-seg2name)

  • seq (str) – Genomic Sequence

  • fold (str) – Dot-bracket Fold Representation, ‘(’, ‘.’, and ‘)’ characters

  • energy (float or None) – Predicted energy of folding

count_hyb_record_mismatches(hyb_record)

Count mismatches between dynamic hyb_record.seq and fold_record.seq.

Parameters

hyb_record (HybRecord) – hyb_record for comparison

matches_hyb_record(hyb_record)

Calculate dynamic sequence from hyb record and compare to DynamicFoldRecord seq.

Parameters

hyb_record (HybRecord) – hyb_record for comparison

ensure_matches_hyb_record(hyb_record)

Ensure the dynamic fold record sequence matches hyb_record.seq.

Parameters

hyb_record (HybRecord) – hyb_record for comparison

classmethod from_ct_lines(record_lines, error_mode='raise')

Create a FoldRecord from a list of record lines in “.ct” format ([CTFormat]).

Parameters
  • record_lines (list or tuple) – list containing lines of ct record

  • error_mode (str, optional) –

    Error mode. Options:
    raise : Raise an error when encountered and exit program
    warn_return : Print a warning and return the error_value
    return : Return the error value with no program output.

classmethod from_ct_string(record_string, error_mode='raise')

Create a FoldRecord entry from a multi-line string from “.ct” format ([CTFormat]).

Parameters
  • record_string (str) – String containing lines of ct record

  • error_mode (str, optional) –

    Error mode. Options:
    raise : Raise an error when encountered and exit program
    warn_return : Print a warning and return the error_value
    return : Return the error value with no program output

classmethod from_vienna_lines(record_lines, error_mode='raise')

Construct instance from a list of 3 strings of vienna-format ([ViennaFormat]) lines.

Parameters
  • record_lines (str or tuple) – Iterable of 3 strings corresponding to lines of a vienna-format record.

  • error_mode (str, optional) –

    Error mode. Options:
    raise : Raise an error when encountered and exit program
    warn_return : Print a warning and return the error_value
    return : Return the error value with no program output.

classmethod from_vienna_string(record_string, error_mode='raise')

Construct instance from a string representing 3 vienna-format ([ViennaFormat]) lines.

Parameters
  • record_string (str or tuple) – 3-line string containing a vienna-format record

  • error_mode (str, optional) – ‘string representing the error mode. Options: “raise”: Raise an error when encountered and exit program; “warn_return”: Print a warning and return the error_value ; “return”: Return the error value with no program output. record is encountered.

settings = {'allowed_mismatches': 0, 'fold_placeholder': '.'}

Class-level settings. See settings.FoldRecord_settings for descriptions.

to_vienna_lines(newline=False)

Return a list of lines for the record in vienna format.

See (Vienna File Format).

Parameters

newline (bool, optional) – If True, add newline character to the end of each returned line. (Default: False)

to_vienna_string(newline=False)

Return a 3-line string for the record in vienna format.

See (Vienna File Format).

Parameters

newline (bool, optional) – If True, terminate the returned string with a newline character. (Default: False)

HybFile Class

class hybkit.HybFile(*args, **kwargs)

File-Object wrapper that provides abiltity to return file lines as HybRecord entries.

settings = {'hybformat_id': False, 'hybformat_ref': False}

Class-level settings. See settings.HybFile_settings for descriptions.

close()

Close the file.

read_record()

Return next line of hyb file as HybRecord object.

read_records()

Return list of all records in hyb file as HybRecord objects.

write_record(write_record)

Write a HybRecord object to file as a Hyb-format string.

Unlike the file.write() method, this method will add a newline to the end of each written record line.

write_records(write_records)

Write a sequence of HybRecord objects as hyb-format lines to the Hyb file.

Unlike the file.writelines() method, this method will add a newline to the end of each written record line.

classmethod open(*args, **kwargs)

Return a new HybFile object.

ViennaFile Class

class hybkit.ViennaFile(*args, **kwargs)

Vienna file wrapper that returns “.vienna” file lines as FoldRecord objects.

read_record(error_mode=None)

Read next three lines and return output as FoldRecord object.

Parameters

error_mode (str, optional) – ‘string representing the error mode. If None, defaults to settings['error_mode']. Options: “raise”: Raise an error when encountered and exit program; “warn_return”: Print a warning and return the error_value ; “return”: Return the error value with no program output. record is encountered.

close()

Close the file handle.

classmethod open(*args, **kwargs)

Return a new FoldFile object.

read_records()

Return list of all FoldRecord objects based on the appropriate file type.

settings = {'foldfile_error_mode': 'raise', 'foldrecord_type': 'strict'}

Class-level settings. See settings.FoldFile_settings for descriptions.

write_direct(write_string)

Write a string directly to the underlying file handle.

write_record(write_record)

Write a FoldRecord object as the appropriate record/file type.

Unlike the file.write() method, this method will add a newline to the end of each written record line.

write_records(write_records)

Write a sequence of FoldRecord objects as the appropraite record/file type.

Unlike the file.writelines() method, this method will add a newline to the end of each written record line.

CtFile Class

class hybkit.CtFile(*args, **kwargs)

Ct file wrapper that returns “.ct” file lines as FoldRecord objects.

read_record(error_mode=None)

Return the next ct record as a FoldRecord object.

Call next(self.fh) to return the first line of the next entry. Determine the expected number of following lines in the entry, and read that number of lines further. Return lines as a FoldRecord object.

Parameters

error_mode (str, optional) – ‘string representing the error mode. If None, defaults to settings['error_mode'] Options: “raise”: Raise an error when encountered and exit program; “warn_return”: Print a warning and return the error_value ; “return”: Return the error value with no program output. record is encountered.

close()

Close the file handle.

classmethod open(*args, **kwargs)

Return a new FoldFile object.

read_records()

Return list of all FoldRecord objects based on the appropriate file type.

settings = {'foldfile_error_mode': 'raise', 'foldrecord_type': 'strict'}

Class-level settings. See settings.FoldFile_settings for descriptions.

write_direct(write_string)

Write a string directly to the underlying file handle.

write_record(write_record)

Write a FoldRecord object as the appropriate record/file type.

Unlike the file.write() method, this method will add a newline to the end of each written record line.

write_records(write_records)

Write a sequence of FoldRecord objects as the appropraite record/file type.

Unlike the file.writelines() method, this method will add a newline to the end of each written record line.

HybFoldIter Class

class hybkit.HybFoldIter(hybfile_handle, foldfile_handle, combine=False)

Iterator for simultaneous iteration over a HybFile and FoldFile object.

This class provides an iterator to iterate through a HybFile and one of a ViennaFile, or CtFile simultaneously to return a HybRecord and FoldRecord.

Basic error checking / catching is performed based on the value of the ~settings['error_mode'] setting.

The obtained FoldRecord will be set as HybRecord.fold_record of the returned HybRecord object.

Parameters
  • hybfile_handle (HybFile) – HybFile object for iteration

  • foldfile_handle (FoldFile) – FoldFile object for iteration

  • combine (bool, optional) – Use HybRecord.set_fold_record(FoldRecord) and return only the HybRecord.

Returns

(HybRecord, FoldRecord)

settings = {'error_checks': ['hybrecord_indel', 'foldrecord_nofold', 'max_mismatch'], 'error_mode': 'warn_skip', 'max_sequential_skips': 20}

Class-level settings. See settings.HybFoldIter_settings for descriptions.

report()

Create a report of information from iteration.

print_report()

Create a report of information from iteration.